podcasts

▸ 13 episodes · updated 3m ago

6 channels tracked

all Latent Space91 Dwarkesh Patel62 最佳拍档 (BestPartners)49 TheValley101 (硅谷101)37 Lex Fridman (YouTube RSS)15 Dwarkesh Patel14

tierfeatured allincludes low-score

▸ Dwarkesh Patel13 episodes

2026-07-01 · Wed

22:13

26d ago

Dwarkesh Patel· rssEN22:13 · 07·01

→Dwarkesh Podcast Announces Winners of AI Essay Contest: Biosecurity, Growth Policy, and Business Models

Dwarkesh Podcast announced the winners of its 'Big Questions About AI' essay contest. First place Jassi Pannu argues the OpenAI Foundation should spend tens of billions on physical infrastructure (e.g., far-UVC lamps) to end airborne pathogen transmission, yielding both everyday health benefits and pandemic tail-risk reduction. Second place Ege Erdil advises countries outside the AI supply chain to stick with strong property rights, low capital taxes, and open regulation—policies that will drive even larger growth differentials in an AI-driven world. Third place Michael Li draws an analogy to Hong Kong's MTR: AI labs' core product burns CapEx, but they can profit by buying complementary assets (like adjacent real estate). The post does not disclose prize amounts or judging details.

#Dwarkesh Podcast#Jassi Pannu#Johns Hopkins University

editor take

Dwarkesh essay contest winners: first place argues OpenAI Foundation should spend tens of billions to end airborne disease.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

SCORE

H0·K1·R0

2026-06-30 · Tue

15:53

28d ago

FEATUREDDwarkesh Patel· rssEN15:53 · 06·30

→Grant Sanderson on AI and math: IMO gold isn't AGI, but math will be the first field to see superintelligence

Grant Sanderson told Dwarkesh why IMO gold didn't turn out to be AGI. Geometry problems get brute-forced in 19 seconds, but combinatorics still trips the models up—the capability frontier is spiky. He pointed out that verifying a conceptual breakthrough can take a century, and even an AI proof of the Riemann hypothesis might be incomprehensible to humans. There's a big overhang in connecting ideas already in the literature, but real-world tasks don't fit neatly into RL environments, and good writing still requires a theory of mind that AI lacks. His advice for students: learning will keep depending on human curation.

#Reasoning#Grant Sanderson#3Blue1Brown#Dwarkesh Patel

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Grant Sanderson: AI math is spiky—geometry brute-forced in 19s, combinatorics still trips it up.

sharp

This one's worth clicking because Grant Sanderson gets concrete about why IMO gold didn't mean AGI. In 2024, geometry problems got brute-forced in 19 seconds by systems like AlphaGeometry—basically a search engine over synthetic proofs. But that year's test happened to have two combinatorics problems, the playful puzzle-type ones, and the models choked. Missed gold by a hair. His point: even within math, the capability frontier is spiky. Some subfields yield to compute; others need conceptual leaps that current systems can't make. He also raises something I rarely hear: an AI proof of the Riemann hypothesis might be incomprehensible to humans, with a verification cycle stretching a century. That's a sharper framing than the usual "AI will replace mathematicians" hand-waving. The bit about the overhang from connecting ideas already in the literature tracks with what a lot of agent-based literature review tools are trying to do right now. His advice for students—learning will keep depending on human curation—is grounded. What's missing: he doesn't unpack exactly what "theory of mind for good writing" means for AI, but the conversation is tighter than most podcast summaries.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-06-26 · Fri

15:51

32d ago

FEATUREDDwarkesh Patel· rssEN15:51 · 06·26

→The next big breakthrough will be AIs learning on the job

Dwarkesh Patel argues the labs' current RL-heavy bet—training AIs on millions of verifiable tasks—hits an underrated wall: a domain must be not just verifiable but also grindable, meaning you can run many parallel rollouts in a deterministic, replayable simulator. He uses computer use as a case study: ordering on Etsy is verifiable, but you can't spin up 1,000 agents to hammer the same Amazon checkout without getting banned. That's why computer use lags behind coding and math. The post doesn't offer a fix, but notes that if AIs get good enough to code high-fidelity app clones themselves, the grindability bottleneck could dissolve.

#Agent#Dwarkesh Patel

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Dwarkesh flags an underrated bottleneck: a task must be grindable, not just verifiable, which explains why computer use lags far behind coding.

sharp

I'd open this because Dwarkesh puts the labs' current RL bet under a clear lens. The pitch is: train AIs on millions of verifiable tasks across diverse environments, and you get general problem-solving. His pushback is that verifiability isn't enough—you also need grindability: a deterministic, replayable simulator where you can run tons of parallel rollouts. The computer-use example makes it concrete. Ordering on Etsy is verifiable, but you can't spin up 1,000 agents to hammer the same Amazon checkout without getting banned. That's why computer use lags behind coding and math—code has reproducible test suites, math has formal verifiers, but real websites don't offer that sandbox. He doesn't offer a fix, but points to one interesting escape hatch: if AIs get good enough to code high-fidelity app clones themselves, the grindability bottleneck could dissolve. That's still speculative, but the framing is worth tracking.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-06-08 · Mon

18:09

50d ago

FEATUREDDwarkesh Patel· rssEN18:09 · 06·08

→The sample efficiency black hole: AI models need far more data than humans to learn

Dwarkesh Patel argues that recent AI progress comes from more and better data, not better sample efficiency. RL is framed as synthetic data generation: spend compute to find good rollouts, then train the model to predict them. Each skill requires hundreds of human experts writing examples and rubrics, fueling a data-labeling industry earning billions annually. A human sees ~200M tokens by adulthood; frontier models train on tens to hundreds of trillions—a nearly million-fold gap. A person learns to teleoperate a robot in hours, while self-driving models need 3–4 orders of magnitude more data than a teen learning to drive. Open models lag closed ones by only 4 months because data is easy to distill from public APIs, unlike architecture tricks. The post does not propose a fix for sample efficiency.

#Dwarkesh Patel#Mercor#Epoch AI

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Dwarkesh reframes RL as compute-heavy data filtering, arguing data volume—not algorithmic elegance—drove recent AI gains.

sharp

This piece clicks because it connects a few scattered observations into one clean thesis: models got better mainly by eating more and better data, not by learning more efficiently. Dwarkesh reframes RL as a synthetic data pipeline—spend compute to find good rollouts, then train the model to predict them, same logic as next-token prediction in pretraining. Two numbers make the gap concrete: a human sees ~200M tokens by adulthood; frontier models train on tens to hundreds of trillions—a million-fold difference. Learning to teleoperate a robot takes a person hours; self-driving models need 3–4 orders of magnitude more data than a teen learning to drive. He offers an explanation I buy: open models lag closed ones by only 4 months because data is easy to distill from public APIs, while architecture tricks and training recipes aren't. If algorithmic efficiency were the main driver, that gap would be wider. The post doesn't propose a fix—it ends on the "data black hole" metaphor. I'd read it as a diagnosis, not a roadmap.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-06-04 · Thu

16:14

54d ago

FEATUREDDwarkesh Patel· rssEN16:14 · 06·04

→Alex Imas and Phil Trammell – What Remains Scarce After AGI?

Dwarkesh Patel interviewed Alex Imas and Phil Trammell on seven AGI economics topics, including capital share, AI wealth taxation, redistribution, demand collapse, developing countries, and what remains scarce after automation. The transcript names human-in-the-loop relational services as a scarcity candidate, but the post does not disclose quantitative forecasts for wages, labor share, or inequality.

#Dwarkesh Patel#Alex Imas#Phil Trammell#Commentary

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

AGI economics keeps circling jobs; this episode drags scarcity to the uglier question: who still gets paid for being human.

sharp

The useful claim here is not “which jobs survive AGI.” It is that value flows to preference targets that automation cannot copy. The concrete hook is clean: one robot can become many robots next year, while the number of ballerinas stays fixed. The transcript also names seven AGI-econ buckets: capital share, AI wealth taxes, redistribution, demand collapse, developing countries, and human-in-the-loop services. I buy the frame, not the confidence around it. Human baristas, dancers, therapists, and relationship labor do look like scarce goods if people pay for the human label. But the post gives no quantitative forecast for wages, labor share, tax rates, or inequality. Compared with the agent-workflow story dominating AI products, this pushes labor value back into identity and taste. The missing number is GDP scale: luxury scarcity is real, but it does not automatically absorb a displaced labor market.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-05-22 · Fri

15:38

67d ago

FEATUREDDwarkesh Patel· rssEN15:38 · 05·22

→Reiner Pope – Chip Design from the Bottom Up

Dwarkesh Patel interviews MatX CEO Reiner Pope on chip design, starting with a 4-bit multiply and 8-bit accumulate example that uses 16 AND gates, then covering systolic arrays, pipeline registers, FPGAs versus ASICs, cache versus scratchpad, and why GPU cores are smaller than CPU cores.

#Inference-opt#Reiner Pope#MatX#Dwarkesh Patel

why featured

Featured · importance 74 · hook + knowledge + resonance

editor take

Dwarkesh makes MatX’s pitch through a 4-bit MAC lesson; AI chip talk finally moves from H100 procurement to data movement cost.

sharp

The useful move here is forcing AI chip hype back down to circuit-level constraints. Pope starts with a 4-bit multiply, 8-bit accumulate, and 16 AND gates, then walks into systolic arrays, pipeline registers, FPGA versus ASIC, and cache versus scratchpad. The hook is plain: matrix multiply is cheap to describe; moving data and scheduling it are where designs bleed. Dwarkesh discloses he is an early MatX investor, so don’t treat this as neutral education. I actually like the honesty. MatX’s pitch smells less like “GPU killer” theater and more like a TPU-style bet on specialization, scratchpad discipline, and compiler co-design for inference. Nvidia’s moat still sits in CUDA, supply, and deployment muscle, not in the romance of one MAC unit.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-05-16 · Sat

19:04

73d ago

FEATUREDDwarkesh Patel· rssEN19:04 · 05·16

→The mistake of conflating intelligence and power

Dwarkesh Patel argues that intelligence and power are being conflated: current AI systems improve through economically valuable tasks such as coding, while real-world power depends more on authority, trust, and large-scale cooperation than isolated strategic reasoning.

#Reasoning#Alignment#Dwarkesh Patel#Donald Trump

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Dwarkesh lands the cut: stop extrapolating SWE-bench cleverness into Stalin-grade political power.

sharp

Dwarkesh’s sharp move is forcing the AI-safety definition of intelligence into an ugly corner. If intelligence means “achieving goals across domains,” the article says Donald Trump, Xi Jinping, Vladimir Putin, and Stalin outrank the physicists. Their power comes from legitimacy, trust, and hundreds of millions of people coordinating around institutions, not isolated reasoning horsepower. That pushback hits the current agent narrative hard. Models are improving through coding, tool use, and economically valuable tasks. That path makes automated firms nastier competitors; it does not automatically create a lone digital mind that captures authority through clever strategy. If a threat model skips institutions, distribution, and authorization, it starts looking less like political economy and more like a Diplomacy board.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

19:01

73d ago

FEATUREDDwarkesh Patel· rssEN19:01 · 05·16

→Notes on Pretraining Parallelisms and Failed Training Runs

Dwarkesh documents pretraining failure modes and parallelism tradeoffs: expert choice and token dropping can break causality in MoE routing, FP16 collectives can bias repeated additions after values exceed 1024, pretraining FLOPs are given as 6ND, B300 HBM is listed as 288GB, and FSDP communication can reach params × 3 with reduce-scatter.

#Fine-tuning#Inference-opt#Benchmarking#Dwarkesh

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Dwarkesh’s note reads like a pretraining incident log: FLOPs are the easy part; causality leaks and numeric bias burn clusters quietly.

sharp

Pretraining failure is not mysticism; tiny engineering choices get amplified at cluster scale. Dwarkesh’s concrete hook is brutal: expert choice can make token n’s expert assignment depend on token n+k, and token dropping can let later tokens crowd out earlier ones. That is training-time information leakage that inference never gets. The FP16 collectives example is even uglier: after an accumulator passes 1024, adding 1 can round back to 1024, so 10,000 additions can land 10x wrong. Outside chatter still fixates on 6ND FLOPs, B300’s 288GB HBM, or FSDP traffic at parameters × 3. This note is a reminder that frontier training advantage includes boring competence: avoid dumb numerical bugs, then find the ones you still shipped.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

19:00

73d ago

FEATUREDDwarkesh Patel· rssEN19:00 · 05·16

→RLVR might be disproportionately bad at science

Dwarkesh argues that RLVR fits scientific discovery poorly, using heliocentrism’s 1543–1838 verification gap and Mercury’s 43-arcsecond-per-century precession as examples of long, ambiguous theory-evaluation loops.

#Reasoning#Alignment#Dwarkesh#Michael Nielsen

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Dwarkesh hits RLVR where it hurts: science is not LeetCode; the reward can arrive 200 years late and still favor the wrong theory.

sharp

RLVR breaks on scientific discovery because the reward is often late, noisy, and historically misleading. Dwarkesh’s examples are brutal: heliocentrism was published in 1543, but stellar parallax was not measured until 1838; Mercury’s extra 43 arcseconds per century pointed Newtonians toward Vulcan, then Einstein closed it with general relativity in 1915. That should make AI-research-booster claims sound less automatic. Code and math give dense feedback through tests, proof checkers, and SWE-bench-style evals. Science often runs on judgment, instrument availability, unification taste, and decades of ambiguous evidence. I don’t buy the straight line from “RLVR works on verifiable tasks” to “models will be unusually good scientists.” It lands first in simulatable, automatable, short-loop research, not in theory choice.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-05-15 · Fri

16:04

74d ago

● P1Dwarkesh Patel· rssEN16:04 · 05·15

→Eric Jang Rebuilds AlphaGo from Scratch with Modern Tools

Eric Jang explains how to build AlphaGo from scratch with modern AI tools, comparing MCTS training targets with credit assignment in LLM reinforcement learning over 100k+ token trajectories.

#Reasoning#Agent#Code#Eric Jang

why featured

Featured · importance 88 · hook + knowledge + resonance

editor take

Eric Jang rebuilt AlphaGo from scratch with modern tools. The real insight isn't the rebuild — it's his side-by-side comparison of why MCTS-style RL works for Go but breaks for LLMs, and what that ...

sharp

Eric Jang walked through his from-scratch AlphaGo rebuild on Dwarkesh's podcast. Both sources are Dwarkesh's own content (article plus YouTube), so there's no independent angle here — but the material is Jang's firsthand technical explanation, not a secondhand summary. His core comparison is sharp: AlphaGo uses Monte Carlo Tree Search for self-play, where every move gets a clear "this is better than that" training signal. LLM RL training, by contrast, has to deal with trajectories of 100k+ tokens, and the model has to guess which specific action earned the reward. That's the credit assignment problem, and Jang argues human learning looks more like the former. Current LLM RL is stuck with the latter's inefficiency. He also touched on using LLMs for automated AI research — implementing experiments and tuning hyperparameters works decently, but picking the right research question and escaping dead ends still doesn't. That connects directly to the intelligence explosion debate. I'd treat the automation section as personal experience rather than a systematic evaluation, since he only ran this on one project.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-04-29 · Wed

17:07

90d ago

FEATUREDDwarkesh Patel· rssEN17:07 · 04·29

→Reiner Pope: The Math Behind How LLMs Are Trained and Served

Dwarkesh interviewed Reiner Pope in a 1-session blackboard lecture on LLM training and serving. The post lists 7 timestamps on batch size, MoE rack layout, pipeline parallelism, KV cache, and API pricing. The key mechanism is cost: without batching, serving economics can be 1,000x worse.

#Inference-opt#Reasoning#Dwarkesh Patel#Reiner Pope

why featured

Featured · importance 77 · hook + knowledge + resonance

editor take

This is more useful than another model launch: a 1,000x serving-cost swing explains why fast modes, batching, and long-context pricing are product politics.

sharp

Dwarkesh’s best move here is turning frontier-model mystique into a serving ledger. Reiner Pope walks from batch size, MoE rack layout, pipeline parallelism, KV cache, and API prices to cost inference. The sharp number is brutal: skipping batching can make serving economics 1,000x worse. That single mechanism explains why Claude, Codex, and Cursor keep bending fast modes around latency, price, and queueing. I’ve always thought 2026 AI discourse over-indexes on intelligence jumps and under-indexes on per-token margin. This lecture flips the order: compute throughput first, memory pressure second, product shape third. Dwarkesh discloses he is an angel investor in MatX, so the chip-startup angle is not neutral. Still, the equations are harder to PR-wash than another vendor benchmark.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-04-27 · Mon

13:51

92d ago

FEATUREDDwarkesh Patel· rssEN13:51 · 04·27

→What I've been Thinking About This Weekend: Open Questions, Intelligence vs Power, Verification in Science

Dwarkesh lists open AI questions, including that five hyperscalers own over 70% of global AI compute. He asks about coding agents, KV cache costs, merging training with inference, and online learning; the post gives questions, not experimental answers.

#Agent#Code#Memory#Dwarkesh

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

Dwarkesh offers questions, not answers, but “5 hyperscalers own 70%+ of AI compute” cuts through a lot of agent theater.

sharp

Dwarkesh’s sharpest move is dragging capability talk back to compute ownership. If five hyperscalers hold 70%+ of global AI compute, and much of it is reserved for OpenAI, Anthropic, and GDM, long-horizon coding agents are not just algorithmic progress. They are a resource allocation outcome. The KV-cache example is the hard hook: Llama 3 70B uses about 320KB per token in cache, versus 0.075 bits per token if weights are amortized over pretraining tokens. That 35-million-fold gap makes “context learning” look like an expensive memory trick, not magic sample efficiency. I don’t buy the post as merely a list of open questions. It has a thesis: pretraining, RL generation, and inference collapse into online learning. The weak spot is verification; the article gives no experimental result or lab evidence that anyone has made that loop reliable.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

2026-04-24 · Fri

16:37

95d ago

Dwarkesh Patel· rssEN16:37 · 04·24

→Blog Prize for the Big Questions About AI

Dwarkesh Patel launched a $20,000 AI blog prize; entrants answer one of four questions in 1,000 words. Prizes are $10,000, $6,000, and $4,000, with a May 10, 11:59 PM PST deadline. The key detail is the hiring funnel: the contest also screens for a research collaborator.

#Reasoning#Alignment#Dwarkesh Patel#OpenAI

editor take

Dwarkesh Patel's $20K blog prize is a hiring funnel for a research collaborator.

sharp

Dwarkesh Patel launched a $20,000 AI blog prize with four 1,000-word prompts and a May 10, 11:59 PM PST deadline. I would not read this as a media creator running an essay contest. It is a compact hiring mechanism for AI judgment: low prize money, hard questions, short word limit, public submissions. He says the quiet part out loud. The contest is meant to find a research collaborator. The prize split is $10,000, $6,000, and $4,000. In the AI labor market, that is tiny. Someone who can reason well about frontier-model economics, RL scaling, AI philanthropy, and national strategy has a much higher opportunity cost. OpenAI, Anthropic, Epoch AI, METR, policy shops, and serious grantmakers all compete for that kind of person. The money is not the wage. The money is the lure for a high-signal funnel. The prompts are sharper than the prize announcement. The first asks why AI progress did not slow when systems moved deeper into RL-style regimes. It names the old intuition: longer horizons reduce reward signal per FLOP under naive policy gradients, and GPT-4 to o1 to o3 already crossed many orders of magnitude of RL compute. That framing matters. A lot of timeline arguments from 2024 treated reasoning progress as if test-time compute and long-horizon RL were the whole story. The better update came from verifier design, synthetic data, tool environments, process supervision, curriculum construction, and evaluation loops. Naive policy gradient was an easy target. The hard question is which of those engineering levers still scale. The second prompt is the most commercially relevant one: when do foundation-model companies make money? The article cites OpenAI’s new raise at an $852 billion valuation and says the OpenAI Foundation stake is now worth $180 billion. That number changes the conversation. Single-model profitability is not enough if the model depreciates after three months and the next training run costs more. Epoch AI has written about whether individual models can earn back training costs, but Dwarkesh pushes toward the company-level problem. Labs face distillation, low switching costs, open-weight catch-up, and cloud platforms taking distribution margin. I do not buy the clean story where frontier labs naturally earn durable API margins. They need workflow control, enterprise lock-in, compliance moats, agent execution surfaces, or some way to tax valuable actions. The article gives no answer from Dwarkesh, which is fine. The absence is the test. The third prompt asks what the OpenAI Foundation should do with wealth at the hundreds-of-billions scale. That is a nastier question than “which AI safety cause deserves funding?” AI safety people are comfortable naming areas: evals, governance, alignment research, biosecurity, compute monitoring. Turning $100 billion into impact requires organizations, operators, procurement channels, government interfaces, and tolerance for failed programs. Open Philanthropy has funded AI risk work for years, but my memory is that its AI spending has been far below the $100 billion scale. Once the budget moves two orders of magnitude up, the bottleneck stops being “smart people need grants.” It becomes absorption capacity. Dwarkesh is filtering for people who can describe a money-to-impact machine, not people who can recite values. The fourth prompt asks what countries outside the AI production chain should do. It names India and Nigeria. That pairing is useful because it punishes generic development-policy answers. India has software services, English-speaking technical labor, a large domestic market, and digital public infrastructure like UPI. Nigeria faces very different constraints around electricity reliability, capital cost, GPU access, and state capacity. Neither country is going to become TSMC or Anthropic by executive will. Good answers need to talk about procurement, education, cloud access, energy, diaspora talent, service exports, and where local firms can capture value around deployment. “Invest in skills and infrastructure” will be filler unless the writer gives a sequence and a budget logic. I do have a concern about the format. A 1,000-word limit tests clarity and compression. It does not test deep research. Each of the four prompts can support a 50-page memo. The format will reward people who sound decisive under uncertainty. Some of them will be genuinely good. Some will be overconfident stylists. Dwarkesh’s own interview style favors fast abstraction, brave synthesis, and clean causal stories. This funnel may select for that same cognitive shape rather than a complementary collaborator. The article also does not disclose judging criteria, judges, citation expectations, or whether private background knowledge is acceptable. Those details affect who applies and who looks good. Still, I like the mechanism more than most AI research hiring exercises. The job is not “read papers and summarize them.” The job is building a usable world model while the facts are incomplete. These prompts force candidates to handle numbers, mechanisms, counterexamples, and timing. A good submission will not prove the writer is right. It will show how they are likely to be wrong. For a research-media hybrid like Dwarkesh, that signal is valuable. Spending $20,000 to attract a pile of dense answers and identify one collaborator is a very efficient search strategy.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

podcasts

more

feeds

admin