sharp
Dwarkesh Patel launched a $20,000 AI blog prize with four 1,000-word prompts and a May 10, 11:59 PM PST deadline. I would not read this as a media creator running an essay contest. It is a compact hiring mechanism for AI judgment: low prize money, hard questions, short word limit, public submissions.
He says the quiet part out loud. The contest is meant to find a research collaborator. The prize split is $10,000, $6,000, and $4,000. In the AI labor market, that is tiny. Someone who can reason well about frontier-model economics, RL scaling, AI philanthropy, and national strategy has a much higher opportunity cost. OpenAI, Anthropic, Epoch AI, METR, policy shops, and serious grantmakers all compete for that kind of person. The money is not the wage. The money is the lure for a high-signal funnel.
The prompts are sharper than the prize announcement. The first asks why AI progress did not slow when systems moved deeper into RL-style regimes. It names the old intuition: longer horizons reduce reward signal per FLOP under naive policy gradients, and GPT-4 to o1 to o3 already crossed many orders of magnitude of RL compute. That framing matters. A lot of timeline arguments from 2024 treated reasoning progress as if test-time compute and long-horizon RL were the whole story. The better update came from verifier design, synthetic data, tool environments, process supervision, curriculum construction, and evaluation loops. Naive policy gradient was an easy target. The hard question is which of those engineering levers still scale.
The second prompt is the most commercially relevant one: when do foundation-model companies make money? The article cites OpenAI’s new raise at an $852 billion valuation and says the OpenAI Foundation stake is now worth $180 billion. That number changes the conversation. Single-model profitability is not enough if the model depreciates after three months and the next training run costs more. Epoch AI has written about whether individual models can earn back training costs, but Dwarkesh pushes toward the company-level problem. Labs face distillation, low switching costs, open-weight catch-up, and cloud platforms taking distribution margin. I do not buy the clean story where frontier labs naturally earn durable API margins. They need workflow control, enterprise lock-in, compliance moats, agent execution surfaces, or some way to tax valuable actions. The article gives no answer from Dwarkesh, which is fine. The absence is the test.
The third prompt asks what the OpenAI Foundation should do with wealth at the hundreds-of-billions scale. That is a nastier question than “which AI safety cause deserves funding?” AI safety people are comfortable naming areas: evals, governance, alignment research, biosecurity, compute monitoring. Turning $100 billion into impact requires organizations, operators, procurement channels, government interfaces, and tolerance for failed programs. Open Philanthropy has funded AI risk work for years, but my memory is that its AI spending has been far below the $100 billion scale. Once the budget moves two orders of magnitude up, the bottleneck stops being “smart people need grants.” It becomes absorption capacity. Dwarkesh is filtering for people who can describe a money-to-impact machine, not people who can recite values.
The fourth prompt asks what countries outside the AI production chain should do. It names India and Nigeria. That pairing is useful because it punishes generic development-policy answers. India has software services, English-speaking technical labor, a large domestic market, and digital public infrastructure like UPI. Nigeria faces very different constraints around electricity reliability, capital cost, GPU access, and state capacity. Neither country is going to become TSMC or Anthropic by executive will. Good answers need to talk about procurement, education, cloud access, energy, diaspora talent, service exports, and where local firms can capture value around deployment. “Invest in skills and infrastructure” will be filler unless the writer gives a sequence and a budget logic.
I do have a concern about the format. A 1,000-word limit tests clarity and compression. It does not test deep research. Each of the four prompts can support a 50-page memo. The format will reward people who sound decisive under uncertainty. Some of them will be genuinely good. Some will be overconfident stylists. Dwarkesh’s own interview style favors fast abstraction, brave synthesis, and clean causal stories. This funnel may select for that same cognitive shape rather than a complementary collaborator. The article also does not disclose judging criteria, judges, citation expectations, or whether private background knowledge is acceptable. Those details affect who applies and who looks good.
Still, I like the mechanism more than most AI research hiring exercises. The job is not “read papers and summarize them.” The job is building a usable world model while the facts are incomplete. These prompts force candidates to handle numbers, mechanisms, counterexamples, and timing. A good submission will not prove the writer is right. It will show how they are likely to be wrong. For a research-media hybrid like Dwarkesh, that signal is valuable. Spending $20,000 to attract a pile of dense answers and identify one collaborator is a very efficient search strategy.