FEATUREDAI HOT (Curated Pool)· aihot-apiZH17:01 · 05·22
→Agent Workloads Quietly Reshape Inference Economics
SemiAnalysis analyzed 432,000 real coding-agent requests and found a median input length of 96,000 tokens, not 32,000 or 64,000. The post does not disclose the model mix, cost curve, sampling method, or time window.
#Agent#Code#Inference-opt#SemiAnalysis
why featured
HKR-H/K/R all pass: SemiAnalysis adds a 432k coding-agent request dataset and 96k-token median input. Missing models, cost curves, and sampling keep it in the strong-data-point band, not must-write.
editor take
432k coding-agent requests hit a 96k-token median input; that punctures cheap short-context math, but missing model mix keeps it from becoming a market baseline.
sharp
A 96k median input says coding-agent economics have moved to prefix ingestion, not the final few hundred output tokens. SemiAnalysis claims 432,000 real requests, which is large enough to take seriously; each call consumes more than The Great Gatsby before the user’s actual ask gets answered. That breaks product math built around 32k or 64k context assumptions once repos, retrieval chunks, tool logs, and prior state pile up.
I would not treat it as the market curve yet. The snippet gives no model mix, time window, sampling method, cache hit rate, or pricing tier. A Claude Sonnet-style long-context coding workflow and a cheap MoE router have very different marginal costs. Narrow claim: coding-agent pricing cannot keep borrowing chatbot assumptions.
HKR breakdown
hook ✓knowledge ✓resonance ✓