STILL DEVELOPING · 1dFEATUREDAI HOT (Curated Pool)· aihot-apiZH00:00 · 06·16
→OpenRouter's Subagent tool lets frontier models delegate routine tasks to cheaper workers
OpenRouter launched a server-side tool called Subagent. Add openrouter:subagent to your tools array and your orchestrator model can hand off mechanical work—summarization, data extraction, boilerplate, reformatting—to a smaller, cheaper worker mid-generation. Claude Opus 4.8 costs $5 per million input tokens; GLM 5.2 costs $1.40, a 3.6x spread. In a 20-tool-call agent workflow, 5–8 calls might be delegations, cutting per-request cost without touching reasoning quality. Each delegation is isolated: the worker sees only the task_description, no parent context or memory. Workers can carry their own tools like web_search, recursion is blocked, and delegations cap at 10 per request. OpenRouter also highlighted the Advisor tool, which escalates hard decisions upward to a stronger model. The two can be used together in a single request.
#Agent#OpenRouter#Anthropic Claude Opus 4.8#GLM 5.2
why featured
OpenRouter turned sub-task delegation into a server-side tool — not just another API wrapper. The Opus 4.8 vs GLM 5.2 cost comparison ($5 vs $1.4) makes the savings tangible. Deduction: no latency numbers disclosed, and no fallback behavior described when the subagent fails. R...
editor take
OpenRouter turned model delegation into a server-side tool, with a 3.6x cost spread that hits your token bill directly.
sharp
This one's worth opening because it turns a common agent cost-saving pattern into infrastructure. Before this, if you wanted Claude Opus 4.8 to hand off summarization, data extraction, or boilerplate to a cheaper model, you had to build the dispatch logic yourself. Now you add openrouter:subagent to your tools array, the model decides when to delegate, GLM 5.2 does the grunt work, and the orchestrator just reads the result.
The cost spread is real: Claude Opus 4.8 at $5 per million input tokens vs. GLM 5.2 at $1.40, with an even wider gap on output. In a 20-tool-call agent workflow, 5-8 calls might be delegations—mechanical tasks that don't need frontier reasoning. You save money without touching the quality ceiling on the hard parts.
One design choice I like: the worker sees only the task_description, no parent context or memory. Each delegation is isolated. That keeps the small model from getting confused by the full conversation and saves tokens. Recursion is blocked, delegations cap at 10 per request—guardrails that keep the tool from spiraling.
OpenRouter also highlighted the Advisor tool, which does the opposite: escalates hard decisions upward to a stronger model. You can use both in a single request, which effectively gives your model a dispatch system for "delegate down, escalate up."
Where I'd discount this a bit: it's an OpenRouter ecosystem optimization. You need to be on their API and model catalog. If you're already using another routing layer or custom orchestration, migration cost is on you. The worker isolation is clean, but it also means the subagent can't leverage implicit context from the parent conversation—some tasks might actually need that to do a good job. The post doesn't give latency numbers either. Small models should be fast on mechanical work, but network round-trips plus scheduling overhead aren't spelled out.
HKR breakdown
hook ✓knowledge ✓resonance ✓