23:59
41d ago
Hacker News Frontpage· rssEN23:59 · 04·28
→Claude system prompt bug wastes user money and bricks managed agents
A GitHub issue says a Claude Code system prompt bug wastes user money and bricks managed agents. The RSS snippet lists 40 HN points and 10 comments; the post does not disclose reproduction steps, scope, or fix status.
#Agent#Code#Tools#Anthropic
why featured
HKR-H and HKR-R pass: Claude Code incident, wasted spend, and broken agents are discussable. HKR-K fails because repro steps, scope, and fix status are undisclosed, so this stays all.
editor take
Claude Code v2.1.111 hit the same prompt-control failure again; agent products break first in governance, not model IQ.
sharp
Claude Code v2.1.111 is accused of injecting a malware reminder on every Read and causing subagent refusals; the article does not disclose repro steps, affected scope, wasted spend, or Anthropic’s fix status. Thin evidence, yes. But I would not dismiss this as random GitHub noise. It hits the ugliest part of coding agents: system prompts, tool calls, subagent authority, and billing paths now share one failure surface.
The title contains two useful anchors. First, it calls this a regression and references #47027 and v2.1.92. The reporter believes Anthropic fixed a similar issue before, then shipped it again in v2.1.111. Second, the phrase “malware reminder on every Read” matters because Read is one of Claude Code’s highest-frequency operations. If every file read appends a security warning into context, the cost damage has two layers. Tokens grow directly, and the subagent’s behavior distribution shifts. The article gives no token delta per Read, so I will not invent the bill. But managed coding agents can run tens or hundreds of file reads on a serious task. A repeated warning is not just prompt clutter; it changes both invoice size and refusal behavior.
I am sensitive to this class of bug because coding-agent competition has moved past the demo phase. Cursor, Claude Code, OpenAI’s Codex-style tooling, and GitHub Copilot’s agent mode are all fighting for the same developer loop. Model quality still matters, but the failures users remember often sit in tool protocols, permission boundaries, context compaction, retries, and recovery. Claude 3.5 Sonnet earned real goodwill with coding. The later Sonnet line kept that reputation alive. But if a basic Read call keeps reintroducing a high-priority malware warning, the model’s coding ability is beside the point. The agent starts treating “am I handling malware?” as part of the task. Refusal becomes a product behavior, not a model oddity.
Anthropic’s safety-heavy posture is not the issue. The issue is using coarse natural-language reminders to steer tool behavior inside an agentic workflow. LLMs do not treat high-priority text like a traditional ACL. They interpret it semantically. If every Read says “malware,” the warning will not only fire on actual malware reverse-engineering. It can bleed into normal repos containing payload fixtures, suspicious strings, binary names, exploit tests, or security scanners. To a safety team, that is conservative. To a paying user, the agent has been hijacked by its own guardrail. Managed agents make this worse. A human can edit context, rerun, or steer around a refusal. A managed subagent can wedge the whole queue.
I do have doubts about the evidence here. The scraped body is mostly GitHub chrome. The HN snapshot shows 40 points and 10 comments, which is tiny. There is no reproduction repo, no log excerpt, no command sequence, no before-and-after run on v2.1.92 versus v2.1.111, and no maintainer response in the provided text. “Wastes user money” and “bricks managed agents” are strong claims. The article does not prove broad impact. The safer read is: the title gives version numbers, issue references, Read calls, and subagent refusal as locating details; the body does not give conditions or blast radius.
Still, this belongs on an AI practitioner’s radar because it exposes a product debt I keep seeing: agent vendors ship safety policy as prompt patching instead of as a testable control system. A serious fix would include regression metrics. Same repo, same task, same Read sequence, run on v2.1.92 and v2.1.111. Compare refusal rate, tool-call count, input-token growth, task completion, and recovery rate. The article has none of that. Anthropic should publish those numbers if it wants users to trust the fix. A plain “fixed” reply is weak when the reporter’s core claim is that the earlier fix did not hold.
My read: the HN heat is less important than whether Anthropic treats this as a product incident. If the response is just removing one reminder string, the same failure returns under another safety banner. If Read-level prompt injection becomes part of versioned regression testing, Claude Code starts looking more like infrastructure for long-running agents. Coding-agent reliability is no longer about writing one clean function in a demo. It is whether the agent can run for hours without tripping over its own system prompt.
HKR breakdown
hook ✓knowledge —resonance ✓
65
SCORE
H1·K0·R1