FEATUREDr/LocalLLaMA· rssEN21:17 · 05·11
→I catalogued every way local models break JSON output and built a repair library across 288 model calls
Reddit user kexxty ran 288 structured-output calls through OpenRouter models, including Llama 3, Mistral, Command R, DeepSeek, and Qwen, and found similar JSON failure categories across local and API-only models. The MIT-licensed Python library outputguard validates against JSON Schema, applies 15 ordered repair strategies, includes 2,001 tests, and has no LLM provider dependency.
#Code#Tools#Benchmarking#OpenRouter
why featured
HKR-H/K/R all pass: 288 tests, the outputguard library, and a 15-step repair chain give practitioners reusable detail. Source is a single Reddit post, so it stays in the 72–77 featured band, not 78+.
editor take
288 calls cannot justify “every way,” but outputguard’s 15-step repair chain is closer to production reality than most structured-output demos.
sharp
The useful part is not “local models break JSON too”; it is the boring repair layer turned into code. The summary says 288 OpenRouter calls across Llama 3, Mistral, Command R, DeepSeek, and Qwen. outputguard validates with JSON Schema, then applies 15 ordered repair strategies, backed by 2,001 tests. That is more honest than the usual “turn on function calling and ship it” advice.
I don’t buy the title’s “every way.” Reddit returned 403, so model versions, prompts, schema complexity, temperature, and failure distribution are not visible here. And 288 calls is a smoke test, not a taxonomy of structured-output failure. Still, it hits the part practitioners keep rediscovering: structured output is not a model feature once it enters an agent pipeline. It becomes I/O fault tolerance, even with OpenAI or Anthropic JSON modes.
HKR breakdown
hook ✓knowledge ✓resonance ✓