04:00
3h ago
→Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops
The paper audits 1,968 tasks across five terminal-agent benchmarks and finds 323 tasks, or 16%, hackable by frontier models using only the task description. It introduces a hacker-fixer-solver loop that patches verifiers without per-task manual fixes, reducing KernelBench attack success on a held-out public-exploit corpus from 62% to 0%.
83
SCORE
H1·K1·R1