r/LocalLLaMA· rssEN13:54 · 05·23
→Apex-Testing: Real-world, real-repo agentic coding benchmark update
Apex-Testing updated its Real-World Agentic Coding benchmark to 95% coverage, using 65-70 private GitHub repositories, 70 tasks, and 8 categories, with metrics for average cost, average time, category-weighted scoring, ELO leaderboard, and model comparison.
#Agent#Code#Benchmarking#Apex-Testing
why featured
HKR-H/K/R all pass, but this is a single Reddit post with scale figures only; methods, model results, and reproducibility are not disclosed. It lands high in 60-71, not featured.
editor take
Apex-Testing claims 65-70 private repos; the body is 403, so without tasks or reproducibility, I don't buy the 95%.
HKR breakdown
hook ✓knowledge ✓resonance ✓