FEATUREDr/LocalLLaMA· rssEN14:57 · 05·05
→Heretic 1.3 Released: Reproducible Models, Integrated Benchmarks, Lower Peak VRAM
Heretic 1.3 adds reproducible runs, integrated benchmarks, lower peak VRAM, and broader model support. The project claims 20,000 GitHub stars and 13 million model downloads. Reproduce directories capture PyTorch, GPU, driver, and accelerator details; benchmarks use lm-evaluation-harness for MMLU, EQ-Bench, GSM8K, and HellaSwag. The post names Qwen3.5 and Gemma 4 support, but does not disclose VRAM reduction figures.
#Benchmarking#Inference-opt#Safety#Heretic
why featured
HKR-K/R pass: 20k stars, 13M downloads, reproducibility metadata, and eval harness are concrete. HKR-H fails and VRAM reduction lacks numbers, so this sits at the featured threshold.
editor take
Heretic 1.3 is less about model support and more about making local inference reproducible; the VRAM claim needs numbers before anyone cheers.
sharp
Heretic 1.3 is aiming at the ugly part of local model work: runs happen, but reproduction rots fast. The concrete hook is useful: reproduce directories capture PyTorch, GPU, driver, and accelerator details, while benchmarks plug into lm-evaluation-harness across MMLU, EQ-Bench, GSM8K, and HellaSwag. That matters more for teams than another line saying Qwen3.5 or Gemma 4 now loads.
The adoption numbers are nontrivial: 20,000 GitHub stars and 13 million model downloads. But the Reddit body is blocked by 403, and the claimed peak VRAM reduction has no disclosed percentage or test condition. That matters because local inference projects often turn allocator tweaks into performance theater. Against llama.cpp and vLLM, Heretic’s credible lane is reproducibility, not vague memory-saving claims.
HKR breakdown
hook —knowledge ✓resonance ✓