FEATUREDr/LocalLLaMA· rssEN00:48 · 05·10
→NVIDIA AI Releases Star Elastic: One Checkpoint Contains 30B, 23B, and 12B Reasoning Models
NVIDIA AI released Star Elastic, a single checkpoint that can zero-shot slice 30B, 23B, and 12B reasoning models in BF16, FP8, and NVFP4; when the 23B submodel handles thinking and the 30B model handles final answers, reported accuracy rises 16% and latency drops 1.9× on AIME-2025, GPQA, LiveCodeBench v5, and MMLU-Pro.
#Reasoning#Inference-opt#Benchmarking#NVIDIA
why featured
HKR-H/K/R all pass: Star Elastic has a concrete mechanism and testable numbers for inference deployment. Its reach is still narrower than a frontier-model release, so it sits in the high-quality featured band.
editor take
NVIDIA is turning model size into a runtime knob; the spicy claim is 23B thinking plus 30B answering with +16% accuracy.
sharp
Star Elastic looks less like another Nemotron drop and more like NVIDIA packaging model routing into one checkpoint. The claim is concrete: one checkpoint slices into 30B, 23B, and 12B models, ships BF16, FP8, and NVFP4, and reports +16% accuracy with 1.9× lower latency when 23B handles reasoning and 30B writes the final answer across AIME-2025, GPQA, LiveCodeBench v5, and MMLU-Pro.
I’d discount the number until the eval details show up. The body is a Reddit 403, so hardware, batch size, routing policy, and scripts are not visible here. NVIDIA has spent the last year pushing Nemotron as deployable inference infrastructure, not just open weights. If zero-shot slicing holds without retraining, it pressures both MoE serving and hand-built cascades on cost control.
HKR breakdown
hook ✓knowledge ✓resonance ✓