posts · 2026-05-16

▸ 50 items · updated 3m ago

browse by dayclear filter ✕

May 2026

MTWTFSS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2573 26105 27120 28142 29116 3064 3162

June 2026

MTWTFSS

1150 2157 3132 4117 5127 669 773 8141 9135 1084 1196 1288 1346 1434 1570 1682 1775 1886 1955 2027 2120 2274 2374 2468 2564 2640 2724 2837 2956 3083

July 2026

MTWTFSS

156 271 347 421 527 664 758 865 975 1050 1134 1228 1345 1484 1582 1683 1745 1818 1938 2051 2170 2265 2340 24 25 26 27 28293031

2026-05-16 · Sat

23:57

72d ago

FEATUREDr/LocalLLaMA· rssEN23:57 · 05·16

→Same Models Tested Across Strix Halo, RTX 3090, and RTX 5070

C_Coffie published 55 local inference benchmark runs across Strix Halo, RTX 3090, RTX 5070, five backends, and 0.35B to 35B-A3B models; RTX 5070 beats RTX 3090 on models fitting 12GiB, while RTX 3090 leads in the 14–31B band that exceeds 12GiB but fits 24GiB.

#Inference-opt#Benchmarking#Reasoning#C_Coffie

why featured

Featured · importance 74 · hook + knowledge + resonance

editor take

A 55-run hobbyist bench says more than vendor slides: 5070 wins small fits, 3090 owns 14–31B because VRAM still decides local inference.

sharp

This Reddit bench punctures the lazy “new GPU wins” take: RTX 5070 beats RTX 3090 when the model fits inside 12GiB, while the 24GiB 3090 wins across the 14–31B band. The useful part is the messiness: 55 runs, five backends, 0.35B through 35B-A3B, and hardware people actually buy or already own. I trust this kind of dirty bench more than vendor slides for local inference. It mixes backend overhead, quantization choices, and the VRAM wall in one place. Strix Halo being included also says the comparison set has moved beyond discrete GPUs. Reddit 403 blocks the original table, so exact tok/s and settings aren’t verifiable here. The direction still matches the field: small models reward newer silicon; larger local models punish 12GiB cards fast.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

74

SCORE

H1·K1·R1

23:39

72d ago

r/LocalLLaMA· rssEN23:39 · 05·16

→Anyone else running pre-release MTP branches to maintain higher speeds?

A Reddit user says a pre-release MTP branch runs about 20% faster on Dual Xeon 8268 CPUs with a Tesla T4, reaching about 38 output tokens per second; the release branch reaches about 30 tokens per second and crashed llama.cpp during light coding.

#Inference-opt#Vision#Code#Reddit

editor take

MTP pre-release hits 38 t/s on a T4; I trust the throughput claim before I trust the stability story.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

56

SCORE

H1·K1·R1

23:04

72d ago

AI HOT (Curated Pool)· aihot-apiZH23:04 · 05·16

→Figure humanoid robot runs autonomously for four consecutive days, moving toward practical use

Figure’s F.03 humanoid robot entered its fourth day of 24/7 autonomous testing in a real warehouse, performing grasping, carrying, and sorting tasks; the post does not disclose failure counts or maintenance intervals.

#Robotics#Agent#Figure#Benchmark

editor take

Figure F.03 ran warehouse tasks for four days; without failures or maintenance intervals, don't call it practical yet.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

70

SCORE

H1·K1·R1

22:23

72d ago

Hacker News Frontpage· rssEN22:23 · 05·16

→Zerostack – A Unix-inspired coding agent written in pure Rust

Zerostack published a 1.0.0 package on crates.io, and the title describes it as a Unix-inspired coding agent written in pure Rust; the post does not disclose its architecture, tool interface, or benchmark results.

#Agent#Code#Tools#Zerostack

editor take

Zerostack shipped crates.io 1.0.0; only the title is disclosed, with no architecture, tool API, or benchmarks.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

64

SCORE

H1·K0·R1

22:19

72d ago

r/LocalLLaMA· rssEN22:19 · 05·16

→Now that MTP is merged, what are the best Qwen 3.6 35B outputs on 2×3090s?

A Reddit user asks for Qwen 3.6 35B results on dual RTX 3090s after llama.cpp merged MTP; their split-layer setup previously reached 1500 p/p and 120 t/g, MTP testing fell to 80 t/g, and their CPU overflow fallback reports 3500 p/p and 80 t/g.

#Inference-opt#Qwen#llama.cpp#NVIDIA

editor take

Qwen 3.6 35B on 2x3090 drops to 80 t/s with MTP. Honestly, one Reddit rig is not a win signal.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

52

SCORE

H1·K1·R1

21:54

72d ago

r/LocalLLaMA· rssEN21:54 · 05·16

→Qwen3.5-122B-Q5-MTP and Qwen3.5-122B-Q6-MTP

A Reddit user tested two Qwen3.5-122B MTP quantized models under llama.cpp server-rocm-mtp with --spec-type draft-mtp and --spec-draft-n-max 3; Qwen3.5-122B-Q5-MTP-General reached 20.24 t/s over 4,200 eval tokens, while Qwen3.5-122B-Q6-MTP-General reached 17.17 t/s over 3,283 eval tokens.

#Inference-opt#Benchmarking#Qwen#Unsloth

editor take

Qwen3.5-122B MTP shows 20.24 t/s, but the body is 403; treat this as one Reddit rig's number.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

66

SCORE

H0·K1·R1

21:43

72d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH21:43 · 05·16

→MagicPath Integrates with Codex to Combine Design and Development

MagicPath AI CEO @skirano demonstrated MagicPath running inside Codex as a native canvas, with users configuring it through one command, dragging UI elements, and letting Codex generate and edit code in real time.

#Agent#Code#Tools#MagicPath AI

why featured

Featured · importance 74 · hook + knowledge + resonance

editor take

MagicPath inside Codex is smart packaging, but UI drag-to-code is the easy part; design systems and state boundaries are where demos usually crack.

sharp

MagicPath is betting on a canvas inside the coding surface, not another Figma-side handoff tool. The demo says one command installs it in Codex, then users drag UI elements while Codex generates and edits code live. That placement is sharp: developers already use Codex to inspect diffs, run projects, and change logic. I don’t buy the clean “design and development merge” story yet. The snippet gives no framework list, no design-token story, and no answer on component-library constraints. v0, Bolt, and Lovable already proved that prompt-to-UI can look impressive; the debt appears after the page enters a real repo. State, styling, and maintainability start charging interest. If MagicPath only improves canvas interaction, it is a nicer scaffold. If it edits existing codebases without breaking conventions, then it earns a place in team workflow.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

74

SCORE

H1·K1·R1

21:34

72d ago

r/LocalLLaMA· rssEN21:34 · 05·16

→I fitted the new δ-mem research for Apple Silicon using MLX and OpenClaw integration

A Reddit user adapted δ-mem to MLX on a 64GB Apple Silicon Mac mini and tested Qwen3-4B-Instruct with OpenClaw history. LoCoMo-10 mini rose from 0.0500 to 0.1833, while OpenClaw replay improved from 6/8 to 7/8 passed probes with about 1.30x latency.

#Memory#Agent#Benchmarking#Apple

editor take

Summary says δ-mem lifts LoCoMo-10 from 0.0500 to 0.1833; body is 403, so distrust the 1.30x tradeoff.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

70

SCORE

H1·K1·R1

20:40

72d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH20:40 · 05·16

→Study on the Cognition–Action Disconnect in Tool-Using Agents

An interpretability paper studies tool-using agents and finds models often recognize when to call a tool but fail to act, with a cognition-to-action mismatch rate of 26%–54%.

#Agent#Tools#Interpretability#Research release

why featured

Featured · importance 80 · hook + knowledge + resonance

editor take

A 26%–54% tool-use gap is brutal: the model knows the tool is needed, then loses the action signal near the final token.

sharp

This paper moves agent failure from “the model didn’t understand” to “the model understood and still didn’t act.” That is the useful part. The concrete hook is sharp: hidden states can decode that a tool should be called, yet the cognition-to-action mismatch sits at 26%–54%. The failure is localized in the transition to action, where late-layer final-token geometry rotates the signal until it is nearly orthogonal to the emitted action. That fits the ceiling many teams hit with tool-use prompt A/B tests. Repeating “use search when needed” pressures the front end of the trajectory; it does not fix a late-layer routing problem. Compared with ReAct or function-calling wrappers, this says the interface contract is cleaner than the internal control path. The snippet does not disclose the exact models, task set, or intervention size.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

80

SCORE

H1·K1·R1

20:17

73d ago

FEATUREDTechCrunch AI· rssEN20:17 · 05·16

→The Haves and Have-Nots of the AI Gold Rush

Deedy Das estimated that about 10,000 founders and employees at companies including OpenAI, Anthropic, and Nvidia have accumulated more than $20 million in wealth, while many software engineers face layoffs, sub-$500,000 career ceilings, and anxiety that their core skills are losing labor-market value.

#Deedy Das#OpenAI#Anthropic#Commentary

why featured

Featured · importance 72 · hook + knowledge + resonance

editor take

The 10,000 AI millionaires aren’t bubble trivia; they mark a class break in tech labor, and SWE anxiety is about a closed ladder.

sharp

Deedy Das is hitting distribution, not model capability. His rough estimate says about 10,000 founders and employees at OpenAI, Anthropic, Nvidia, and peers now hold more than $20 million in wealth, while many software engineers stare at sub-$500,000 career ceilings, layoffs, and skill depreciation. That gap changes labor pricing fast: frontier researchers, infra engineers, and inference-cost people keep getting bid up, while ordinary product engineers get repriced against Copilot, Cursor, and Devin-style workflows. TechCrunch only cites Das’s back-of-the-envelope math, with no sample or methodology, so don’t treat 10,000 as a statistic. But the labor-market signal is real. The 2025 slogan about “AI-using engineers replacing non-AI engineers” has now hardened into compensation bands.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

72

SCORE

H1·K1·R1

19:58

73d ago

AI HOT (Curated Pool)· aihot-apiZH19:58 · 05·16

→US Starts Seeing Heavy Job Losses in Roles Exposed to AI

Bloomberg says US roles exposed to AI are starting to see heavy job losses; the post does not disclose layoff counts, affected industries, or the measurement method.

#Bloomberg#Commentary

editor take

Bloomberg flags heavy AI-exposed US job losses, but gives no counts or method here; don’t weaponize this headline in planning.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

70

SCORE

H1·K0·R1

19:51

73d ago

r/LocalLLaMA· rssEN19:51 · 05·16

→Local Qwen 3.6 vs Frontier Models on a Single-File HTML Canvas Driving Animation

A Reddit user tested 11 models with the same single-file HTML Canvas driving-animation prompt, and local Qwen3.6-27B Q4_K_M ranked second subjectively at 2.70 tok/s, behind Kimi k2.6 Thinking and ahead of the Claude-opus-reasoning-distilled 27B quant.

#Code#Benchmarking#Qwen#Claude

editor take

Title says Qwen3.6-27B Q4_K_M ranked 2nd among 11 models; body is 403, so scoring and GIFs are unverified.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

70

SCORE

H1·K1·R1

19:43

73d ago

AI HOT (Curated Pool)· aihot-apiZH19:43 · 05·16

→Codex Adds Custom Keyboard Shortcuts

Codex added custom keyboard shortcuts, letting users adjust key bindings in settings; the post does not disclose a version number, supported platforms, or rollout schedule.

#Code#Tools#Product update

editor take

Codex now supports custom shortcuts in settings. No version, platforms, or rollout disclosed; this is editor-table-stakes catch-up.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

58

SCORE

H0·K1·R0

19:04

73d ago

FEATUREDDwarkesh Patel· rssEN19:04 · 05·16

→The mistake of conflating intelligence and power

Dwarkesh Patel argues that intelligence and power are being conflated: current AI systems improve through economically valuable tasks such as coding, while real-world power depends more on authority, trust, and large-scale cooperation than isolated strategic reasoning.

#Reasoning#Alignment#Dwarkesh Patel#Donald Trump

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Dwarkesh lands the cut: stop extrapolating SWE-bench cleverness into Stalin-grade political power.

sharp

Dwarkesh’s sharp move is forcing the AI-safety definition of intelligence into an ugly corner. If intelligence means “achieving goals across domains,” the article says Donald Trump, Xi Jinping, Vladimir Putin, and Stalin outrank the physicists. Their power comes from legitimacy, trust, and hundreds of millions of people coordinating around institutions, not isolated reasoning horsepower. That pushback hits the current agent narrative hard. Models are improving through coding, tool use, and economically valuable tasks. That path makes automated firms nastier competitors; it does not automatically create a lone digital mind that captures authority through clever strategy. If a threat model skips institutions, distribution, and authorization, it starts looking less like political economy and more like a Diplomacy board.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

19:01

73d ago

FEATUREDDwarkesh Patel· rssEN19:01 · 05·16

→Notes on Pretraining Parallelisms and Failed Training Runs

Dwarkesh documents pretraining failure modes and parallelism tradeoffs: expert choice and token dropping can break causality in MoE routing, FP16 collectives can bias repeated additions after values exceed 1024, pretraining FLOPs are given as 6ND, B300 HBM is listed as 288GB, and FSDP communication can reach params × 3 with reduce-scatter.

#Fine-tuning#Inference-opt#Benchmarking#Dwarkesh

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Dwarkesh’s note reads like a pretraining incident log: FLOPs are the easy part; causality leaks and numeric bias burn clusters quietly.

sharp

Pretraining failure is not mysticism; tiny engineering choices get amplified at cluster scale. Dwarkesh’s concrete hook is brutal: expert choice can make token n’s expert assignment depend on token n+k, and token dropping can let later tokens crowd out earlier ones. That is training-time information leakage that inference never gets. The FP16 collectives example is even uglier: after an accumulator passes 1024, adding 1 can round back to 1024, so 10,000 additions can land 10x wrong. Outside chatter still fixates on 6ND FLOPs, B300’s 288GB HBM, or FSDP traffic at parameters × 3. This note is a reminder that frontier training advantage includes boring competence: avoid dumb numerical bugs, then find the ones you still shipped.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

82

SCORE

H1·K1·R1

19:00

73d ago

FEATUREDDwarkesh Patel· rssEN19:00 · 05·16

→RLVR might be disproportionately bad at science

Dwarkesh argues that RLVR fits scientific discovery poorly, using heliocentrism’s 1543–1838 verification gap and Mercury’s 43-arcsecond-per-century precession as examples of long, ambiguous theory-evaluation loops.

#Reasoning#Alignment#Dwarkesh#Michael Nielsen

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Dwarkesh hits RLVR where it hurts: science is not LeetCode; the reward can arrive 200 years late and still favor the wrong theory.

sharp

RLVR breaks on scientific discovery because the reward is often late, noisy, and historically misleading. Dwarkesh’s examples are brutal: heliocentrism was published in 1543, but stellar parallax was not measured until 1838; Mercury’s extra 43 arcseconds per century pointed Newtonians toward Vulcan, then Einstein closed it with general relativity in 1915. That should make AI-research-booster claims sound less automatic. Code and math give dense feedback through tests, proof checkers, and SWE-bench-style evals. Science often runs on judgment, instrument availability, unification taste, and decades of ambiguous evidence. I don’t buy the straight line from “RLVR works on verifiable tasks” to “models will be unusually good scientists.” It lands first in simulatable, automatable, short-loop research, not in theory choice.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

82

SCORE

H1·K1·R1

19:00

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH19:00 · 05·16

→RLVR May Perform Disproportionately Poorly in Science

Dwarkesh argues that RLVR has a short-feedback weakness in scientific theory validation; the post says validation loops can span decades or centuries, and does not disclose experimental results or benchmark numbers.

#Reasoning#Alignment#Dwarkesh#Commentary

why featured

Featured · importance 81 · hook + knowledge + resonance

editor take

Dwarkesh hits RLVR where the hype is loudest: science is not LeetCode, and a loop closing in 1838 is not a reward signal.

sharp

RLVR’s weakness in science is not raw compute; it is late, messy reward. Dwarkesh’s best example is sharp: Copernicus’s 1543 model was not clearly better on accuracy or simplicity, Kepler’s laws arrived in 1619, Newton’s unification in 1686, and stellar parallax was measured only in 1838. That is not a training loop any current RLVR story can comfortably digest. I read this as a needed cold shower for the “science is verifiable, so RL will crush it” line. Code has tests, math has proof checkers, and AlphaGeometry-style tasks have clean graders. Theory choice does not. Neptune in 1846 is the success case; Mercury’s extra 43 arcseconds per century sent people hunting Vulcan before Einstein closed it in 1915. RLVR gets paid on short feedback. Science often pays out after a century of choosing which bad prediction to tolerate.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

81

SCORE

H1·K1·R1

18:58

73d ago

r/LocalLLaMA· rssEN18:58 · 05·16

→How I Started Programming Differently Over the Last Year. What About You?

Reddit user /u/ievkz says they stopped using LLM autocomplete in the IDE, now use a CLI coding agent with @-referenced files, and keep the IDE mainly for Git diffs, debugging, and navigation that they estimate covers 5-10% of their work.

#Agent#Code#Tools#JetBrains

editor take

The poster says IDE navigation/debugging is 5-10% of work. CLI agents replacing autocomplete tracks my experience.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

66

SCORE

H1·K1·R1

18:56

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH18:56 · 05·16

→Eric Jang shares lessons from building AlphaGo from scratch

Eric Jang spent several months implementing AlphaGo from scratch and says that in 2026, training a strong Go AI requires only a few thousand dollars in rented compute rather than DeepMind-scale resources.

#Reasoning#Code#Eric Jang#AlphaGo

why featured

Featured · importance 80 · hook + knowledge + resonance

editor take

A few thousand dollars to train strong Go AI is not AlphaGo nostalgia; it is a warning that today’s moat becomes tomorrow’s weekend project.

sharp

Eric Jang’s post is about cost collapse, not AlphaGo nostalgia. In 2016, DeepMind needed elite researchers, heavy engineering, and serious compute to crack Go. In 2026, Jang says one person can spend several months and a few thousand dollars in rented compute to train a strong Go system from scratch, with code and tutorials published. That is an ugly reminder for current agent and reasoning startups. Once a capability is well-scoped, search, recipes, and open implementations compress the moat fast. I would not overextend the analogy: Go has closed rules, clean rewards, and self-play. Most enterprise agent tasks do not. But the pattern is still brutal. The first path is expensive; the second path becomes a repo, a rental GPU bill, and patience.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

80

SCORE

H1·K1·R1

18:31

73d ago

AI HOT (Curated Pool)· aihot-apiZH18:31 · 05·16

→Customize Keyboard Shortcuts to Fit Your Workflow

OpenAI Devs says Codex now supports custom keyboard shortcuts through settings. Users can map shortcuts around their workflow, but the post does not disclose platform coverage, rollout timing, or version requirements.

#Code#Tools#OpenAI#Product update

editor take

Codex now supports custom shortcuts; platform and version are undisclosed. Small fix, but default keymaps finally stop dictating flow.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

63

SCORE

H0·K1·R1

18:12

73d ago

r/LocalLLaMA· rssEN18:12 · 05·16

→OpenReader: Open-source read-along document reader with TTS and audiobook export

OpenReader v3.0.0 ships an open-source TTS document reader for EPUB, PDF, DOCX, TXT, and Markdown, with OpenAI, Replicate, Deepinfra, or self-hosted OpenAI-compatible APIs, plus m4b/mp3 audiobook export with chapter metadata through ffmpeg.

#Audio#Tools#OpenReader#OpenAI

editor take

OpenReader v3.0.0 covers 5 formats to m4b/mp3; the body is 403-blocked, so I’d treat it as handy tooling.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

65

SCORE

H1·K1·R0

17:43

73d ago

Product Hunt · AI· rssEN17:43 · 05·16

→CtrlOps

CtrlOps says it uses AI to deploy, debug, and manage Linux servers; the post does not disclose pricing, permission controls, supported distributions, or operational safeguards.

#Agent#Code#Tools#CtrlOps

editor take

CtrlOps claims AI-managed Linux servers, but discloses no permission model; before prod, ask where the audit log lives.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

48

SCORE

H1·K0·R1

17:19

73d ago

r/LocalLLaMA· rssEN17:19 · 05·16

→Corsair desktop PC with Ryzen AI Max 395 and 128GB unified RAM: has anyone tested it for LLM?

A Reddit user posted a Corsair AI Workstation 300 listing with Ryzen AI Max 395, 128GB LPDDR5X memory, up to 96GB VRAM, and a 1TB SSD; the post does not disclose LLM throughput, tested model sizes, or the actual price.

#Inference-opt#Corsair#AMD#Reddit

editor take

Title says Ryzen AI Max 395 and 128GB; Reddit 403 hides tokens/s and price, so skip the value hype.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

46

SCORE

H1·K1·R1

17:02

73d ago

r/LocalLLaMA· rssEN17:02 · 05·16

→LLM Phone Home: Reliable Apps That Can Deliver Inference from a Local Backend

A Reddit user asks for an iOS app that can serve an OpenAI-compatible endpoint from a local backend and has tested Apollo, Locally AI, Noema, and 3 Sparks. The post says 3 Sparks works for endpoint use but lacks MCP and web search, while Noema fails to complete DeepSeek V4 Flash requests from a Mac Studio.

#Agent#Tools#Inference-opt#3 Sparks

editor take

Body is only a 403; four iOS clients are named, and local OpenAI endpoints still smell like tinkering, not dependable UX.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

46

SCORE

H0·K1·R1

17:00

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH17:00 · 05·16

→Latest Open Artifacts #21: Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1, and More

Open AI model teams released Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1, and other versions this month, and the post says they were tested under CAISI’s V4 evaluation framework, but the RSS snippet does not disclose scores.

#Benchmarking#Gemma#DeepSeek#Kimi

why featured

Featured · importance 80 · hook + knowledge + resonance

editor take

Don’t buy the “open model bonanza” framing too fast; CAISI’s V4 shows how benchmark choice can stretch the gap narrative.

sharp

CAISI is making the open-model gap sound cleaner than the evidence supports. The post says V4 uses nine benchmarks, but DeepSeek V4’s large Elo hit comes heavily from CTF-Archive-Diamond subset extrapolation, CAISI-private PortBench, and ARC-AGI-2 with scoring different from public leaderboards. One private benchmark plus two special-case treatments can bend the aggregate. I buy Interconnects’ pushback more than the headline. A bash loop with fixed token budget is not how Claude Code or OpenCode elicit coding models. The Bun Zig-to-Rust port with 1 million LOC changed is a nasty counterexample to benchmark claims that porting apps is currently impossible. Open models trail closed frontier models, but this Elo story is too dependent on the harness.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

80

SCORE

H1·K1·R1

16:41

73d ago

r/LocalLLaMA· rssEN16:41 · 05·16

→Strix Halo Llama.cpp MTP Benchmarks: 27B Gets Much Faster, 35B Is Mixed

Qwen3.6-27B-MTP reduced llama.cpp wall time from 258.65s to 200.55s in a 5-turn test reaching about 28.5k context, while Qwen3.6-35B-MTP increased wall time from 58.86s to 60.24s under the same setup.

#Inference-opt#Benchmarking#Qwen#Unsloth

editor take

Qwen3.6-27B-MTP hit 200.55s; body is 403, and 35B slowing to 60.24s kills blind MTP toggles.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

67

SCORE

H1·K1·R1

16:38

73d ago

AI HOT (Curated Pool)· aihot-apiZH16:38 · 05·16

→vLLM Adds Support for Trillion-Parameter Models

The title says vLLM supports trillion-parameter models, while the body only mentions Day 0 community collaboration and does not disclose the model name, exact parameter count, implementation details, or reproducible conditions.

#Inference-opt#vLLM#Product update#Open source

editor take

vLLM claims trillion-scale support, but gives no model name, size, or repro path; don’t treat Day 0 coordination as a perf win.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

63

SCORE

H1·K0·R1

16:05

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH16:05 · 05·16

→Ring-2.6-1T Open-Sourced and Listed on OpenRouter for Agent Workflows

AntLingAGI open-sourced Ring-2.6-1T and listed it on OpenRouter with a 75% discount through the end of May; the trillion-scale reasoning model targets agent workflows, including planning, tool use, context maintenance, and complex task execution, using Async RL and IcePop training methods.

#Agent#Reasoning#Tools#AntLingAGI

why featured

Featured · importance 74 · hook + knowledge + resonance

editor take

Ring-2.6-1T is chasing agent devs with open weights plus OpenRouter discounts; without evals or pricing, the 1T story gets a haircut.

sharp

Ring-2.6-1T reads more like a distribution test than a model-generation claim. AntLingAGI is stacking open source, OpenRouter access, and a 75% discount through May to lower trial friction. The positioning hits the hot agent checklist: planning, tool use, context maintenance, and complex workflow execution. I’m discounting the “trillion-scale reasoning model” line until the missing parts show up. The snippet gives no architecture, context window, baseline price, SWE-bench, τ-bench, or ToolBench results. It also names Async RL and IcePop without saying what training stage they touch. Open agent models do not need louder task-execution claims; they need reproducible traces on long-horizon failure, tool recovery, and state drift. OpenRouter can get Ring-2.6-1T sampled. It does not prove it belongs inside a production agent loop.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

74

SCORE

H1·K1·R1

15:37

73d ago

The Verge · AI· rssEN15:37 · 05·16

→Sony tries to explain that its AI Camera Assistant doesn’t suck

Sony says the Xperia 1 XIII AI Camera Assistant does not edit photos; it gives four suggestions for exposure, color, and background blur based on lighting, depth, and subject.

#Vision#Sony#The Verge#Product update

editor take

Sony’s AI Camera Assistant gives four shooting suggestions; the “photogenic angle” demo only shows zoom, so the AI label feels padded.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

61

SCORE

H1·K1·R0

15:28

73d ago

r/LocalLLaMA· rssEN15:28 · 05·16

→Local speech to text for iOS using Apple Watch

The author released Dictawiz for Apple Watch recording and local iPhone transcription, citing Parakeet and Whisper support plus integrations with Notion, Obsidian, custom webhooks, and a Cloudflare memory layer; the post does not disclose latency, pricing, model sizes, or accuracy metrics.

#Audio#Tools#Memory#Apple

editor take

Dictawiz records on Apple Watch and transcribes locally on iPhone; no latency, pricing, or accuracy, so I don't buy the productivity pitch yet.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

64

SCORE

H1·K1·R1

15:25

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH15:25 · 05·16

→SANA-WM: A 2.6B-Parameter Open-Source World Model for 1-Minute 720p Video

NVIDIA researchers released SANA-WM, a 2.6B-parameter open-source world model that generates videos up to 1 minute at 720p, and the project is available on its GitHub page.

#Multimodal#Vision#NVIDIA#SANA-WM

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

SANA-WM puts 60s 720p video into 2.6B params and one H100; NVIDIA is showing the cost curve, not just prettier demos.

sharp

SANA-WM’s sharp move is dragging “world model” back into reproducible engineering. It claims 2.6B parameters, 213K public clips, 15 days on 64 H100s, and one-H100 inference for 60s 720p video. The distilled path is the eye-catcher: NVFP4 on an RTX 5090 denoises a 60-second clip in 34 seconds. I don’t fully buy the visual-quality framing yet. The pipeline uses a 17B second-stage long-video refiner, model weights are still marked “soon,” and the 36x throughput claim comes from NVIDIA’s own benchmark. Compared with Sora or Veo-style cinematic generation, this reads more like a controllable camera-trajectory simulator. The 6-DoF adherence and metric pose annotation are the parts robotics and 3D data teams should care about.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

82

SCORE

H1·K1·R1

15:21

73d ago

FEATUREDHacker News Frontpage· rssEN15:21 · 05·16

→Tesla reveals two Robotaxi crashes involving teleoperators

Tesla disclosed two Robotaxi crashes involving teleoperators, according to the TechCrunch headline. The RSS snippet only lists 27 Hacker News points and 17 comments; the post does not disclose crash locations, injuries, dates, vehicle behavior, or the teleoperation handoff mechanism.

#Robotics#Tesla#TechCrunch#Hacker News

why featured

Featured · importance 73 · hook + knowledge + resonance

editor take

Tesla disclosed two Robotaxi crashes involving teleoperators, with no location, injury, or handoff details; human backup is not a safety case.

sharp

Tesla’s ugly word here is “teleoperators.” Once Robotaxi safety depends on remote humans, the incident is no longer just an autonomy failure. It exposes system boundaries, latency, and liability. The disclosed number is two crashes; the RSS copy gives only 27 HN points and 17 comments. Location, injuries, dates, vehicle behavior, and the handoff trigger are not given. Waymo has at least spent years spelling out rider-only zones, disengagement framing, and operating constraints. Tesla saying a teleoperator was involved, without saying whether that person monitored, advised, took control, or intervened after the fact, makes the disclosure thinner rather than safer. It drags the Robotaxi pitch away from end-to-end autonomy and back toward a remote-support safety pad.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

73

SCORE

H1·K1·R1

15:18

73d ago

r/LocalLLaMA· rssEN15:18 · 05·16

→Extension idea: llama-server with custom samplers

DeProgrammer99 proposed a llama-server custom sampler extension prototype, with one short C++ loop-detector example that breaks repeated 1-3 token loops seen in heavily quantized models. The branch targets llama.cpp master after MTP was merged, works with speculative decoding, and includes a Windows x64 Vulkan release plus an example command using Qwen3.6-27B with 32,768 context.

#Inference-opt#Code#Tools#DeProgrammer99

editor take

Title says llama-server custom samplers; body is 403, no patch details disclosed, so wait for a reproducible branch.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

63

SCORE

H0·K1·R1

14:54

73d ago

AI HOT (Curated Pool)· aihot-apiZH14:54 · 05·16

→Show HN: Burn, Baby, Burn (Those Tokens)

A developer open-sourced “Burn, Baby, Burn” on GitHub, providing a tool for users to burn their own tokens to reduce total supply; the Hacker News post reached 100 points.

#GitHub#Hacker News#Open source

editor take

GitHub body only shows chrome, HN has 100 points; a token-burn tool smells like a gag, not an AI signal.

HKR breakdown

hook —knowledge —resonance —

→ open source

28

SCORE

H0·K0·R0

14:40

73d ago

r/LocalLLaMA· rssEN14:40 · 05·16

→macOS support in Lemonade has graduated out of beta

Lemonade moved macOS support out of beta and says five capability areas are available: OmniRouter, coding, image generation, speech generation, and transcription; the post also states the local AI tool uses a 3 MB portable binary across Linux, Windows, and macOS.

#Multimodal#Code#Audio#Lemonade

editor take

Lemonade says macOS is stable with 5 capability areas; Reddit 403s, so I won't endorse the 3 MB binary claim.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

66

SCORE

H0·K1·R1

14:15

73d ago

r/LocalLLaMA· rssEN14:15 · 05·16

→Same double-pendulum prompt, same renderer, two models picked opposite θ conventions

The author tested Claude 3.5 Sonnet and DeepSeek V3 with the same double-pendulum contract, using θ1=π/2, θ2=π/2, and zero angular velocities; under one host renderer, the two outputs showed mirror-image behavior within one second.

#Code#Reasoning#Benchmarking#Claude 3.5 Sonnet

editor take

Same pendulum prompt split Claude 3.5 Sonnet and DeepSeek V3 within 1s; Reddit 403s, so don't benchmark from screenshots.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

68

SCORE

H1·K1·R1

13:46

73d ago

AI HOT (Curated Pool)· aihot-apiZH13:46 · 05·16

→Hangzhou Base Opens as a National Vocational Skills Training Site for Robots

The National AI Application Pilot Base for Embodied Intelligence opened in Hangzhou on May 16, and Hangzhou has gathered more than 700 robotics-related companies, with its embodied intelligence industrial cluster reaching 106.8 billion yuan in output value in 2025.

#Robotics#Hangzhou#国家人工智能应用中试基地#Policy

editor take

Hangzhou opened an embodied-AI pilot base with 700+ robotics firms; without open data and eval protocols, it's a policy showroom.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

66

SCORE

H1·K1·R0

13:05

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH13:05 · 05·16

→Anthropic Founder’s Playbook warns AI can raise startup failure rates

Anthropic published Founder’s Playbook, arguing that AI tools such as Claude Code reduce prototyping cost but increase startup failure risk across the Idea, MVP, Launch, and Scale stages through false validation, confirmation bias, agentic technical debt, and founder decision bottlenecks.

#Agent#Code#Tools#Anthropic

why featured

Featured · importance 73 · hook + knowledge + resonance

editor take

Anthropic is cooling its own Claude Code hype: cheap prototypes make bad founder judgment look like product velocity.

sharp

Anthropic is naming the self-deception around AI startups: Claude Code lowers prototype cost, then founders confuse “it runs” with “people want it.” The playbook’s useful hook is its four-stage failure map: Idea, MVP, Launch, Scale, with false validation, confirmation bias, agentic technical debt, and founder decision bottlenecks called out. That is much cleaner than the usual one-person-unicorn fantasy. I buy the Skills point: the durable asset is structured vertical knowledge, not prompt fluency. But Anthropic has skin in this framing. Blaming founder judgment is convenient when Claude Code itself can generate systems whose maintenance boundary is still fuzzy. “Agentic technical debt” should not become a polite way to make startups absorb model/tooling failure modes.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

73

SCORE

H1·K1·R1

12:49

73d ago

r/LocalLLaMA· rssEN12:49 · 05·16

→Built a 6x Cheaper CodeRabbit Alternative Using Open Source Models

Reddit user Axintwo says PrixAI uses open source models for PR review and detected 10 of 10 planted issues in a test PR, while costing 6x less than CodeRabbit’s stated $60 per month plan.

#Code#Agent#CodeRabbit#PrixAI

editor take

PrixAI claims 10/10 detections at 6x lower cost; the body is 403, with no model, repo, or repro script.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

70

SCORE

H1·K1·R1

12:11

73d ago

Product Hunt · AI· rssEN12:11 · 05·16

→pixserp

pixserp offers a live-web LLM endpoint with ten answer shapes, but the RSS post does not disclose pricing, supported models, latency, or API details.

#RAG#Tools#pixserp#Product update

editor take

pixserp discloses one endpoint and ten answer shapes; no models, latency, or pricing, so I’m filing this as a wrapper.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

42

SCORE

H0·K1·R0

12:06

73d ago

FEATUREDHacker News Frontpage· rssEN12:06 · 05·16

→SANA-WM, a 2.6B open-source world model for 1-minute 720p video

SANA-WM’s title says the project is a 2.6B open-source world model for 1-minute 720p video; the RSS body only lists the project URL, Hacker News comments URL, 9 points, and 8 comments, and the post does not disclose training data, license terms, inference cost, evaluation setup, or benchmark results.

#Multimodal#Vision#NVIDIA#Open source

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

SANA-WM’s 60s 720p single-GPU claim is strong; the “open-source” label is premature while model weights still say “soon.”

sharp

SANA-WM’s sharp claim is efficiency, not the “world model” branding. The page gives concrete numbers: 2.6B parameters, about 213K public clips, 15 days on 64 H100s, and one H100 for 60-second 720p generation. The distilled variant uses NVFP4 on an RTX 5090 and denoises a 60s clip in 34 seconds. If reproducible, that pulls minute-scale video back into an engineering conversation. The catch is the release shape. The model link still says “soon,” license terms are not visible, and the 36x throughput claim comes from NVIDIA’s own one-minute benchmark. Comparing this to Sora or Genie is the wrong fight; the better test is whether open long-video systems can keep 6-DoF camera control and late-window consistency on one GPU.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

11:34

73d ago

Hacker News Frontpage· rssEN11:34 · 05·16

→OpenClaw Creator Spent $1.3M on OpenAI Tokens in 30 Days

The title says the OpenClaw creator spent $1.3 million on OpenAI tokens in 30 days; the post does not disclose usage volume, model mix, pricing structure, or billing evidence.

#OpenClaw#OpenAI#Commentary

editor take

OpenClaw’s creator claims $1.3M in OpenAI tokens over 30 days; without bills or model mix, I treat it as spend-bragging.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

68

SCORE

H1·K0·R1

11:03

73d ago

r/LocalLLaMA· rssEN11:03 · 05·16

→Reduce Your GPU Power Limit

Reddit user NotArticuno tested GPU power-limit changes against TG128 generation and PP512 processing, likely using qwen3.5:9b; the post does not disclose the exact GPU model or numeric results in the RSS body.

#Inference-opt#NotArticuno#Qwen#Commentary

editor take

Title says lower GPU power limits; body is 403. No GPU model or tok/s, so don't call this inference optimization yet.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

52

SCORE

H1·K0·R1

10:22

73d ago

FEATUREDSynced (机器之心) · WeChat· rssZH10:22 · 05·16

→Why Robots Need World Models: Top Institutions Release Joint Survey

NTU MARS Lab and collaborators released a 43-page survey on robot world models, covering definitions, architectures, applications, benchmarks, and challenges around action-conditioned consistency, inference efficiency, and physical grounding.

#Robotics#Multimodal#Benchmarking#NTU MARS Lab

why featured

Featured · importance 72 · hook + knowledge

editor take

The 43-page survey pulls robot world models out of video-gen hype; I buy the framing, but closed-loop task gains are the only scoreboard.

sharp

Robot world models are getting dragged into the wrong story: prettier video is not sturdier control. This 43-page NTU MARS Lab survey lands on the right fault line: a useful model predicts the state after a specific action, not a plausible future clip. The concrete hook is the evaluation shift from open-loop visual fidelity to closed-loop task utility, with LIBERO, RoboTwin, CALVIN, and SIMPLER named as task grounds. I buy that framing. VLA systems made “image-plus-language to action” look clean, but contact, occlusion, long-horizon drift, and recovery stay ugly. Cosmos Policy and VideoVLA-style systems need to prove action-conditioned consistency and inference latency, not just rollout aesthetics. Until then, a lot of robot world-model work is still video prior dressed as control.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

72

SCORE

H1·K1·R0

10:22

73d ago

Synced (机器之心) · WeChat· rssZH10:22 · 05·16

→Anthropic Brings Claude Code to a Card-Sized Computer

Anthropic gave developers a Cardputer at its Code With Claude event, and the post says the ESP32-S3 handheld development board can run the full Claude Code.

#Code#Tools#Anthropic#Claude

editor take

Cardputer running Claude Code cites a GitHub link, with no local inference disclosed; this smells like terminal-wrapper demo art.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

69

SCORE

H1·K1·R1

10:22

73d ago

Synced (机器之心) · WeChat· rssZH10:22 · 05·16

→This Time, Robots Compete on Work, Not Flashy Demos

The 2026 Hangzhou International Embodied Robot Scenario Application Competition set three tracks and tested more than 200 teams in real scenarios including fire rescue, power inspection, data centers, underwater rescue, and warehouse logistics.

#Robotics#Agent#Multimodal#机器之心

editor take

Hangzhou tested 200+ robot teams in field-like tasks; useful, but no completion rates, failure rates, or procurement data yet.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

68

SCORE

H1·K1·R1

09:30

73d ago

Hacker News Frontpage· rssEN09:30 · 05·16

→Δ-Mem: Efficient Online Memory for Large Language Models

The title presents Δ-Mem as an efficient online memory method for large language models; the post only discloses an arXiv URL, 36 Hacker News points, and 8 comments, and does not disclose the mechanism, benchmark results, model scale, latency, memory cost, or code availability.

#Memory#Research release

editor take

δ-mem claims 1.10× average gain with an 8×8 state; I buy the lightweight-memory angle, not agent longevity without code.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

62

SCORE

H1·K0·R1

08:52

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH08:52 · 05·16

→Researchers use Anthropic Mythos to build a macOS kernel exploit bypassing Apple M5 MIE

Three researchers used Anthropic Mythos to develop a macOS kernel exploit in six days, moving from discovery on April 25 to completion on May 1, bypassing Apple’s MIE memory-integrity system for M5 and A19 chips and gaining root via standard unprivileged system calls; the full technical report will follow Apple’s patch.

#Agent#Code#Safety#Anthropic

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Mythos turned an M5/A19 MIE bypass into a six-day kernel exploit; that is commercial agents compressing the 0day weaponization window.

sharp

Mythos just compressed the expensive part of exploit work to six days. Apple spent five years on MIE for M5 and A19, and three researchers bypassed it with a pure data attack. The hook is brutal: discovered April 25, finished May 1, root via standard unprivileged system calls, no pointer manipulation, full report delayed until Apple ships a patch. I don’t read this as a clean Anthropic safety win. It smells like agentic coding arriving on both sides of security at once. For the last year, vendors sold SWE-bench as “models fix bugs.” This story puts the same capability into macOS kernel exploit development. The scary part is not that a model can write exploit code. It is that expert scarcity moves from “can anyone do it” to “who runs the loop first.”

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

82

SCORE

H1·K1·R1

08:10

73d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH08:10 · 05·16

→Codex adds multi-device remote control and shared context

Codex controls multiple devices through ChatGPT, switches by project to access each device’s context and files, and supports remote SSH setup for other VMs.

#Agent#Tools#Code#Codex

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

Codex is moving past IDE helper into remote machine control; the snippet lacks permissions and audit details, so I’d treat it as high-value agent infra with sharp risk.

sharp

Codex is pushing ChatGPT from code assistant into a remote machine control surface. That is a bigger deal than another coding benchmark. The concrete hook here is project-based switching across device context and files, plus remote SSH setup for other VMs. If that works as described, ChatGPT becomes the entry point for operating several dev environments. I’m wary of the product story. GitHub Copilot Workspace, Cursor, and Devin mostly fight inside repos, sandboxes, or hosted environments. Codex touching local machines and VMs raises a nastier set of questions: permission scope, command audit, rollback, secret handling, and blast radius. The snippet does not disclose those controls. Without them, this is great demo material and scary production plumbing.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

76

SCORE

H1·K1·R1

07:28

73d ago

AI Chat-Group Daily (群聊日报)· atomZH07:28 · 05·16

→2026-05-15 Chat Group Daily

The chat-group daily summarizes 5 AI discussion areas: Bloomberg reported a 0.2% employment drop across 18 BLS-labeled AI-exposed occupations, while Anthropic reset Claude Code 5-hour and weekly rate limits without changing the original reset schedule.

#Agent#Code#Tools#Bloomberg

editor take

Bloomberg says 18 AI-exposed jobs fell 0.2%; technical writers dropped 18.1%, while Claude Code's reset is just candy.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

66

SCORE

H1·K1·R1

more

✕

feeds

hot events daily column all posts podcasts curated X monitor saved sources newsletter agent access

admin

usage system newsletter curation iterations users