posts · 2026-05-24

▸ 48 items · updated 3m ago

May 2026

MTWTFSS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2573 26105 27120 28142 29116 3064 3162

June 2026

MTWTFSS

1150 2157 3132 4117 5127 669 773 8141 9135 1084 1196 1288 1346 1434 1570 1682 1775 1886 1955 2027 2120 2274 2374 2468 2564 2640 2724 2837 2956 3083

July 2026

MTWTFSS

156 271 347 421 527 664 758 865 975 1050 1134 1228 1345 1484 1582 1683 1745 1818 1938 2051 2170 2265 2340 24 25 26 27 28293031

2026-05-24 · Sun

22:21

64d ago

r/LocalLLaMA· rssEN22:21 · 05·24

→hipEngine: Fast Native Qwen 3.6 Inference for RDNA3

hipEngine released an AGPLv3 ROCm-native inference engine for Qwen3.6 on RDNA3 GPUs; on Qwen3.6 35B-A3B at 128K context with INT8 KV cache, it reports 20.89 GiB allocator peak, 1076.5 tok/s prefill, and 60.0 tok/s decode.

#Inference-opt#hipEngine#Qwen#AMD

editor take

hipEngine claims 60 tok/s decode for Qwen3.6 35B-A3B on RDNA3; Reddit 403 blocks license and repro checks.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

22:13

64d ago

AI HOT (Curated Pool)· aihot-apiZH22:13 · 05·24

→Luma Agents Enables Scaled Authentic UGC Ad Generation

Luma Labs says Luma Agents generates UGC-style ads from a defined brief and style settings; the post does not disclose generation volume, pricing, model details, or ad deployment conditions.

#Agent#Luma Labs#Product update

editor take

Luma Agents only discloses brief and style inputs; volume, pricing, and deployment are missing, so I’d file it as ad-creative tooling.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

19:23

65d ago

r/LocalLLaMA· rssEN19:23 · 05·24

→What frontend do you guys use?

Reddit user Borkato asks the LocalLLaMA community which frontend they use; the post only discloses that the author uses Vim with a custom text-completion plugin and views llama-server as a sensible but limited default.

#Code#Tools#Reddit#LocalLLaMA

editor take

Borkato uses Vim plus a custom completion plugin; no comment breakdown disclosed. LocalLLaMA frontends still smell artisanal.

HKR breakdown

hook —knowledge —resonance ✓

→ open source

SCORE

H0·K0·R1

19:10

65d ago

FEATUREDr/LocalLLaMA· rssEN19:10 · 05·24

→Users Successfully Run Large Language Model Qwen 3.6 on Consumer GPUs

A Reddit user ran unsloth qwen3.6-35B-a3b-MTP-GGUF UD Q4_K_XL in LMStudio on Windows with a GTX 1060 6GB, 32GB DDR3, and an E5-2698v3; the setup used ctx length 131072, 41 GPU-offload layers, KV Q4_0, and reported about 130-150 tps prefill at 16k and 16 tps decode at 4k.

#Inference-opt#Qwen#LMStudio#Reddit

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Two LocalLLaMA posts test Qwen 3.6 on consumer GPUs; the body is 403-blocked, so 4.5 t/s is a field signal, not a model verdict.

sharp

Two Reddit posts point the same way: users are testing Qwen 3.6 on a GTX 1060 6GB and a 3080 Ti; the only visible number is 4.5 t/s for 27B MTP on the 3080 Ti, while the body is 403-blocked. That is a narrow signal, but a useful one for local inference people: the fight has moved from leaderboard bragging to VRAM, quantization, and whether MTP-style decoding makes 27B/35B usable on old cards. I'll be real: 4.5 t/s is rough for live writing, but acceptable for offline agent loops or batch work. Treating it like a Qwen3-Coder or DeepSeek-R1 experience claim would be sloppy.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

19:00

65d ago

TechCrunch AI· rssEN19:00 · 05·24

→Xreal, Google’s smart glasses partner, says it has mastered the tricky smart glasses industry

Xreal founder and CEO Chi Xu says the smart glasses business has reached a turning point, but the RSS snippet does not disclose Google partnership details, product specifications, pricing, or a launch timeline.

#Vision#Xreal#Google#Chi Xu

editor take

Chi Xu calls smart glasses at a turning point; no specs, pricing, or timeline disclosed, so I don’t buy it yet.

HKR breakdown

hook ✓knowledge —resonance —

→ open source

SCORE

H1·K0·R0

18:08

65d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH18:08 · 05·24

→DeepSeek to Make Permanent 75% Discount on Flagship AI Model

The title says DeepSeek will make a 75% discount on its flagship AI model permanent; the post does not disclose the model name, applicable API, start date, or original price.

#DeepSeek#Product update

why featured

Featured · importance 77 · hook + knowledge + resonance

editor take

DeepSeek making a 75% flagship discount permanent is not a sale; it drags the margin anchor for rival closed APIs lower.

sharp

DeepSeek is cutting price expectations, not running a customer-acquisition coupon. The title says its flagship AI model gets a permanent 75% discount, while the model name, API surface, start date, and original price are not given. Thin disclosure, hard move: “permanent” lands in procurement sheets and default developer routing. Alibaba, Zhipu, and MiniMax have all used price cuts, but many looked like timed campaigns. If DeepSeek keeps a flagship tier at one-quarter of list price, buyers will ask why comparable closed APIs deserve 4x pricing. The catch is basic but important: no latency, rate-limit, context-window, or batch-pricing data is disclosed. A 75% headline can still hide worse total cost if throughput or availability is constrained.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

17:46

65d ago

r/LocalLLaMA· rssEN17:46 · 05·24

→OCR: granite-docling-258m vs granite-docling-2stage-258m: has anyone noticed improvements?

A Reddit user compares IBM granite-docling-258M with granite-docling-2stage-258m; the post only says the 2stage version uses a dynamic prompt to precompute page layout objects, and it does not disclose OCR benchmarks or accuracy numbers.

#Vision#IBM#Reddit#Granite Docling

editor take

Only the title and a 403 page are visible; no OCR metrics, so don’t treat 258M two-stage gains as proven.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

17:18

65d ago

AI HOT (Curated Pool)· aihot-apiZH17:18 · 05·24

→Self-optimizing prompt framework for Codex

The prompt framework instructs Codex to review sessions and Memories, select repeated tasks that appear at least twice with stable inputs, and convert them into skills, subagents, or automation tools while avoiding duplicate assets.

#Code#Agent#Memory#Codex

editor take

Codex uses “twice repeated + stable inputs” as the filter; I buy that threshold—agent memory should learn chores before taste.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

17:00

65d ago

FEATUREDFinancial Times · Technology· rssEN17:00 · 05·24

→ECB Orders Banks to Fix Security Flaws Exposed by AI Models

The ECB summoned banks to a hastily arranged meeting to push fixes for flaws exposed by the latest AI models; the RSS snippet says supervisors will stress financial-system risks but does not disclose the banks involved, flaw categories, or remediation deadlines.

#European Central Bank#Policy

why featured

Featured · importance 76 · hook + resonance

editor take

ECB summoned banks to fix risk-control flaws that the latest AI models can expose—this isn't a generic warning, it means stress tests already found concrete holes.

sharp

Both FT and Bloomberg covered this, but Bloomberg's headline explicitly credits FT, so we're looking at a single original source. The FT article is behind a paywall, so I can't see which models, which flaws, or which banks are involved. But the fact that ECB convened banks in person—rather than issuing a routine guidance note—suggests this isn't theoretical. Regulators don't call emergency meetings over hypotheticals. More likely, internal red-team exercises or audits already surfaced real cases where new large models were used to bypass anti-fraud or credit-scoring systems. I'd discount the confidence a bit until we see the actual flaw types, the bank list, and the remediation timeline. If a bank responds publicly or ECB releases a formal report, this gets a lot more solid.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

16:31

65d ago

FEATUREDHacker News Frontpage· rssEN16:31 · 05·24

→Memory has grown to nearly two-thirds of AI chip component costs

Epoch AI says memory has grown to nearly two-thirds of AI chip component costs; the RSS body only lists the article URL, 68 points, and 71 comments, and the post does not disclose the methodology or sample scope.

#Inference-opt#Epoch AI#Commentary

why featured

Featured · importance 74 · hook + knowledge + resonance

editor take

Memory at 63% of AI chip component cost is a loud warning against FLOPS-only thinking; methodology is missing here, so treat it as direction, not gospel.

sharp

The 63% figure drags AI chip economics back to bandwidth, not raw FLOPS. Epoch AI’s title says memory is 63% of component cost, but the captured body only shows navigation and the title. It gives no sample scope, BOM definition, HBM generation, packaging split, or methodology. I buy the direction, not the precision. H100/H200 and Blackwell economics already made HBM3E, CoWoS, and advanced packaging the pressure points. If memory really takes nearly two-thirds of component cost, inference pricing cannot be discussed without KV cache, quantization, speculative decoding, and memory bandwidth. Put 63% in the memo; don’t put it straight into a financial model.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

16:24

65d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH16:24 · 05·24

→TrapDoor Supply Chain Attack Makes AI Assistants a New Attack Surface

TrapDoor hit npm, PyPI, and Crates.io with 34 malicious packages, using manipulated CLAUDE.md and .cursorrules files in pull requests to make Claude Code and Cursor treat attacker content as trusted instructions and run malicious commands.

#Agent#Code#Safety#npm

why featured

Featured · importance 80 · hook + knowledge + resonance

editor take

TrapDoor turns CLAUDE.md and .cursorrules into supply-chain payloads; coding agents are now paying for treating repo text as authority.

sharp

TrapDoor’s sharp edge is not the 34 malicious packages; it is the break in context trust. The campaign hit npm, PyPI, and Crates.io, targeting wallets, SSH keys, and cloud credentials. The wild part is the delivery path: PRs injected manipulated CLAUDE.md and .cursorrules files, then Claude Code and Cursor treated repo text as project authority. That is exactly the security debt coding agents created by making “read the repo rules” a default behavior. Package scanners can flag typosquats; they are much worse at deciding whether an instruction file is hostile.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

15:24

65d ago

● P1r/LocalLLaMA· rssEN15:24 · 05·24

→OpenBMB Releases BitCPM-CANN 1.58-Bit Language Model Training on Ascend NPU

OpenBMB released BitCPM-CANN, a 1.58-bit QAT training stack on Ascend NPU with 0.5B, 1B, 3B, and 8B models trained from scratch, where the 1B to 8B variants retain 95.7%–97.2% of full-precision MiniCPM4 performance across 11 benchmarks.

#Fine-tuning#Inference-opt#Benchmarking#OpenBMB

why featured

Featured · importance 96 · hook + knowledge + resonance

editor take

BitCPM-CANN gets 1.58-bit QAT to 8B on Ascend 910B; treat this less as a model drop and more as a low-bit training proof for non-CUDA stacks.

sharp

All 3 items track the same OpenBMB paper and repo, so this is an official technical-release chain, not independent benchmark validation. BitCPM-CANN trains 0.5B/1B/3B/8B models on Huawei Ascend 910B, with the 1B–8B variants retaining 95.7%–97.2% of full-precision MiniCPM4 performance and QAT adding 4.5% throughput overhead. That 4.5% is the sharper claim than the “first domestic NPU” framing. I read this as an infrastructure event, not an 8B model event. Getting CANN, MindSpeed, and Megatron-LM wired for end-to-end 1.58-bit training gives Ascend a reproducible low-bit path outside CUDA. I would not overread the Qwen3-8B comparison: the post says MiniCPM4 used 8T tokens versus Qwen3-8B’s 36T, but BitCPM-CANN still needs public latency and serving-throughput numbers.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

15:05

65d ago

AI HOT (Curated Pool)· aihot-apiZH15:05 · 05·24

→Pixverse Tests a Character Design Workflow

Pixverse tested a character design workflow that uses GPT Image 2.0 to create Lucas’s visual concept and Seedance 2.0 to generate an animated bouncing performance.

#Multimodal#Vision#Pixverse#GPT Image 2.0

editor take

Pixverse chains GPT Image 2.0 with Seedance 2.0. No frame consistency or control data is shown, so ignore the “cinematic” claim.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

SCORE

H0·K1·R0

15:02

65d ago

r/LocalLLaMA· rssEN15:02 · 05·24

→GPU VRAM only for small models with llama.cpp: is it possible?

A Reddit user running llama.cpp on an RTX 4070 with 12GB VRAM says Gemma4 26B and Qwen 3.6 35B MoE reach about 40 t/s; he asks whether a Qwen3.5-9B quant can run entirely in VRAM, because gemma4-e2b Q4_IXS still uses about 3.5GB of host RAM at 8192 context.

#Inference-opt#Reddit#Qwen#Gemma

editor take

RTX 4070 12GB hits 40 t/s, but Reddit body is 403; I don't buy any all-VRAM claim without llama.cpp flags.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

15:00

65d ago

TechCrunch AI· rssEN15:00 · 05·24

→I Tried Amazon’s Bee Wearable and Am Both Intrigued and Slightly Creeped Out

TechCrunch tried Amazon’s Bee wearable and described it as combining convenience with privacy anxiety; the RSS snippet does not disclose price, sensor specifications, launch timing, or availability conditions.

#Audio#Memory#Amazon#TechCrunch

editor take

Amazon Bee has only “convenience plus privacy anxiety”; no price, sensors, or launch terms, so this smells like another AI Pin trial balloon.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

14:22

65d ago

r/LocalLLaMA· rssEN14:22 · 05·24

→Gemma 4 2B handles structured JSON, tool calling, and reasoning traces via Spring AI / LM Studio

A Reddit user tested Gemma 4 2B locally through LM Studio and Spring AI on three tasks. It returned schema-valid JSON, called a weather tool with Riga as the parameter, exposed reasoning_content, and scored a Java review 50/100 after finding a string == bug.

#Tools#Reasoning#Code#Google

editor take

Gemma 4 2B has only a title-level 3-task test; 403 hides prompts and sampling, so I won’t treat it as evidence.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

14:09

65d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH14:09 · 05·24

→Greg Brockman: The 72 Hours That Nearly Destroyed OpenAI

The title says Greg Brockman discusses the 72 hours that nearly destroyed OpenAI, but the post does not disclose the timeline, participants, or specific mechanisms behind the crisis.

#Greg Brockman#OpenAI#Commentary

why featured

Featured · importance 72 · hook + resonance

editor take

Brockman frames the 72 hours as resilience, but Phoenix and Ilya’s tweet make OpenAI’s old governance look shockingly improvised.

sharp

OpenAI’s 72-hour story lands less like founder lore and more like a governance autopsy with missing organs. The hard details are ugly enough: after Sam Altman was fired, Greg Brockman quit the same day; the next morning, they designed a “Phoenix” backup company at Sam’s house; Ilya Sutskever’s tweet then changed the trajectory. For a company now tied to GPT-5, massive compute allocation, and enterprise dependency, that is a wild failure mode. I don’t buy the clean “we had to leave the pure nonprofit structure” framing without the board mechanics. The post names the crisis beats, but gives no voting rules, investor triggers, Microsoft leverage, or internal safety dispute details. That makes this useful as Brockman’s version of the war story, not a serious account of why OpenAI’s old control system broke.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

14:09

65d ago

● P1Hacker News Frontpage· rssEN14:09 · 05·24

→DeepSeek Announces Permanent 75% Price Cut on Flagship AI Model

Bloomberg’s headline says DeepSeek will make a 75% discount on its flagship AI model permanent; the RSS body only lists the Hacker News entry with 46 points and 45 comments, and the post does not disclose the model name, pricing, or effective date.

#DeepSeek#Bloomberg#Hacker News#Product update

why featured

Featured · importance 89 · hook + knowledge + resonance

editor take

DeepSeek made the 75% flagship discount permanent; stop calling this promo pricing. The closed-model API margin story just took another cut.

sharp

Three headlines align on the same payload: DeepSeek is making a permanent 75% discount on its flagship AI model. That looks like one Bloomberg-led source chain; the scraped body does not disclose the model name, original price, or token pricing. My read: DeepSeek is turning discounting from a customer-acquisition tactic into the reference price. A 75% permanent cut changes procurement math, not just developer sentiment. OpenAI and Anthropic can still defend premium pricing with tools, enterprise controls, and long-context workflows. The exposed layer is everyone reselling “good enough” inference with thin differentiation. If your pitch is model access plus a wrapper, DeepSeek just made your gross margin look fictional.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:05

65d ago

r/LocalLLaMA· rssEN13:05 · 05·24

→Qwen3.6-35B-A3B vs Gemma4-26B-A4B

Reddit user MarcCDB compares Qwen3.6-35B-A3B with Gemma4-26B-A4B, saying Gemma4 runs faster on a Radeon 9070 XT with the latest llama.cpp, while the post does not disclose benchmark scores or prompt conditions.

#Inference-opt#Benchmarking#Qwen#Gemma

editor take

Gemma4-26B-A4B is faster on 9070 XT, but no scores; Reddit 403 makes this a lead, not evidence.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

13:02

65d ago

Hacker News Frontpage· rssEN13:02 · 05·24

→DeepSeek Reasonix, a DeepSeek-native coding agent with high caching and low cost

The title identifies DeepSeek Reasonix as a DeepSeek-native coding agent focused on high caching and low cost; the post only discloses 41 points and 24 comments, and does not disclose its caching mechanism, pricing, benchmark results, or coding capability details.

#Agent#Code#Inference-opt#DeepSeek

editor take

Reasonix claims 94% cache hit and 2.5× lower cost; I buy the cache-first angle, but coding quality lacks benchmarks.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

12:55

65d ago

Hacker News Frontpage· rssEN12:55 · 05·24

→Constraint Decay: The Fragility of LLM Agents in Back End Code Generation

The title states that Constraint Decay studies LLM agent fragility in back-end code generation; the RSS body only discloses an arXiv link, 13 Hacker News points, and 3 comments, and the post does not disclose methods, models, metrics, or results.

#Agent#Code#Research release

editor take

Across 80 greenfield tasks, added structural constraints cut pass rates by 30 points; ORM and framework conventions still break agents.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

12:05

65d ago

AI HOT (Curated Pool)· aihot-apiZH12:05 · 05·24

→Claude Code automatic mode: a key technique for parallel tasks

The author says Claude Code automatic mode removes permission prompts, letting a user start one session and work on another session in parallel while the first keeps running.

#Agent#Code#Tools#Claude

editor take

Claude Code auto mode removes permission prompts. Parallel sessions sound useful, but the snippet omits sandboxing and rollback details.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

11:31

65d ago

r/LocalLLaMA· rssEN11:31 · 05·24

→Qwen Plays DCSS: qwen3.6-35b-a3b@q4_k_xl Handles the Open-Source Roguelike Better Without MTP

A Reddit user ran qwen3.6-35b-a3b@q4_k_xl on DCSS with 240k context, 8k output, 0.6 temperature, and LM Studio on an RTX 5090; the non-MTP build handled gameplay, while the MTP build produced malformed tool calls and repeated wrong tool calls.

#Agent#Tools#Vision#Qwen

editor take

Qwen3.6-35B ran DCSS with 240k context; MTP tool calls broke, so this smells like an agent regression test.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

11:12

65d ago

r/LocalLLaMA· rssEN11:12 · 05·24

→Gemma 4 E2B quality degrades after ~30-40 continuous inferences on 4GB VRAM?

A user ran Gemma 4 E2B through llama-server on a GTX 1650 with 4GB VRAM, and after about 30-40 calls the outputs became shorter, missed JSON fields, or returned empty; restarting llama-server immediately restored quality.

#Inference-opt#Gemma#llama-server#NVIDIA

editor take

Title says Gemma 4 E2B degrades after 30-40 calls on GTX 1650 4GB; body is 403, so inspect llama-server leakage first.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

11:02

65d ago

FEATUREDr/LocalLLaMA· rssEN11:02 · 05·24

→Using llama.cpp native tools for web RAG inside llama-server WebUI

A Reddit user describes using llama.cpp native tools for web RAG inside llama-server WebUI with a 7-step setup: enable get_datetime and exec_shell_command, then run wget through firejail, a separate Linux user, and an Alpine OCI VM sandbox.

#RAG#Tools#Agent#llama.cpp

why featured

Featured · importance 72 · hook + knowledge + resonance

editor take

Only the title and summary are visible; Reddit 403 blocks the body. Still, llama.cpp web_fetch inside WebUI turns sandboxing into product work.

sharp

llama.cpp becomes a security product the moment tool calling reaches the WebUI. The summary gives a 7-step setup: enable get_datetime and exec_shell_command, then run wget through firejail, a separate Linux user, and an Alpine OCI VM. That is ugly plumbing, but it points at the right failure mode: web RAG risk is not retrieval; it is letting page text sit near command execution. Reddit returns 403, so I cannot verify the prompts, permission flags, or llama-server version. Still, this is more useful than another hosted agent demo. Local agents do not get managed egress, filesystem policy, identity, or audit logs for free. The user ends up assembling a small security platform around one wget call.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

10:17

65d ago

r/LocalLLaMA· rssEN10:17 · 05·24

→What workstation to get for ~13k EUR?

A Reddit user compares a 13,000 EUR M5 Ultra Mac Studio against an RTX PRO 5000 workstation for local testing of 30B-35B open-weight LLMs, 262k-token context, harnesses, and inference systems, while excluding local fine-tuning because renting a B200 on RunPod is sufficient for that workload.

#Inference-opt#Fine-tuning#Reddit#RunPod

editor take

Only a 403 body; title says €13k. First compute 262k-token KV cache, then stop fetishizing Mac memory bandwidth.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

08:45

65d ago

r/LocalLLaMA· rssEN08:45 · 05·24

→Frustrating results with product searching

A Reddit user tested a gemma4 26b agent for product research, and it finished in 1 minute with the wrong direction and generic categories; Claude Sonnet 4.6 searched longer, but only produced concrete product candidates after a second prompt excluding manufacturers without matching products.

#Agent#Tools#Gemma#Claude

editor take

Body is just Reddit 403; test details are missing. A 1-minute wrong search smells like bad retrieval policy, not model failure.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

08:29

65d ago

Hacker News Frontpage· rssEN08:29 · 05·24

→Greg Brockman: Inside the 72 Hours That Almost Killed OpenAI

The title says Greg Brockman discusses the 72 hours that nearly killed OpenAI; the RSS body only lists the article URL, Hacker News comments URL, 4 points, and 0 comments, and the post does not disclose event details.

#Greg Brockman#OpenAI#Commentary

editor take

The page gives 6 clip timestamps, not OpenAI’s AI-written code share; I’d skip to 40:38 on hidden reasoning traces.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

07:30

65d ago

AI Chat-Group Daily (群聊日报)· atomZH07:30 · 05·24

→2026-05-23 Chat Group Daily

The chat group daily records discussion around a coding-plan infographic: a $200/month plan is valued at $8,000–$10,000 in API-equivalent usage, while MIT HAN Lab open-sourced KDA and placed in the top three at MLSys 2026.

#Agent#Code#Inference-opt#Microsoft

editor take

A $200 coding plan maps to $8K–$10K API value; looks like subsidy arbitrage, not durable pricing.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

07:00

65d ago

FEATUREDSynced (机器之心) · WeChat· rssZH07:00 · 05·24

→ICML 2026: First Parallel Thinking Framework for Vision-Language Models

Visual Para-Thinker introduces a parallel thinking framework for vision-language models, using Pa-Attention and LPRoPE to isolate four visual reasoning paths and training on 163,000 question-answer pairs.

#Multimodal#Vision#Reasoning#Visual Para-Thinker

why featured

Featured · importance 79 · hook + knowledge + resonance

editor take

Visual Para-Thinker splits VLM reasoning into four visual paths; I buy the mechanism, not the “first framework” victory lap.

sharp

Visual Para-Thinker’s useful part is the mechanism, not the “parallel thinking” branding. It isolates four visual reasoning paths with Pa-Attention, keeps shared position ranges unbiased, then adds LPRoPE so paths stay distinguishable. The training set is also concrete: 163,000 QA pairs distilled mainly from Qwen3-VL-235B-A22B-Instruct. That targets a real VLM failure mode. Long CoT often dilutes attention over visual tokens, which shows up as hallucination rather than better reasoning. The reported gains are nontrivial: +12.6 / +6.3 on V* for 3B / 7B, and +6.1 / +5.0 on HallusionBench. I don’t buy the “first framework” framing, since K2.5, Step3-VL, and LongCat-Flash-Thinking already explored reasoning width. This reads more like a clean VLM-specific patch; the open question is whether it holds outside curated perception benchmarks.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

07:00

65d ago

FEATUREDSynced (机器之心) · WeChat· rssZH07:00 · 05·24

→Meta layoff survivors face a difficult choice

Meta is pushing some post-layoff employees into new roles: some engineering managers are returning to IC work, while some Infra and AI engineers are being reassigned to data labeling; the article cites a manager-to-report ratio shift from 1:8 to 1:50 and says Meta holds a 49% stake in Scale AI.

#Agent#Fine-tuning#Meta#Scale AI

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

Meta is pushing managers back to IC and infra/AI engineers into labeling; this smells less like efficiency and more like attrition by humiliation.

sharp

Meta’s sharp move is not layoffs; it is repricing expensive engineering labor as interchangeable workflow. The article gives two concrete hooks: manager span moving from 1:8 to 1:50, and infra plus AI engineers being reassigned to data labeling. The first cuts middle management. The second is harsher: distributed-systems talent gets harvested for “expert labeling.” I don’t buy the clean “data moat” story. Meta reportedly holds 49% of Scale AI, yet still pushes internal engineers into labeling. That smells like a retention filter: people who tolerate it stay, the expensive people with market value leave first. OpenAI and Anthropic also chase high-quality data, but they rarely make scarce engineers visibly look like a labeling line.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

06:08

65d ago

r/LocalLLaMA· rssEN06:08 · 05·24

→Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP

A Reddit user shared Hugging Face links for Qwen3.6-35B-A3B Uncensored Genesis V2 in GGUF and FP8 Safetensors formats, and reported Q8_K_P MTP quantization tests on Beelink GTR9 Pro plus Strix Halo hardware: 5 sessions at 200k context had no glitches, loops, or repeated tool calls, and a task switch after 120k tokens completed correctly.

#Code#Tools#Inference-opt#Qwen

editor take

Title says Qwen3.6-35B-A3B has GGUF/FP8 builds; body is 403, so the 200k no-loop claim is poster-only.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

04:51

65d ago

r/LocalLLaMA· rssEN04:51 · 05·24

→I built a local GUI for the TradingAgents framework — works with Ollama

AI_Trenches forked TradingAgents and added a local web GUI with support for 10 LLM providers, including OpenAI, Anthropic, Ollama, Qwen, and DeepSeek; the concise report mode saves about 50% of tokens.

#Agent#Tools#RAG#TradingAgents

editor take

Title claims a local GUI with 10 providers; Reddit 403 hides the repo, so I’d treat this as a demo post.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

04:09

65d ago

FEATUREDAI Era (新智元) · WeChat· rssZH04:09 · 05·24

→Anthropic’s Three Cards Surface: Mythos 1 Appears, Opus 4.8 Spotted

Xinzhiyuan says Anthropic’s claude-opus-4.8 appeared in Google Vertex AI, while a 59.8MB Claude Code source-map leak with 512,000 TypeScript lines exposed Sonnet 4.8 references and Mythos 1 clues tied to Claude Code and Claude Security.

#Code#Safety#Vision#Anthropic

why featured

Featured · importance 80 · hook + knowledge + resonance

editor take

Only the summary has signal: claude-opus-4.8 on Vertex AI plus a 59.8MB source-map leak. This smells like release plumbing, not a capability launch.

sharp

Anthropic’s signal here looks like an engineering leak, not a model reveal. The article body is just a WeChat verification page, so the usable facts come from the summary: claude-opus-4.8 appeared on Google Vertex AI, and a 59.8MB Claude Code source-map leak exposed 512,000 TypeScript lines with Sonnet 4.8 and Mythos 1 references. That is concrete enough to take seriously, but pricing, context window, benchmarks, and launch timing are missing. I would not auto-file Mythos 1 as a frontier model. The clues tie it to Claude Code and Claude Security, which sounds more like product packaging or a security layer than a clean model-family launch. Anthropic has spent the last year turning coding agents into distribution. This leak has weight because of where the names surfaced, not because it proves a capability jump.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

04:09

65d ago

FEATUREDAI Era (新智元) · WeChat· rssZH04:09 · 05·24

→AI Agent Completes Chip Design from 219 Words to 7nm GDSII Without Engineer Input

Verkor’s Design Conductor generated an ASAP7 7nm GDSII layout for the VerCore RISC-V CPU from a 219-word English spec in 12 hours, with no engineer in the design loop; the reported result scored 3,261 CoreMark at 1.48GHz, but it has not been fabricated and lacks cache implementation.

#Agent#Code#Tools#Verkor

why featured

Featured · importance 79 · hook + knowledge + resonance

editor take

Verkor pushed AI chip design to GDSII, but don’t get dazzled by “7nm”: ASAP7, no cache, no silicon; the hard part is 12-hour toolchain control.

sharp

Verkor’s hard result is not the 3,261 CoreMark score; it is Design Conductor turning a 219-word spec into a closed RTL-to-GDSII loop. In 12 hours, it produced an ASAP7 7nm layout for VerCore at 1.48GHz and 2,809 µm². The useful detail is the debugging path: it converted VCD to CSV, wrote Python, found a bad JAL flush, patched RTL, and reran tests. But “AI designed a production chip” is still a stretch. ASAP7 is an academic predictive PDK, VerCore has no cache, no out-of-order logic, and no fabricated silicon. The performance reference is a 2011 Celeron SU2300. Cadence and Synopsys have spent the last year selling AI EDA copilots; Verkor is more aggressive because the agent runs the whole flow. I buy the direction. I don’t buy the 7nm victory lap.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

04:09

65d ago

FEATUREDAI Era (新智元) · WeChat· rssZH04:09 · 05·24

→AI-generated articles now outnumber human-written ones: what is left for the brain?

Graphite sampled 43,000 CommonCrawl articles and found AI-generated English articles exceeded human-written ones from November 2024, with its detector reporting about a 4.2% false-positive rate and 0.6% false-negative rate.

#Benchmarking#Graphite#Merriam-Webster#CommonCrawl

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

Graphite’s 43k CommonCrawl sample says AI articles crossed 50%; I buy the pollution trend, not the “humans stopped writing” panic.

sharp

Graphite’s finding reads more like an SEO-farm health check than proof that human writing has collapsed. Its 43,000 CommonCrawl sample says AI-written English articles exceeded human-written ones from November 2024. But the detector has a 4.2% false-positive rate and 0.6% false-negative rate, so the 50% crossing is fuzzier than the headline sells. The nastier part is the measurement gap: “pure AI-generated” content excludes AI drafts edited by humans. For training corpora and search indexes, that hybrid layer is harder to filter than obvious slop. The 2024 Nature model-collapse paper supports the contamination concern, but jumping from web article share to “your brain is shrinking” needs user-behavior data and quality segmentation.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

04:00

65d ago

Financial Times · Technology· rssEN04:00 · 05·24

→How AI Is Forcing McKinsey and Its Peers to Rethink Pricing

The title says AI is forcing McKinsey and its peers to rethink pricing; the post only discloses that clients are questioning advisory value and becoming more used to fees tied to successful task completion.

#McKinsey#Financial Times#Commentary

editor take

McKinsey clients are questioning advisory value. Only success-fee mechanics are disclosed, no rates; AI is squeezing slide-hours into acceptance tests.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

04:00

65d ago

AI HOT (Curated Pool)· aihot-apiZH04:00 · 05·24

→OpenClaw 2026.5.22 Released With Performance Optimizations and Security Hardening

OpenClaw released version 2026.5.22, reducing the /models response time to about 5 ms and adding locked dependencies for the npm package.

#Inference-opt#Safety#OpenClaw#Product update

editor take

OpenClaw cuts /models latency to ~5 ms; locked npm deps are practical, but test conditions are undisclosed.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

SCORE

H0·K1·R0

03:51

65d ago

QbitAI (量子位) · WeChat· rssZH03:51 · 05·24

→Hu Yanbin Is Also Practicing Vibe Coding

The article says Hu Yanbin spent one month vibe-coding the fan community app Yanhuo, Yu Hua mentioned learning “local deployment” on a show, and Milla Jovovich’s MemPalace memory system scored 96.6% on LongMemEval.

#Agent#Code#Memory#Hu Yanbin

editor take

Hu Yanbin shipped a fan app in 1 month; no code quality disclosed, so don’t call celebrity Cursor use developer migration.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

03:21

65d ago

r/LocalLLaMA· rssEN03:21 · 05·24

→TTS Benchmark Comparison for Tools Known to the Author up to May 2026

UkieTechie released tts-bench for local TTS tool testing. The repository already includes Windows and Mac results, while Linux testing is pending on a 5900XT and RTX 3090 workstation.

#Audio#Benchmarking#UkieTechie#Benchmark

editor take

UkieTechie posted tts-bench, but Reddit 403 hides the body; with only Win/Mac and 5900XT+3090 disclosed, don’t rank TTS yet.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

03:05

65d ago

FEATUREDr/LocalLLaMA· rssEN03:05 · 05·24

→Vision-capable LLMs vs. OCR for long-document QA with charts, images, and tables

The author tested Claude Sonnet 4.5 on 171 questions from 30 image-heavy MMLongBench-Doc PDFs, comparing native PDF vision use with OCR pipelines. Native PDF ranked fifth of six at 52.0% accuracy and cost $0.2552 per query, while LlamaCloud premium with full context reached 59.6% at $0.1885 per query.

#Vision#RAG#Benchmarking#Claude

why featured

Featured · importance 74 · hook + knowledge + resonance

editor take

Only the summary is visible, but Sonnet 4.5 native PDF looks worse and pricier than OCR here. Don’t default to vision-PDF ingestion.

sharp

Sonnet 4.5 native PDF reading loses cleanly in the visible summary: 30 MMLongBench-Doc PDFs, 171 questions, 52.0% accuracy, and $0.2552 per query. LlamaCloud premium with full context hits 59.6% at $0.1885 per query. Reddit 403 blocks the body, so I can’t inspect prompts, sampling, judge setup, or page-count distribution, and I wouldn’t treat this as a leaderboard. The result still matches the engineering pattern: long-document QA usually fails in layout parsing, table structure, chunking, and context packing before it fails in raw “can the model see images” capability. Native vision-PDF ingestion is a nice demo path, but production pipelines still need OCR/layout tooling when charts, tables, and scanned pages dominate. The lazy path is now visibly more expensive too.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

02:49

65d ago

r/LocalLLaMA· rssEN02:49 · 05·24

→Is there any reason for an uncensored model if you have no interest in roleplaying?

A Reddit user questions the value of uncensored models for RAG when roleplaying is not the goal, citing the OpenAI-Pentagon deal, unspecified tests where uncensored variants showed random problems, and Qwen3.6 giving restricted-topic answers that changed after a “no propaganda” system-style prompt; the post does not disclose test counts, model versions beyond Qwen3.6, or evaluation criteria.

#RAG#Safety#Alignment#OpenAI

editor take

Reddit body is 403; only the summary names Qwen3.6 bypass. No sample count, no RAG takeaway for model selection.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

02:47

65d ago

r/LocalLLaMA· rssEN02:47 · 05·24

→How are you handling agents and sub-agents?

A Reddit user describes a three-model agent setup in LibreChat: DeepSeek v4 pro via OpenRouter acts as the master planner, a local Qwen 35B runs at about 160 tokens per second as the worker, and a mini PC runs Gemma E2B for trivial tasks. The post asks whether smaller role-specific models or better orchestration patterns exist.

#Agent#Tools#Inference-opt#DeepSeek

editor take

Title says multi-agent orchestration, body is Reddit 403; don’t infer architecture until LibreChat shows stable routing across 3 models.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

01:16

65d ago

r/LocalLLaMA· rssEN01:16 · 05·24

→Minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL

A user tested Qwen3.6-27B on a MacBook M5 Max with 128GB RAM using llama.cpp, and MTP raised throughput from 19 tps to 22.3 tps under the listed sampling, cache, and batch settings.

#Inference-opt#Benchmarking#Qwen#Unsloth

editor take

Title claims M5 Max runs Qwen3.6-27B MTP at 22.3 vs 19 tps. Body is 403, so settings stay unverified.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

00:19

65d ago

r/LocalLLaMA· rssEN00:19 · 05·24

→llampart 1.0.0: Standalone local web UI for llama-server released

The developer released llampart 1.0.0, a standalone local web UI for llama-server with 6 interface languages, MCP tool flows, a two-column conversation sidebar, local import/export defaults, and an MIT license.

#Tools#Reasoning#llama.cpp#Svelte

editor take

llampart 1.0.0 ships 6 UI languages and MCP flows; local LLM UI still wins or loses on daily ergonomics.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

00:13

65d ago

FEATUREDr/LocalLLaMA· rssEN00:13 · 05·24

→It's OK to Quantize the KV Cache; Model Quant Matters More in Qwen3.6 27B KLD Tests

Reddit user hopbel tested Qwen3.6 27B with approximate KLD on wikitext-2 at 16k context, using Q5_K_M as the proxy baseline; Q5_K_S weights with q4_0 KV cache scored 0.016304, while Q4_K_XL with f16 KV cache scored 0.026067, so weight quant tier dominated KV-cache quant in this setup.

#Inference-opt#Benchmarking#Qwen#llama.cpp

why featured

Featured · importance 73 · hook + knowledge + resonance

editor take

This Reddit result is a local-inference budgeting note: protect weight quant first; q4_0 KV cache did less damage here.

sharp

Hopbel’s numbers challenge a common local-inference instinct: on Qwen3.6 27B, wikitext-2, and 16k context, weight quantization hurt more than KV-cache quantization. Q5_K_S weights with q4_0 KV scored 0.016304 approximate KLD, below Q4_K_XL with f16 KV at 0.026067. The proxy baseline was Q5_K_M, not full fp16. I’d treat this as a config-priority signal for llama.cpp and Unsloth users, not a law. The Reddit body is blocked by 403, so I can’t inspect seeds, prompt mix, throughput, or VRAM curves. wikitext-2 is also language-modeling terrain, not long-horizon agent tool use. Still, for 16k local deployment, don’t sacrifice the weight tier just to keep f16 KV.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

00:00

65d ago

FEATUREDComputing Life · Share (鸭哥 research reports)· rssZH00:00 · 05·24

→You May Have Coded for 10 Years, but You Are Still a Beginner with AI

The article discusses the debate sparked by Armin Ronacher using Pi to develop Pi, citing issue tracker data to argue that experienced programmers can still be misled by confident but wrong AI outputs.

#Code#Agent#Armin Ronacher#Commentary

why featured

Featured · importance 73 · hook + knowledge + resonance

editor take

The Ronacher/Pi case lands, but don’t turn steering into mysticism; without issue counts, this is craft lore, not evidence.

sharp

I buy half of the claim that “ten-year programmers are AI beginners.” The Armin Ronacher/Pi dispute hits a real failure mode: senior engineers bring old debugging instincts to model output, while confident wrong answers quietly reset their review rhythm. The evidence is thin in the provided text. The snippet says it uses issue tracker data, but gives no issue count, error taxonomy, fix time, or even a clear description of whether Pi is a model, toolchain, or project setup. Downgrading double-checking and elevating steering needs reproducible tasks, not just taste. SWE-bench-style coding-agent results already show models breaking on long-horizon state and local confidence, not merely on users asking badly. This reads like a useful corrective for veteran ego, not proof that the definition of expert has changed.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

00:00

65d ago

Computing Life · Share (鸭哥 research reports)· rssZH00:00 · 05·24

→When Data Centers Became a Hot Potato

The article says U.S. local governments are turning against data centers after a 20-year period of favoring them, with examples from Maine to Seattle; the post does not disclose specific moratoriums, power-use figures, or impacts on AI infrastructure projects.

#Policy#Commentary

editor take

Local pushback spans Maine to Seattle; without moratoriums or power figures, treat the AI-infra panic as unproven.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

posts · 2026-05-24

more

feeds

admin