hot events · 2026-05-17

▸ 19 signals · updated 3m ago

live · 217 today·policy v2

LATENT SPACEAnthropic pulls Fable and Mythos after US e…96·LATENT SPACEAnthropic launches Claude Fable 5, its firs…88·HACKER NEWS FRONTPAGDid Anthropic ask for its own export contro…82·HACKER NEWS FRONTPAGAnthropic flies senior technical staff to D…82·AI HOT (CURATED POOLWSJ: OpenAI weighs steep price cuts and pla…82·HACKER NEWS FRONTPAGBram Cohen: Claude is turning into an assho…78·R/LOCALLLAMAXiaomi serves MiMo V2.5 at 1000–3000 tps wi…78·IMPORT AI (JACK CLARAI learns to game society's rules, and Anth…78·MIT TECHNOLOGY REVIEGoogle DeepMind is worried about what happe…78·DWARKESH PATELThe sample efficiency black hole: AI models…78·LATENT SPACECognition launches FrontierCode: a coding b…78·HACKER NEWS FRONTPAGGabriel Weinberg argues with data that “eve…78·LATENT SPACEAnthropic pulls Fable and Mythos after US e…96·LATENT SPACEAnthropic launches Claude Fable 5, its firs…88·HACKER NEWS FRONTPAGDid Anthropic ask for its own export contro…82·HACKER NEWS FRONTPAGAnthropic flies senior technical staff to D…82·AI HOT (CURATED POOLWSJ: OpenAI weighs steep price cuts and pla…82·HACKER NEWS FRONTPAGBram Cohen: Claude is turning into an assho…78·R/LOCALLLAMAXiaomi serves MiMo V2.5 at 1000–3000 tps wi…78·IMPORT AI (JACK CLARAI learns to game society's rules, and Anth…78·MIT TECHNOLOGY REVIEGoogle DeepMind is worried about what happe…78·DWARKESH PATELThe sample efficiency black hole: AI models…78·LATENT SPACECognition launches FrontierCode: a coding b…78·HACKER NEWS FRONTPAGGabriel Weinberg argues with data that “eve…78·LATENT SPACEAnthropic pulls Fable and Mythos after US e…96·LATENT SPACEAnthropic launches Claude Fable 5, its firs…88·HACKER NEWS FRONTPAGDid Anthropic ask for its own export contro…82·HACKER NEWS FRONTPAGAnthropic flies senior technical staff to D…82·AI HOT (CURATED POOLWSJ: OpenAI weighs steep price cuts and pla…82·HACKER NEWS FRONTPAGBram Cohen: Claude is turning into an assho…78·R/LOCALLLAMAXiaomi serves MiMo V2.5 at 1000–3000 tps wi…78·IMPORT AI (JACK CLARAI learns to game society's rules, and Anth…78·MIT TECHNOLOGY REVIEGoogle DeepMind is worried about what happe…78·DWARKESH PATELThe sample efficiency black hole: AI models…78·LATENT SPACECognition launches FrontierCode: a coding b…78·HACKER NEWS FRONTPAGGabriel Weinberg argues with data that “eve…78·

⤓ RSS live

browse by dayclear filter ✕

May 2026

MTWTFSS

126 212 320 419 542 632 749 826 923 1017 1136 1248 1337 1454 1539 1630 1719 1849 1976 2045 2148 2249 2313 2415 2520 2637 2744 2848 2935 3022 3114

June 2026

MTWTFSS

147 258 348 447 545 619 715 852 945 1031 1128 1222 1313 1416 154161718192021222324252627282930

2026-05-17 · Sun

22:57

28d ago

FEATUREDr/LocalLLaMA· rssEN22:57 · 05·17

→Benchmarking vLLM vs SGLang vs llama.cpp on a mixed Blackwell/Ada cluster

The author benchmarked long-context prefill on a 7-GPU mixed Blackwell/Ada cluster; on Qwen3.5-397B-A17B with 75k tokens, vLLM reached 9.8s TTFT and 7,683 t/s, while llama.cpp took 57.2s and 1,319 t/s.

#Inference-opt#Benchmarking#vLLM#SGLang

why featured

Single-source Reddit benchmark, so source authority keeps it near the threshold. HKR-H/K/R pass on the mixed 7-GPU setup, 397B at 75k tokens, and concrete TTFT/throughput numbers.

editor take

vLLM is crushing mixed-GPU prefill here; long-context pain is now execution graphs and layer placement, not model size alone.

sharp

vLLM exposes the ugly truth of local multi-GPU inference: heterogeneous rigs work only if the engine handles the pipeline sanely. On Qwen3.5-397B-A17B with 75k tokens across seven mixed Blackwell/Ada GPUs, vLLM hits 9.8s TTFT and 7,683 t/s. llama.cpp lands at 57.2s and 1,319 t/s, roughly a 6x gap. The useful detail is not “vLLM is faster.” It is manual layer placement via VLLM_PP_LAYER_PARTITION, which balances fast Blackwell cards against slower 4090s doing FP4 emulation. SGLang looks fine on pure Blackwell, with 5.3s versus vLLM’s 5.0s on Qwen3.5-122B, then crashes when Ada enters because FP4 lacks a software fallback. Single Reddit benchmark, single topology, no independent replication; still, anyone stitching together used 4090s for 397B-class models should take the warning seriously.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

22:22

28d ago

FEATUREDr/LocalLLaMA· rssEN22:22 · 05·17

→LLMs on Android: Snapdragon 8 Elite MoE Experience

A Reddit user tested MoE LLMs on an Honor Magic 7 Pro with Snapdragon 8 Elite and 24GB RAM; under Q4 quantization, LFM2-24b-a2b reached about 24 tokens/s while Gemma reached about 11 tokens/s, and CPU inference was still faster than NPU or GPU in the reported setup.

#Inference-opt#Benchmarking#Qualcomm#Honor

why featured

HKR-H/K/R all pass: a named Reddit test gives hardware, quantization, and token/s figures. Single-device anecdote and weak source authority keep it at the low featured band.

editor take

Only the summary is visible; 24 tok/s Q4 MoE on a 24GB Android phone makes runtime maturity look like the bottleneck, not model size.

sharp

A 24 tok/s LFM2-24b-a2b run puts Android local inference inside the usable zone. The reported setup is concrete: Honor Magic 7 Pro, Snapdragon 8 Elite, 24GB RAM, Q4 quantization. Gemma lands around 11 tok/s, while the MoE model reportedly hits about 24 tok/s. The wild part is CPU beating NPU and GPU in that setup. Qualcomm has sold the AI Engine story for years, but LocalLLaMA-style tests keep exposing the boring layer: memory movement, operator coverage, and runtime glue. The Reddit page is blocked by 403, so batch size, context length, backend, and sampling settings are not available here. I read this as a good sign for on-device MoE, and a bad sign for the claim that phone NPUs automatically own LLM inference.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

15:37

28d ago

FEATUREDHacker News Frontpage· rssEN15:37 · 05·17

→Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

MinishLab open-sourced Semble, a code-search tool for agents that combines Model2Vec embeddings, BM25, RRF fusion, and reranking; on a 63-repo benchmark, it used 98% fewer tokens than grep+read, reached 0.854 NDCG@10, and ran CPU queries in about 1.5 ms.

#Agent#Code#Embedding#MinishLab

why featured

HKR-H/K/R all pass: the 98% token claim is clickworthy, the 63-repo benchmark adds substance, and coding-agent context cost is a real practitioner nerve. Impact is still toolchain-level, so it stays below must-write.

editor take

Semble pulls agent code search back from context stuffing to IR; 98% token savings is sharp, but grep+read is a soft target.

sharp

Semble matters because it attacks the boring cost center in coding agents: context waste. On a 63-repo benchmark, it claims 98% fewer tokens than grep+read, 0.854 NDCG@10, and roughly 1.5 ms CPU queries. The stack is not magic: Model2Vec embeddings, BM25, RRF fusion, then reranking. I don’t buy grep+read as the serious opponent. Cursor, Claude Code, and Sourcegraph Cody have moved past naked grep into repo maps, AST-ish indexes, and symbol search. Still, the direction is right. Coding agents fail less from “not enough intelligence” than from retrieving 40 bad chunks and spending the next two calls laundering that noise.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

15:26

28d ago

FEATUREDr/LocalLLaMA· rssEN15:26 · 05·17

→MiroThinker-1.7 Open-Weight Deep Research Agent Based on Qwen3 MoE

MiroMindAI released the MiroThinker-1.7-deepresearch and mini APIs, with the mini version using 30B total parameters and 3B active parameters, weights on HuggingFace, and context management based on sliding window K=5 plus episode restarts.

#Agent#Reasoning#Tools#MiroMindAI

why featured

HKR-H/K/R all pass, but the source is a Reddit thread and the lab is not top-tier. Open weights, MoE sizing, and context-management details clear featured, not same-day must-write.

editor take

Only the title and summary are visible; MiroThinker-1.7 mini pushes deep research into 30B/3B active, but tok/s on consumer GPUs decides if this matters.

sharp

MiroThinker-1.7 mini has a clean pitch: 30B total parameters, 3B active, Qwen3 MoE base, weights on HuggingFace. That is not a leaderboard flex. It is an attempt to squeeze a deep-research agent into hardware people actually own. Sliding window K=5 plus episode restarts also admits the hard part: long research runs still break context, so the system is patching continuity with control flow. Reddit is 403-blocked here, so benchmark scores, tool success rate, VRAM use, and tokens/sec are not visible. The LocalLLaMA question about consumer hardware speed is the right pressure test. DeepSeek-R1-Distill and Qwen3 already lowered the “can run locally” bar; MiroThinker needs to show a research loop that stays usable on 24GB or 48GB cards, not just another open-weight badge.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

14:00

28d ago

● P1Bloomberg Technology· rssEN14:00 · 05·17

→Apple's Revamped Siri App Will Support Auto-Deleting Chats

The title says Apple’s ChatGPT-like Siri app will support auto-deleting chats; the RSS snippet only adds that iOS 27 will include a Genmoji upgrade, and the post does not disclose retention periods, release timing, or feature details.

#Agent#Multimodal#Apple#Siri

why featured

HKR-H and HKR-R pass because Bloomberg frames a specific Apple Siri privacy angle; HKR-K fails since retention and feature mechanics are missing, so this stays at the low featured threshold.

editor take

Three titles, no body: Apple’s auto-deleting Siri chats read like privacy containment, not evidence it has caught ChatGPT-class assistants.

sharp

Three outlets tracked the same Siri auto-delete angle, but the available body is only Bloomberg’s title, while Verge says “reportedly” and TechCrunch says “could.” That smells like one leak chain spreading, not three independently confirmed product reads. My read is blunt: Apple is boxing in memory risk before selling a ChatGPT-like Siri. Auto-deleting chats reduces audit, shared-device, and enterprise-compliance headaches, but it also cuts against the sticky personalization OpenAI and Anthropic are pushing through memory, projects, and persistent context. Apple is still using privacy as the product surface while Siri’s actual model competence remains unproven. Pricing, launch date, retention window, and default behavior are not disclosed in the titles.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

11:18

28d ago

FEATUREDr/LocalLLaMA· rssEN11:18 · 05·17

→85 GPU-hours comparing 5 abliteration methods on Qwen3.6-27B

Abliterlitics compared five Qwen3.6-27B abliteration variants against the base model using 85 GPU-hours of benchmarks, HarmBench, KL divergence, and weight forensics; Huihui had the smallest benchmark deltas, Heretic had the lowest KL divergence, and all five variants reached near-complete safety removal.

#Safety#Benchmarking#Interpretability#Qwen

why featured

HKR-H/K/R all pass: the post gives an 85-GPU-hour comparison across five abliteration methods on Qwen3.6-27B. Niche open-model safety work, not a lab release, so it stays at the featured threshold.

editor take

85 GPU-hours turns abliteration into an engineering benchmark; open model safety now has a weights-level removal market, not a prompt jailbreak problem.

sharp

Abliterlitics hits the uncomfortable layer: refusal behavior can be stripped as an engineering target, not argued around in policy docs. The disclosed hooks are concrete enough: 85 GPU-hours on Qwen3.6-27B, five abliteration variants, HarmBench, KL divergence, and weight forensics. The summary says all five reached near-complete safety removal; Huihui kept the smallest benchmark deltas, while Heretic had the lowest KL divergence. Reddit blocked the body with a 403, so I cannot verify exact scores, sample sizes, or reproduction scripts. Still, the pattern is clear. LocalLLaMA is moving from “which prompt bypass works” to “which weight edit preserves capability while deleting refusals.” That is a much nastier problem for open-weight safety than another jailbreak leaderboard.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

10:22

29d ago

● P1QbitAI (量子位) · WeChat· rssZH10:22 · 05·17

→Weilan Technology unveils BabyAlpha A3 quadruped robot with domestic heterogeneous chips

Weilan Technology unveiled BabyAlpha A3, a consumer quadruped robot using a six-chip heterogeneous cluster that runs a 7B-parameter model on-device at 280 TPS; the article says it has 66MP vision, 2.232 million point-cloud samples per second, and a planned Q3 launch.

#Robotics#Inference-opt#Multimodal#Weilan Technology

why featured

HKR-H/K/R pass: the robot-dog-versus-Nvidia angle is clickable, and 280 TPS on a local 7B model is concrete. Single-source summary lacks price, power draw, and benchmark setup, so it stays near the featured floor.

editor take

Three outlets pushed the “topple Nvidia” angle, but the body is a WeChat gate. Treat the 7B model, 1000x compute, and 1/10 cost claims as unverified PR math.

sharp

Three headlines align tightly: BabyAlpha A3, a domestic heterogeneous chip, framed against Nvidia Jetson Thor. That smells like a coordinated launch narrative, not three independent teardown reads. The hooks are loud: a 7B model running on-device, 1000x compute uplift, and 1/10 the cost. The available body is only a WeChat access-error page, so chip name, power draw, TOPS, memory bandwidth, and latency are absent. I don’t buy the “topple Nvidia” headline. Jetson’s moat is not a peak-compute slide; it is CUDA, TensorRT, drivers, sensor integration, and boring deployment stability. Running a 7B model on a quadruped is a useful milestone. Replacing Jetson needs the same task, same power envelope, same thermal budget, and continuous runtime evidence.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

10:22

29d ago

FEATUREDQbitAI (量子位) · WeChat· rssZH10:22 · 05·17

→TGO Aligns Visual Generative Models with Scalar Feedback Without Preference Pairs | ICML 2026

NUS proposed Threshold-Guided Optimization, which converts scalar feedback into positive or negative updates through a score-distribution threshold and was accepted by ICML 2026; experiments cover Stable Diffusion v1.5, FLUX, Wan 1.3B, and Meissonic across image and video generation settings.

#Fine-tuning#Alignment#Vision#NUS

why featured

HKR-H/K/R pass: the paper has a concrete mechanism and tests across SD v1.5, FLUX, Wan 1.3B, and Meissonic. Impact is research-heavy, so it lands in featured, not must-write.

editor take

TGO is a clean escape from synthetic preference pairs, but a global threshold is a blunt tool once product feedback gets noisy.

sharp

TGO matters because it treats visual feedback as scores, not forced winner/loser pairs. The mechanism is simple: a score-distribution threshold sets update direction, and distance from that threshold sets weight. The paper tests Stable Diffusion v1.5, FLUX, Wan 1.3B, and Meissonic, so this is broader than a one-backbone diffusion trick. I don’t buy the “new paradigm” framing. PMPO already loosens unpaired positive/negative feedback, and QRPO handles pointwise absolute rewards through quantiles. TGO is the visual-generation engineering compromise: cheap, readable, and easy to plug into diffusion or masked generators. The weak spot is the global threshold. It compresses prompt difficulty, style taste, and reward-model bias into one cutoff. If the scorer is skewed, pseudo-negatives will suppress minority aesthetics with mathematical confidence.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

10:04

29d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH10:04 · 05·17

→Microsoft AI CEO predicts AI will automate all white-collar jobs within 18 months

Mustafa Suleyman predicts AI will reach human-level performance within 18 months and automate most professional tasks, including accounting, law, marketing, and project management.

#Agent#Reasoning#Microsoft AI#Mustafa Suleyman

why featured

HKR-H and HKR-R are strong, and HKR-K passes on the testable 18-month timeline. The score stays in the low 78–84 band because this is a CEO forecast, not evidence, benchmarks, or a shipped capability.

editor take

Suleyman is selling an 18-month white-collar wipeout, but the snippet gives no evals, cost curve, or deployment constraints. Smells more like narrative pressure than a roadmap.

sharp

Suleyman’s “18 months to automate everyone sitting at a computer” is too clean for the evidence given. The snippet names accounting, law, marketing, and project management, but gives no benchmark, error rate, liability model, or deployment cost. The hard part in white-collar work is not drafting a document. It is context access, approvals, audit trails, system permissions, and owning bad calls. A Microsoft AI CEO talking up “superintelligence” is expected. Compressing the timeline to 18 months is the aggressive part. OpenAI, Anthropic, and Google are already pushing agents into Office, IDEs, support, and analytics, but task automation and job replacement are separated by procurement, compliance, and accountability. I don’t buy this claim without reproducible enterprise agent success rates.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

07:23

29d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH07:23 · 05·17

→Grok Imagine image generation is officially released

Grok Imagine is now available on X for all users, with text-to-image generation for realistic images and multiple aspect ratios; the post does not disclose model parameters, pricing, or regional limits.

#Multimodal#Vision#Grok#X

why featured

HKR-H/K/R pass, but the post only discloses availability and basic image features; model details, pricing, and regions are absent, so this lands at the featured threshold.

editor take

Grok Imagine is open to all X users, but pricing, regions, and model details are missing; this smells like distribution first, capability second.

sharp

Grok Imagine is leading with X distribution, not model evidence. The post says it is available to all users, supports realistic text-to-image output, and offers multiple aspect ratios. It gives no pricing, regional limits, model card, safety policy, or reproducible comparison against Midjourney, GPT-4o image, or Imagen. That omission matters because image generation is already crowded and heavily benchmark-resistant. The wild part is the channel. X gives Grok a default creation-and-sharing loop that standalone image tools have to buy through ads or creator communities. Even a second-tier model can absorb casual meme, avatar, and post-illustration demand if the button sits inside the feed. I don’t buy the implied capability claim until we see hard prompts: text rendering, character consistency, editing control, and commercial-use terms. Right now the product surface is visible; the moat is not.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

06:35

29d ago

FEATUREDr/LocalLLaMA· rssEN06:35 · 05·17

→DeepSeek V4's 1M Context Window: The Breaking Point

A Reddit user tested DeepSeek V4 on 45k, 180k, and 520k-token codebases and found 150k-250k tokens best for coding work. Past 300k tokens, line-number precision degraded; at 520k, outputs shifted toward architecture summaries and skipped implementation details.

#Code#Reasoning#Memory#DeepSeek

why featured

A single Reddit post limits authority, but HKR-H/K/R all pass: it is a numbered first-person test with a concrete long-context failure pattern. The right band is featured, not 78+, because replication and model details are thin.

editor take

Only the summary is usable: DeepSeek V4’s 1M window reads like a marketing ceiling; 150k-250k is the coding bandwidth that matters.

sharp

DeepSeek V4’s 1M context is not proving whole-repo coding here; it shows a usable band. The user tested 45k, 180k, and 520k-token codebases. Their sweet spot was 150k-250k tokens. Past 300k, line-number precision degraded. At 520k, the model shifted into architecture summaries and skipped implementation details. I trust that Reddit failure mode more than the 1M headline. Coding needs retrieval, references, and local edits, not a giant prompt stuffed with a repo. Gemini 1.5 Pro had the same 1M-context aura, and serious users still leaned on chunking, search, and repo maps for reliability. The body is blocked by 403, so prompt, model settings, and DeepInfra config are missing. But the “long enough becomes a summarizer” pattern is painfully familiar.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

04:00

29d ago

FEATUREDFinancial Times · Technology· rssEN04:00 · 05·17

→Chinese AI Groups Pull Ahead of US Rivals in Video Generation Race

FT says Chinese AI groups have moved ahead of US rivals in video generation; the RSS snippet names ByteDance and Kuaishou and says they outshine western competitors in advertising and entertainment quality, but the post does not disclose benchmark metrics or model details.

#Multimodal#Vision#ByteDance#Kuaishou

why featured

FT authority plus a China-vs-US video-generation lead claim clears HKR-H and HKR-R. HKR-K fails because the body lacks metrics, samples, and eval method, so it sits at the low featured threshold.

editor take

Only an FT title and RSS line, no metrics or model names; naming ByteDance and Kuaishou says video gains are landing first inside distribution apps.

sharp

FT’s claim reads like a product judgment, not a model judgment. The disclosed text names ByteDance and Kuaishou, limits the claim to advertising and entertainment video, and gives no benchmark, model version, blind-test size, or target against Sora, Veo, or Runway. That is too thin for “pull ahead of US rivals.” I still buy half the direction. Video generation is not won by a single demo clip; it rewards asset pools, creator feedback, ad conversion loops, and moderation pipes. Douyin and Kuaishou have daily short-video production and ad testing loops that pure model labs do not. OpenAI Sora owns the launch-stage perception; ByteDance and Kuaishou are closer to commercial quality tuned through gray-release A/B tests. Until metrics show up, read this as platform production advantage, not proof that Chinese video models beat US models.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

03:06

29d ago

FEATUREDSynced (机器之心) · WeChat· rssZH03:06 · 05·17

→AI agents may spend 1,000x more tokens without better results: the hidden bill

Researchers used OpenHands to analyze traces from 8 frontier models on 500 swe-bench-verified tasks, finding that agentic coding reached a 154:1 input-output token ratio and that human difficulty labels correlated weakly with token use at Kendall tau 0.32.

#Agent#Code#Benchmarking#OpenAI

why featured

All HKR axes pass: strong cost-performance hook, concrete benchmark setup and correlation numbers, and direct resonance with coding-agent economics. It is not a model or platform launch, so it fits the 78–84 quality-recommendation band.

editor take

Only the summary is visible: OpenHands on 500 SWE-bench tasks exposes the ugly part—154:1 tokens before the code even lands.

sharp

Agentic coding’s ugly limit is not patch generation; it is hidden reasoning spend. The visible summary gives a sharp hook: OpenHands traced 8 frontier models on 500 SWE-bench-verified tasks, with a 154:1 input-output token ratio and only 0.32 Kendall tau between human difficulty labels and token use. Human “hard” does not predict token burn cleanly, which is exactly the failure mode vendors avoid showing in demos. That hits the margin story behind Cursor, Devin, and OpenHands-style workflows. A higher SWE-bench pass rate looks great on a launch slide; enterprise buyers care about cost per merged PR. The full WeChat body is blocked by verification, so model names and pricing assumptions are not disclosed. I’d treat 154:1 as a serious warning flare, not a settled measurement.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

03:06

29d ago

FEATUREDSynced (机器之心) · WeChat· rssZH03:06 · 05·17

→Peter Steinberger Says His Monthly Token Bill Hit $1.3M, Covered by OpenAI

Peter Steinberger used 603 billion tokens across 7.6 million requests in 30 days, with the bill exceeding $1.3 million; he said disabling fast mode cut the price by 70%, and OpenAI does not charge him for the tokens.

#Agent#Code#Tools#Peter Steinberger

why featured

HKR-H/K/R all pass: the story has a sharp cost hook, concrete usage numbers, and strong practitioner resonance. It is a first-person bill disclosure, not an OpenAI pricing or product launch, so it sits just above the featured threshold.

editor take

603B tokens and a $1.3M monthly bill is not a hobbyist flex; it is OpenAI stress-testing agent economics through extreme users.

sharp

The 603B-token number matters less than OpenAI eating a $1.3M bill for Peter Steinberger. Over 30 days, 7.6M requests averages about 79K tokens per request, which smells like continuous code-agent traffic, not chat usage. His claim that disabling fast mode cuts price by 70% says latency is becoming a hidden tax in agent products. I don’t read this as generosity. It looks like OpenAI buying a real workload trace from a very visible power user. The article body is only a CAPTCHA page, so model name, cache hit rate, and input/output split are not disclosed. Without those three, the $1.3M figure proves burn rate, not viable unit economics.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

03:06

29d ago

FEATUREDSynced (机器之心) · WeChat· rssZH03:06 · 05·17

→What Are World Models? Their History and the $10 Billion Bet

Jiqizhixin translated a MoE Capital blog tracing two world-model lineages. The article says more than $10 billion entered the category over 18 months, and cites DreamDojo as using 44,711 hours of first-person video pretraining to reach r=0.995 correlation with real-world robot policy outcomes.

#Agent#Robotics#Multimodal#MoE Capital

why featured

HKR-H/K/R all pass: the hook is strong and the article gives concrete figures, but it is a compiled explainer rather than a new release. It fits the featured-threshold band for a strong commentary/tutorial.

editor take

Only the summary is available; “world model” reads like a funding filter here, and r=0.995 is too shiny without eval details.

sharp

The world-model narrative is getting financially front-run. More than $10B over 18 months sounds like consensus, but it also bundles robotics, video generation, simulation, and agent training into one investable label. The hard hook is DreamDojo: 44,711 hours of first-person video pretraining, then r=0.995 correlation between policy evaluation and real robot outcomes. If that holds, it moves from “predicts frames” toward “filters robot policies.” I don’t buy the number at face value yet. The available body is a CAPTCHA page, so the benchmark setup, task mix, robot hardware, and correlation method are not disclosed. r=0.995 is unusually clean for robotics. NVIDIA Cosmos, Genie-style environment models, and LeCun’s JEPA line are all circling this terrain; the useful test is transfer across embodiments and long-horizon failure, not whether the deck says “world model.”

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

02:50

29d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH02:50 · 05·17

→Anthropic CEO discusses AI’s dual impact: high growth and high unemployment

Dario Amodei said AI may drive 5%-10% GDP growth while increasing unemployment and inequality, and near-free software costs would challenge the assumptions behind traditional software business models.

#Code#Anthropic#Dario Amodei#Commentary

why featured

HKR-H/K/R all pass: Dario Amodei’s 5%-10% GDP and near-free software claims are concrete and highly discussable. The source is an X summary, not a full primary transcript, so it stays at 78.

editor take

Dario pairing 5%-10% GDP growth with high unemployment reads like social licensing for Anthropic’s automation roadmap.

sharp

Dario’s framing is careful: admit high unemployment and inequality first, then put 5%-10% GDP growth on the table. That gives Anthropic room to sell automation without pretending it is just a safety lab watching from the sidelines. The hard claim is near-free software cost; that hits SaaS seats, implementation fees, and outsourced dev work in one shot. I don’t buy the clean “engineers move into editing or upgrading work” line. Claude Code, Cursor, and Devin already show the editor layer does not appear one-for-one for displaced engineers. AI compresses delivery from billable human time into task output; bargaining power falls before neat new roles arrive.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

02:43

29d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH02:43 · 05·17

→Anthropic CEO predicts near-free software and major job shifts

Dario Amodei said in a Wall Street Journal YouTube interview that software costs will fall sharply toward near-free, and the traditional assumption that software needs millions of users to spread costs will no longer hold.

#Anthropic#Dario Amodei#The Wall Street Journal#Commentary

why featured

HKR-H/K/R all pass: Dario Amodei’s software-cost and labor-structure claim is highly discussable. The source is a secondhand X summary, with no full argument, timeline, or data disclosed, so it stays in the low featured band.

editor take

Dario is selling near-free software, while Anthropic still charges per million tokens; inference gets cheaper before software margins vanish.

sharp

Amodei is overstating the “software becomes near-free” line. The body gives only a WSJ YouTube interview and the claim that million-user cost spreading breaks; it gives no price curve, timeline, or software category. SaaS cost is not code alone. Compliance, sales, integration, uptime, and liability remain stubbornly non-free. Anthropic still prices Claude by tokens, and enterprise AI still sells permissions, audit logs, and data isolation. Code generation will crush prices for CRUD apps, internal tools, and disposable scripts; that part is real. But Workday, ServiceNow, and Salesforce customers buy workflow ownership and risk transfer. Amodei’s warning works as a labor-market alarm. It fails as a clean forecast for software margins going to zero.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

00:10

29d ago

FEATUREDr/LocalLLaMA· rssEN00:10 · 05·17

→Gemma 4 finetuned models released for creative writing tasks

LLMFan46 released G4-Meromero-31B-Uncensored-Heretic, with Safetensors and GGUF builds linked on Hugging Face; the title states it is a Gemma 4 31B finetune for creative tasks with KLD 0.0100 and 15 refusals per 100 tests.

#Fine-tuning#LLMFan46#Gemma#zerofata

why featured

HKR-H/K/R pass via the uncensored hook, refusal metric, and local-model control angle, but this is a niche community finetune with no broad benchmark or adoption signal, so it stays in the small open-source update band.

editor take

G4-Meromero-31B claims KLD 0.0100 and 15/100 refusals; Reddit body is 403, so prose quality stays unverified.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

00:00

29d ago

FEATUREDComputing Life · Share (鸭哥 research reports)· rssZH00:00 · 05·17

→Vibe Coding’s Security Crisis

AI coding platforms exposed sensitive data from thousands of enterprise applications through public-by-default deployment settings; the snippet names hospital schedules, bank financial data, and clinical trial data, and identifies one-click deployment defaults rather than AI-generated code as the core mechanism.

#Code#Safety#Incident#Commentary

why featured

HKR-H/K/R pass: the public-by-default deployment angle is clickable, concrete, and practitioner-relevant. Lack of named platform detail or top-tier sourcing keeps it in the lower good-quality band.

editor take

Don’t blame “bad AI code” here: the leak came from public-by-default deployment, so vibe coding’s risk sits in product defaults.

sharp

Vibe coding’s security failure sits in deployment defaults, not model-generated code. The snippet says “thousands” of enterprise apps were exposed, including hospital schedules, bank financial data, and clinical trial data. That is not a toy-app bug; that is regulated data on the open internet. I don’t buy the easy excuse that users misconfigured access. Tools like Lovable, Replit Agent, and Bolt push non-engineers straight from prompt to production, so the default permission becomes the security boundary. The body does not disclose the named platforms, exposure duration, or remediation path, and those gaps matter. But the mechanism is already damning: AI code review will not catch a public-by-default deploy button, and enterprise procurement often misses that layer.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

hot events · 2026-05-17

more

feeds

admin