LATENT SPACEAnthropic pulls Fable and Mythos after US e…96·LATENT SPACEAnthropic launches Claude Fable 5, its firs…88·HACKER NEWS FRONTPAGDid Anthropic ask for its own export contro…82·HACKER NEWS FRONTPAGAnthropic flies senior technical staff to D…82·AI HOT (CURATED POOLWSJ: OpenAI weighs steep price cuts and pla…82·HACKER NEWS FRONTPAGBram Cohen: Claude is turning into an assho…78·R/LOCALLLAMAXiaomi serves MiMo V2.5 at 1000–3000 tps wi…78·IMPORT AI (JACK CLARAI learns to game society's rules, and Anth…78·MIT TECHNOLOGY REVIEGoogle DeepMind is worried about what happe…78·DWARKESH PATELThe sample efficiency black hole: AI models…78·LATENT SPACECognition launches FrontierCode: a coding b…78·HACKER NEWS FRONTPAGGabriel Weinberg argues with data that “eve…78·LATENT SPACEAnthropic pulls Fable and Mythos after US e…96·LATENT SPACEAnthropic launches Claude Fable 5, its firs…88·HACKER NEWS FRONTPAGDid Anthropic ask for its own export contro…82·HACKER NEWS FRONTPAGAnthropic flies senior technical staff to D…82·AI HOT (CURATED POOLWSJ: OpenAI weighs steep price cuts and pla…82·HACKER NEWS FRONTPAGBram Cohen: Claude is turning into an assho…78·R/LOCALLLAMAXiaomi serves MiMo V2.5 at 1000–3000 tps wi…78·IMPORT AI (JACK CLARAI learns to game society's rules, and Anth…78·MIT TECHNOLOGY REVIEGoogle DeepMind is worried about what happe…78·DWARKESH PATELThe sample efficiency black hole: AI models…78·LATENT SPACECognition launches FrontierCode: a coding b…78·HACKER NEWS FRONTPAGGabriel Weinberg argues with data that “eve…78·LATENT SPACEAnthropic pulls Fable and Mythos after US e…96·LATENT SPACEAnthropic launches Claude Fable 5, its firs…88·HACKER NEWS FRONTPAGDid Anthropic ask for its own export contro…82·HACKER NEWS FRONTPAGAnthropic flies senior technical staff to D…82·AI HOT (CURATED POOLWSJ: OpenAI weighs steep price cuts and pla…82·HACKER NEWS FRONTPAGBram Cohen: Claude is turning into an assho…78·R/LOCALLLAMAXiaomi serves MiMo V2.5 at 1000–3000 tps wi…78·IMPORT AI (JACK CLARAI learns to game society's rules, and Anth…78·MIT TECHNOLOGY REVIEGoogle DeepMind is worried about what happe…78·DWARKESH PATELThe sample efficiency black hole: AI models…78·LATENT SPACECognition launches FrontierCode: a coding b…78·HACKER NEWS FRONTPAGGabriel Weinberg argues with data that “eve…78·
→Users Successfully Run Large Language Model Qwen 3.6 on Consumer GPUs
A Reddit user ran unsloth qwen3.6-35B-a3b-MTP-GGUF UD Q4_K_XL in LMStudio on Windows with a GTX 1060 6GB, 32GB DDR3, and an E5-2698v3; the setup used ctx length 131072, 41 GPU-offload layers, KV Q4_0, and reported about 130-150 tps prefill at 16k and 16 tps decode at 4k.
#Inference-opt#Qwen#LMStudio#Reddit
why featured
HKR-H/K/R all pass, but this is a single Reddit experiment without replication, release context, or fuller throughput comparisons. Lower band: useful browse signal, not featured.
editor take
Two LocalLLaMA posts test Qwen 3.6 on consumer GPUs; the body is 403-blocked, so 4.5 t/s is a field signal, not a model verdict.
sharp
Two Reddit posts point the same way: users are testing Qwen 3.6 on a GTX 1060 6GB and a 3080 Ti; the only visible number is 4.5 t/s for 27B MTP on the 3080 Ti, while the body is 403-blocked. That is a narrow signal, but a useful one for local inference people: the fight has moved from leaderboard bragging to VRAM, quantization, and whether MTP-style decoding makes 27B/35B usable on old cards. I'll be real: 4.5 t/s is rough for live writing, but acceptable for offline agent loops or batch work. Treating it like a Qwen3-Coder or DeepSeek-R1 experience claim would be sloppy.
FEATUREDFinancial Times · Technology· rssEN17:00 · 05·24
→ECB Orders Banks to Fix Security Flaws Exposed by AI Models
The ECB summoned banks to a hastily arranged meeting to push fixes for flaws exposed by the latest AI models; the RSS snippet says supervisors will stress financial-system risks but does not disclose the banks involved, flaw categories, or remediation deadlines.
#European Central Bank#Policy
why featured
FT's ECB item clears HKR-H and HKR-R through regulatory pressure on bank AI risk. HKR-K fails because flaw types, bank count, and remediation timeline are not disclosed, so it stays in the 60–71 band.
editor take
ECB summoned banks to fix risk-control flaws that the latest AI models can expose—this isn't a generic warning, it means stress tests already found concrete holes.
sharp
Both FT and Bloomberg covered this, but Bloomberg's headline explicitly credits FT, so we're looking at a single original source. The FT article is behind a paywall, so I can't see which models, which flaws, or which banks are involved. But the fact that ECB convened banks in person—rather than issuing a routine guidance note—suggests this isn't theoretical. Regulators don't call emergency meetings over hypotheticals. More likely, internal red-team exercises or audits already surfaced real cases where new large models were used to bypass anti-fraud or credit-scoring systems. I'd discount the confidence a bit until we see the actual flaw types, the bank list, and the remediation timeline. If a bank responds publicly or ECB releases a formal report, this gets a lot more solid.
→Memory has grown to nearly two-thirds of AI chip component costs
Epoch AI says memory has grown to nearly two-thirds of AI chip component costs; the RSS body only lists the article URL, 68 points, and 71 comments, and the post does not disclose the methodology or sample scope.
#Inference-opt#Epoch AI#Commentary
why featured
HKR-H/K/R all pass: the cost-share claim is clickable, specific, and relevant to infra economics. Sparse body details keep it near the featured floor: method, sample, and timeline are not disclosed.
editor take
Memory at 63% of AI chip component cost is a loud warning against FLOPS-only thinking; methodology is missing here, so treat it as direction, not gospel.
sharp
The 63% figure drags AI chip economics back to bandwidth, not raw FLOPS. Epoch AI’s title says memory is 63% of component cost, but the captured body only shows navigation and the title. It gives no sample scope, BOM definition, HBM generation, packaging split, or methodology.
I buy the direction, not the precision. H100/H200 and Blackwell economics already made HBM3E, CoWoS, and advanced packaging the pressure points. If memory really takes nearly two-thirds of component cost, inference pricing cannot be discussed without KV cache, quantization, speculative decoding, and memory bandwidth. Put 63% in the memo; don’t put it straight into a financial model.
FEATUREDAI HOT (Curated Pool)· aihot-apiZH16:24 · 05·24
→TrapDoor Supply Chain Attack Makes AI Assistants a New Attack Surface
TrapDoor hit npm, PyPI, and Crates.io with 34 malicious packages, using manipulated CLAUDE.md and .cursorrules files in pull requests to make Claude Code and Cursor treat attacker content as trusted instructions and run malicious commands.
#Agent#Code#Safety#npm
why featured
HKR-H/K/R all pass: AI coding assistants become the execution surface, with 34 malicious packages across three registries. Single-post sourcing lacks IOCs, timeline, and victim scale, so this stays in the 78–84 band.
editor take
TrapDoor turns CLAUDE.md and .cursorrules into supply-chain payloads; coding agents are now paying for treating repo text as authority.
sharp
TrapDoor’s sharp edge is not the 34 malicious packages; it is the break in context trust. The campaign hit npm, PyPI, and Crates.io, targeting wallets, SSH keys, and cloud credentials. The wild part is the delivery path: PRs injected manipulated CLAUDE.md and .cursorrules files, then Claude Code and Cursor treated repo text as project authority. That is exactly the security debt coding agents created by making “read the repo rules” a default behavior. Package scanners can flag typosquats; they are much worse at deciding whether an instruction file is hostile.
→DeepSeek Announces Permanent 75% Discount on Flagship AI Model
Bloomberg’s headline says DeepSeek will make a 75% discount on its flagship AI model permanent; the RSS body only lists the Hacker News entry with 46 points and 45 comments, and the post does not disclose the model name, pricing, or effective date.
#DeepSeek#Bloomberg#Hacker News#Product update
why featured
HKR-H/K/R pass on the permanent 75% discount and cost-competition angle. The RSS body only shows HN traction and omits model name, price, and timing, so this stays in low featured.
editor take
DeepSeek made the 75% flagship discount permanent; stop calling this promo pricing. The closed-model API margin story just took another cut.
sharp
Three headlines align on the same payload: DeepSeek is making a permanent 75% discount on its flagship AI model. That looks like one Bloomberg-led source chain; the scraped body does not disclose the model name, original price, or token pricing.
My read: DeepSeek is turning discounting from a customer-acquisition tactic into the reference price. A 75% permanent cut changes procurement math, not just developer sentiment. OpenAI and Anthropic can still defend premium pricing with tools, enterprise controls, and long-context workflows. The exposed layer is everyone reselling “good enough” inference with thin differentiation. If your pitch is model access plus a wrapper, DeepSeek just made your gross margin look fictional.
→Using llama.cpp native tools for web RAG inside llama-server WebUI
A Reddit user describes using llama.cpp native tools for web RAG inside llama-server WebUI with a 7-step setup: enable get_datetime and exec_shell_command, then run wget through firejail, a separate Linux user, and an Alpine OCI VM sandbox.
#RAG#Tools#Agent#llama.cpp
why featured
HKR-H/K/R all pass: the post gives a concrete local web-RAG recipe with sandboxing. It is a community tutorial, not a model or product launch, so the narrow reach and source authority keep it at the low featured band.
editor take
Only the title and summary are visible; Reddit 403 blocks the body. Still, llama.cpp web_fetch inside WebUI turns sandboxing into product work.
sharp
llama.cpp becomes a security product the moment tool calling reaches the WebUI. The summary gives a 7-step setup: enable get_datetime and exec_shell_command, then run wget through firejail, a separate Linux user, and an Alpine OCI VM. That is ugly plumbing, but it points at the right failure mode: web RAG risk is not retrieval; it is letting page text sit near command execution.
Reddit returns 403, so I cannot verify the prompts, permission flags, or llama-server version. Still, this is more useful than another hosted agent demo. Local agents do not get managed egress, filesystem policy, identity, or audit logs for free. The user ends up assembling a small security platform around one wget call.
The title says Greg Brockman discusses the 72 hours that nearly killed OpenAI; the RSS body only lists the article URL, Hacker News comments URL, 4 points, and 0 comments, and the post does not disclose event details.
#Greg Brockman#OpenAI#Commentary
why featured
HKR-H and HKR-R pass: Brockman on OpenAI's 72-hour crisis has a strong hook and governance resonance. HKR-K fails because the feed discloses no concrete details, keeping it in the 60–71 band.
editor take
Brockman gives a firsthand account of the 72 hours after Sam Altman's firing in Nov 2023 — quitting the same day, planning a backup company called Phoenix at Sam's house, and the moment Ilya's twee...
sharp
The reason this is worth opening: Brockman finally told his side of the 72 hours that nearly broke OpenAI in November 2023. On Shane Parrish's podcast, he shared details that weren't public before — where he was when the board called, why he quit the same day, how the backup company "Phoenix" was architected at Sam Altman's house the next morning, and the moment Ilya Sutskever's regret tweet changed the trajectory.
Both sources covering this — Hacker News front page and an AI newsletter — are just relaying the same podcast. No independent reporting, no cross-checking. The alignment is total because there's only one source: Brockman's own account.
I'd take this with a grain of salt. It's a firsthand memoir, not a third-party reconstruction. Brockman is telling this story two and a half years after the fact, and he's still OpenAI's president. The Phoenix company details, Ilya's real motivations, the board's full reasoning — we're only hearing one side. What's missing is any public response from Ilya or the former board members who made the call.
→ICML 2026: First Parallel Thinking Framework for Vision-Language Models
Visual Para-Thinker introduces a parallel thinking framework for vision-language models, using Pa-Attention and LPRoPE to isolate four visual reasoning paths and training on 163,000 question-answer pairs.
#Multimodal#Vision#Reasoning#Visual Para-Thinker
why featured
HKR-H/K/R pass: the ICML 2026 paper offers a concrete parallel-thinking mechanism, four isolated paths, and 163K training pairs. It remains a single research release without broad replication or product impact, so it fits 78–84.
editor take
Visual Para-Thinker splits VLM reasoning into four visual paths; I buy the mechanism, not the “first framework” victory lap.
sharp
Visual Para-Thinker’s useful part is the mechanism, not the “parallel thinking” branding. It isolates four visual reasoning paths with Pa-Attention, keeps shared position ranges unbiased, then adds LPRoPE so paths stay distinguishable. The training set is also concrete: 163,000 QA pairs distilled mainly from Qwen3-VL-235B-A22B-Instruct.
That targets a real VLM failure mode. Long CoT often dilutes attention over visual tokens, which shows up as hallucination rather than better reasoning. The reported gains are nontrivial: +12.6 / +6.3 on V* for 3B / 7B, and +6.1 / +5.0 on HallusionBench. I don’t buy the “first framework” framing, since K2.5, Step3-VL, and LongCat-Flash-Thinking already explored reasoning width. This reads more like a clean VLM-specific patch; the open question is whether it holds outside curated perception benchmarks.
Meta is pushing some post-layoff employees into new roles: some engineering managers are returning to IC work, while some Infra and AI engineers are being reassigned to data labeling; the article cites a manager-to-report ratio shift from 1:8 to 1:50 and says Meta holds a 49% stake in Scale AI.
#Agent#Fine-tuning#Meta#Scale AI
why featured
HKR-H/K/R all pass: the piece has a concrete oddity, numbers, and a job-security nerve. It is still workforce reporting rather than a model launch or executive departure, so it sits in the lower featured band.
editor take
Meta is pushing managers back to IC and infra/AI engineers into labeling; this smells less like efficiency and more like attrition by humiliation.
sharp
Meta’s sharp move is not layoffs; it is repricing expensive engineering labor as interchangeable workflow. The article gives two concrete hooks: manager span moving from 1:8 to 1:50, and infra plus AI engineers being reassigned to data labeling. The first cuts middle management. The second is harsher: distributed-systems talent gets harvested for “expert labeling.”
I don’t buy the clean “data moat” story. Meta reportedly holds 49% of Scale AI, yet still pushes internal engineers into labeling. That smells like a retention filter: people who tolerate it stay, the expensive people with market value leave first. OpenAI and Anthropic also chase high-quality data, but they rarely make scarce engineers visibly look like a labeling line.
→Anthropic’s Three Cards Surface: Mythos 1 Appears, Opus 4.8 Spotted
Xinzhiyuan says Anthropic’s claude-opus-4.8 appeared in Google Vertex AI, while a 59.8MB Claude Code source-map leak with 512,000 TypeScript lines exposed Sonnet 4.8 references and Mythos 1 clues tied to Claude Code and Claude Security.
#Code#Safety#Vision#Anthropic
why featured
HKR-H/K/R all pass, but this is a leak plus Vertex listing, not an Anthropic launch. No capability numbers, pricing, context window, or reproducible evals, so it stays in the 78–84 band.
editor take
Only the summary has signal: claude-opus-4.8 on Vertex AI plus a 59.8MB source-map leak. This smells like release plumbing, not a capability launch.
sharp
Anthropic’s signal here looks like an engineering leak, not a model reveal. The article body is just a WeChat verification page, so the usable facts come from the summary: claude-opus-4.8 appeared on Google Vertex AI, and a 59.8MB Claude Code source-map leak exposed 512,000 TypeScript lines with Sonnet 4.8 and Mythos 1 references. That is concrete enough to take seriously, but pricing, context window, benchmarks, and launch timing are missing.
I would not auto-file Mythos 1 as a frontier model. The clues tie it to Claude Code and Claude Security, which sounds more like product packaging or a security layer than a clean model-family launch. Anthropic has spent the last year turning coding agents into distribution. This leak has weight because of where the names surfaced, not because it proves a capability jump.
→AI Agent Completes Chip Design from 219 Words to 7nm GDSII Without Engineer Input
Verkor’s Design Conductor generated an ASAP7 7nm GDSII layout for the VerCore RISC-V CPU from a 219-word English spec in 12 hours, with no engineer in the design loop; the reported result scored 3,261 CoreMark at 1.48GHz, but it has not been fabricated and lacks cache implementation.
#Agent#Code#Tools#Verkor
why featured
HKR-H/K/R all pass, but VerCore is not taped out and lacks cache, so the claim stays at demo-and-benchmark level. Concrete numbers and test conditions put it in the 78–84 recommendation band.
editor take
Verkor pushed AI chip design to GDSII, but don’t get dazzled by “7nm”: ASAP7, no cache, no silicon; the hard part is 12-hour toolchain control.
sharp
Verkor’s hard result is not the 3,261 CoreMark score; it is Design Conductor turning a 219-word spec into a closed RTL-to-GDSII loop. In 12 hours, it produced an ASAP7 7nm layout for VerCore at 1.48GHz and 2,809 µm². The useful detail is the debugging path: it converted VCD to CSV, wrote Python, found a bad JAL flush, patched RTL, and reran tests.
But “AI designed a production chip” is still a stretch. ASAP7 is an academic predictive PDK, VerCore has no cache, no out-of-order logic, and no fabricated silicon. The performance reference is a 2011 Celeron SU2300. Cadence and Synopsys have spent the last year selling AI EDA copilots; Verkor is more aggressive because the agent runs the whole flow. I buy the direction. I don’t buy the 7nm victory lap.
→AI-generated articles now outnumber human-written ones: what is left for the brain?
Graphite sampled 43,000 CommonCrawl articles and found AI-generated English articles exceeded human-written ones from November 2024, with its detector reporting about a 4.2% false-positive rate and 0.6% false-negative rate.
HKR-H/K/R all pass: the article has a sharp web-content crossover claim, concrete sampling/error numbers, and clear data-quality resonance. Single-study sourcing and no platform-level impact keep it below the 78 band.
editor take
Graphite’s 43k CommonCrawl sample says AI articles crossed 50%; I buy the pollution trend, not the “humans stopped writing” panic.
sharp
Graphite’s finding reads more like an SEO-farm health check than proof that human writing has collapsed. Its 43,000 CommonCrawl sample says AI-written English articles exceeded human-written ones from November 2024. But the detector has a 4.2% false-positive rate and 0.6% false-negative rate, so the 50% crossing is fuzzier than the headline sells.
The nastier part is the measurement gap: “pure AI-generated” content excludes AI drafts edited by humans. For training corpora and search indexes, that hybrid layer is harder to filter than obvious slop. The 2024 Nature model-collapse paper supports the contamination concern, but jumping from web article share to “your brain is shrinking” needs user-behavior data and quality segmentation.
→Vision-capable LLMs vs. OCR for long-document QA with charts, images, and tables
The author tested Claude Sonnet 4.5 on 171 questions from 30 image-heavy MMLongBench-Doc PDFs, comparing native PDF vision use with OCR pipelines. Native PDF ranked fifth of six at 52.0% accuracy and cost $0.2552 per query, while LlamaCloud premium with full context reached 59.6% at $0.1885 per query.
#Vision#RAG#Benchmarking#Claude
why featured
HKR-H/K/R pass: the post gives 30 PDFs, 171 questions, accuracy, and per-question cost for long-document QA. Limited sample and Reddit sourcing keep it in the featured-threshold band.
editor take
Only the summary is visible, but Sonnet 4.5 native PDF looks worse and pricier than OCR here. Don’t default to vision-PDF ingestion.
sharp
Sonnet 4.5 native PDF reading loses cleanly in the visible summary: 30 MMLongBench-Doc PDFs, 171 questions, 52.0% accuracy, and $0.2552 per query. LlamaCloud premium with full context hits 59.6% at $0.1885 per query. Reddit 403 blocks the body, so I can’t inspect prompts, sampling, judge setup, or page-count distribution, and I wouldn’t treat this as a leaderboard.
The result still matches the engineering pattern: long-document QA usually fails in layout parsing, table structure, chunking, and context packing before it fails in raw “can the model see images” capability. Native vision-PDF ingestion is a nice demo path, but production pipelines still need OCR/layout tooling when charts, tables, and scanned pages dominate. The lazy path is now visibly more expensive too.
→It's OK to Quantize the KV Cache; Model Quant Matters More in Qwen3.6 27B KLD Tests
Reddit user hopbel tested Qwen3.6 27B with approximate KLD on wikitext-2 at 16k context, using Q5_K_M as the proxy baseline; Q5_K_S weights with q4_0 KV cache scored 0.016304, while Q4_K_XL with f16 KV cache scored 0.026067, so weight quant tier dominated KV-cache quant in this setup.
#Inference-opt#Benchmarking#Qwen#llama.cpp
why featured
HKR-H/K/R all pass, backed by first-person test numbers. Source is a single Reddit post, the metric is approximated KLD, and the claim is narrow, so it sits at the featured threshold.
editor take
This Reddit result is a local-inference budgeting note: protect weight quant first; q4_0 KV cache did less damage here.
sharp
Hopbel’s numbers challenge a common local-inference instinct: on Qwen3.6 27B, wikitext-2, and 16k context, weight quantization hurt more than KV-cache quantization. Q5_K_S weights with q4_0 KV scored 0.016304 approximate KLD, below Q4_K_XL with f16 KV at 0.026067. The proxy baseline was Q5_K_M, not full fp16.
I’d treat this as a config-priority signal for llama.cpp and Unsloth users, not a law. The Reddit body is blocked by 403, so I can’t inspect seeds, prompt mix, throughput, or VRAM curves. wikitext-2 is also language-modeling terrain, not long-horizon agent tool use. Still, for 16k local deployment, don’t sacrifice the weight tier just to keep f16 KV.
FEATUREDComputing Life · Share (鸭哥 research reports)· rssZH00:00 · 05·24
→You May Have Coded for 10 Years, but You Are Still a Beginner with AI
The article discusses the debate sparked by Armin Ronacher using Pi to develop Pi, citing issue tracker data to argue that experienced programmers can still be misled by confident but wrong AI outputs.
#Code#Agent#Armin Ronacher#Commentary
why featured
HKR-H/K/R all pass, but this is commentary around the Armin Ronacher debate, not a model or product launch. The issue-tracker evidence lifts it to the featured threshold.
editor take
The Ronacher/Pi case lands, but don’t turn steering into mysticism; without issue counts, this is craft lore, not evidence.
sharp
I buy half of the claim that “ten-year programmers are AI beginners.” The Armin Ronacher/Pi dispute hits a real failure mode: senior engineers bring old debugging instincts to model output, while confident wrong answers quietly reset their review rhythm.
The evidence is thin in the provided text. The snippet says it uses issue tracker data, but gives no issue count, error taxonomy, fix time, or even a clear description of whether Pi is a model, toolchain, or project setup. Downgrading double-checking and elevating steering needs reproducible tasks, not just taste. SWE-bench-style coding-agent results already show models breaking on long-horizon state and local confidence, not merely on users asking badly. This reads like a useful corrective for veteran ego, not proof that the definition of expert has changed.