all posts

▸ 50 items · updated 3m ago

browse by day4283 items · 60 days

May 2026

MTWTFSS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2573 26105 27120 28142 29116 3064 3162

June 2026

MTWTFSS

1150 2157 3132 4117 5127 669 773 8141 9135 1084 1196 1288 1346 1434 1570 1682 1775 1886 1955 2027 2120 2274 2374 2468 2564 2640 2724 2837 2956 3083

July 2026

MTWTFSS

156 271 347 421 527 664 758 865 975 1050 1134 1228 1345 1484 1582 1683 1745 1818 1938 2051 2170 2265 2340 24 25 26 27 28293031

2026-07-13 · Mon

16:06

15d ago

FEATUREDHacker News Frontpage· rssEN16:06 · 07·13

→Apple's new SpeechAnalyzer beats Whisper Small on accuracy in first public benchmark

Inscribe benchmarked Apple's new SpeechAnalyzer API against the legacy SFSpeechRecognizer and three Whisper models on 5,559 LibriSpeech utterances. SpeechAnalyzer hit 2.12% WER on clean speech and 4.56% on noisy speech, beating Whisper Small by 1.62 and 3.39 points respectively while running ~3x faster. The legacy API scored 9.02% WER, worse than the 40MB Whisper Tiny. All engines ran fully on-device on an M2 Pro. Inscribe switched its default engine to SpeechAnalyzer and released all transcripts and scoring code. The post does not disclose SpeechAnalyzer's model architecture or parameter count.

#Benchmarking#Apple#OpenAI#Inscribe

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Apple's new SpeechAnalyzer hits 2.12% WER, beating Whisper Small by 1.62 points and running 3x faster.

sharp

This is worth opening because Apple shipped SpeechAnalyzer with zero accuracy numbers, and Inscribe just published the first real comparison on 5,559 LibriSpeech utterances. The result is clean: 2.12% WER on clean speech, 4.56% on noisy, crushing the legacy SFSpeechRecognizer's 9.02% and edging out Whisper Small while running 3x faster, all on-device on an M2 Pro. I'd discount it slightly: LibriSpeech is read speech, not real meetings or street noise. But the Whisper numbers match OpenAI's own published figures almost exactly, which validates the harness. Inscribe already switched its default engine and released all transcripts and scoring code. For teams building English voice products, this basically means Whisper is no longer the automatic accuracy pick on Apple hardware. Whisper's remaining advantages are multilingual coverage and cross-platform support. The post doesn't disclose SpeechAnalyzer's architecture or parameter count, so don't speculate there.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

15:56

15d ago

Hacker News Frontpage· rssEN15:56 · 07·13

→Go-Flavored Concurrency in C: Tradeoffs with POSIX Threads

The author replicates Go's concurrency model in C for the Solod transpiler using POSIX threads. sync.Mutex and Cond wrap pthread primitives; atomics map to compiler builtins with performance matching Go. However, conc.Go spawns OS threads, not goroutines, making them expensive for short tasks—worker pools are recommended. The post does not disclose the pool's implementation details or benchmarks.

#Inference-opt#Anton Zhiyanov#Solod

editor take

Replicates Go's concurrency in C via POSIX threads—atomics match Go's speed, but conc.Go spawns an OS thread per call, making short tasks expensive.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

55

SCORE

H0·K1·R0

15:34

15d ago

TechCrunch AI· rssEN15:34 · 07·13

→Anthropic starts localizing Claude pricing for India, its biggest market after the US

Anthropic has started showing Indian rupee pricing for Claude subscriptions in India, its largest market outside the US. But payments still require a card or Apple/Google billing—no support yet for India's popular UPI instant payments. OpenAI rolled out rupee pricing and UPI for ChatGPT back in August.

#Anthropic#Claude#OpenAI

editor take

Anthropic shows rupee pricing in India but still no UPI support—OpenAI had both since last August.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

55

SCORE

H0·K1·R0

14:02

15d ago

Hacker News Frontpage· rssEN14:02 · 07·13

→Clawk: Give coding agents a disposable Linux VM, not your laptop

Clawk is an open-source tool that routes AI coding agent commands into an isolated, disposable Linux VM instead of your local machine. The post body is a thin RSS snippet from the GitHub README—it doesn't detail which agents are supported, how the VM lifecycle is managed, or what the performance overhead looks like. The headline's core pitch is safety through isolation: if the agent messes up, it only trashes a throwaway VM.

#Clawk

editor take

Clawk routes coding agent commands into a disposable Linux VM—if the agent breaks something, only the VM dies.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

68

SCORE

H1·K0·R1

13:35

15d ago

Hacker News Frontpage· rssEN13:35 · 07·13

→Grok CLI uploaded the entire home directory to GCS

A user found that the Grok CLI tool uploaded their entire home directory to Google Cloud Storage. The post is just a tweet link—no details on whether this is a bug or intended behavior, and no mention of scope. Only the title is disclosed so far.

#Grok#Google Cloud Storage

editor take

Just a tweet claiming Grok CLI uploaded the entire home dir to GCS. No word on bug vs intended, don't panic yet.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

72

SCORE

H1·K0·R1

12:28

15d ago

AI Era (新智元) · WeChat· rssZH12:28 · 07·13

→AI overhauled Tao's 30-year-old homepage and found two bugs hidden for over 20 years

Someone used AI to completely revamp mathematician Terence Tao's personal homepage, which hadn't been touched in 30 years. The AI not only redesigned the layout and style but also uncovered two code bugs that had been hidden for over two decades. The post doesn't specify which model or tool was used, but the result is intriguing: bugs in old code that even the owner hadn't noticed.

#陶哲轩

editor take

AI revamped Terence Tao's 30-year-old homepage and dug out two bugs hidden for over 20 years.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

62

SCORE

H1·K1·R0

12:00

15d ago

Financial Times · Technology· rssEN12:00 · 07·13

→Intel invests €5bn in Irish plant as AI chip demand surges

Intel is pouring €5 billion into its Irish fabrication plant to expand advanced chip capacity. The move directly targets surging demand for AI chips from data centers and edge devices. The post doesn't specify which process node the money will fund or a timeline, but the scale signals Intel's urgency to catch up with TSMC and Samsung in the capacity race.

#Intel#TSMC#Samsung

editor take

Intel drops €5B on Irish fab expansion but won't say which node — that's the whole difference between real urgency and noise.

HKR breakdown

hook —knowledge —resonance —

→ open source

55

SCORE

H0·K0·R0

11:51

15d ago

Hacker News Frontpage· rssEN11:51 · 07·13

→DOM-docx: Convert HTML to native, editable Word docs, MIT licensed

A new GitHub project, DOM-docx, converts semantic HTML fragments into native OOXML Word documents. The output is editable docx, not images or PDF. Licensed under MIT, it currently has 22 stars. The post doesn't specify which HTML tags are supported, conversion speed, or browser compatibility—those details require checking the source code or running it.

#Code#GitHub#floodtide#Open source

editor take

DOM-docx converts HTML fragments into native, editable .docx files—not PDFs or images. MIT license, 22 stars.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

55

SCORE

H1·K1·R0

11:45

15d ago

FEATUREDHacker News Frontpage· rssEN11:45 · 07·13

→Control the Ideas, Not the Code

Redis creator antirez argues that line-by-line code review no longer makes sense when LLMs can generate 5k lines a day. Models are strong at local code but weaker on big-picture design, so engineers should shift to controlling ideas, doing more QA, and having LLMs write DESIGN.md files that capture the thinking behind data structures. He cites his own Redis sorted-set memory optimization: he still reviews manually out of respect for users, but believes GPT 5.6 and Fable would catch more bugs. For juniors, he recommends building an interpreter or hash table instead of reviewing customer JS.

#Code#antirez#Redis#DeepSeek v4

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

antirez: line-by-line code review is dead when LLMs write 5k lines/day. Control the ideas and QA, not the code.

sharp

antirez isn't picking a fight here — he's giving anxious programmers a way to reframe. His core claim: GPT 5.6 and Fable are good enough at local code that line-by-line review is now a bad tradeoff. He's dogfooding this on his own Redis sorted-set memory optimization PR, where he still reviews manually but admits it's mostly pointless — the models would catch more bugs. His advice for juniors is sharper: skip reviewing customer JS, go build an interpreter or hash table instead. The real skill shift is from "how is this written" to "why is it designed this way." The DESIGN.md trick is practical — have the LLM write down the thinking behind data structures in plain language, so both humans and future models can understand it. I'd discount this maybe 30%. antirez has two decades of C under his belt; his "code doesn't matter" stance comes from a height of abstraction most teams don't have. If you drop code review entirely, locally-optimal LLM chunks can quietly rot the architecture. But he's right about the direction: shift energy from reading code to setting design constraints and testing hard.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

11:41

15d ago

AI HOT (Curated Pool)· aihot-apiZH11:41 · 07·13

→German AI consortium releases Soofi S, an open 30B model topping English and German benchmarks

Soofi S is a 30B open model trained entirely on Deutsche Telekom's AI cloud, coordinated by the German AI Association. It uses a hybrid Mamba-Transformer MoE architecture—31.6B total params, only 3.2B active per token—so throughput stays nearly flat even at 256K context. At 40K tokens with 32 parallel requests, it generates roughly 8× more tokens/sec/GPU than dense models of similar size. The 27T-token training mix deliberately raised German data from 7.2% to 15.3% across phases. It beats Olmo 3 32B and Apertus 70B on German, English, and coding benchmarks. Caveat: only a pretraining report is out; no instruction-tuned or RLHF version yet.

#Code#KI Bundesverband#Deutsche Telekom#Soofi S

editor take

31.6B total, 3.2B active per token, German data at 15.3%, and throughput stays flat at long context—but only a pretraining report, no instruct version yet.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

72

SCORE

H1·K1·R0

11:12

15d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH11:12 · 07·13

→Tencent Hunyuan open-sources HyOCR-1.5: a 1B end-to-end OCR model with 6.37× faster inference

Tencent Hunyuan fully open-sourced HyOCR-1.5—training, inference, and model weights—a first for end-to-end OCR large models. The 1B-parameter model handles 8+ text-centric tasks and scores 94.74 on OmniDocBench v1.6, ranking first end-to-end. DFlash speculative decoding speeds up inference 6.37× under Transformers and 2.14× under vLLM, hitting 1.408s per page. It supports 4K resolution and a 128K context window, and uses Agentic Data Flow to extend low-resource OCR to 331 languages, ancient script recognition, and multi-image QA.

#Tencent Hunyuan#HyOCR-1.5#DFlash#Open source

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

A 1B fully open-source OCR model hits 94.74 on OmniDocBench and 6.37× faster inference—end-to-end first place.

sharp

This one's worth opening because Tencent Hunyuan fully open-sourced HyOCR-1.5—training code, inference code, and model weights—a first for end-to-end OCR large models. It's only 1B parameters, scores 94.74 on OmniDocBench v1.6 (first place end-to-end), and handles 8+ text-centric tasks like document parsing, formula recognition, and table restoration. On inference speed: they used a speculative decoding method called DFlash, hitting 6.37× acceleration under Transformers and 2.14× under vLLM, landing at 1.408 seconds per page end-to-end. It supports 4K resolution and a 128K context window, and uses Agentic Data Flow to stretch low-resource OCR to 331 languages, ancient script recognition, and multi-image QA. I'd discount this a bit for now—we only have the RSS snippet, no full eval comparisons or failure cases. A 1B model running 4K resolution with a 128K context window raises real questions about memory and long-sequence stability, and the post doesn't spell those out. But fully open-source plus end-to-end first place is a strong combo. If you're building document parsing or OCR pipelines, it's worth pulling the code and running it yourself.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

09:58

15d ago

AI HOT (Curated Pool)· aihot-apiZH09:58 · 07·13

→Meta expands Louisiana data center to 5GW, total investment exceeds $50B

Meta is expanding its Louisiana data center to 5GW, with total investment exceeding $50 billion. Meta covers all energy and water costs, and signed a deal with Entergy to fund seven new gas plants, grid batteries, and nuclear capacity. The post doesn't specify GPU count or timeline.

#Meta#Entergy#Funding

editor take

Meta is spending $50B+ to push its Louisiana data center to 5GW, covering all energy and water costs itself.

HKR breakdown

hook —knowledge —resonance —

→ open source

55

SCORE

H0·K0·R0

09:48

15d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH09:48 · 07·13

→ByteDance's Seedream 5.0 Pro: point, box, scribble to edit images locally

ByteDance released Seedream 5.0 Pro. Image quality and prompt understanding match GPT-Image 2.0; overall capability ranks second. The standout is editable interaction: place points, draw boxes, or scribble on the image, then @-tag in the prompt to replace a sofa or change wall color precisely while leaving other areas untouched. Demos include swapping six furniture items at once, an exploded keyboard view with callouts, and poster text placed in drawn boxes. Color palette and SKU color swaps are supported. The Volcano Engine API is live; Jimeng, Doubao, and Lumina offer access.

#Vision#ByteDance#Seedream 5.0 Pro#GPT-Image 2.0

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Seedream 5.0 Pro turns local editing into point-and-@-tag: halving the friction for interior and e-commerce image revisions.

sharp

The interaction model is what makes this worth a click. Instead of regenerating the whole image to swap a sofa, you place a point or draw a box on the image, @-tag it in the prompt, and only that region changes. ByteDance's demos are practical: six furniture swaps in one go, an exploded keyboard view with callouts, poster text placed inside drawn boxes, plus color palette and SKU swaps. Image quality and prompt understanding match GPT-Image 2.0, ranking second overall. The Volcano Engine API is live, with Jimeng, Doubao, and Lumina offering access. Two things I'd watch: edge blending in complex scenes, and multi-turn @-tag stability. If both hold up, e-commerce and interior design revision workflows get a real shortcut.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

09:00

15d ago

FEATUREDThe Verge · AI· rssEN09:00 · 07·13

→Waze integrates Gemini for natural-language road reporting

Waze is integrating Google's Gemini AI so drivers can report road conditions in plain language instead of memorizing fixed commands. The post doesn't specify a launch date but confirms Gemini will parse natural phrases like "cop ahead" and generate shorter voice prompts to reduce chatter.

#Waze#Google#Gemini

why featured

Featured · importance 72 · hook + knowledge

editor take

Waze adds Gemini-powered voice reporting. Both sources agree because they're working from the same official announcement, but neither mentions offline capability or latency.

sharp

Waze is rolling out Gemini-powered conversational reporting: you can describe road hazards in natural language instead of tapping preset buttons. The app also promises less chatty AI navigation prompts, cutting down on unnecessary interruptions. The Verge and TechCrunch are telling the same story because they're both working from Waze's official announcement—no independent testing, no third-party benchmarks. I'd take the "conversational reporting" pitch with a grain of salt until we see real-world voice recognition accuracy and latency numbers. Those matter a lot when you're driving. One gap worth flagging: neither source mentions whether this runs on-device or requires a cloud connection. Waze is a Google product, so Gemini integration isn't surprising, but if you're on a mountain road with spotty signal, this feature might just go silent.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

72

SCORE

H1·K1·R0

08:10

15d ago

AI Chat-Group Daily (群聊日报)· atomZH08:10 · 07·13

→Subsidy war: Claude extends Fable 5, OpenAI removes Codex 5-hour limit

Anthropic extended Fable 5 paid access to July 19 and gave 50% more Claude Code weekly quota. OpenAI removed the 5-hour limit on Codex, announced 6M active users, did another reset within an hour, and will give everyone a banked reset tomorrow for 700M milestone. Day 4 with GPT-5.6: consensus shifted from 'which is stronger' to 'how to manage'—5.5 you push to do more, 5.6 you push to do less. Ultra scheduling is dumb—all subtasks use sol max with no model customization. Grok CLI npm package uploads the entire working directory every task and steals Claude Code config and plaintext API keys—rotate keys if you used it.

#Anthropic#OpenAI#Claude

editor take

Grok CLI npm package uploads your entire working directory every task and steals Claude Code config and plaintext API keys—rotate keys now if you used it.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

55

SCORE

H1·K1·R1

04:47

15d ago

FEATUREDAI Chat-Group Daily (群聊日报)· atomZH04:47 · 07·13

→GPT-5.6 Sol Pro decoded: 'Pro' is a reasoning mode, not a new model

Packet capture reveals OpenCode's Sol Pro is just gpt-5.6-sol with reasoning.mode: "pro" — not a separate model. Mode, effort, and service_tier can be freely combined. A simple greeting jumps from 12 to 1,527 input tokens with Pro enabled, roughly 100x more expensive. Separately, GPT-5.6 now charges for cache writes, potentially doubling Codex costs for long tasks. One user burned 19B tokens in two days, 98% from cache reads. The biggest shock: a researcher's 2024 open problem was solved by gpt-5.6-sol ultra in 46 minutes, verified correct by Fable.

#Reasoning#Code#OpenAI#GPT-5.6

why featured

Featured · importance 72 · hook + knowledge + resonance

editor take

OpenCode's Sol Pro isn't a separate model — just gpt-5.6-sol with reasoning.mode: "pro", making a greeting 100x more expensive.

sharp

This one's worth opening because someone actually packet-captured the truth: Sol Pro in OpenCode isn't a separate model. It's just gpt-5.6-sol with reasoning.mode: "pro" tacked on. Mode, effort, and service_tier can be freely combined — you can run pro + xhigh + priority in a single request. A simple greeting jumps from 12 to 1,527 input tokens with Pro enabled, roughly 100x more expensive. The bigger story from the same day: a researcher's 2024 open problem was solved by gpt-5.6-sol ultra in 46 minutes, verified correct by Fable. His calibration is useful — 5.4 was the first version that solved something they considered important; 5.6 just flattened him. I'd discount this a bit: the post doesn't spell out the problem domain or difficulty, and 46 minutes is one data point. But the token explosion and cache-write charges are confirmed cost signals — GPT-5.6's reasoning investment is shifting from "tweak a parameter" to "flip the money-burn switch." Run the numbers before you commit.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

72

SCORE

H1·K1·R1

03:00

15d ago

Financial Times · Technology· rssEN03:00 · 07·13

→Employers pushed staff to use AI more. That has backfired

FT reports that top-down mandates to use AI are backfiring. The article doesn't disclose specific data or cases, but the headline highlights a counterintuitive outcome: forced adoption can trigger resistance or misuse, hurting productivity. For AI practitioners, it's a reminder to factor in human and organizational dynamics, not just model metrics.

#Financial Times

editor take

FT: Top-down AI mandates are backfiring on productivity. Full article behind paywall, but the headline alone is a useful warning.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

55

SCORE

H1·K0·R1

01:24

15d ago

Hacker News Frontpage· rssEN01:24 · 07·13

→HN users propose an AI-generated article flag; dang says tagging may come

HN user levkk proposed adding an 'AI-generated' flag so readers can skip such articles. Admin dang replied that HN already bans AI-generated text on the site, but no rule yet covers external articles. He noted the community is developing an allergy to LLM-sounding prose—once triggered, the writing gets relegated to low-status instantly. Dang said HN may finally add a 'reason for flagging' step, with 'I think it's genai' as one option. The post doesn't clarify whether the flag would affect ranking or give a timeline.

#Hacker News#levkk#dang

editor take

dang says HN readers are developing an allergy to LLM-sounding prose—once triggered, the article gets instantly relegated to low-status, and a 'genai' flag reason is coming.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

72

SCORE

H1·K0·R1

00:55

15d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH00:55 · 07·13

→Nvidia quarterly revenue nears $100B; Rubin Ultra on track for next year, says Jensen Huang

At a Morgan Stanley NDR, Jensen Huang delivered three signals: quarterly revenue is approaching $100B and growth is still accelerating; Rubin Ultra is not delayed and will ship next year; a leading AI lab that previously relied almost entirely on ASICs now sources nearly 50% of its compute from Nvidia GPUs—widely read as Anthropic. Nvidia sees growth coming from AI labs, cloud hyperscalers, and sovereign AI. Its CPU business is targeting $20B this year. Morgan Stanley kept a $288 target and said the real bottleneck is no longer demand but delivery constrained by memory, power, and data center space.

#Nvidia#Jensen Huang#Morgan Stanley

why featured

Featured · importance 88 · hook + knowledge + resonance

editor take

Jensen Huang dropped three hard numbers at a Morgan Stanley NDR: quarterly revenue nearing $100B, Rubin Ultra on track for next year, and Anthropic shifting nearly 50% of compute to Nvidia GPUs.

sharp

The reason to click is that ASIC-to-GPU shift detail. Investors have been worried that Google and Amazon's custom chips would eat Nvidia's lunch. Huang's counter: a leading AI lab that used to run almost entirely on ASICs now sources nearly half its compute from Nvidia GPUs. The market reads this as Anthropic. If true, it means customers are optimizing for cost per token, not chip price — and Nvidia's stack often wins that math. On Rubin Ultra, Huang was direct: the 2028 delay rumor was a misread. They're swapping rack designs to support larger clusters, but the roadmap hasn't changed. I'd discount this a bit — rack redesigns can still slip — but the official line is next-year shipment. The $20B CPU target is easy to miss but worth noting. Vera isn't just a sidekick in GPU servers anymore; it's moving into general-purpose server markets. Morgan Stanley's $288 target and "bottleneck is delivery, not demand" thesis align with Huang's tone. What's missing: actual customer order data and a firm Rubin Ultra production timeline. Those are the real things to watch.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

88

SCORE

H1·K1·R1

00:00

15d ago

Computing Life · Share (鸭哥 research reports)· rssZH00:00 · 07·13

→GPT-5.6 Sol Pro Max Fast: How a community joke became a cheat sheet for OpenAI's API control surface

The community nicknamed GPT-5.6 'Sol Pro Max Fast'—a joke that maps directly onto four real API control dimensions: Sol is the base model, Pro is the reasoning execution mode, Max is the highest reasoning effort tier, and Fast comes from the Priority service tier. The article breaks down four control layers—reasoning, expression, state & tools, and scheduling—explaining what each parameter controls and how billing works. Tests show Pro mode has no separate pricing but can spike input tokens by ~1,500; Priority doubles short-context unit prices. The Responses endpoint is now the recommended default, and the old Assistants API shuts down on August 26, 2026. Going forward, reporting just a model ID isn't enough—you need model, mode, effort, context, and service tier.

#Reasoning#OpenAI#GPT-5.6 Sol#Responses API

editor take

The GPT-5.6 'Sol Pro Max Fast' joke maps directly onto four real API control dimensions—reasoning mode, effort, and priority tier—so you can tell which knobs spike your bill vs. just add latency.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

72

SCORE

H1·K1·R0

00:00

15d ago

FEATUREDComputing Life · Share (鸭哥 research reports)· rssZH00:00 · 07·13

→Turso bets on per-agent SQLite databases, but the new sync engine is still in beta

Turso started with edge SQLite to cut latency, then moved to per-tenant databases, and now pushes the boundary to per-agent or per-task databases. The pitch: libSQL branches plus AgentFS pack files, KV state, and audit logs into a single SQLite file that syncs to the cloud via explicit push/pull. The new Turso Database engine, Sync protocol, and AgentFS are all still in private beta or beta—the post warns not to confuse this with the mature Turso Cloud. Real trade-offs include schema migration across hundreds of thousands of databases and application-level conflict resolution during sync.

#Turso#libSQL#AgentFS

why featured

Featured · importance 72 · hook + knowledge + resonance

editor take

Turso pushes per-agent databases, but the new engine and sync protocol are still in private beta.

sharp

The reason to click: Turso's product arc is unusually clear. They started with edge SQLite to cut latency, moved to per-tenant databases, and are now betting on per-agent or even per-task databases. The pitch is libSQL branches plus AgentFS—pack files, KV state, and audit logs into one SQLite file that syncs via explicit push/pull. But the post draws its own line: the new Turso Database engine, Sync protocol, and AgentFS are all still in private beta or beta. Don't confuse this with the mature Turso Cloud that's been running in production for years. Two real headaches: schema migration across hundreds of thousands of databases, and application-level conflict resolution during sync. The competitive landscape section is honest. Cloudflare Durable Objects come closest to the per-agent model but lock you into Workers—no offline portability. Neon's cloud multi-tenant Postgres is strong on query power and extensions, but there's no local replica. Electric plus PGlite aim for local-first Postgres, yet the bidirectional sync loop isn't closed. If your agent runs on a single machine, plain SQLite is still the cheapest option. I'd discount this a bit: the vision is right, but the product isn't ready for real workloads yet. What to watch: when the sync protocol and AgentFS exit beta, and whether actual customers fill in the schema migration and conflict resolution gaps.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

72

SCORE

H1·K1·R1

2026-07-12 · Sun

23:41

15d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH23:41 · 07·12

→Ploy migrated its production AI agent from Claude Opus 4.8 to GPT-5.6 Sol: 2.2× faster builds, 27% cheaper

Ploy's agent builds real marketing websites. Claude Opus 4.8 held the default slot for four months with no challenger. GPT-5.6 Sol changed that on launch day. Head-to-head: mean build time dropped from 8m to 3m 42s, cost from $3.06 to $2.22, visual score rose from 0.936 to 0.970. The migration wasn't plug-and-play. Their eval harness was tuned to Opus's sequential style; GPT-5.6's parallel tool calls blew through budgets, and roughly a third of initial failures were harness assumptions, not model errors. GPT-5.6 also writes leaner code—one case went from 17,957 CSS characters and 174 variables to 2,508 characters and 45 variables. Design output is clean and modern but tends toward uniformity; brand adherence required extra steering. The post flags tool schemas, caching, and reasoning replay as additional migration steps but doesn't detail the fixes.

#Agent#Code#Benchmarking#Ploy

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Ploy swapped its production website-building agent from Claude Opus 4.8 to GPT-5.6 Sol: build time halved, cost down 27%, visual score up.

sharp

This post is worth reading because Ploy's agent isn't a demo—it builds real marketing websites in production. Claude Opus 4.8 held that slot for four months with no challenger. GPT-5.6 Sol got tested on launch day, and the numbers are stark: mean build time dropped from 8 minutes to 3m 42s, cost from $3.06 to $2.22, visual score from 0.936 to 0.970. The migration wasn't plug-and-play. Their eval harness was tuned to Opus's sequential tool-calling style; GPT-5.6's parallel calls blew through budgets on cases it was solving correctly. Roughly a third of initial failures were harness assumptions, not model errors. GPT-5.6 also writes leaner code—one case went from 17,957 CSS characters and 174 variables to 2,508 characters and 45 variables. The tradeoff: design output trends uniform, and brand adherence needed extra steering. The post flags tool schemas, caching, and reasoning replay as additional migration steps but doesn't detail the fixes. I'd treat this as a practical field manual for production model swaps, which is more useful than any benchmark leaderboard.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

82

SCORE

H1·K1·R1

18:31

16d ago

FEATUREDHacker News Frontpage· rssEN18:31 · 07·12

→I love LLMs, I hate hype

George Hotz is excited about GPT-5.6, GLM-5.2, and coding agents, but calls out two things he hates: negative-valence hype about closing windows and perpetual underclasses, and the strawman jump from 'fancy autocomplete' to 'owning the whole light cone.' He argues AI progress is mostly Moore's law and commoditization, not frontier-lab magic, and that anti-open-source arguments are really about fear of commodification. He also walks back his earlier dismissal of models for programming—he's getting better at using them—but warns they can increase cognitive fatigue and that vibe-coded stuff is still slop.

#Code#George Hotz#OpenAI#Tesla

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Hotz walks back his 'models can't program' take, but warns vibe-coded stuff is still slop and models increase cognitive fatigue.

sharp

This is worth reading because Hotz is publicly correcting his own earlier take. In May he said models couldn't program; now he admits he's getting better at using them—GLM-5.2 running opencode locally installed his tmux config in one shot. But he's not praising vibe coding. His exact words: all vibe-coded stuff is still slop, and he links to a piece on AI-driven cognitive fatigue. The real target here is two groups: people pushing negative-valence hype about closing windows and perpetual underclasses, and people making the strawman jump from 'fancy autocomplete' to 'owning the whole light cone.' His core argument hasn't changed: AI progress is mostly Moore's law and commoditization, not frontier-lab magic, and anti-open-source arguments are really about fear of commodification. This isn't technical analysis—it's a veteran hacker's position statement. He's genuinely excited about GPT-5.6, GLM-5.2, and coding agents, but his skepticism about lab valuations and narrative bubbles is fully intact.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

18:28

16d ago

AI HOT (Curated Pool)· aihot-apiZH18:28 · 07·12

→Juggler: Open-source GUI coding agent from JUCE creator

Juggler is an open-source GUI coding agent built by the creator of JUCE. It lets AI directly manipulate GUI code, useful for desktop app prototyping. The repo just launched on GitHub with 118 stars; the post doesn't spell out supported frameworks or model backends.

#Code#juggler-ai#JUCE#Open source

editor take

JUCE's creator open-sourced a GUI coding agent. 118 stars on GitHub, but no word on supported UI frameworks or model backends.

HKR breakdown

hook —knowledge —resonance —

→ open source

55

SCORE

H0·K0·R0

18:26

16d ago

AI HOT (Curated Pool)· aihot-apiZH18:26 · 07·12

→Codex drops 5-hour cap, GPT 5.6 Sol gets token efficiency boost

Codex and ChatGPT Work temporarily remove the 5-hour usage cap for Plus, Business, and Pro plans. GPT 5.6 Sol gets efficiency improvements that should use fewer tokens, though the post doesn't quantify the savings. Active users hit 6 million, with a usage reset coming in the next hour.

#Code#OpenAI#Codex#GPT 5.6 Sol

editor take

Codex and ChatGPT Work drop the 5-hour cap for now; GPT 5.6 Sol claims token savings but no numbers yet, so I'd discount that.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

72

SCORE

H1·K0·R1

18:25

16d ago

Hacker News Frontpage· rssEN18:25 · 07·12

→Claude Code sends 33k tokens before reading the prompt; OpenCode sends 7k

Systima ran Claude Code and OpenCode on the same model, same machine, and same tasks, capturing every request at the API boundary. In the simplest case Claude Code consumed roughly 33,000 tokens of system prompt, tool schemas, and injected scaffolding before the user prompt arrived; OpenCode used about 7,000. Switching to Claude Fable 5 narrowed the gap to roughly 3.3× because Claude Code sends a shorter system prompt to newer models. The cache-efficiency gap is larger: OpenCode's request prefix was byte-identical every run, so it paid to cache once and read back cheaply; Claude Code rewrote tens of thousands of cache tokens mid-session, writing up to 54× more cache tokens on the same task. In a production setup a 72 KB AGENTS.md file adds another ~20,000 tokens, five MCP servers add 5,000–7,000, pushing the first request to 75,000–85,000 tokens before the user types a word. Subagents inflate cost further: a small task that cost 121,000 tokens directly cost 513,000 when fanned out to two subagents. The one win for Claude Code: on multi-step tasks it batches tool calls into fewer requests, so the whole-task total can come out lower than OpenCode's. The post does not disclose the specific repo or MCP server names used.

#Anthropic#Claude Code#OpenCode

editor take

Systima measured API-level overhead: Claude Code burns ~33k tokens of system scaffolding before the prompt arrives, OpenCode ~7k. The cache gap is worse—Claude Code rewrites cache tokens mid-session.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

72

SCORE

H1·K1·R0

17:59

16d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH17:59 · 07·12

→Codex and ChatGPT Work drop the 5-hour cap, roll out GPT 5.6 Sol efficiency gains

Three updates landed in 48 hours: the 5-hour usage cap is temporarily removed for Plus, Business, and Pro plans; GPT 5.6 Sol is getting efficiency improvements that reduce per-request usage, with numbers promised after quantification; and active users hit 6 million, with a usage reset rolling out within the hour. The post doesn’t say how long “temporarily” lasts or give a range for the efficiency gain, so I’d hold off on pricing that in.

#Codex#ChatGPT Work#GPT 5.6 Sol#Product update

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Codex and ChatGPT Work temporarily drop the 5-hour cap, and GPT 5.6 Sol gets efficiency tweaks — but neither change comes with numbers yet.

sharp

Three updates landed in 48 hours. The headline: the 5-hour usage cap is temporarily removed for Plus, Business, and Pro plans. GPT 5.6 Sol is also getting efficiency improvements that reduce per-request usage, but the post says numbers will come after quantification. Active users hit 6 million, with a usage reset rolling out within the hour. The post doesn't say how long "temporarily" lasts or give a range for the efficiency gain, so I'd hold off on pricing that in.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

17:47

16d ago

Hacker News Frontpage· rssEN17:47 · 07·12

→Against Usefulness: Why the next paradigm needs stubbornly useless research

The author tried Folk Computer, a physical programming system where paper runs code and the room is the display. She argues the entire field is converging on agents, and the next paradigm will come from the strange questions the curve forgot. She traces a lineage from Xerox PARC to CDG, noting that useful companies stand on rails built by stubbornly useless research. The post does not disclose technical specs or a commercialization timeline for Folk Computer.

#Folk Computer#Omar Rizwan#Andrés Cuervo

editor take

Folk Computer turns paper into running code and the room into a display—the author calls it the most human programming she's ever done.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

68

SCORE

H1·K1·R0

17:13

16d ago

FEATUREDHacker News Frontpage· rssEN17:13 · 07·12

→Ploy migrated its production AI agent from Claude Opus 4.8 to GPT-5.6: 2.2x faster, 27% cheaper

Ploy's agent builds real marketing sites. For four months, no model beat Claude Opus. GPT-5.6 Sol is the first. Migration cut build time from 8 min to 3 min 42 sec, cost from $3.06 to $2.22, with a slightly higher visual score. The switch wasn't plug-and-play: eval harness, tool schemas, caching, and reasoning replay all needed rework because the stack had quietly specialized around Opus. The post doesn't disclose GPT-5.6's API pricing or context window.

#Agent#Code#Benchmarking#Ploy

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Ploy swapped its production agent from Claude Opus to GPT-5.6 Sol: 2.2x faster builds, 27% cheaper, but the migration exposed how deeply their stack had specialized around Opus.

sharp

This is worth reading because it's not a benchmark post—it's a production brain-transplant log. Ploy's agent builds real marketing sites, and for four months Opus was unbeatable. GPT-5.6 Sol is the first model to beat it on speed, cost, and visual quality simultaneously: builds dropped from 8 minutes to 3:42, cost from $3.06 to $2.22. The useful bit isn't the numbers—it's the migration. Their eval harness, tool-call budgets, and caching were all tuned to Opus's sequential calling style. GPT-5.6 fans out parallel calls and blew through those budgets. Roughly a third of initial failures were harness assumptions, not model errors. I'd discount this a bit: the post doesn't disclose GPT-5.6's API pricing or context window, so the 27% cost drop is relative to Opus pricing—if the new model costs more, that gap shrinks. Sample sizes are tiny too, just 10-11 builds. But the real value here is the concrete reminder that swapping models in production is never plug-and-play.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

82

SCORE

H1·K1·R1

16:27

16d ago

FEATUREDThe Verge · AI· rssEN16:27 · 07·12

→Apple's canceled car project left behind a powerful AI chip: the M7 Ultra

Apple is speeding up work on the M7 Ultra, a chip that supports up to 1.5TB of RAM and traces its roots to the canceled self-driving car project. Originally built for on-vehicle AI, it's now likely headed to the Mac Pro for local LLM inference. The post doesn't disclose a launch date or price, but 1.5TB is enough to hold full weights of nearly any open-source model without offloading.

#Apple#M7 Ultra

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Apple repurposed its canceled car project's AI chip into the M7 Ultra for Mac Pro, supporting up to 1.5TB RAM to run full open-source model weights locally.

sharp

The 1.5TB number is what makes this worth a click. Current local inference setups top out around 192GB on a Mac Studio, which still forces you to quantize models like DeepSeek-V3. 1.5TB means full weights, no offloading, no external retrieval tricks. The chip's lineage is the interesting part: Apple poured billions into its self-driving car program, killed it, and now the silicon designed for on-vehicle AI is landing in the Mac Pro for local LLM work. That's the most useful thing to come out of that project. The post doesn't give a launch date or price. Knowing Apple's pricing, a 1.5TB config won't be cheap — likely more than a fully specced Mac Pro today. I'd wait for real-world inference speed and power numbers before getting excited.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

16:24

16d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH16:24 · 07·12

→Tencent Hunyuan releases Hy3: a 295B MoE model positioned as an agent-oriented LLM, already integrated into WeChat

Tencent Hunyuan's Hy3 is a 295B-total, 21B-active MoE model whose inference efficiency matches flagship models 2–5× its size. Positioned as an agent-oriented LLM, it was refined from preview to release using feedback from 50+ real business cases: internal WorkBuddy task success rose from 72% to 90% and latency dropped 34%. It excels at coding, office tasks, and complex planning; pure vision is a weak spot. Hy3 is already integrated into WeChat, serving over 1 billion users.

#Agent#Code#Reasoning#腾讯混元

why featured

Featured · importance 86 · hook + knowledge + resonance

editor take

Tencent Hunyuan Hy3 is a 295B-total, 21B-active MoE model already serving 1B+ WeChat users, with agent task success up from 72% to 90%.

sharp

The wild part here is deployment scale: Hy3 is already inside WeChat, serving over a billion users. It's a 295B-total, 21B-active MoE model, and Tencent claims inference efficiency matches flagship models 2–5× its size. That architecture choice makes sense if you're trying to run something this big inside a super-app without burning cash. They're positioning it as an agent-oriented LLM, refined from preview to release using feedback from 50+ real business cases. Internal WorkBuddy task success went from 72% to 90%, latency dropped 34%, and hallucinations plus common-sense errors kept declining. Coding, office tasks, and complex planning are the strong suits; pure vision is a weak spot. I only have the title and summary right now—no benchmark scores, pricing, or API details. If the numbers hold, the real story isn't model quality. It's that Tencent shoved a large MoE model into its own super-app workflow and it apparently runs. That's more interesting than any leaderboard position.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

86

SCORE

H1·K1·R1

15:17

16d ago

Hacker News Frontpage· rssEN15:17 · 07·12

→Don't You Mean Extinct?

Fabien Sanglard draws a parallel between Jurassic Park's stop-motion artist Phil Tippett being replaced by CGI and today's programmers fearing LLMs. His advice: evolve. Learn how LLMs work (Karpathy's videos, Raschka's book), use them to write code but don't abandon quality. He iterates PRs and maintains GEMINI.md / CLAUDE.md to teach agents his style. Warns that context switching causes mental burnout. Code review standards should rise—commit messages must follow strict rules.

#Fabien Sanglard#Phil Tippett#John Carmack

editor take

Fabien Sanglard compares Jurassic Park's stop-motion artist being replaced by CGI to programmers fearing LLMs—his advice: learn how they work, don't resist.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

55

SCORE

H1·K0·R1

15:09

16d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH15:09 · 07·12

→Satya Nadella warns of the 'inverse information paradox' when enterprises use AI

Microsoft CEO Satya Nadella argues that enterprises pay for AI while leaking proprietary know-how—prompts, tool use, correction feedback—as 'intellectual exhaust' that models absorb. He calls for a trust boundary where private evaluations, organizational memory, and the right to fine-tune own models with model outputs stay under enterprise control. The post does not disclose a timeline or product specifics.

#Satya Nadella#Microsoft

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Nadella calls enterprise prompts and corrections 'intellectual exhaust' and argues for a trust boundary—no product timeline disclosed.

sharp

The reason to click: Nadella puts a sharp label on something enterprises already feel. You pay for AI, but your prompts, tool-use patterns, and corrections become training signal for the model provider—he calls it 'intellectual exhaust.' His fix is a trust boundary where private evals, organizational memory, and the right to fine-tune your own models with model outputs stay on your side. The framing isn't new, but it lands differently coming from Microsoft's CEO, since Microsoft sits on both sides of the table. The post is a tweet with no product specifics or timeline, so I'd read it as Nadella setting the narrative on enterprise AI trust rather than a near-term ship date.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

14:32

16d ago

Hacker News Frontpage· rssEN14:32 · 07·12

→Autoresearch with Claude: 10 autonomous iterations on file compression, from 0 to 0.67 ratio

Elliot Smith ran an autoresearch experiment using Claude Code with Sonnet 4.6: an agent autonomously iterated on a Rust file compression project, aiming to reduce total compressed bytes to under 67% of original size while keeping bit-perfect round-trips and a 300-second per-file timeout. Over 10 iterations with zero human plan edits, the agent read benchmark results, modified code, and ran tests. The compression ratio dropped from 1.0 (no compression) to 0.67. The post doesn't disclose final comparisons against gzip/bzip2 or detail which compression strategies the agent used. I'd treat this as a proof-of-concept for letting a model iteratively optimize a non-ML metric, not a claim that it beats existing libraries.

#Agent#Code#Benchmarking#Elliot Smith

editor take

Claude Code iterated on a Rust compressor for 10 rounds with zero human edits, hitting 0.67 ratio—but the post doesn't compare against gzip or bzip2.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

72

SCORE

H1·K1·R0

14:16

16d ago

FEATUREDHacker News Frontpage· rssEN14:16 · 07·12

→Agent Harness Engineering: The Scaffolding Matters More Than the Model

Addy Osmani argues that a coding agent's effectiveness depends as much on its harness—prompts, tools, hooks, sandboxes, and feedback loops—as on the underlying model. Citing Viv Trivedy and others, he notes a decent model with a great harness beats a great model with a bad one. A key data point: Claude Opus 4.6 jumped from Top 30 to Top 5 on Terminal Bench 2.0 solely by switching to a custom harness. The core discipline is a ratchet: every agent mistake becomes a permanent rule or check in AGENTS.md or a pre-commit hook, so it never repeats.

#Addy Osmani#Viv Trivedy#Dex Horthy

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

A coding agent's effectiveness is half model, half the harness you build around it.

sharp

Addy Osmani nails something a lot of engineers have felt but couldn't name: the scaffolding around your model—prompts, tools, hooks, sandboxes, feedback loops—matters as much as the model itself. A decent model with a great harness beats a great model with a bad one. The clearest proof: Claude Opus 4.6 jumped from Top 30 to Top 5 on Terminal Bench 2.0 just by switching to a custom harness. The core discipline is a ratchet: every time the agent messes up, you add a permanent rule or check so it never repeats that mistake. That's way more practical than waiting for the next model version. I'd send this to anyone on my team still obsessing over which model to pick—the time spent on harness tuning probably has higher ROI.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

13:34

16d ago

Hacker News Frontpage· rssEN13:34 · 07·12

→Skillscript: A declarative, sandboxed language that turns fixed agent workflows into executable scripts

Scott built Skillscript, a small language for writing fixed agent workflows as readable, version-controlled scripts. He got tired of his NanoClaw agent re-reasoning the same morning routine every session—checking tickets, summarizing deploys—burning tokens and drifting. Skillscript defines named steps, variables, conditions, and tool calls in a text file, then hands it to a local model as a runtime to execute, not interpret. The language is deliberately minimal: no eval, no arbitrary imports, no subprocess, no unbounded loops. Everything it can do is in the file. It's pre-1.0 (0.30), MCP-native, self-hosted, and currently assumes Ollama for local models. Scott says first-run setup is clunky, the grammar is still shifting, and it's not production-ready, but he uses it daily. He's asking for feedback on language design and what trust mechanisms people would need before running agent-authored skills on their own machines.

#Scott (sshwarts)#Skillscript#NanoClaw

editor take

A tiny DSL that turns fixed agent workflows into readable scripts, with a local model as the runtime. Pre-1.0, author uses it daily but says it's not production-ready.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

72

SCORE

H1·K1·R0

13:26

16d ago

FEATUREDHacker News Frontpage· rssEN13:26 · 07·12

→AI boosts individual research careers but narrows the scope of scientific discovery

A new study of 68 million papers, patents, and products finds that AI tools raise individual productivity and citations, but shrink the overall scope of exploration across a field. Researchers cluster into directions where AI gives quick wins, reducing disciplinary diversity. The paper frames this as an exploration–exploitation trade-off: AI helps dig deeper on known paths, but fewer people chase the cold, potentially disruptive directions. The post doesn't offer specific policy fixes, just flags that the narrowing is a systemic incentive shift, not an individual skill gap.

#IEEE Spectrum

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

68M papers show AI lifts individual output and citations, but shrinks a field's overall exploration scope.

sharp

This one's worth opening because it puts hard data behind a gut feeling: AI helps scientists publish more and get cited more, but the trade-off is that entire fields start clustering around the same few directions. The study frames it as an exploration–exploitation problem—AI makes you faster at digging deeper on known paths, but fewer people go off to chase the weird, potentially disruptive stuff. I'd discount it a bit: the post doesn't break down how this varies by discipline or offer policy fixes, so it's more of a systemic warning than a playbook. If you're building AI-for-science tools, the question it leaves you with is whether your product is deepening the trench or widening the map.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

13:21

16d ago

FEATUREDHacker News Frontpage· rssEN13:21 · 07·12

→Only 1 of 4,356 reachable MCP servers passes the 2026-07-28 spec check

Roee-Tsur scanned public MCP servers and found the ecosystem is far from the new spec. Out of 8,100 registered endpoints, 4,356 were reachable. Only one passed the 2026-07-28 MCP specification compliance check. The post doesn't name the single passing server or detail which spec requirements caused the mass failures. Take the number with a grain of salt—the strictness of the check and sampling bias aren't spelled out—but the signal is clear: MCP server implementations are not keeping up with the protocol.

#Roee-Tsur#MCP

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Out of 4,356 reachable MCP servers, only one passed the 2026-07-28 spec compliance check.

sharp

The number is brutal: 8,100 registered endpoints, 4,356 reachable, one compliant. The author doesn't name the single passing server or detail which spec requirements caused the mass failures. I'd discount this a bit—we don't know how strict the check is or whether the sample skews toward abandoned endpoints. But the signal is loud: MCP server implementations are way behind the protocol. If you maintain an MCP server, you've got two weeks before the July 28 spec date, and this tool can at least tell you where you're falling short.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

12:00

16d ago

The Verge · AI· rssEN12:00 · 07·12

→Community pushback against AI data centers is forcing companies to rethink expansion plans

This Verge column maps a growing wave of local opposition to AI data center projects across the US. In Mount Carmel Township, Pennsylvania, residents are posting yard signs against a planned facility; in Loudoun County, Virginia, already-approved projects face lawsuits. Noise, power consumption, and strain on local grids are the core friction points. Some tech companies are now reassessing site plans. The piece doesn't name specific firms or project sizes, but the trend is clear: these conflicts will spread as AI compute demand keeps ballooning.

#The Verge#Emma Roth#Mount Carmel Township#Policy

editor take

The Verge maps growing US community backlash against AI data centers over noise and power strain.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

62

SCORE

H1·K0·R1

11:09

16d ago

FEATUREDHacker News Frontpage· rssEN11:09 · 07·12

→Terence Tao revives 1999 Java math applets and builds two new tools using AI coding agents

Terence Tao used an AI coding agent to port 24 Java math applets from 1999 to JavaScript in a few hours. Only one minor drag-handling bug was found; the agent also caught two bugs in the original code. He then built two new apps: a Minkowski spacetime diagram tool he abandoned in 1999, and a visualization for his Gilbreath conjecture paper. Full conversation transcripts with the agent are public. Tao considers AI-generated code acceptable for non-critical visual aids.

#Terence Tao#Allen Knutson

why featured

Featured · importance 72 · hook + knowledge + resonance

editor take

Terry Tao ported 24 Java math applets from 1999 to JavaScript in hours with an AI coding agent, finding only one minor drag bug.

sharp

This is worth reading because Terence Tao himself validated AI coding agents for non-critical work. He handed 24 Java math applets from 1999—complex analysis, linear algebra, honeycombs—to an AI agent for porting to JavaScript. Done in hours. Only one minor bug: a drag event misbehaving outside the main box. The agent also caught two bugs in the original code Tao never noticed. Then he did two new things: built a Minkowski spacetime diagram tool he'd abandoned in 1999 due to complexity, and created an interactive visualization for his just-published Gilbreath conjecture paper. Full conversation transcripts with the agent are public—you can see exactly how he guided the model step by step. Tao's framing is clear-eyed: these are secondary visual aids, not critical components of mathematical arguments, so the downside risk of AI-generated bugs is low. That's a far more grounded take than "AI will replace programmers." It looks more like a top mathematician offloading grunt work so he can focus on defining problems and verifying results.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

72

SCORE

H1·K1·R1

10:23

16d ago

FEATUREDHacker News Frontpage· rssEN10:23 · 07·12

→Big tech datacenters now emit a third of France's total carbon output

Microsoft, Amazon, and Google datacenters emitted 104 million tonnes of CO₂ in 2025—equal to 33% of France's national total. Microsoft accounted for nearly half, driven by AI infrastructure expansion. All three have walked back clean-energy pledges: Amazon and Google dropped 24/7 carbon-free targets, and Microsoft's carbon-offset contracts were found to overstate impact. These figures are self-reported, so real emissions are likely higher.

#Microsoft#Amazon#Google#Policy

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Microsoft, Amazon, and Google datacenters emitted 104M tonnes of CO₂ in 2025—33% of France's total—and these are self-reported numbers, so real emissions are likely higher.

sharp

The number is what makes this worth clicking: 104 million tonnes of CO₂, a third of France's entire national output. Microsoft alone accounts for nearly half, driven by its AI infrastructure buildout. The quieter story is that all three are walking back their clean-energy pledges—Amazon and Google dropped 24/7 carbon-free targets, and Microsoft's carbon offsets were found to overstate impact. These figures are self-reported, so treat them as a floor. The real number is almost certainly worse.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

09:15

16d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH09:15 · 07·12

→Altman now 'pretty sure' AI is net job-creating, Amodei also walks back job-killer claims

OpenAI CEO Sam Altman posted on X that he's 'pretty sure' AI has been net job-creating so far, a sharp pivot from his earlier 'potentially a little scary' warning. Anthropic CEO Dario Amodei also reframed automation as a productivity multiplier rather than a job killer. No studies yet show a significant AI impact on overall productivity or the labor market; the Yale Budget Lab found no AI-related job market shifts. The article notes some companies did cite AI for layoffs, but often as a shareholder-friendly excuse.

#OpenAI#Sam Altman#Anthropic

why featured

Featured · importance 72 · hook + resonance

editor take

Altman and Amodei both walked back AI job-killer claims, but the Yale Budget Lab still finds no AI footprint in labor data.

sharp

This one's worth opening because two people with the most inside visibility changed their tune at roughly the same time, and they didn't hedge much. Altman posted on X that he's "pretty sure" AI has been net job-creating so far, adding "this is not what I expected." Amodei reframed automation as a productivity multiplier, backing away from his earlier 20% unemployment prediction. The thing is, the studies cited in the article don't back the new optimism. The Yale Budget Lab found no AI-related labor market shifts. A multi-university study showed the job decline for programmers and copywriters started in early 2022, months before ChatGPT launched. The article also notes some companies did cite AI in layoff announcements, but often as a shareholder-friendly narrative. I'd discount this a bit. Right now it reads more like a positioning shift than a data-driven conclusion. If macro data keeps showing nothing, either the impact hasn't hit yet or it's being absorbed by other factors—neither equals "net job-creating" as a settled fact.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

72

SCORE

H1·K0·R1

05:51

16d ago

Hacker News Frontpage· rssEN05:51 · 07·12

→Mindwalk: Replay AI coding-agent sessions on 3D codebase map

Mindwalk is an open-source visualization tool that replays coding-agent sessions on a 3D map of your codebase. You can step through each action the agent took—edits, navigation, decisions—like watching a replay. The post doesn't specify which agent frameworks or IDEs are supported; only the GitHub repo with 21 stars is available.

#cosmtrek#Open source

editor take

Mindwalk is an open-source tool that replays coding-agent sessions on a 3D map of your codebase, so you can see exactly what the AI changed and how. Only 21 stars on GitHub so far and no real-world...

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

70

SCORE

H1·K1·R0

04:15

16d ago

Hacker News Frontpage· rssEN04:15 · 07·12

→Why Write Code in 2026

Software engineer Doug Turnbull argues that even when AI generates most code, writing code by hand still matters. It lets you feel system fragility directly instead of observing through English descriptions. He uses a 'software factory' metaphor: engineers build the assembly line for AI, but occasionally must tear it down and fix the engine. Writing code helps maintain attention, build ownership, and prevent AI from amplifying bad human decisions. The post doesn't specify which models or tools, but references 'reverse centaur' and 'slop hurts agents' concepts.

#Doug Turnbull

editor take

Writing code by hand lets you feel system fragility directly, not through English descriptions.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

62

SCORE

H1·K1·R1

01:40

16d ago

AI HOT (Curated Pool)· aihot-apiZH01:40 · 07·12

→Tibo shares how to swap Claude Code's backend to GPT-5.6 Sol via CLIProxyAPI in three steps

Tibo shares a method to swap Claude Code's backend model to GPT-5.6 Sol via CLIProxyAPI in three steps: install CLIProxyAPI, authenticate, and set an environment alias `claudex`. The alias configures sub-agent model, always-on Effort, max concurrent tool calls, etc. Theo adds that with a proxy already set up, it takes about two prompts. Tibo says the whole process takes ~5 minutes and can be reset if blocked. The post doesn't disclose GPT-5.6 Sol's specific capabilities or pricing.

#Claude Code#GPT-5.6 Sol#CLIProxyAPI

editor take

Swap Claude Code's backend to GPT-5.6 Sol in ~5 min via CLIProxyAPI. The post doesn't spell out GPT-5.6 Sol's capabilities or pricing.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

55

SCORE

H1·K1·R0

01:09

16d ago

● P1Hacker News Frontpage· rssEN01:09 · 07·12

→xAI's Grok Build CLI uploads entire codebase and environment secrets by default

A packet capture of Grok Build CLI (v0.2.93) shows it uploads the entire project repo to xAI's GCS bucket by default, including plaintext secrets in .env and full git history. Even with a prompt telling the model to reply 'OK' and read no files, the whole repo is still uploaded. On a 12 GB test repo, the storage upload hit 5.10 GiB—roughly 27,800× the model-turn channel data. Disabling 'Improve the model' does not stop the upload.

#Code#Agent#xAI#Grok

why featured

Featured · importance 100 · hook + knowledge + resonance

editor take

Wire capture confirms: Grok Build CLI uploads your entire repo, git history, and .env secrets to xAI's GCS bucket by default, and the opt-out toggle doesn't stop it.

sharp

This blew up on Reddit and HN simultaneously because it's not a rumor—someone actually captured the traffic. The analyst used a test repo with fake canary secrets, told Grok to just reply 'OK' and not read any files, and the CLI still bundled the entire repo and uploaded it, .env secrets and full git history included. The wild part: after toggling off 'Improve the model' in settings, the wire capture still showed trace_upload_enabled: true. All three sources point to the same gist, so the facts aren't in dispute. The one thing I'd caution: this proves transmission and storage, not that xAI trains on it—that's a policy question the packet capture can't answer. If you're using Grok Build right now, the immediate takeaway is don't keep unencrypted secrets in your project directory.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

100

SCORE

H1·K1·R1

00:00

16d ago

FEATUREDComputing Life · Share (鸭哥 research reports)· rssZH00:00 · 07·12

→Stronger Models, More Bloated Code: The Structural Blind Spot of AI Code Bloat

An arXiv paper found that stronger AI models produce more bloated code, with a 0.94 correlation between code volume and architectural flaws. GitClear's 2026 report shows duplicate code blocks grew 81% since 2023, while refactoring dropped from 21% to under 4%. A company called Slopfix charges $10,000/week to delete AI-generated bloat—one case went from 100K to 35K lines. The post argues manual cleanup services are transitional; SaaS tools like CodeRabbit and platforms like Cursor and Microsoft Copilot are absorbing that demand.

#Code#Slopfix#CodeRabbit#Greptile

why featured

Featured · importance 82 · hook + knowledge + resonance

editor take

Stronger models produce more bloated code (0.94 correlation); Slopfix charges $10K/week to delete AI bloat, but manual cleanup is a transitional business.

sharp

This piece connects two signals: an arXiv paper showing a 0.94 correlation between model capability and code bloat, and Slopfix—three engineers charging $10K/week to delete AI-generated code, one case going from 100K to 35K lines using Claude Code to remove what Claude Code wrote. The paper calls this the Volume-Quality Inverse Law: smarter models try harder to cover edge cases and end up stuffing in more redundant code. GitClear's 2026 data backs this up—duplicate code blocks up 81% since 2023, refactoring down from 21% to under 4%. I'd discount Slopfix as a long-term business though. The article itself maps out the absorption chain clearly: mid-layer SaaS tools (CodeRabbit's ARR jumped from ~$5M to $40M in a year) and top-layer platforms (Cursor acquired Graphite, Microsoft baked Copilot Reviews into subscriptions) are eating the manual cleanup layer. Same pattern as Kite dying after GitHub Copilot launched, or Jasper's ARR collapsing from $120M to $55M. The governance need is real and lasting, but standalone manual cleanup services are transitional. What's worth watching: SaaS players like CodeRabbit and Greptile with proprietary data and workflow integration, and how platforms like Cursor and Microsoft Copilot make governance a built-in feature.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

82

SCORE

H1·K1·R1

00:00

16d ago

FEATUREDComputing Life · Share (鸭哥 research reports)· rssZH00:00 · 07·12

→Codex Merges into ChatGPT: Why Agents Are Going Cross-Interface

OpenAI merged Codex into ChatGPT and dropped its standalone desktop app. The same Codex tech that wrote code now handles docs, spreadsheets, and web pages, with over 5 million weekly active users—1 million of them for non-dev work. Anthropic and Cursor are making similar moves: putting different execution modes into one app so agents run in the background while users start tasks from the web, check progress on a phone, and approve results. Two axes drive this: agents are getting better at completing tasks on their own, and systems are compressing long execution runs into quick-to-review summaries, diffs, and anomalies. Coding got there first because software engineering already had mature verification tools like tests, diffs, and PRs. Knowledge work lacked that compression layer until recently, which is why reviewing a 30-slide deck on a phone in two minutes is now becoming feasible. Always-on doesn't require the cloud—a local Mac mini paired with a phone client can achieve the same pattern, with different trade-offs in responsibility and data boundaries. IDEs and desktop apps aren't disappearing; they're becoming specialized execution views beneath a cross-device, always-available delegation service.

#OpenAI#Codex#ChatGPT

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Codex merging into ChatGPT isn't a product cut—it's decoupling execution from the desktop IDE so phones can approve long-running tasks.

sharp

This piece connects several recent product moves into one clear trend: execution environments and human interfaces are decoupling. OpenAI folded Codex into ChatGPT, Anthropic put Chat/Cowork/Code in one app, and Cursor 3.0 split out a task manager—all three are letting agents run in the background while users start tasks from the web, check progress on a phone, and approve with a tap. The article's framework hangs on two axes: can the agent finish the task on its own, and can the user verify the result cheaply. Coding got there first because software engineering already had mature verification tools—tests, diffs, PRs. Knowledge work (reports, decks) lacked that compression layer. You can't judge a 30-slide auto-generated deck in two minutes. The July 2026 shift is that systems are now adding source tracking, change highlighting, and outline pre-approval, shrinking review from "read 30 slides" to "scan a few anomaly markers." I'd discount this a bit: the article is AI-generated and doesn't cite independent verification for OpenAI's claimed 5M weekly active users or 1M non-dev users. But the framework itself is useful—it explains why IDEs aren't dying, just becoming specialized execution views beneath a cross-device delegation layer. Also worth noting: always-on doesn't require the cloud. A Mac mini paired with a phone client can achieve the same pattern, which matters for people who care about data boundaries.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

00:00

16d ago

FEATUREDComputing Life · Share (鸭哥 research reports)· rssZH00:00 · 07·12

→Cloudflare bakes agent format negotiation into new site deployment

On July 8, Cloudflare launched Drop, a drag-and-drop deploy tool that bakes Markdown for Agents into the new site claim flow. A site now defaults to treating AI agents as first-class readers alongside browsers. Combined with the x402 billing layer and Access identity layer, the edge node answers four questions at once: who you are, can you enter, do you pay, and what format to return. A test page dropped from 12,345 tokens to 725, cutting LLM token cost by 94%. The post also notes that free-tier users are locked out, mainstream agent frameworks don't yet send Accept: text/markdown by default, and conversion quality on dynamic SPAs remains unproven.

#Cloudflare#VoidZero#Vercel

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Cloudflare baked Markdown for Agents into its new site deploy flow, cutting LLM token cost by 94%, but free-tier users are locked out.

sharp

The reason this is worth a look: Cloudflare isn't just shipping a format converter. On July 8, they launched Drop, a drag-and-drop deploy tool that looks like a Vercel Drop clone, but the real move is in the site claim flow—there's now a one-click toggle for Markdown for Agents. From day one, a new site defaults to treating AI agents as first-class readers alongside browsers. Technically, the edge node converts HTML to clean Markdown in real time. One test page dropped from 12,345 tokens to 725, a 94% reduction in LLM cost. Combined with the x402 billing layer and Access identity layer, the edge answers four questions at once: who you are, can you enter, do you pay, and what format to return. No competitor currently offers this full stack—Vercel has deploy but no edge billing, AWS Bedrock has orchestration but no ultra-low-latency sandbox. I'd discount this on two fronts. One, free-tier users are locked out, so a huge chunk of personal sites and small projects can't use it. Two, mainstream agent frameworks don't yet send Accept: text/markdown by default, and conversion quality on dynamic SPAs is unproven. This feels like Cloudflare laying the tracks—whether trains actually run depends on the agent ecosystem catching up.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

2026-07-11 · Sat

22:54

16d ago

FEATUREDHacker News Frontpage· rssEN22:54 · 07·11

→Fixed three bugs that made Qwen3.5-122B a daily driver on Mac Studio

Running Qwen 3.5 122B on an M3 Ultra Mac Studio for long-context coding, the author found every follow-up took minutes before the first token. The model wasn't the problem—three cache bugs in the serving stack were: a per-turn message ID in the system prompt, a missing assistant reply in the conversation history, and stale checkpoint data poisoning the disk cache. After fixes, follow-up latency on a 130k-token context dropped from minutes to 11 seconds. The code is open-sourced as qMLX.

#Qwen 3.5 122B#M3 Ultra Mac Studio#qMLX

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Three cache bugs made a 122B model take minutes per turn; fixes dropped 130k-context latency to 11 seconds.

sharp

This post is worth opening because it names a specific, common pain point in local inference: the model is fine, but the serving stack's cache logic silently kills speed. The author ran Qwen 3.5 122B on an M3 Ultra Mac Studio for long-context coding, and every follow-up took minutes before the first token. Three bugs: a per-turn message ID in the system prompt broke prompt caching; a missing assistant reply in the history messed up context assembly; and stale checkpoint data poisoned the disk cache. After fixes, follow-up latency on a 130k-token context dropped to 11 seconds. The code is open-sourced as qMLX. I'd treat this as a local-inference debugging log, not a universal speedup recipe—it's tied to MLX and Apple Silicon—but the troubleshooting pattern is worth bookmarking.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

78

SCORE

H1·K1·R1

more

✕

feeds

hot events daily column all posts podcasts curated X monitor saved sources newsletter agent access

admin

usage system newsletter curation iterations users