posts · 2026-05-23

▸ 50 items · updated 3m ago

May 2026

MTWTFSS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2573 26105 27120 28142 29116 3064 3162

June 2026

MTWTFSS

1150 2157 3132 4117 5127 669 773 8141 9135 1084 1196 1288 1346 1434 1570 1682 1775 1886 1955 2027 2120 2274 2374 2468 2564 2640 2724 2837 2956 3083

July 2026

MTWTFSS

156 271 347 421 527 664 758 865 975 1050 1134 1228 1345 1484 1582 1683 1745 1818 1938 2051 2170 2265 2340 24 25 26 27 28293031

2026-05-23 · Sat

23:39

65d ago

Hacker News Frontpage· rssEN23:39 · 05·23

→ICE Awards $25M Iris-Scanning Contract to Bi2 Technologies

The title states that ICE awarded Bi2 Technologies a $25 million iris-scanning contract; the post does not disclose procurement scope, deployment sites, performance metrics, or contract timeline.

#Vision#ICE#Bi2 Technologies#Policy

editor take

ICE gave Bi2 a $25.1M no-bid award; 1,570 iris devices land by June, with no FedRAMP or outside audit.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

23:00

65d ago

r/LocalLLaMA· rssEN23:00 · 05·23

→Local Model Doing Accounting Tasks

A Reddit user uses Qwen 3.6 27B for monthly closes, bank reconciliations, payables, receivables, and managing a SQLite database. The user integrated Claude skills and Anthropic’s financial-services repo; the post does not disclose accuracy, workload size, or exact hardware configuration.

#Agent#Tools#Code#Qwen

editor take

Qwen 3.6 27B handles closes and bank recs; no accuracy disclosed, so treat it as an early local finance-agent specimen.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

22:48

65d ago

FEATUREDr/LocalLLaMA· rssEN22:48 · 05·23

→llama.cpp server has built-in native tools: exec_shell, edit_file, and more

llama.cpp server exposes an experimental --tools flag with 8 native tools, including file reads, grep search, shell execution, file edits, diffs, and datetime; the post says file operations are relative to the server launch directory and no command whitelist or strict sandbox is provided yet.

#Agent#Tools#Code#llama.cpp

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

llama.cpp adding 8 native tools is useful, but exec_shell without a whitelist or sandbox is a footgun near any real repo.

sharp

llama.cpp just made local agents much easier to boot, and the guardrails are behind the capability. The experimental `--tools` flag exposes 8 tools: `read_file`, `grep_search`, `exec_shell_command`, `write_file`, `edit_file`, `apply_diff`, and others. File operations run relative to the server launch directory, so a plain `.gguf` plus the llama.cpp binary now gets close to a tiny coding agent harness. The dangerous part is not tool calling; it is native shell and file mutation inside the inference server. The post says there is no command whitelist and no strict sandbox yet. Claude Code and OpenAI Codex at least force approvals, directory scoping, and visible diffs into the workflow. llama.cpp currently smells like agent runtime welded onto a model server. Great for a throwaway repo; reckless near anything with secrets.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

21:45

65d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH21:45 · 05·23

→StepAudio 2.5 Realtime Voice Released with Paralinguistic Awareness and Persona Interaction

StepFun released StepAudio 2.5 Realtime with Chinese and English real-time voice support, API-based custom personas, more than 10,000 native persona options, millions of composable traits, and 5 built-in preset personas.

#Audio#Agent#Alignment#StepFun

why featured

Featured · importance 72 · hook + knowledge + resonance

editor take

StepFun is selling voice personas before proving latency; 10,000 characters mean little without barge-in, pricing, or real-time evals.

sharp

StepAudio 2.5 Realtime is leaning into paralinguistic sensing and persona scale, which is the right battleground, but the claim stack is soft. The disclosed hooks are Chinese-English support, API-defined personas, 10,000-plus native personas, 5 presets, and RLHF for role consistency. Pricing, first-token audio latency, end-to-end latency, barge-in quality, concurrency limits, and eval protocol are not given. Voice models are no longer judged by “can it sound like someone.” OpenAI Realtime API and Gemini Live pushed the bar toward interruption handling, emotional tracking, and long-session stability. If StepFun’s 10,000 personas are just a catalog, developers get character inventory, not a dependable voice-agent substrate.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

21:30

65d ago

r/LocalLLaMA· rssEN21:30 · 05·23

→Top 10 Fastest Growing AI Repos This Week

Sam_Tech1 listed 10 fastest-growing AI repos this week, with codegraph adding 14.1K stars and openhuman adding 17.1K stars; the list centers on coding agents, personal AI, memory, browser automation, Claude Skills, and local-first development tooling.

#Agent#Code#Memory#Sam_Tech1

editor take

Reddit body is 403; only summary says openhuman gained 17.1K stars, so treat this as repo heat, not technical evidence.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

20:14

66d ago

r/LocalLLaMA· rssEN20:14 · 05·23

→Command A+ (218B MoE) Running on Apple Silicon — MLX Port, PR Open

A developer wrote an mlx-lm port for Cohere Command A+ 218B MoE, and a larger Apple Silicon test box ran BF16-to-Q8 generation at 22.9 tok/s with 241GB peak memory.

#Inference-opt#Tools#Cohere#Apple

editor take

Command A+ 218B hits 22.9 tok/s on MLX; the catch is 241GB peak memory, not your everyday Mac setup.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

19:51

66d ago

r/LocalLLaMA· rssEN19:51 · 05·23

→Embeddings for NVIDIA's Nemotron Personas

Feisty_Plant4567 published precomputed embeddings for NVIDIA Nemotron-Personas, using Qwen 0.6B on millions of synthetic personas with names, ages, jobs, and hobbies. The release covers Korea, Japan, France, and the USA, with a Hugging Face collection and a web demo for semantic search and K-nearest-neighbor grouping.

#Embedding#Agent#NVIDIA#Qwen

editor take

Title says Nemotron-Personas embeddings shipped; body is 403, with no dimensions, license, or retrieval evals disclosed.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

19:00

66d ago

AI HOT (Curated Pool)· aihot-apiZH19:00 · 05·23

→Replit Agent Integrates with Squidler for Automated AI Quality Assurance

Replit Agent integrated Squidler through Replit’s MCP library, creating a build-test-fix loop where users describe app features in natural language and Squidler tests deployed apps without test scripts.

#Agent#Tools#Code#Replit

editor take

Replit Agent now loops build-test-fix via Squidler; no coverage or false-positive data, so “no scripts” is still marketing.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

18:32

66d ago

r/LocalLLaMA· rssEN18:32 · 05·23

→Inference Provider Tiers by Cache-Hit Rates, Using OpenRouter Data

The Reddit post title says it ranks inference providers by cache-hit rates using OpenRouter data; the RSS body only includes an image link and does not disclose the sample size, provider list, or cache-hit percentages.

#Inference-opt#OpenRouter#Benchmark

editor take

Title ranks providers by OpenRouter cache-hit rates, but sample size is undisclosed; I don’t buy screenshot leaderboards.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

18:10

66d ago

r/LocalLLaMA· rssEN18:10 · 05·23

→Run Chrome’s tiny Gemma4 (aka Gemini Nano) directly on PC without GPU

A Reddit user released the Dobby Chrome extension to run Gemini Nano locally inside Google Chrome with 16GB RAM, disk space, and no GPU required; the post says Chrome sets 9,216 tokens per session and the author only estimates about 20 tokens per second without measured speed data.

#Inference-opt#Tools#Google#Chrome

editor take

Dobby runs Gemini Nano in Chrome with 16GB RAM and 9,216 tokens; Reddit is 403, so I don't buy the 20 tok/s estimate yet.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

17:39

66d ago

r/LocalLLaMA· rssEN17:39 · 05·23

→Hermes Agent issues with directory creation

A user ran Hermes Agent with Qwen3.5 9B to create one directory, but the agent reported mkdir success while the filesystem did not change, and the Hermes logs showed no warnings.

#Agent#Tools#Code#Hermes Agent

editor take

Qwen3.5 9B made Hermes Agent fake one mkdir success; body is 403, with permissions and sandbox details undisclosed.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

16:45

66d ago

r/LocalLLaMA· rssEN16:45 · 05·23

→30 llama-bench runs to tune Gemma 4 and Qwen3 on an MI60 for Frigate and HomeAssistant

A Reddit user ran 30 llama-bench tests on an MI60 32GB GPU for Gemma 4 26B Q4_1 and Qwen3 35B Q4_0, using a fixed 512-token prompt and 128 generated tokens, and reported under 1.2 seconds for HomeAssistant voice commands and under 18 seconds for Frigate footage summaries.

#Inference-opt#Benchmarking#Reddit#Gemma

editor take

Reddit title gives 30 llama-bench runs; body is 403, so don't generalize MI60 latency claims yet.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

16:06

66d ago

Hacker News Frontpage· rssEN16:06 · 05·23

→Show HN: I built a RAG and knowledge graph agent that runs locally

Claw-Coder runs a coding agent locally on a laptop with RAG, a knowledge graph, search, Docker execution, and a vision LLM; the post says the project is closed source during heavy testing and provides Homebrew commands for installation.

#Agent#RAG#Code#Claw-Coder

editor take

Claw-Coder offers brew install, closed source, no benchmarks; local RAG+KG sounds fine, but coding agents live on reproducible evals.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

16:04

66d ago

r/LocalLLaMA· rssEN16:04 · 05·23

→Any reason to run dense over MoE for RAGs?

A Reddit user tested RAG on a single RTX 3090 and says qwen3.6 35b APEX produced better answers at about 150 tok/s, compared with qwen3.6 27b MTP at 60 tok/s; the post does not disclose retrieval setup, prompts, quantization, or evaluation metrics.

#RAG#Inference-opt#Claude#Qwen

editor take

Single 3090 claim: Qwen3.6 35B APEX hits 150 tok/s. 403 body; no RAG setup, so don't crown MoE.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

15:38

66d ago

r/LocalLLaMA· rssEN15:38 · 05·23

→Needle 26M vs Qwen3-0.6B CPU Function-Calling Benchmark

Reddit user gvij tested Needle 26M and Qwen3-0.6B on 50 tool-calling queries using a 4-core CPU, and Needle reached 72.0% tool_match with 10.9s mean latency while Qwen3 reached 56.0% tool_match with 47.9s mean latency.

#Agent#Tools#Benchmarking#Needle

editor take

Needle 26M beats Qwen3-0.6B on 50 CPU tool calls; body is 403, so treat the numbers as unverified.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

15:38

66d ago

r/LocalLLaMA· rssEN15:38 · 05·23

→GPT 5.5 “secret sauce” is just caveman-mode thinking?

A Reddit user claims GPT-5.5 leaked its thinking trace during a normal conversation and links one Gist log; the post does not disclose a reproducible setup, model provenance, or token-efficiency measurements.

#Reasoning#Fine-tuning#OpenAI#GPT-5.5

editor take

Reddit 403 leaves title plus summary: one Gist is not GPT-5.5 evidence; this smells like prompt-injection crumbs.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

15:12

66d ago

● P1AI HOT (Curated Pool)· aihot-apiZH15:12 · 05·23

→Anthropic reportedly nears over $30B funding round, with valuation set to overtake OpenAI

Bloomberg reports that Anthropic is nearing a funding round of over $30 billion, expected to close as soon as next week, pushing its valuation above $900 billion, while the company projects second-quarter revenue of $10.9 billion and its first profitable quarter.

#Anthropic#OpenAI#Bloomberg#Funding

why featured

Featured · importance 88 · hook + knowledge + resonance

editor take

If Anthropic raises $30B at $900B, investors are buying Claude’s enterprise pull, not the safety-lab myth.

sharp

Anthropic’s reported round breaks the normal model-company frame: a $900B valuation against projected $50B ARR is about 18x forward revenue. That multiple is not insane; the slope is. The article says Q2 revenue will hit $10.9B, more than double the prior quarter, and ARR may jump from $4B last July to $50B by month-end. If those numbers hold, Claude Code, enterprise API usage, and cloud channels are turning Claude into a default workflow layer. I don’t buy the “passes OpenAI” headline as the useful read. OpenAI owns the consumer entry point and the stronger public brand; Anthropic owns the safer enterprise procurement story plus AWS and Google routes. The $30B matters because it becomes compute prepayments almost immediately. Investors and terms are not disclosed, and that gap matters more than the valuation flex.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:58

66d ago

FEATUREDSynced (机器之心) · WeChat· rssZH13:58 · 05·23

→FlashAR speeds up pretrained autoregressive image models by 22.9x using 0.05% data

Zhejiang University and the University of Adelaide introduced FlashAR, using 0.05% of the original training data to reduce Emu3.5-Image-34B 512×512 generation latency from 130.10 seconds to 5.68 seconds, while GenEval changed from 80.48 to 80.29.

#Inference-opt#Vision#Multimodal#Zhejiang University

why featured

Featured · importance 80 · hook + knowledge + resonance

editor take

FlashAR’s bite is 1024 serial steps collapsing to 63 with only 80k images; diffusion’s deployment moat just got thinner.

sharp

FlashAR hits the ugliest deployment flaw in AR image models: quality has arrived, latency has not. On Emu3.5-Image-34B at 512×512, it cuts generation from 130.10 seconds to 5.68 seconds, while GenEval moves from 80.48 to 80.29. The core trick is concrete: add a vertical prediction head and reduce 32×32-token decoding from 1024 serial steps to H+W-1, or 63 steps. I only half-buy the “near-lossless” claim. GenEval does not cover aesthetics, text rendering, or long-prompt consistency, and 80k adaptation images do not prove broad robustness. Still, BlockDiffusion reportedly falls to 73.83 under the same setting. FlashAR shows AR image generation does not need fresh pretraining to get real parallelism.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:58

66d ago

FEATUREDSynced (机器之心) · WeChat· rssZH13:58 · 05·23

→Bengio Paper Raises Recursive Reasoning Limits as Parallel Trajectories Beat Serial Reasoning

Yoshua Bengio’s team introduced GRAM, a generative recursive reasoning model that samples multiple latent trajectories; on Sudoku-Extreme, GRAM reached 97.0% accuracy with 16 recursive steps and 20 parallel samples, exceeding TRM’s 90.5% result at 320 serial recursive steps.

#Reasoning#Inference-opt#Benchmarking#Yoshua Bengio

why featured

Featured · importance 80 · hook + knowledge + resonance

editor take

GRAM’s punch is not 97.0% Sudoku; it turns reasoning scale from “think longer” into “race 20 latent bets in parallel.”

sharp

GRAM adds trained search width to recursive reasoning, not random noise. On Sudoku-Extreme, 16 recursive steps plus 20 parallel samples hit 97.0%, beating TRM’s 90.5% at 320 serial steps; that gap lands directly on the latency problem of long-token CoT reasoning. I would not stretch this into a general-agent claim yet. The wins sit on structured tasks: Sudoku, N-Queens, Graph Coloring, ARC-AGI, with majority voting or LPRM picking candidates. Still, it gives LeCun’s latent-space planning line a measurable engineering shape: under similar compute, sampling multiple latent trajectories beats forcing one chain to grind deeper.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:58

66d ago

Synced (机器之心) · WeChat· rssZH13:58 · 05·23

→How AppLovin Built a Hundred-Billion-Dollar Ad Business Without LLMs or Owned Traffic

AppLovin used Axon 2 to shift ad buying toward LTV prediction, with its stock rising 790% in 2024 and its market value approaching $250 billion in 2025.

#Embedding#Multimodal#Agent#AppLovin

editor take

AppLovin rose 790% in 2024; don’t mythologize Axon 2 as LLM magic—LTV prediction prints the cash.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:54

66d ago

r/LocalLLaMA· rssEN13:54 · 05·23

→Apex-Testing: Real-world, real-repo agentic coding benchmark update

Apex-Testing updated its Real-World Agentic Coding benchmark to 95% coverage, using 65-70 private GitHub repositories, 70 tasks, and 8 categories, with metrics for average cost, average time, category-weighted scoring, ELO leaderboard, and model comparison.

#Agent#Code#Benchmarking#Apex-Testing

editor take

Apex-Testing claims 65-70 private repos; the body is 403, so without tasks or reproducibility, I don't buy the 95%.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:45

66d ago

r/LocalLLaMA· rssEN13:45 · 05·23

→Llama.cpp vs LiteRT on a Custom Xiaomi 12 Pro 24/7 Server (V2 Redesign)

The author tested gemma-4-E4B on a custom Xiaomi 12 Pro server: Llama.cpp reached 30.6 prompt t/s and 5.7 generation t/s, while LiteRT generated slightly faster but maxed out the CPUs and drew more power.

#Inference-opt#Benchmarking#Xiaomi#Google

editor take

Title says Xiaomi 12 Pro runs gemma-4-E4B at 5.7 gen t/s via llama.cpp; Reddit 403 blocks LiteRT power checks.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:29

66d ago

r/LocalLLaMA· rssEN13:29 · 05·23

→I added native MTP to exo for Qwen3.6 MLX models; here are the exactness and speed results

A developer submitted a native MTP PR for exo; on an M5 Max 48GB laptop, 27B rose from 17.27 to 34.06 tok/s at K=2, while 35B-A3B rose from 85.14 to 98.59 tok/s at K=1.

#Inference-opt#exo#Qwen#Apple

editor take

exo native MTP hits 34.06 tok/s on 27B with M5 Max 48GB; body is 403, so exactness details remain unverified.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

13:00

66d ago

TechCrunch AI· rssEN13:00 · 05·23

→Elon Musk has given up on solar power (on Earth)

TechCrunch says xAI has gone all in on natural gas and SpaceX is focused on orbital data centers; the RSS snippet does not disclose project scale, costs, timelines, or Musk’s direct statements.

#Elon Musk#xAI#SpaceX#Commentary

editor take

TechCrunch only gives xAI gas and SpaceX orbital data centers; no scale, cost, or timeline, so don’t over-read Musk’s energy pivot.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

12:53

66d ago

r/LocalLLaMA· rssEN12:53 · 05·23

→Qwen3.6 35B-A3B MTP hits 249 t/s on a 24GB RTX 5090M

Qwen3.6-35B-A3B-MTP-GGUF reached 249.30 t/s on a 24GB RTX 5090M in 10 runs of 2,000 tokens, with 86.6% draft acceptance and n_max=3. The same image, args, and context gave 74.28 t/s for the 27B dense MTP variant, while 262K context used about 22.4GB VRAM with q4_0 KV cache.

#Inference-opt#Code#Benchmarking#Qwen

editor take

Qwen3.6-35B-A3B hits 249 t/s on 24GB 5090M; the win is MoE 3B activation plus 86.6% MTP acceptance.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

11:50

66d ago

Hacker News Frontpage· rssEN11:50 · 05·23

→Making Deep Learning Go Brrrr from First Principles

The title identifies a first-principles deep learning performance topic, while the RSS body only discloses 6 Hacker News points and 0 comments; the post does not disclose methods, benchmarks, or hardware conditions.

#Inference-opt#Commentary

editor take

Horace He splits perf into compute, memory, and overhead; better than hoarding 50 PyTorch folklore tricks.

HKR breakdown

hook ✓knowledge —resonance —

→ open source

SCORE

H1·K0·R0

11:40

66d ago

FEATUREDr/LocalLLaMA· rssEN11:40 · 05·23

→Qwen3.6 27B Model Inference Speed Benchmarked at 40GB VRAM and 100k Context

A Reddit user runs Qwen3.6 27B with 40GB of VRAM and reports 22-30 tok/s generation at a 100k context window, with prompt processing at 300-500 tok/s.

#Agent#Inference-opt#Multimodal#Qwen

why featured

Featured · importance 78 · hook + knowledge + resonance

editor take

Two LocalLLaMA posts and a 403 body are not enough to judge Qwen3.6 27B at 100k; don’t turn screenshot numbers into infra choices yet.

sharp

Two LocalLLaMA posts discuss Qwen3.6 27B speed and quality at a 100k context window, but the body is blocked by 403 and only titles are visible. The coverage is aligned because it is the same subreddit thread cluster, not independent validation. I’m skeptical of long-context speed claims from screenshots. At 100k, prefill behavior, KV-cache layout, quantization format, and batch size can swing tokens/sec hard. One title asks how to optimize speed and quality; the other asks someone to explain the results. That says the mechanism is not settled by the posters themselves. Compared with a reproducible vLLM run on Qwen2.5 32B or Llama 3.x long-context configs, this is useful smoke, not evidence for deployment choices.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

11:01

66d ago

Bloomberg Technology· rssEN11:01 · 05·23

→Nvidia CEO Urges Super Micro to Tighten Up on Compliance

Bloomberg's title says Nvidia's CEO urged Super Micro to tighten compliance, with a published time of 2026-05-23; the scraped body does not disclose the Taiwan crackdown details, specific compliance issues, or any response from Super Micro.

#Nvidia#Super Micro#Bloomberg#Policy

editor take

Bloomberg names Nvidia and Super Micro, but discloses no probe details; AI server compliance risk is now supply-chain risk.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

11:00

66d ago

FEATUREDThe Verge · AI· rssEN11:00 · 05·23

→Google’s New Anything-to-Anything AI Model Is Wild

The Verge tried Google’s new Gemini anything-to-anything model for a stuffed-deer deepfake video, but the RSS snippet discloses only one example and does not disclose model parameters, pricing, release timing, or safety controls.

#Multimodal#Vision#Google#Gemini

why featured

Featured · importance 73 · hook + resonance

editor take

One stuffed-deer demo, no pricing, parameters, or launch date; Google’s anything-to-anything pitch still smells more like capability theater than product proof.

sharp

Google’s uncomfortable win here is not video quality; it is how little friction The Verge needed for a stuffed-deer deepfake. The snippet gives one Buddy the deer example and withholds parameters, pricing, release timing, and safety controls, so treating this as a Gemini product victory is premature. I’m wary of the “anything-to-anything” framing. It sounds like a unified model story, but Google demos often hide a chain of tools behind one clean label. Veo, Sora, and Runway already showed the hard part is not making pixels move; it is identity consistency, edit control, and abuse cost. This snippet proves a narrower point: casual realistic video fabrication just got easier.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

10:01

66d ago

r/LocalLLaMA· rssEN10:01 · 05·23

→Have We Passed the Peak of Inflated Expectations?

Reddit user fairydreaming posted that LocalLLaMA participation has declined and referenced Google Trends; the post does not disclose specific trend values, time ranges, or measurement methods.

#Reddit#LocalLLaMA#Google#Commentary

editor take

The title claims LocalLLaMA peaked, but the body is just 403; no Google Trends values, no inflection proof.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

09:46

66d ago

AI HOT (Curated Pool)· aihot-apiZH09:46 · 05·23

→Doubling Down on Science to Win Industrial AI

Mistral AI signed a definitive agreement to acquire Emmi AI, adding more than 30 researchers and engineers with physics simulation and digital twin expertise to its industrial AI team.

#Robotics#Mistral AI#Emmi AI#Partnership

editor take

Mistral AI buys Emmi AI and adds 30+ staff; the page 404s, with price and deployments undisclosed.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

SCORE

H1·K1·R0

09:16

66d ago

r/LocalLLaMA· rssEN09:16 · 05·23

→DGX Spark agentic usage numbers

A Reddit user tested RedHatAI/Qwen3.6-35B-A3B-NVFP4 on DGX Spark with a 30k-token prompt and 5,000-token outputs, reporting about 51 TPS for one stream and 138.56 aggregate TPS across four concurrent requests.

#Agent#Tools#Inference-opt#RedHatAI

editor take

Title claims DGX Spark runs Qwen3.6-35B at 51 TPS; body is 403, so treat 138.56 TPS as community telemetry.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

08:51

66d ago

r/LocalLLaMA· rssEN08:51 · 05·23

→Best open-source and proprietary options for Indic language ASR

A Reddit user asks for Indic-language ASR options covering Hindi, South Indian languages, and code-mixed audio, with a preference for ready-to-use models over fine-tuning; the post mentions Sarvam Saaras v3 but does not disclose benchmark scores, pricing, or deployment constraints.

#Audio#Reddit#Sarvam#Saaras v3

editor take

Title only says Hindi, South Indian languages, code-mixed ASR; Reddit 403 hides benchmarks, pricing, deployment constraints.

HKR breakdown

hook —knowledge —resonance ✓

→ open source

SCORE

H0·K0·R1

08:00

66d ago

FEATUREDFinancial Times · Technology· rssEN08:00 · 05·23

→SpaceX, OpenAI and Anthropic plan initial public offerings

The title identifies IPOs for SpaceX, OpenAI, and Anthropic as a test of the AI boom, but the FT body is a subscription page and does not disclose valuations, timing, proceeds, or deal structures.

#SpaceX#OpenAI#Anthropic#Funding

why featured

Featured · importance 82 · hook + resonance

editor take

Three sources put SpaceX, OpenAI, and Anthropic in one IPO frame; public markets are finally being asked to underwrite GPU burn.

sharp

Three sources converge on the same frame: FT stresses giant IPOs and Wall Street trading heat, while yage-share packages it as three prospectus bets. The accessible FT body is paywalled, so valuation, timing, and proceeds are not disclosed. I read this less as a normal IPO window than as private AI marks being pushed onto public-market buyers. OpenAI and Anthropic have a specific problem: training and inference spend keep absorbing cash, and prospectuses force cleaner disclosure on revenue quality. Unlike Databricks or Stripe, these labs must explain GPU leases, cloud dependence, and gross-margin trajectory. Putting SpaceX in the same basket is convenient; it gives the package a Musk-era hard-asset halo while softening the anxiety around AI cash burn.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

07:45

66d ago

AI Chat-Group Daily (群聊日报)· atomZH07:45 · 05·23

→AI Chat Group Daily, 2026-05-22

The chat-group daily covers GPT-5 refuting Erdős’s unit distance conjecture, GLM-5.1 reaching 400 tokens/s, DeepSeek V4 Pro cutting API prices to one-quarter of the original rate, and antirez’s ds4 running the 284B DeepSeek V4 Flash locally on an M5 Max at 270 t/s prefill and 25 t/s decode under q2 quantization.

#Reasoning#Inference-opt#Tools#OpenAI

editor take

Four hard signals in one chat digest; GPT-5 math, GLM-5.1 speed, and DeepSeek pricing are dense but verification-heavy.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

07:44

66d ago

r/LocalLLaMA· rssEN07:44 · 05·23

→Gemma4 26B A4B Apex Quant Is Quite Good

A Reddit user tested mudler’s Gemma4 26B A4B Apex GGUF on an RX 9060 XT 16GB with llama.cpp Vulkan, reporting 38 tps at 90k context with no loop and no visible quality degradation.

#Inference-opt#Gemma#mudler#llama.cpp

editor take

Title claims Gemma4 26B A4B hits 90k context and 38 tps on 16GB VRAM; body is 403, so treat as folklore.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

07:15

66d ago

AI HOT (Curated Pool)· aihot-apiZH07:15 · 05·23

→Feishu-Claude Code Bridge Open-Source Project

feishu-claude-code-bridge connects Feishu with the local Claude Code CLI, converts Feishu messages into prompts for `claude -p`, streams outputs back into Feishu, and the post says Claude subscription plans will bill this mode separately from June 15, 2026.

#Agent#Code#Tools#Feishu

editor take

feishu-claude-code-bridge pipes Feishu into claude -p; separate billing after June 15 makes chat-to-CLI bridges hit cost first.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

05:23

66d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH05:23 · 05·23

→Microsoft Says AI Use Can Cost More Than Human Wages

Microsoft says AI use costs more than human wages in specific work scenarios, with its report comparing token- and agent-based usage costs against the cost of hiring people for the same tasks.

#Agent#Microsoft#Commentary

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

Microsoft says some AI workflows cost more than wages; that hits enterprise Copilot ROI, not model capability theater.

sharp

Microsoft’s report punctures the agent budget story: companies are not paying for one inference, but for tokens, tool calls, retries, monitoring, and human fallback. The snippet only says “specific work scenarios” cost more than human wages; it does not disclose task types, model prices, success rates, or wage assumptions. That makes the evidence thin, but the direction tracks with what teams hit in production. A failed multi-step agent run burns more than extra tokens; it creates rollback, review, and permission overhead. A lot of Copilot selling over the last year treated labor savings as linear. Microsoft putting wages on the other side of the ledger is a quiet admission that enterprise AI has moved from demo quality to unit economics.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

05:21

66d ago

r/LocalLLaMA· rssEN05:21 · 05·23

→Experimental “Preserve Thinking” Jinja Template for Gemma4 31B in llama.cpp

Reddit user ggonavyy posted one Gemma4 31B Jinja template for llama.cpp, saying Pi-coding-agent tests no longer showed thinking-tag open or close errors, but the post does not disclose benchmark results or reproduction details.

#Code#Agent#Tools#Google

editor take

ggonavyy posted one Gemma4 31B Jinja template with no benchmarks; I’d treat it as a llama.cpp tool-call bandage.

HKR breakdown

hook ✓knowledge ✓resonance —

→ open source

SCORE

H1·K1·R0

04:46

66d ago

Product Hunt · AI· rssEN04:46 · 05·23

→Goldfish: Press Option, it knows your work context

Goldfish is a Mac AI writing assistant that privately remembers your recent work. Press Option in any app to draft replies, summarize threads, rewrite sentences, or recall details without copy-pasting or re-explaining context. The post doesn't specify the underlying model, Windows support, or pricing.

#Goldfish#Ben Lang#Haylli Weintraub

editor take

Goldfish remembers your recent work on Mac and lets you press Option to reply anywhere without copy-pasting context.

HKR breakdown

hook ✓knowledge —resonance —

→ open source

SCORE

H1·K0·R0

04:21

66d ago

Latent Space· rssEN04:21 · 05·23

→[AINews] All Model Labs Are Now Agent Labs

Latent Space summarized AI News for May 4–5 after checking 12 subreddits and 544 Twitter accounts, arguing that OpenAI, AI21, DeepSeek and other model labs are moving product focus from standalone models to agents, harnesses, workflows, UI, memory and cost structure.

#Agent#Tools#Code#Latent Space

editor take

Latent Space checked 12 subreddits and 544 accounts; model labs are adding agent shells, and closed harnesses can choke API competition.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

03:47

66d ago

● P1QbitAI (量子位) · WeChat· rssZH03:47 · 05·23

→DeepSeek V4 cuts prices as CATL, JD.com and NetEase discuss investment; Liang Wenfeng targets AGI

DeepSeek-V4-Pro API will keep its promotional pricing from June 1, with cached input at RMB 0.025 per million tokens, while Bloomberg says DeepSeek is pursuing a RMB 70 billion round at a USD 45 billion pre-money valuation.

#Inference-opt#DeepSeek#CATL#Liang Wenfeng

why featured

Featured · importance 90 · hook + knowledge + resonance

editor take

DeepSeek’s RMB 0.025/M cached-token price is not generosity; it’s a funding-backed API price war with infrastructure bills attached.

sharp

DeepSeek’s sharpest move here is not the AGI line; it is locking V4-Pro cached input at RMB 0.025 per million tokens. Uncached input is RMB 3, output is RMB 6, all one-quarter of the prior list price. Put that beside the reported RMB 70B round and USD 45B pre-money valuation, and the pricing story turns into a capital and infrastructure story. CATL’s role makes more sense than JD or NetEase. DeepSeek is building data centers in Inner Mongolia and already had a nearly 12-hour outage. CATL just spent USD 942M for 38.1% of VNET, a major China data-center operator. Liang Wenfeng can say commercialization is secondary, but permanent low API pricing forces the market to follow. The contest moves to power, cooling, cache hit rates, and how cheaply each lab can finance compute.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

03:44

66d ago

Hacker News Frontpage· rssEN03:44 · 05·23

→Microsoft Reports AI Is More Expensive Than Paying Human Employees

The title says Microsoft reported AI costs more than paying human employees; the RSS body only lists the URL, 17 points, and 2 comments, and the post does not disclose the cost basis, employee roles, or token/agent mechanism.

#Agent#Microsoft#Commentary

editor take

Microsoft says AI costs exceed employees; RSS shows only 17 points and 2 comments, with no cost basis, so I don’t buy it yet.

HKR breakdown

hook ✓knowledge —resonance ✓

→ open source

SCORE

H1·K0·R1

03:27

66d ago

FEATUREDr/LocalLLaMA· rssEN03:27 · 05·23

→meituan-longcat/LongCat-Video-Avatar-1.5 on Hugging Face

Meituan LongCat released LongCat-Video-Avatar-1.5 on Hugging Face, supporting AT2V, ATI2V, and video continuation while replacing Wav2Vec2 with Whisper-Large and using DMD2 distillation to reduce inference to 8 NFE; the model weights are released under the MIT License.

#Multimodal#Audio#Vision#Meituan

why featured

Featured · importance 76 · hook + knowledge + resonance

editor take

Meituan LongCat cut avatar-video inference to 8 NFE and used MIT weights; this smells like workflow infrastructure, not a paper demo.

sharp

Meituan LongCat’s sharp move is the 8 NFE inference target, not the avatar-video label. The summary names AT2V, ATI2V, video continuation, Whisper-Large replacing Wav2Vec2, DMD2 distillation, and an MIT license; the Reddit body is blocked by 403, so sample quality, VRAM, max duration, and training data are not verifiable here. If 8 NFE preserves lip sync and identity consistency, LongCat-Video-Avatar-1.5 lands near the HeyGen, MuseTalk, and LivePortrait problem space. Meituan has obvious internal gravity for this: merchant support, local ads, and creator commerce all reward cheap controlled video humans. But without latency numbers, memory requirements, or failure cases, the open weights prove an engineering posture, not production readiness.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

03:16

66d ago

FEATUREDr/LocalLLaMA· rssEN03:16 · 05·23

→club-rdna16: Practical 16GB AMD/Radeon local LLM testing repo

club-rdna16 publishes a practical 16GB Radeon local LLM testing repo, with an RX 6900 XT running llama.cpp on ROCm/HIP and Qwen3.6 35B-A3B reaching a stable 131k context using q8 KV cache.

#Inference-opt#Benchmarking#Qwen#AMD

why featured

Featured · importance 73 · hook + knowledge + resonance

editor take

A 16GB Radeon hitting Qwen3.6 35B-A3B at 131k context is exactly the AMD local-inference data people lack, not another CUDA vanity chart.

sharp

club-rdna16 matters because it records where 16GB Radeon inference breaks, not because it proves a 35B model can boot. The first profile uses an RX 6900 XT with llama.cpp on ROCm/HIP, running Qwen3.6 35B-A3B with Unsloth UD-IQ3_XXS and q8 KV at a stable 131k context. MTP reaches 100k, but only with careful settings. That is the useful layer: KV cache type, prefill behavior, driver stack, and AMD power profile decide whether local inference survives real prompts. NVIDIA users have had this folk knowledge around CUDA for years. Radeon users still stitch it together from Reddit comments. If RX 6800 XT, 7800 XT, 7900 GRE, and similar 16GB cards submit the same template, this repo becomes a better engineering entry point than most ROCm sample pages.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

02:35

66d ago

AI HOT (Curated Pool)· aihot-apiZH02:35 · 05·23

→Kling AI Appears at Cannes to Discuss AI Film Production Workflows

Kling AI held an official session at Cannes Marché du Film, and the post says it has been used for four production types: animated features, Hollywood series, experimental shorts, and theatrical films.

#Multimodal#Vision#Kling AI#Marché du Film

editor take

Kling AI held one Cannes session; the post names 4 use cases, but gives no titles, shot counts, or costs.

HKR breakdown

hook —knowledge —resonance ✓

→ open source

SCORE

H0·K0·R1

01:39

66d ago

AI HOT (Curated Pool)· aihot-apiZH01:39 · 05·23

→Models.dev: An open-source database for AI model specs, pricing, and features

Models.dev released an open-source AI model database on GitHub covering specs, pricing, and features; the post reports 101 Hacker News points but does not disclose the number of covered models.

#Models.dev#GitHub#Hacker News#Open source

editor take

Models.dev open-sourced a specs/pricing/features database; HN shows 101 points, but covered model count is undisclosed.

HKR breakdown

hook —knowledge ✓resonance ✓

→ open source

SCORE

H0·K1·R1

01:10

66d ago

r/LocalLLaMA· rssEN01:10 · 05·23

→G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out, With KLD 0.0152

LLMFan46 released G4-MeroMero-26B-A4B-it-uncensored-heretic, a finetune of gemma-4-26B-A4B-it, with Safetensors and GGUF files on Hugging Face; the title reports KLD 0.0152 and 12/100 refusals, while the post says a benchmark is included.

#Fine-tuning#Benchmarking#LLMFan46#Hugging Face

editor take

LLMFan46 claims KLD 0.0152 and 12/100 refusals; Reddit 403 blocks the body, so safety and benchmark details stay unverifiable.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

00:41

66d ago

AI HOT (Curated Pool)· aihot-apiZH00:41 · 05·23

→Expanding Collaboration with Singapore for Safe AI Deployment at Scale

Google DeepMind expanded its collaboration with Singapore, with new projects covering three areas: scientific discovery, pandemic preparedness, and healthcare; the post does not disclose budget, timeline, model details, or deployment metrics.

#Safety#Google DeepMind#Singapore#Partnership

editor take

Google DeepMind names 3 Singapore tracks; budget, timeline, model details are undisclosed, so this reads like policy positioning.

HKR breakdown

hook —knowledge ✓resonance —

→ open source

SCORE

H0·K1·R0

00:05

66d ago

FEATUREDAI HOT (Curated Pool)· aihot-apiZH00:05 · 05·23

→AI Replaces Entry-Level Work: Tech Hit Hardest as 74% of CEOs Freeze or Cut Hiring

Oliver Wyman’s study says the tech sector faces the heaviest AI-related hiring shock, with 74% of CEOs freezing or cutting hiring and the share of companies planning entry-level role reductions rising from 17% to 43%.

#Oliver Wyman#Commentary

why featured

Featured · importance 74 · hook + knowledge + resonance

editor take

74% of CEOs freezing or cutting hiring is bad; killing entry-level roles is the bigger self-inflicted talent bug.

sharp

CEOs are using AI as cover for org slimming, and the fragile point is entry-level work. Oliver Wyman’s numbers are blunt: 74% of tech CEOs are freezing or cutting hiring, up from 67% a year earlier. The share planning to reduce entry-level roles jumped from 17% to 43%, while only 17% plan to add junior roles. Honestly, that reads like ripping out the training loop for junior engineers, analysts, and support staff. The timing is the tell. The same study says 67% of companies are still in planning or pilot mode for AI. Cutting junior roles before workflows are production-stable is a bet on automation that many firms have not earned. Microsoft and Google at least have internal Copilot and Gemini deployment muscle; average companies copying the headcount move will discover the middle layer does not refill itself.

HKR breakdown

hook ✓knowledge ✓resonance ✓

→ open source

SCORE

H1·K1·R1

posts · 2026-05-23

more

feeds

admin