ax@ax-radar:~/podcasts/latent-space $ ls -t podcasts/
45 srcsignal 72%cycle 04:32

podcasts

30 episodes · updated 3m ago
6 channels tracked
tierfeaturedallcurated only
Latent Space30 episodes
2026-06-05 · Fri
2026-06-04 · Thu
2026-06-03 · Wed
2026-06-02 · Tue
2026-06-01 · Mon
2026-05-28 · Thu
2026-05-27 · Wed
2026-05-22 · Fri
2026-05-21 · Thu
2026-05-20 · Wed
2026-05-19 · Tue
2026-05-18 · Mon
2026-05-14 · Thu
2026-05-12 · Tue
04:33
28d ago
● P1Latent Space· rssEN04:33 · 05·12
Thinking Machines' Native Interaction Models: TML-Interaction-Small 276B-A12B Advances Realtime Voice
Thinking Machines released TML-Interaction-Small, a 276B-parameter MoE model with 12B active parameters, and the post says it advances realtime voice through 200ms time-aligned microturns, encoder-free early fusion for audio and images under 200ms, and benchmark wins over GPT-Realtime-2 and Gemini 3.1-Flash.
#Multimodal#Audio#Agent#Thinking Machines
why featured
HKR-H/K/R all pass: TML-Interaction-Small gives architecture, active parameters, 200ms interaction, and named rivals. Benchmarks still need replication, but a real-time voice SOTA claim is same-day material.
editor take
Thinking Machines moved realtime voice inside the model loop: 276B MoE, 12B active, 200ms microturns. That hits harder than another chat leaderboard.
sharp
Thinking Machines is betting on the interaction clock, not a speech wrapper. TML-Interaction-Small is a 276B MoE with 12B active parameters, encoder-free early fusion for audio and images, and 200ms time-aligned microturns. That attacks the hand-coded turn logic sitting between VAD, ASR, LLM, and TTS stacks. I’d discount the official leaderboard for now: wins over GPT-Realtime-2 and Gemini 3.1-Flash on BigBench Audio, IFEval, and FD-bench lack reproducibility details in the snippet. The stronger signal is the new task shape: TimeSpeak, CueSpeak, RepCount-A, and ProactiveVideoQA test when to talk, when to stay silent, and when visual evidence becomes available. OpenAI’s 4o “Her” demo sold presence; Thinking Machines is trying to own timing.
HKR breakdown
hook knowledge resonance
open source
88
SCORE
H1·K1·R1
2026-05-09 · Sat
2026-05-05 · Tue
2026-05-01 · Fri
2026-04-30 · Thu
2026-04-27 · Mon
2026-04-25 · Sat
05:00
45d ago
● P1Latent Space· rssEN05:00 · 04·25
DeepSeek V4 Pro and Flash released, runnable on Huawei Ascend chips
DeepSeek released V4 Pro and V4 Flash, with 1.6T/49B active and 284B/13B active parameters. Both support 1M-token context, Base/Instruct variants, and an MIT license; the report claims 27% FLOPs and 10% KV cache versus V3.2 at 1M tokens. The key point is Huawei CANN compatibility, not just benchmarks, because it reduces CUDA dependence.
#Reasoning#Code#Inference-opt#DeepSeek
why featured
HKR-H/K/R all pass: a major DeepSeek release adds concrete specs, 1M context, MIT licensing, and Huawei Ascend support. This sits in the 85–94 must-write band, with hardware independence pushing it upward.
editor take
DeepSeek V4 pairs 1M context with Huawei CANN support; the shot is less at Kimi than at CUDA lock-in.
sharp
DeepSeek V4’s sharp edge is not matching the GPT 5.4 / Opus 4.6 class. It is binding long-context efficiency to a non-CUDA inference path. V4 Pro is 1.6T with 49B active; V4 Flash is 284B with 13B active. At 1M tokens, the report claims 27% of V3.2 FLOPs and 10% of its KV cache, with Base/Instruct releases under MIT. CANN support gives this release a hardware escape hatch. The article says Ascend supply is only one quarter of H100 supply, so calling it an NVIDIA replacement is hype. But open weights that run on Ascend cut a real CUDA tax for Chinese cloud and private deployments. Kimi K2.6 may still hold the open-model leaderboard narrative; DeepSeek is pushing a more useful engineering bet: less memory, longer context, portable hardware.
HKR breakdown
hook knowledge resonance
open source
92
SCORE
H1·K1·R1
2026-04-22 · Wed
2026-04-21 · Tue
00:19
49d ago
● P1Latent Space· rssEN00:19 · 04·21
Moonshot Kimi K2.6 open-weight model refresh aims to catch Opus 4.6
Moonshot released Kimi K2.6, a 1T-parameter MoE with 32B active and 256K context. The post cites 58.6 on SWE-Bench Pro, 4,000+ tool calls, 12+ hour runs, and 300 parallel sub-agents. The key signal is long-horizon agent execution, not only open-model scores.
#Agent#Code#Multimodal#Moonshot
why featured
HKR-H/K/R all pass: Kimi K2.6 has a strong race narrative, concrete model and agent metrics, and direct relevance to open-model builders. The domestic flagship release signal lifts it into P1.
editor take
Kimi K2.6 is an open-weight agent bet: 1T MoE, 256K context, 4,000+ tool calls. This is no leaderboard-only refresh.
sharp
Kimi K2.6 pushes open weights into long-horizon agent execution, not another polite benchmark chase. The concrete hook is strong: 1T-parameter MoE, 32B active, 384 experts, 256K context, 58.6 on SWE-Bench Pro, plus 4,000+ tool calls, 12+ hour runs, and 300 parallel sub-agents. That is the part practitioners should care about, because it tests persistence and coordination, not just prompt-time cleverness. I have doubts about the “catch up to Opus 4.6” framing, since the article says the extra pre/post-training amount was not disclosed. K2.5 already put Moonshot near the top of open Chinese labs in January; K2.6 looks less like a clean model-quality leap and more like a serious agent-runtime bet. Against DeepSeek V4 rumor cycles, Moonshot is shipping deployable artifacts.
HKR breakdown
hook knowledge resonance
open source
88
SCORE
H1·K1·R1
2026-04-20 · Mon
2026-04-18 · Sat
2026-04-16 · Thu
2026-04-07 · Tue
17:14
62d ago
● P1Latent Space· rssEN17:14 · 04·07
Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review
OpenAI Frontier says it built an internal beta over five months with a repo above 1M LOC, over 1B tokens per day, and 0% human-written or human-reviewed code before merge. The post says the team treated failures as missing capability, context, or structure, then used Symphony orchestration, specs, tests, observability, and sub-1-minute build loops to constrain Codex. The shift to watch is from humans reviewing code to humans designing the harness; the $2k-$3k/day cost is cited secondhand in the post.
#Agent#Code#Tools#OpenAI
why featured
HKR-H/K/R all pass: the headline is clickworthy, and the piece includes concrete workflow details plus scale numbers. It stays below p1 because this is an interview-style report, not an official launch, and key claims like 1B tokens/day and cost lack independent verification.
editor take
OpenAI Frontier moved review upstream into tests and orchestration. I buy that part; “0% human review” sounds more like process discipline than model reliability.
sharp
OpenAI Frontier says it built an internal beta in five months with a repo above 1M LOC and more than 1B tokens a day. That points to a shift I do buy: the bottleneck for coding agents is no longer “can the model write code,” but “can your system cage failure.” The solid part here is not the slogan about 0% human-written code or 0% pre-merge human review. It is the operating model: classify failures as missing capability, context, or structure, then constrain the agent with specs, tests, observability, and sub-minute build loops. That is a serious change in where engineering control sits. A lot of teams still use coding agents like fancy autocomplete with a longer memory. The 2025 wave of products, from Cursor’s background workflows to Devin-style autonomous task execution, already showed that agents can touch many files, open PRs, and run some checks. But the default safety model still assumed a human reviewer at the end. OpenAI is describing a different posture: move the control point upstream into the harness. In a million-line codebase, that is not cosmetic. Human review often catches local style and obvious logic bugs; it is weak at system-wide regressions. Tests, evaluators, rollout gates, and observability are much closer to the actual control plane. I still have some doubts about the “0% human review” framing. The article gives repo scale, token consumption, and the broad mechanism. It does not disclose defect rates, rollback frequency, incident counts, escaped bugs, or a speed comparison against a human-led team. Without those numbers, “0% review” is a management signal, not a reliability conclusion. A team can skip pre-merge review only if the acceptance surface is brutally explicit: strong tests, hard release gates, good isolation, fast rollback, and instrumentation that catches regressions early. If the harness has blind spots, the model just makes the wrong thing faster. I also don’t fully buy the cost discourse as presented. The $2k–$3k per day figure is cited secondhand in the post, not disclosed as an official bill. Even if that estimate is directionally right for 1B tokens/day, token spend is not the hard part for a frontier lab, and for some startups it still would not be the main constraint. The expensive piece is the discipline needed to maintain the harness: PRDs that read like executable contracts, one-minute build loops, evals that mean something, and a team habit of filing each failure under capability, context, or structure instead of shrugging that “the model was weird today.” Plenty of readers will take this as “burn more tokens.” I read the opposite. Without a test factory, more tokens just buy you more noise. There is also a broader product signal here that the article only hints at. OpenAI is using its own coding stack at a very high intensity. That is different from routine dogfooding. It suggests the product is moving away from the IDE-plugin frame and toward a constrained software factory. If Symphony-style multi-agent orchestration is reproducible, senior engineers will spend less time writing business logic and more time defining specs, tests, evaluators, and release policies. That is a real labor shift. We have seen pieces of this before in SWE-bench chasing, autonomous PR demos, and internal devtools teams building eval harnesses around codegen. OpenAI is packaging those fragments into an operating doctrine. My pushback is portability. This probably works inside OpenAI because several luxuries line up at once: tight coupling to their own models, deep tool integration, huge token budgets, and a direct path to feed failures back into the system. The article does not prove that an ordinary company can reproduce the same result with off-the-shelf agents on a messy legacy stack. A lot of autonomous coding demos over the last year broke at exactly that boundary: clean repo in the demo, ugly dependencies in production. So yes, this is important. But what it proves is narrower than the headline suggests. It shows that a very strong harness can hold a very strong agent. It does not yet show that most software teams can run a dark factory by copying the playbook.
HKR breakdown
hook knowledge resonance
open source
88
SCORE
H1·K1·R1
2026-04-03 · Fri

more

feeds

admin