ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-05-15

209 papers · updated 3m ago
2026-05-15 · Fri
17:42
24d ago
● P1arXiv · cs.CL· atomEN17:42 · 05·15
FORGE: Self-Evolving Agent Memory Without Weight Updates
FORGE improves hierarchical ReAct agents on the 30-step CybORG CAGE-2 B-line task across four LLM families, raising average evaluation return by 1.7-7.7x over zero-shot and 29-72% over Reflexion without weight updates.
#Agent#Memory#Reasoning#Gemini
why featured
HKR-H/K/R all pass: the paper offers a concrete no-weight-update memory mechanism and testable CAGE-2 gains across 4 LLM families. It stays below P1 because this is still an arXiv benchmark result, not a shipped product or broad field event.
editor take
FORGE’s population-broadcast memory looks useful, but the evidence lives inside CAGE-2 B-line; don’t sell it as general agent learning yet.
sharp
Two arXiv tracks, cs.CL and cs.LG, point to the same 2605.16233v1 paper with identical framing; that is taxonomy spread, not independent corroboration. Under CAGE-2, 30-step horizon, B-line attacker, FORGE reports 1.7-7.7x average return over zero-shot and 29-72% over Reflexion across four model families. I buy the engineering instinct here: failed trajectories become Rules or Examples, then the best instance’s memory gets broadcast to the population. That is a stronger agent-training scaffold than isolated Reflexion loops. But the authors also fence the claim tightly: all evidence is confined to CAGE-2 B-line. Compared with the Voyager/Reflexion lineage, FORGE’s clean win is no weight update; its unresolved risk is open-ended tasks, long-horizon drift, and memory contamination.
HKR breakdown
hook knowledge resonance
open source
90
SCORE
H1·K1·R1
17:34
24d ago
arXiv · cs.AI· atomEN17:34 · 05·15
Evaluating Design Video Generation: Metrics for Compositional Fidelity
The paper proposes a fully automated evaluation framework for design animation generation, covering four dimensions: layout fidelity, motion correctness, temporal quality, and content fidelity.
#Multimodal#Vision#Benchmarking#Research release
why featured
HKR-K passes via a concrete 4-axis evaluation framework, but HKR-H and HKR-R are weak: no surprising hook, no disclosed benchmark size, results, or artifact. This fits the 60–71 research-interest band.
editor take
The paper defines 4 automated metrics, but no dataset size is disclosed. Design-video generation needs rulers before victory laps.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
17:23
24d ago
arXiv · cs.CL· atomEN17:23 · 05·15
Cost-Performance Study of Compound LLM Agents in Adversarial POMDP
The paper evaluates compound LLM agents in CybORG CAGE-2 across five model families, six models, twelve configurations, and 3,475 episodes with token-level cost accounting. Programmatic state abstraction raises mean return by up to 76%, while distributed deliberation tools in hierarchies produce up to 3.4× worse mean return and use 1.8–2.7× more tokens.
#Agent#Reasoning#Tools#CybORG CAGE-2
why featured
HKR-K is strong, with concrete scale and effect sizes; HKR-H comes from the 3.4x return gap. The CybORG CAGE-2 setting is niche and academic, so it stays below featured.
editor take
CybORG CAGE-2 ran 3,475 episodes: state abstraction gained 76%, hierarchical deliberation lost 3.4×; agent stacks need plumbing, not more pondering.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
12:09
24d ago
HuggingFace Papers (takara mirror)· rssEN12:09 · 05·15
Linked Multimodal Data on Russian Domestic and Foreign Policy Speeches
The paper introduces a Russian government political communication dataset covering decades of speeches from Kremlin and Russian Ministry of Foreign Affairs actors, with Russian and English texts, available images, captions, linked identifiers, harmonized metadata, and expert-refined multimodal topic annotations.
#Multimodal#Vision#Benchmarking#Kremlin
why featured
HKR-K lands because the corpus combines Russian/English speeches, images, captions, and metadata. HKR-H and HKR-R miss: no product, model capability, or practitioner-facing industry mechanism is disclosed.
editor take
The dataset spans decades of Kremlin and MFA speeches; sample size is undisclosed, so don't call it a benchmark yet.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
05:09
25d ago
HuggingFace Papers (takara mirror)· rssEN05:09 · 05·15
LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs
LRCP estimates the dominant low-rank subspace of visual tokens with PCA and keeps tokens with high projection residuals, preserving 94.7% of original image-understanding performance after an 88.9% token reduction and 97.8% average video-understanding accuracy after an 87.5% token reduction.
#Multimodal#Vision#Inference-opt#LRCP
why featured
HKR-K/R pass: the paper offers a concrete PCA-based pruning mechanism and a cost/latency angle. HKR-H is weak, and a single technical paper without implementation or real latency data stays in the 60–71 band.
editor take
LRCP cuts 88.9% of visual tokens while keeping 94.7% image performance; I buy PCA residuals over attention-score pruning.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
● P1arXiv · cs.LG· atomEN04:00 · 05·15
Circuit Attribution Enables Machine Unlearning to Persist Through Quantization
The paper introduces MANSU, which combines circuit attribution, null-space projection, and a per-parameter magnitude floor to keep unlearning intact after 4-bit NF4 quantization; across baselines, per-parameter updates sit 47-828x below the quantization bin width, and gradient-based baselines recover up to +0.05 accuracy under compression.
#Alignment#Safety#Interpretability#MANSU
why featured
HKR-H/K/R all pass: the title has a counterintuitive hook, the summary gives MANSU’s mechanism and the 47-828x quantization-bin gap, and the safety risk is deployment-relevant. Single arXiv paper, so it stays in the 78-84 band.
editor take
Two arXiv tracks are not media heat; they are taxonomy spillover. Still, the 47–828x update gap nails a real audit hole in post-unlearning quantization.
sharp
cs.LG and cs.CL list the same arXiv v1, with identical framing; the signal is author-supplied, not independently corroborated. The strongest hook is concrete: baseline per-parameter updates sit 47–828x below the NF4 quantization bin width, and 4-bit PTQ recovers up to +0.05 accuracy after unlearning. I buy the problem framing. Full-precision unlearning evals are too clean for a deployment path that usually ends in 4-bit or NF4 inference. MANSU’s recipe—circuit attribution, null-space projection, diagonal-Fisher retain bound, and a magnitude floor—sounds more serious than another behavioral suppression loop. But the body surfaced here only names “multiple model families” and “hazard benchmarks,” without model names or tables. Treat this as a sharp mechanistic paper, not a compliance-ready unlearning recipe yet.
HKR breakdown
hook knowledge resonance
open source
90
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)
The authors post-train a multi-turn CodeAct agent with reinforcement learning for FHIR-AgentBench, raising answer correctness from 50% with o4-mini to 77% with the smaller Qwen3-8B under execution-grounded LLM-judge rewards and data-integrity constraints.
#Agent#Reasoning#Tools#Qwen
why featured
HKR-K and HKR-R pass: the benchmark delta is concrete, and small vertical tool agents are relevant to practitioners. The narrow FHIR scope and single arXiv source keep it below featured.
editor take
Qwen3-8B jumps FHIR-AgentBench from 50% to 77%; healthcare agents need trained traversal discipline, not another tool wrapper.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
GEAR: Self-Distillation Method for Granularity-Adaptive Advantage Reweighting in LLM Agents
GEAR reshapes trajectory-level GRPO advantages with self-distillation signals, and experiments on eight mathematical reasoning and agentic tool-use benchmarks using Qwen3 4B and 8B models report consistent gains over GRPO, self-distillation baselines, and token- or turn-level credit assignment, with improvements reaching about 20% over GRPO on harder long-horizon settings.
#Agent#Reasoning#Fine-tuning#Qwen
why featured
HKR-K and HKR-R pass through a concrete GEAR mechanism and +20% over GRPO on 8 benchmarks. HKR-H is weak, and the narrow RL-training scope keeps it below featured.
editor take
GEAR reports up to 20% over GRPO on 8 benchmarks; I buy the direction—long-horizon credit assignment gets a usable scalpel.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces
The paper defines minimal cores for overcomplete reasoning traces across six reasoning benchmarks, finding that 46% of steps are removable on average while preserving the original answer in 86% of cases, and the top three steps account for 65% of measured necessity mass.
#Reasoning#Interpretability#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv paper with abstract-level numbers only; no tool, code, or adoption evidence is disclosed, so it stays in the lower 60–71 band.
editor take
Six benchmarks drop 46% of CoT steps with 86% answer retention; long traces carry dead tokens.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
Langzhou He and nine coauthors propose ActFocus, a token-level energy-informed reweighting method for agentic reinforcement learning; across four environments and multiple model sizes, it beats PPO and GRPO by up to 65.2 and 63.7 percentage points at the final step without extra runtime or memory cost.
#Agent#Reasoning#Fine-tuning#Langzhou He
why featured
HKR-K is strong and HKR-H has a concrete method hook, but this is a single arXiv training paper with no disclosed code, reproduction setup, or adoption signal, so it stays in the 60–71 band.
editor take
ActFocus beats PPO by up to 65.2 points across 4 environments; I buy action-token bottlenecks, pending task complexity in the PDF.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction
HandITL blends human corrective intent with autonomous policy execution for bimanual dexterous manipulation, reducing takeover jitter by 99.8%, grasp failures by 87.5%, mean completion time by 19.1%, and producing policies that outperform standard teleoperation-trained policies by 19% on average across three long-horizon tasks.
#Robotics#Agent#Multimodal#HandITL
why featured
HKR-H/K/R all pass, but this is a single arXiv robotics paper. The post gives the mechanism and two metrics, not task scale, baselines, or code, so it stays at the high end of 60–71.
editor take
HandITL cuts takeover jitter 99.8%; strong result, but three long-horizon tasks is not general dexterous VLA yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
VER distills multiple vision foundation models into an expert library for robot learning, fine-tunes only a routing network with fewer than 0.4% of parameters for downstream tasks, and reports state-of-the-art results across 17 robotic tasks with multiple policy heads.
#Vision#Robotics#Fine-tuning#Research release
why featured
HKR-H/K/R all pass: the paper reports <0.4% tuning, 17-task SOTA, and dynamic routing. It stays below featured because it is a single arXiv research item without a named lab, artifact, or cross-source pickup.
editor take
VER tunes under 0.4% of parameters across 17 robot tasks; expert routing is practical, but SOTA needs real-robot replication.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
The paper proposes QAOD, a single-pass white-box hallucination detection framework that removes question-aligned directions from answer representations; on BioASQ out-of-distribution transfer, its orthogonal-only probe beats the best white-box baseline by up to 21% while using under 25% of generation cost.
#Safety#Interpretability#Benchmarking#QAOD
why featured
HKR-H/K/R all pass, but this is a single arXiv method paper limited to BioASQ OOD and white-box detection. Without a major lab release, tool artifact, or cross-source uptake, it stays in the 60–71 band.
editor take
QAOD beats white-box baselines by 21% on BioASQ OOD; hallucination probes are finally taking domain shift seriously.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Test-Time Learning with an Evolving Library
EvoLib lets large language models accumulate skills and reflective insights across test instances without parameter updates or external supervision, using a shared library plus weighting and consolidation to turn instance-specific abstractions into reusable knowledge over time.
#Reasoning#Code#Agent#EvoLib
why featured
HKR-H/K/R pass: the no-parameter test-time learning angle is clickable, and EvoLib adds a concrete shared-library mechanism. Score stays in 60–71 because the feed gives no benchmark results, code, or adoption signal.
editor take
EvoLib accumulates skills across tasks without parameter updates; no benchmark numbers disclosed, so I file it under memory engineering beating fine-tune iteration.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction
Collider-Bench evaluates LLM agents by asking them to reproduce LHC experimental analyses using only public papers and open scientific software, then scores predicted collision event yields with histogram metrics, per-task compute cost, and an LLM judge for qualitative failures; the paper reports that no agent reliably beats the physicist-in-the-loop solution on average.
#Agent#Code#Benchmarking#Collider-Bench
why featured
HKR-H/K/R all pass, but this is an arXiv domain benchmark with a high particle-physics barrier and weaker spread than general agent evals. No hard exclusion; it fits the 60-71 interesting band.
editor take
Collider-Bench makes agents reproduce LHC analyses and submit event yields; none reliably beats physicist-in-the-loop on average.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
The paper evaluates federated fine-tuning on the Sherpa.ai Federated Learning platform across four healthcare and finance datasets: MedQA, MedMCQA, FPB, and FiQA-SA. It compares LoRA, QLoRA, and IA3 under non-IID institutional settings, and reports performance close to centralized training, better results than isolated single-institution learning, and higher efficiency from QLoRA and IA3 with limited accuracy loss.
#Fine-tuning#Benchmarking#Sherpa.ai#Research release
why featured
HKR-K/R pass: the paper adds a healthcare/finance federated fine-tuning benchmark and method comparison, tied to private-data training pain. HKR-H is weak, and as a single arXiv item with no code or cross-source pickup, it stays below featured.
editor take
The paper tests federated tuning on 4 health/finance datasets; I don’t buy the “next frontier” label without node counts or privacy-attack evals.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Proxy Compression for Language Modeling
The paper introduces proxy compression, training one language model on raw byte sequences and externally compressed views while using only raw bytes at inference; code language modeling experiments show better fixed-compute efficiency than pure byte-level baselines, but the RSS snippet does not disclose exact improvement numbers.
#Inference-opt#Code#Research release#Open source
why featured
HKR-H and HKR-K pass: the train-time proxy versus raw-byte inference setup is concrete. HKR-R fails because no gain numbers, deployment target, or named lab push it beyond an interesting research item.
editor take
Proxy compression trains on bytes plus compressed views, then infers on bytes only; no gains disclosed, so don’t bury tokenizers yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Researchers introduce evolutionary multi-agent system for code solving
The paper introduces EvE, a decentralized co-evolving system for existing coding agents; it maintains two populations, code solvers and guidance states, and evaluates marginal gains through synchronous races with empirical Elo updates.
#Agent#Code#Reasoning#EvE
why featured
HKR-K and HKR-R pass: EvE has a concrete mechanism and targets coding-agent orchestration. The post lacks performance numbers, an open artifact, or production evidence, so it stays in the 60–71 band.
editor take
EvE scores agent marginal gains via synchronous races and Elo; ICON is neat, but benchmarks, code, and cost are undisclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Conformal Thinking: Risk Control for Reasoning on a Compute Budget
The paper frames reasoning token budget selection as a risk-control problem, using a target risk and validation set to set upper and parametric lower stopping thresholds, and reports compute-efficiency gains across multiple reasoning tasks while keeping error rates within the user-specified risk target.
#Reasoning#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: it reframes reasoning-token budgets as risk control, relevant to cost-sensitive teams. No concrete savings rate, task list, or code is disclosed, so it stays in the 60–71 band.
editor take
Conformal Thinking sets stopping thresholds from target risk plus validation data; I like the framing, but the abstract omits token savings.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
CurveBench introduces 756 images of non-intersecting Jordan curves and asks models to recover the full rooted containment tree from visual input; Gemini 3.1 Pro reaches 71.1% tree-generation accuracy on CurveBench-Easy and 19.1% on CurveBench-Hard.
#Vision#Reasoning#Benchmarking#Gemini
why featured
HKR-H/K/R all pass, but this is a niche arXiv benchmark rather than a major model or product release. Concrete scores justify the upper 60–71 band, not featured.
editor take
Gemini 3.1 Pro scores 19.1% on Hard; CurveBench is another clean reminder that VLM vision still fails exact topology.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning
MoMo uses a scalar user preference to modulate plan conservativeness at inference time without retraining; the paper reports results across six environments, where MoMo adjusts plan safety smoothly and improves temporal and preferential consistency over state-augmentation baselines.
#Reasoning#MoMo#Research release
why featured
HKR-H/K/R pass, but this is still an arXiv methods paper. The 6-environment result and no-retraining mechanism are useful; product impact or broad replication is not shown.
editor take
MoMo tunes plan conservativeness with one scalar across six environments. Nice no-retrain knob; failure rates stay undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation
RxEval evaluates LLM medication recommendation with 1,547 multiple-choice questions covering 584 patients, 18 diagnostic categories, and 969 unique medications; across 16 LLMs, F1 ranges from 45.18 to 77.10, and the best Exact Match reaches only 46.10%.
#Reasoning#Benchmarking#RxEval#Research release
why featured
HKR-K and HKR-R pass: the benchmark gives concrete numbers and targets high-risk medical use. HKR-H is weak, and a single arXiv benchmark without product impact stays in 60-71.
editor take
RxEval tests 16 models; best Exact Match is 46.10%. Medication copilots still fail on stated patient facts.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Selective Safety Steering via Value-Filtered Decoding
The paper proposes value-filtered decoding, a test-time steering method that filters tokens with a value-based safety criterion and uses one threshold hyperparameter to control an explicit bound on false-intervention probability.
#Safety#Alignment#Inference-opt#Research release
why featured
HKR-K/R pass: it offers a concrete inference-time safety decoding mechanism and speaks to over-refusal cost. HKR-H is weak, and the feed gives no experiment scale or model results, so this stays in the interesting research band.
editor take
Value-filtered decoding bounds false interventions with one threshold; I buy the target, since safety steering often mangles safe answers.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
ClawGym: A Scalable Framework for Building Effective Claw Agents
ClawGym provides a framework for Claw-style personal agent development, with 13.5K synthesized tasks, 200 benchmark instances, and ClawGym-Agents trained via supervised fine-tuning plus a lightweight reinforcement-learning rollout pipeline.
#Agent#Tools#Benchmarking#ClawGym
why featured
HKR-K passes with task counts, benchmark size, and training recipe; HKR-R passes because agent evaluation tooling is a live pain point. HKR-H is weak, and this is a single arXiv paper, so it stays in all.
editor take
ClawGym ships 13.5K tasks and a 200-case bench; I buy the data loop, not the “soon released” IOU.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
AIS: Adaptive Importance Sampling for Quantized RL
AIS adds three real-time diagnostics to GRPO to tune importance sampling per batch, and on LLaDA-8B-Instruct, Qwen3-8B, and Qwen3.5-9B it matches the BF16 baseline on most mathematical reasoning and planning tasks while retaining FP8 rollout speedups from 1.5x to 2.76x.
#Reasoning#Fine-tuning#Inference-opt#LLaDA
why featured
HKR-K and HKR-R pass: the paper gives 3 GRPO diagnostics and 1.5–2.76x FP8 rollout speedups, tied to post-training cost. HKR-H fails, and the method is too technical for featured.
editor take
AIS uses 3 diagnostics to tune GRPO weights; keeping 1.5-2.76x FP8 rollout speed makes this a stability patch, not mere quantization thrift.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning
The paper introduces conformal aggregation for Chain-of-Thought reasoning, replacing majority voting with weighted score aggregation and a conformal abstention rule, and reports finite-sample guarantees on confident-error rate across four benchmarks, four open-source models, and three score classes; on GSM8K, it reaches 90.1% selective accuracy while abstaining on under 5% of problems, versus 82% accuracy for majority voting.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the abstention mechanism and GSM8K figure are concrete. Impact remains an arXiv methods paper without major-model, cost, or deployment evidence, so it stays in the 60–71 band.
editor take
Conformal CoT hits 90.1% selective accuracy on GSM8K; abstaining under 5% for +8.1 points is an engineering trade I buy.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Boosting LLM Reasoning via Human-Inspired Reward Shaping
The paper introduces T2T, a reward-shaping framework that encourages broader search on incorrect attempts and applies length penalties after correctness; experiments across 5 mainstream LLMs on MATH-500, AIME, and AMC report better performance than standard GRPO and recent baselines.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-H/K pass: the mechanism is concrete and tested on 5 models across 3 math benchmarks. HKR-R is weak because gains, code, and training cost are not disclosed, keeping it in the normal research band.
editor take
T2T expands search on failures and penalizes length after correctness; 5 LLMs beat GRPO on 3 math sets, but gains aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation
V2M-ZERO trains a text-to-music model on intra-modal music event curves, swaps in video event curves at inference, and reports state-of-the-art results on OES-Pub, MovieGenBench-Music, and AIST++ without paired video-music data, including 21-52% better temporal synchronization and 28% higher beat alignment on dance videos.
#Multimodal#Audio#Fine-tuning#V2M-ZERO
why featured
HKR-H and HKR-K pass: zero-pair training plus 21-52% sync gains give a concrete mechanism and number. HKR-R is narrow, limited to music-generation research, so it stays below featured.
editor take
V2M-ZERO claims 21–52% better sync with zero paired training; clever shortcut, but benchmark bias can flatter event-curve methods.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems
The paper proposes population-aware coordination interfaces that condition learned primal and dual maps on compact population summaries, reducing forecast error by 16–19% and capacity violations by 20–51% versus population-unaware baselines in a supply-chain capacity-control case study, while 20K-agent cohorts coordinate 500K-agent populations and simulator-trained primal maps reach 11.1% MAPE on real observations.
#Agent#Robotics#Benchmarking#arXiv
why featured
HKR-K is strong: the paper gives a concrete coordination mechanism and supply-chain deltas. HKR-R passes for constraint failures in multi-agent deployment, but HKR-H is weak and single-source arXiv limits the score.
editor take
Population summaries let 20K agents coordinate 500K; I buy the direction—constrained MAS needs less policy flexing, more planner interfaces.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
InfoSFT: Learn More and Forget Less with Information-Aware Token Weighting
InfoSFT changes the SFT objective with medium-confidence token weighting and reports better generalization than vanilla SFT and likelihood-weighted baselines across math, code, and chain-of-thought tasks, while preserving prior capabilities; the abstract describes a one-line token-wise loss modification but does not disclose exact scores in the RSS snippet.
#Fine-tuning#Reasoning#Code#InfoSFT
why featured
HKR-K/R pass: InfoSFT offers a concrete SFT loss mechanism and claims gains on math, code, and CoT. No effect sizes, author authority, or reproducibility details are disclosed, so it stays in the interesting-research band.
editor take
InfoSFT changes one token-loss line; RSS gives no scores. I buy the direction, not the free-lunch framing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Diagnosing Training-Inference Mismatch in LLM Reinforcement Learning
The paper introduces VeXact to isolate training-inference mismatch in LLM reinforcement learning, where rollout generation and policy optimization assign different token probabilities under identical weights. The authors report that small token-level numerical disagreements can independently cause training collapse, alter the effective optimization problem, and require systems-level remedies rather than being treated as benign numerical noise.
#Alignment#Inference-opt#Research release
why featured
HKR-H/K/R all pass, but the item is an arXiv paper with abstract-level facts only; no code, scale, or external replication is disclosed. It stays in the upper all band, below featured.
editor take
VeXact reproduces token-probability drift under identical weights; stop blaming reward first, inference-stack numerics need acceptance tests.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
UQ4CT calibrates confidence in the functional space induced by prompt-dependent mixtures of LoRA experts, and the paper reports over 25% lower Expected Calibration Error across four multiple-choice benchmarks and two open-ended generative QA tasks while preserving high accuracy under distribution shift.
#Fine-tuning#Alignment#Benchmarking#Research release
why featured
HKR-K/R pass: the paper gives a prompt-dependent LoRA expert-mixture mechanism and a >25% ECE drop, touching fine-tuned LLM deployment risk. HKR-H is weak because the title is technical and lacks a product hook.
editor take
UQ4CT cuts ECE by over 25% on 6 tasks; useful for LoRA calibration, but the generalization bill stays unpaid.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA
GIFT combines GRPO-style group sampling, DPO-style implicit rewards, and UNA-style MSE to replace GRPO’s externally tuned beta with prompt-adaptive beta(x), and reports faster convergence than GRPO, DAPO, and GSPO on 7B-32B backbones.
#Fine-tuning#Reasoning#Alignment#GIFT
why featured
HKR-K and HKR-R pass: the mechanism and 7B-32B comparisons add signal, and GRPO alternatives matter to post-training teams. HKR-H fails because the title is jargon-heavy, so this stays below featured.
editor take
GIFT reports faster convergence on 7B-32B than GRPO, DAPO, and GSPO; endogenous beta(x) attacks a real RLVR tuning tax.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Adaptive Consensus in LLM Ensembles via Sequential Evidence Accumulation: Automatic Budget Identification and Calibrated Commit Signals
DASE uses adaptive stopping for iterative LLM ensembles, committing on consensus and falling back to global frequency under fragmented evidence; on GPQA-Extended with N=546 and a 70B ensemble, its commit-type partition produced an 81.1% right-wall accuracy versus 41.5% left-wall accuracy, a 39.5 percentage-point routing gap.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-K/R pass: the paper has a concrete DASE mechanism and GPQA-Extended numbers, plus relevance to ensemble inference budgets. HKR-H is weak, and the feed lacks code, cost savings, or reproduction details, so it stays in 60–71.
editor take
DASE shows a 39.5pp routing gap on GPQA-Extended; I buy adaptive stopping, not more deliberation by default.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
The paper introduces M²RNN, a nonlinear recurrent architecture with matrix-valued hidden states for language modeling; in a 7B MoE hybrid model, Hybrid M²RNN beats equivalent Gated DeltaNet hybrids by 0.4–0.5 perplexity points while using 3× smaller recurrent-layer states.
#Reasoning#Memory#Benchmarking#M²RNN
why featured
HKR-K is solid: the paper gives comparable perplexity and state-size claims. HKR-R is moderate for model-cost debates, but HKR-H is weak and this is a single arXiv architecture paper, so it stays below featured.
editor take
M²RNN cuts 0.4–0.5 PPL at 7B MoE with 3× smaller state; nonlinear RNNs just bit linear-attention hybrids.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Finding Interpretable Prompt-Specific Circuits in Language Models
The paper introduces ACC++, a circuit-tracing method that extracts attention-causal communication circuits from a single forward pass, without replacement models or patching. Across multiple models and a four-language IOI case study, ACC++ finds many low-dimensional signals with short natural-language descriptions, prompt-specific IOI circuit clusters, reused components across languages, and often language-specific signals.
#Interpretability#Reasoning#arXiv#Research release
why featured
HKR-H/K pass: ACC++ offers a concrete one-forward-pass circuit method and four-language IOI tests. HKR-R is weak, and this is a specialist research release rather than a product or lab-scale milestone.
editor take
ACC++ traces attention-causal circuits in one forward pass; the four-language IOI split between reused heads and language-specific signals is the hook.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
An Interpretable Latency Model for Speculative Decoding in LLM Serving
The paper proposes an interpretable latency model for speculative decoding in LLM serving. It infers effective batch size from request rate via Little’s Law, decomposes prefill, drafting, and verification demand, and validates the model with vLLM measurements across verifier and drafter sizes, sequence lengths, request rates, draft lengths, and acceptance probabilities.
#Inference-opt#Benchmarking#vLLM#Research release
why featured
HKR-K/R pass: the paper gives a testable latency mechanism for speculative decoding and flags degraded gains under load. HKR-H is weak, and the LLM-serving focus keeps it in the 60–71 band.
editor take
The paper uses Little’s Law to estimate batch size and shows vLLM load erodes speculative-decoding speedups; cleaner than offline speedup charts.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines
arXiv:2605.13981 presents an end-to-end energy accounting framework for LLM distillation pipelines, logging GPU power by stage and measuring two methods: classic logit-based knowledge distillation and synthetic-data supervised fine-tuning, with energy-quality Pareto frontiers and an open-source measurement harness.
#Fine-tuning#Benchmarking#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the accounting method and tool are useful for distillation work and touch GPU-cost pain. No quantified savings or broad deployment claim keeps it in the 60–71 research-signal band.
editor take
This paper accounts for full-pipeline GPU energy in logit distillation and synthetic-data SFT; good, teacher-side cost belongs in the bill.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Minimal-Intervention KV Retention: A Design-Space Study and a Diversity-Penalty Survivor
The paper tests seven KV-cache compression mechanisms on MATH-500 with Qwen-7B and Llama-8B DeepSeek-R1-Distill variants at budgets 64 and 128, rejects all seven, then reports that α with λ=0.5 passes Bonferroni in two of four model-budget cells without significant negative cells.
#Reasoning#Inference-opt#Benchmarking#Qwen
why featured
HKR-K is strong: models, budgets, MATH-500, and Bonferroni-tested negative results are concrete. HKR-R is moderate on inference cost, but HKR-H is weak and the arXiv paper is narrow, so it stays in 60-71.
editor take
Seven KV compressors failed on MATH-500 small budgets; α wins 2/4 cells, so trust the protocol before the method.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
OPT-ENGINE introduces a controllable-complexity benchmark covering 10 canonical operations research problems; its experiments show pure-text reasoning loses robustness as complexity increases, external tools fix local arithmetic only, and solver-integrated reasoning is mainly bottlenecked by automated constraint formulation.
#Reasoning#Tools#Benchmarking#OPT-ENGINE
why featured
HKR-H/K pass: the paper brings a new benchmark, 10 problem classes, and robustness findings under complexity scaling. The OR-modeling focus is niche and PTR/SIR are not unpacked, so it stays in 60–71.
editor take
OPT-ENGINE spans 10 OR tasks; I don’t buy pure CoT for optimization, constraint formulation is SIR’s wall.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models
MPU addresses dual non-disclosure constraints in LLM machine unlearning with perturbed model copies and update aggregation; experiments on seven unlearning algorithms show most algorithms keep average degradation below 1% under noise up to 10%.
#Fine-tuning#Safety#MPU#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and experiment numbers, tied to privacy deletion and model safety. HKR-H is weak, and this is still a single arXiv paper with no disclosed artifact or adoption.
editor take
MPU holds under 10% noise across seven unlearning algorithms; dual non-disclosure feels closer to deployment than another forgetting metric.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation
MALLVI uses a multi-agent LLM/VLM closed-loop framework for robotic manipulation, taking a natural-language instruction and an environment image to generate atomic robot actions, while a Reflector agent performs targeted error recovery by reactivating only relevant agents instead of triggering full replanning.
#Agent#Robotics#Vision#MALLVI
why featured
Single arXiv robotics-agent framework with a concrete mechanism, but no disclosed metrics, task suite, or reproducibility details. HKR-K/R pass, HKR-H is weak, so it stays in all.
editor take
MALLVI discloses the loop, not success-rate numbers; targeted agent restarts smell like a practical robotics patch.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning
The paper presents a dynamic abstention framework for LLM reasoning, terminating low-value chain-of-thought traces at each token position and using an abstention reward parameter to trade off compute against information.
#Reasoning#Inference-opt#Safety#Research release
why featured
HKR-H/K/R pass, but the body gives only the framework mechanism, with no experiment numbers, model scope, or artifact. As a single arXiv research item, it stays in the 60–71 band.
editor take
The paper gives token-level abstention, but no metrics in the snippet; CoT compute control needs value functions, not post-hoc confidence.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Researchers introduce higher-order linear attention mechanism reducing computational complexity
The paper introduces Higher-order Linear Attention, where the second-order case keeps a constant-size streaming state, computes each token in linear time, and avoids materializing any n×n attention matrix.
#Reasoning#Inference-opt#Research release
why featured
HKR-H and HKR-K pass: the mechanism is concrete and relevant to long-context inference efficiency. HKR-R is weak because no benchmark, code, or model-scale test is disclosed, so this stays in the 60–71 research band.
editor take
HLA claims constant-state second-order streaming per token; no benchmarks disclosed, so don’t confuse algebraic elegance with long-context wins.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines
The study mines 5,382,249 contiguous Gherkin slices from 339 repositories and 276 upstream owners, collapsing them into 692,020 recurring patterns; its XGBoost classifier reaches 0.891 out-of-fold F1 under 5-fold cross-validation, beating a tuned rule baseline at 0.836 and the better open-weight LLM judge at 0.728.
#Code#Benchmarking#Sentence-BERT#XGBoost
why featured
HKR-H/K/R pass via the classic-ML-beats-LLM angle and concrete F1 data. The BDD test-suite refactoring niche limits reach, so this stays in all rather than featured.
editor take
XGBoost hits 0.891 F1 on a 200-slice labeled pool; LLM Judge gets 0.728. Small labels wobble, but don't worship LLM judges for code hygiene.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
NeuroAtlas Benchmarks Foundation Models for Clinical EEG and Brain-Computer Interfaces
NeuroAtlas evaluates foundation models on 42 EEG datasets and 260k hours across epilepsy, sleep medicine, brain age estimation, and brain-computer interfaces; the paper reports that EEG-specific FMs do not consistently beat generic time-series FMs, standard ML metrics miss clinical utility, and current models still lack an out-of-the-box unified EEG capability.
#Benchmarking#NeuroAtlas#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the large EEG benchmark gives a counterintuitive result with concrete scale and comparisons. The clinical EEG/BCI focus narrows audience fit, so it stays below featured.
editor take
NeuroAtlas tests 42 datasets and 260k EEG hours; EEG-specific FMs still fail to reliably beat generic time-series FMs.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning
ScaLoRA accumulates high-rank updates from consecutive low-rank increments and analytically scales LoRA columns; tests on LLMs up to 12 billion parameters report consistent gains and faster convergence versus LoRA variants across NLU, commonsense reasoning, and math tasks.
#Fine-tuning#Inference-opt#Reasoning#ScaLoRA
why featured
HKR-K and HKR-R pass: ScaLoRA offers a testable fine-tuning mechanism and 12B-parameter evaluation context tied to LoRA efficiency. HKR-H is weak, and the summary lacks concrete benchmark numbers.
editor take
ScaLoRA tests up to 12B parameters; low-rank increments stack into high-rank updates, and LoRA tuning still has convergence debt.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents
LQM-ContextRoute models same-function tool-provider routing as a contextual bandit and improves F1 by 2.18 percentage points over SW-UCB on the main web-search load benchmark.
#Agent#Tools#RAG#arXiv
why featured
HKR-K and HKR-R pass: the mechanism and +2.18 F1 result are concrete, and the problem maps to production agents. HKR-H is weak, and a single arXiv paper stays below featured.
editor take
LQM-ContextRoute gains +2.18 F1 pp on web-search; I buy the setup—tool routing should price quality per service cycle.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
DMAP: A Distribution Map for Text
The paper presents DMAP, a method that maps text through a language model into unit-interval samples, and evaluates it in 3 case studies covering generation-parameter validation, machine-generated text detection, and forensic analysis of statistical fingerprints from synthetic-data post-training.
#Benchmarking#DMAP#Research release
why featured
HKR-K and HKR-R pass: DMAP offers a testable text-distribution mapping mechanism for detection and synthetic-data fingerprints. No effect sizes or released artifacts are disclosed, so it stays in the mid research band.
editor take
DMAP maps text into unit-interval samples and tests 3 cases; I buy the direction—perplexity is too blunt for text forensics.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Paper introduces dynamic latent routing method for improved low-data fine-tuning
The paper introduces Dynamic Latent Routing, a post-training method that jointly learns discrete latent codes, routing policies, and model parameters; in low-data fine-tuning across four datasets and six models, DLR matches or beats supervised fine-tuning with a mean gain of 6.6 percentage points.
#Fine-tuning#Reasoning#Tools#Research release
why featured
HKR-K is solid with 4 datasets, 6 models, and a +6.6-point gain; HKR-R fits low-data fine-tuning cost concerns. HKR-H is weak, and the single arXiv paper lacks code or production evidence, so it stays in all.
editor take
DLR beats SFT by 6.6 points across 4 datasets and 6 models; I’d wait for ablation replications before adopting it.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Synthetic Sociality: How Generative Models Privatize the Social Fabric
arXiv:2605.14090 proposes a Synthetic Sociality framework for analyzing how generative models automate “social doing” and either substitute for or mediate social relations; the abstract cites existing empirical research but does not disclose sample sizes or evaluation conditions.
#Alignment#Safety#arXiv#Silicon Valley
why featured
HKR-H/K/R pass, but this is an arXiv conceptual frame with no disclosed sample size or reproducible experiment. It belongs in the feed, below the 72 featured threshold.
editor take
arXiv 2605.14090 offers a theory, with no sample size disclosed; I’d test it against Replika-style attachment first.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
OMAC: A Holistic Optimization Framework for LLM-Based Multi-Agent Collaboration
OMAC defines five optimization dimensions for LLM-based multi-agent systems and uses two actors, the Semantic Initializer and the Contrastive Comparator, to optimize single dimensions and joint multi-dimension settings.
#Agent#Reasoning#Code#OMAC
why featured
HKR-K/R pass: the paper names concrete mechanisms and targets multi-agent collaboration reliability. HKR-H is weak, and the post gives no benchmark numbers, code release, or production impact, so it stays in the 60–71 research band.
editor take
OMAC names 5 MAS optimization dimensions, but the snippet gives no benchmark numbers; treat it as framework paper, not a new agent baseline.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference
Multi-Scale Dequant decomposes BF16 activations into low-precision components and removes INT8-to-BF16 weight dequantization from the GEMM path; its two-pass MXFP4 decomposition reaches 6.6 effective bits, and the paper’s latency and HBM models show up to 2.5x lower KV cache HBM traffic in attention.
#Inference-opt#arXiv#Ascend#Research release
why featured
HKR-K/R pass: the paper gives a testable mechanism and up to 2.5x lower HBM traffic tied to inference cost. The low-level quantization angle keeps it below featured.
editor take
MSD splits BF16 activations into low-precision parts, hitting 6.6 effective bits with two-pass MXFP4; Ascend-style dequant stalls get a serious attack.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks
The paper presents DUET, a global-to-local method that optimizes LLM fine-tuning data mixtures from multiple feedback rounds on an unseen evaluation task, combining influence functions for data selection with Bayesian optimization; the abstract reports regret analysis and experiments across language tasks, but does not disclose exact benchmark scores in the snippet.
#Fine-tuning#Benchmarking#Research release
why featured
HKR-K/R pass: DUET offers a concrete mechanism for fine-tuning data mixtures using unseen-task feedback, tied to cost and generalization. HKR-H is weak, and no experimental numbers are disclosed, so this stays in 60–71.
editor take
DUET tunes fine-tuning mixtures from feedback rounds; scores are undisclosed. I buy the setup: encrypted user tasks break offline data recipes.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Exemplar Partitioning for Mechanistic Interpretability
The paper introduces Exemplar Partitioning, an unsupervised activation-dictionary method using about 10^3 fewer tokens than comparable SAEs; on AxBench latent concept detection at Gemma-2-2B-it L20, EP reaches mean AUROC 0.881, 0.126 above the canonical GemmaScope SAE entry and 0.030 below SAE-A at about 10^3 less build compute.
#Interpretability#Benchmarking#Alignment#Gemma
why featured
HKR-H and HKR-K pass: 10^3 fewer tokens and 0.881 AUROC provide a concrete mechanism and result. HKR-R is weak, and the mechanistic-interpretability niche keeps it in the interesting band.
editor take
EP hits 0.881 AUROC with ~10^3 fewer tokens; if SAE-A only wins by 0.030, the compute bill looks ugly.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
SOP reports lower weight reconstruction error than an E4M3 FP8 8.0 bpw per-layer-POT baseline across six open model families, using an FP6 E2M3sUE4M4 6.5 bpw operating point with 1.5 bpw less storage.
#Inference-opt#Research release
why featured
HKR-K/R pass: the paper gives model coverage and bpw comparisons tied to inference cost. HKR-H fails because the angle is a specialist PTQ method with no product release or artifact, so it stays in the 60-71 band.
editor take
SOP beats FP8 reconstruction at 6.5 bpw across six model families; I want task scores before calling this deployable.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
TopoPrimer: The Missing Topological Context in Forecasting Models
TopoPrimer feeds the global topological structure of a series population into Chronos and TimesFM, improves forecasting accuracy across four public benchmarks, cuts ECL MSE by up to 7.3%, keeps peak seasonal degradation within 10%, and reduces cold-start MAE by 27% versus a topology-free baseline.
#Benchmarking#Fine-tuning#TopoPrimer#Chronos
why featured
HKR-H and HKR-K pass: the mechanism and metrics are concrete. It remains a single forecasting-model paper with weak broader resonance, so it stays in all rather than featured.
editor take
TopoPrimer adds topology priors to Chronos and TimesFM on 4 benchmarks; 7.3% MSE is modest, 27% cold-start MAE is the signal.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
On the Unreasonable Effectiveness of Last-layer Retraining
The paper tests why last-layer retraining improves worst-group accuracy, rejects the neural-collapse mitigation hypothesis, and attributes the gain to better group balance in the held-out set under LLR, CB-LLR, and AFR.
#Fine-tuning#Alignment#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv mechanism paper. The feed summary gives the LLR explanation, not model scale, datasets, or reproduction details, so it stays in all rather than featured.
editor take
LLR boosts worst-group accuracy via held-out group balance; stop using neural collapse as the catch-all robustness story.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs
Predict-then-Diffuse uses AdaRLP to estimate response length before D-LLM inference, then applies a small data-driven length increase to reduce truncation reruns; experiments on multiple datasets show lower FLOP than default D-LLM inference while preserving output quality.
#Inference-opt#Research release
why featured
HKR-K and HKR-R pass via a concrete inference mechanism and cost angle. HKR-H is weak, and the post lacks FLOP deltas, model sizes, and reproducible settings, so it stays in the 60–71 all band.
editor take
Predict-then-Diffuse predicts length then pads slightly; FLOP numbers are undisclosed, but D-LLM fixed-length tax deserves its own optimizer.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
EMA: Efficient Model Adaptation for Learning-based Systems
EMA reduces adaptation costs by 14.9-42.4% across eight learning-based systems and improves system performance, including network throughput, by 6.9-31.3%, using state transformers for warm-start adaptation and utility-prioritized labeling to balance training and labeling costs.
#Fine-tuning#Inference-opt#Research release
why featured
HKR-K/R pass: the paper gives concrete cost and throughput numbers. HKR-H fails, and as a single arXiv systems paper without release, major-lab backing, or adoption signal, it stays in the 60-71 band.
editor take
EMA cuts adaptation cost 14.9-42.4% across 8 systems; for systems ML, this beats another generic fine-tuning trick.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning
GPart maps a d-dimensional trainable vector directly into the full model weight space with one isometric partition matrix, stores only d+1 values including the vector and a random seed, and reports superior or comparable results against existing PEFT methods on natural language understanding, computer vision, and mathematical reasoning tasks.
#Fine-tuning#Vision#Reasoning#Research release
why featured
HKR-K and HKR-R pass: the paper gives a d+1 storage mechanism and tests across NLU, vision, and math reasoning. HKR-H is weak, and the technical PEFT framing keeps it in the 60–71 band.
editor take
GPart stores only d+1 values for PEFT; I don’t buy “removing the low-rank bottleneck” without disclosed baselines and model scale.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
SpeakerLLM Audio Language Model for Speaker Understanding and Verification Reasoning
SpeakerLLM uses a hierarchical speaker tokenizer to handle four tasks: single-utterance profiling, recording-condition understanding, utterance-pair comparison, and verification reasoning, while the authors state that SpeakerLLM-Base improves profile and condition understanding over general audio-LLMs and plan to release the metadata-enriched supervision dataset plus target-construction code.
#Audio#Reasoning#SpeakerLLM#Research release
why featured
HKR-H/K/R pass, but this is a vertical arXiv audio paper. The post gives a mechanism and planned release, not benchmark numbers or production adoption, so it stays in the 60–71 band.
editor take
SpeakerLLM unifies 4 speaker tasks. The sharp bit is forcing verification evidence, not another opaque similarity score.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Silent Neuron Theory and Plasticity Preservation for Deep Reinforcement Learning in Adaptive Video Streaming
The paper proposes ReSiN, which resets silent neurons using forward and backward propagation states, and reports up to 168% higher bitrate and 108% better QoE in an adaptive video streaming system while maintaining comparable smoothness.
#Reasoning#Alignment#arXiv#ReSiN
why featured
HKR-H/K pass: ReSiN links silent-neuron plasticity to streaming QoE, with +168% bitrate and +108% QoE. HKR-R is weak because the DRL streaming setting sits far from mainstream AI tooling, so it stays in 60–71.
editor take
ReSiN claims 168% higher bitrate; I don't buy the generalization story without disclosed baselines or network traces.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
ReMIA: A Powerful and Efficient Alternative to Membership Inference Attacks against Synthetic Data Generators
ReMIA evaluates privacy risk for tabular synthetic data generators with 2 SDG training runs and auxiliary data no larger than the original training set, while experiments across multiple datasets and SDGs report sensitivity comparable to state-of-the-art membership inference attacks.
#Safety#Benchmarking#Aindo#Research release
why featured
HKR-K/R pass: the 2-run privacy test gives a concrete, testable mechanism and touches synthetic-data compliance. HKR-H is weak, and a single arXiv paper with a technical privacy angle stays in the 60–71 band.
editor take
ReMIA needs 2 SDG training runs and nears shadow-MIA sensitivity; tabular synthetic-data privacy testing gets less ceremonial.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
The paper proposes TraFL, a trajectory-balance objective for diffusion language models that anchors a reward-tilted target distribution to a frozen reference model; across math reasoning and code generation benchmarks, TraFL is the only evaluated post-training method that improves over the base model in every benchmark-length setting.
#Reasoning#Code#Fine-tuning#TraFL
why featured
HKR-K passes: TraFL offers a new post-training objective, constraint mechanism, and math/code benchmark claim. HKR-H and HKR-R are weak, so this fits the all tier rather than featured.
editor take
TraFL beats the base model across all math/code length settings; I care more whether trajectory locking reproduces independently.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection
The study groups more than 20 time-series anomaly detection metrics into six problem-oriented dimensions and compares score distributions under genuine, random, and oracle detection scenarios; NAB and Point-Adjust show limited resistance to random-score inflation, while most event-level metrics retain stronger separability.
#Benchmarking#Research release#Benchmark
why featured
HKR-K is strong, and HKR-H comes from the random-detector score inflation finding. The scope is methodology-heavy and limited to time-series anomaly detection, so it stays in the 60–71 band.
editor take
This taxonomy sorts 20+ TSAD metrics into six dimensions; NAB and Point-Adjust inflating random detectors should embarrass old leaderboards.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Embedding Perturbation May Better Reflect Intermediate-Step Uncertainty in LLM Reasoning
The paper proposes measuring LLM intermediate-step uncertainty through sensitivity to perturbations on preceding token embeddings, and reports stronger uncertainty quantification performance than probability-based, sampling-based, and Bayesian baselines; the RSS abstract does not disclose datasets, model names, or numeric scores.
#Reasoning#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete uncertainty metric and speaks to reasoning reliability. It remains a single arXiv methods paper without disclosed adoption or strong practical result, so it stays in 60–71.
editor take
Embedding perturbation flags shaky reasoning steps; scores are undisclosed, but this smells better than token-prob confidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
MUON+: Towards More Effective Muon via One Additional Normalization Step for LLM Pre-training
MUON+ inserts one normalization step after polar orthogonalization without adding optimizer state; the paper reports lower training and validation perplexity than Muon across GPT and LLaMA pre-training runs from 60M to 7B parameters and token-to-parameter ratios up to about 200.
#Fine-tuning#Inference-opt#Benchmarking#Muon
why featured
HKR-K is clear: mechanism, scale, and perplexity comparison are disclosed; HKR-R is limited to pretraining teams. This is optimizer research, not a model or product launch, so it fits the 60-71 signal band.
editor take
MUON+ adds one post-polar normalization; 60M–7B pretraining beats Muon, so I’d test it on our small stack first.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks
The paper introduces iAmTime, a time-series foundation model trained with instruction-conditioned amortized meta-learning. It uses structured prompts, semantic tokens, a Hierarchical Multi-Scope Transformer Encoder, and a Task-Conditioned Patch Decoder across six task types, including forecasting, imputation, classification, anomaly detection, and source de-mixing.
#Reasoning#Benchmarking#iAmTime#arXiv
why featured
HKR-K/R pass: the post gives a concrete mechanism and 6 task categories, with relevance to unified time-series modeling. It stays in 60–71 because no benchmark gains, code artifact, or major-lab signal are disclosed.
editor take
iAmTime spans six time-series tasks; RSS gives no benchmark numbers, so don’t crown instruction-conditioned ICL as time-series GPT yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Do-Undo Bench: Reversibility for Action Understanding in Image Generation
Do-Undo Bench introduces an image-generation benchmark that requires models to simulate a real-world action and reverse it to the original state, using reversible actions from real scenarios; the arXiv snippet says current models struggle with reversibility but does not disclose benchmark size or scores.
#Multimodal#Vision#Reasoning#Research release
why featured
HKR-H/K pass: Do-Undo offers a fresh reversible-action test for causal understanding in image generation. HKR-R is weak, and sample size or major model results are not disclosed, so this stays in the normal research band.
editor take
Do-Undo Bench tests do-then-undo generation, but gives no size or scores; I buy the setup, not the causality claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Intelligence Impact Quotient (IIQ): A Framework for Measuring Organizational AI Impact
The IIQ paper proposes a 0-1000 index for measuring organizational AI integration, combining novelty-weighted time-decayed token stock, usage frequency, a recency gate, organizational leverage, task complexity, and autonomy; it frames IIQ as a deployment metric, not a direct model-capability score or causal productivity estimate.
#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: IIQ proposes a 0-1000 organizational AI-impact index with five inputs. As a single arXiv framework, it lacks disclosed validation, enterprise samples, or adoption, so it stays in all.
editor take
IIQ compresses organizational AI adoption into 0–1000; I don’t buy it without disclosed token and autonomy weights.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Polaris: A Gödel Agent Framework for Small Language Models through Experience-Abstracted Policy Repair
Polaris applies experience-abstracted policy repair to a 7B model on MGSM, DROP, GPQA, and LitBench, using auditable policy patches rather than response-level correction or parameter tuning; the abstract reports consistent gains over the base policy and competitive baselines, but the post does not disclose the exact improvement numbers.
#Agent#Reasoning#Code#Polaris
why featured
HKR-K/R pass: 7B agents, experience-abstracted policy repair, and four benchmarks add signal, and small-model agents hit cost concerns. No concrete gains are disclosed, so this stays in the 60–71 research-release band.
editor take
Polaris only discloses a 7B run on four benchmarks; no gain numbers, but auditable policy patches sound less hand-wavy than self-correction.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models
Kairos addresses temporal heterogeneity in time-series forecasting with dynamic patching, mixture-of-size encoding, and dynamic RoPE, and reports stronger zero-shot results with fewer parameters on two benchmarks, GIFT-Eval and Time-Series-Library.
#Reasoning#Benchmarking#Kairos#GIFT-Eval
why featured
HKR-K is clear and HKR-R is modest: the paper offers mechanisms and benchmarks, but it is a single arXiv time-series model item with no production replacement claim, so it stays in the 60–71 band.
editor take
Kairos uses dynamic patching on GIFT-Eval and TSL; parameter counts are undisclosed, so I buy the mechanism before the win.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods
The paper evaluates fine-tuned RoBERTa, Binoculars, text feature analysis, and Random Forest ensembles under paraphrasing attacks, finding that Binoculars-inclusive ensembles achieve the strongest results but suffer the largest performance losses during attacks.
#Safety#Benchmarking#RoBERTa#Binoculars
why featured
HKR-K and HKR-R pass: the paper gives method-level comparisons under paraphrasing attacks and touches detector trust. It remains a routine arXiv benchmark, not a major model, product, or industry-moving release.
editor take
The paper tests RoBERTa, Binoculars, feature methods, and RF ensembles; Binoculars wins clean and bleeds hardest under paraphrasing.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
The paper proposes neighbor distance minimization to learn non-basis-aligned subspaces without supervision, and tests the link between learned subspaces and circuit variables on known GPT-2 circuits and a 2B model.
#Interpretability#GPT-2#Research release
why featured
HKR-K passes: NDM and GPT-2/2B validation are concrete. HKR-H and HKR-R are weak, and the mechanistic-interpretability topic has a high specialty bar, so it fits the 60–71 interesting band.
editor take
NDM finds subspaces in GPT-2 circuits and a 2B model; I buy the direction, but the abstract gives no scores.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
SEDGE: Structural Extrapolated Data Generation
The paper proposes SEDGE for structural extrapolated data generation, gives reliability and approximate identifiability conditions under conservative assumptions, and tests two algorithmic routes—structure-informed optimization and diffusion posterior sampling—on synthetic data and extrapolated image generation.
#Multimodal#Inference-opt#arXiv#SEDGE
why featured
HKR-K/R pass: SEDGE states reliability conditions for new-spec data and tests two paths. HKR-H misses because the title is technical; no result numbers, code, benchmark, or lab signal keeps it in 60-71.
editor take
SEDGE formalizes extrapolated generation under conservative assumptions; don’t hype generalization without image scale or failure cases disclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning
Realiz3D trains diffusion models with a domain covariate and small residual adapters to separate control signals from real or synthetic visual domains, targeting the domain gap created when image generators are fine-tuned on rendered 3D assets, and the paper evaluates it on text-to-multiview generation and texturing from 3D inputs.
#Vision#Multimodal#Realiz3D#Research release
why featured
HKR-K lands: the summary gives a concrete domain-covariate and residual-adapter mechanism. HKR-H/R miss because the post lacks metrics, datasets, open source status, or production-replacement proof.
editor take
Realiz3D adds a domain covariate and small residual adapters; I buy the target, since photoreal 3D has bled on render-domain bias for years.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Nexus: An Agentic Framework for Time Series Forecasting
Nexus decomposes time-series forecasting into multi-agent stages for macro fluctuations, micro fluctuations, and available contextual signals. The paper evaluates data after LLM knowledge cutoffs, spanning Zillow real estate metrics and volatile equities, and reports that Nexus matches or outperforms state-of-the-art TSFMs and strong LLM baselines.
#Agent#Reasoning#Tools#Nexus
why featured
HKR-K passes because the paper offers a testable mechanism and evaluation setup. HKR-H/R are weak: the title is academic, and the impact stays inside forecasting rather than the broader AI workflow.
editor take
Nexus splits forecasting into 3 agent roles; I don’t buy “beyond sequence modeling” until cutoff data and ablations hold up.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets
Croissant Baker generates validated Croissant metadata from local dataset directories through a modular handler registry, and the paper evaluates it on more than 140 datasets, scaling to MIMIC-IV with 886 million rows and 374 Parquet files while reporting 97–100% agreement against producer-authored or standards-derived ground truth.
#Tools#Croissant Baker#NeurIPS#MIMIC-IV
why featured
HKR-K is solid: 140+ datasets and MIMIC-IV at 886M rows across 374 Parquet files give scale. The topic is data-governance infrastructure, with weak HKR-H and a narrower audience, so it stays in the 60–71 all band.
editor take
Croissant Baker ran on 140+ datasets; local metadata beats upload-first workflows, but 97–100% agreement needs field-level error detail.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes
TILBench evaluates more than 40 imbalanced-learning algorithms across 57 tabular datasets and runs over 200,000 controlled experiments; the study finds that no single method consistently dominates, with performance depending on dataset characteristics and computational constraints.
#Benchmarking#TILBench#arXiv#Research release
why featured
HKR-K is solid: the paper adds scale and a testable “no single method wins” claim; HKR-R applies to tabular ML practitioners. The topic is a conventional ML benchmark, not a model/product industry event, so it stays in the 60–71 band.
editor take
TILBench runs 40+ algorithms on 57 tables with 200k experiments; stop defaulting to SMOTE and profile data plus compute first.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Bridging the Rural Healthcare Gap: A Cascaded Edge-Cloud Architecture for Automated Retinal Screening
The paper evaluates a two-tier edge-cloud retinal screening cascade on 733 APTOS 2019 test images, using MobileNetV3-small for local referable-DR triage and sending 49.52% of images to cloud-based RETFoundDINOv2, reducing cloud calls by 50.48% versus a cloud-only pipeline.
#Vision#Inference-opt#APTOS#MobileNetV3-small
why featured
HKR-K and HKR-R pass: MobileNetV3-small filters on edge, RETFoundDINOv2 verifies in cloud, with clear routing numbers. The medical-imaging scope is narrow, so it stays in the 60-71 band.
editor take
The cascade cuts cloud calls 50.48% on 733 APTOS images, losing 0.0017 Kappa; the rural-care pitch hides threshold risk.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Detecting Overfitting in Neural Networks During Long-Horizon Grokking Using Random Matrix Theory
The paper proposes an overfitting detector that needs no train or test data: it randomizes each layer’s weight matrix, fits the empirical spectrum with a Marchenko-Pastur distribution, and uses Correlation Traps to mark the anti-grokking phase where train accuracy stays high while test accuracy falls.
#Interpretability#Safety#Benchmarking#Research release
why featured
HKR-H/K pass: detecting overfitting without data is a real hook, and the RMT/MP/Correlation Traps mechanism is testable. HKR-R is weak; grokking plus random matrix theory keeps this narrow, so it stays in all.
editor take
The method flags overfitting without train/test data via spectral outliers; unnamed LLM evidence makes the broad claim weak.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
A Hormone-Inspired Emotion Layer for Transformer Language Models (HELT)
The paper introduces HormoneT5, which adds six continuous hormone-like values to a Transformer via specialized attention heads, and reports over 85% per-hormone accuracy within a 0.15 tolerance threshold on its curated emotion-labeled dataset.
#Alignment#Agent#HormoneT5#T5
why featured
HKR-H and HKR-K pass: the mechanism and metric are concrete, and the title has novelty. It remains a single arXiv paper with no disclosed open-source artifact, replication setup, or production-replacement claim, so HKR-R is weak and the item stays in 60–71.
editor take
HormoneT5 adds 6 continuous hormone values; 85% accuracy is on a curated emotion set, so the endocrine framing smells decorative.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
Gonzalez and five coauthors introduce tensor similarity, a weight-based metric for tensor models that is invariant to weight-space symmetries and computed with a recursive algorithm; the 22-page paper with 8 figures says it tracks functional training dynamics such as grokking and backdoor insertion better than existing metrics.
#Interpretability#Benchmarking#ML Nissen Gonzalez#Logan Riggs Smith
why featured
HKR-H and HKR-K pass: the title has a real hook, and the paper gives a new metric plus recursive mechanism. Its math-heavy interpretability focus keeps it in the 60–71 band, not featured.
editor take
Gonzalez et al. turn network similarity into recursive algebra; tensor-model scope keeps this far from real LLM verification.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Research proposes output alignment method for 1-bit post-training quantization of large language models
The paper proposes a PTQ method for 1-bit LLMs that targets two identified failure modes: error accumulation across layers and anisotropic distortion in representation space, and its experiments report consistent gains over existing 1-bit PTQ methods while keeping calibration-based post-training quantization computationally efficient.
#Inference-opt#Research release
why featured
LLM quantization matters for inference cost, so HKR-K/R pass via the stated PTQ mechanisms and cost nerve. HKR-H is weak, and the post gives no speed, memory, model-scale, or open-source details.
editor take
This 1-bit PTQ paper targets layer error and anisotropic distortion; no model sizes or scores in the snippet, so don’t buy “consistent gains” yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh uses a VAE to separate a conditional base mesh, relative motion trajectories, and a rectification jump offset, then trains on Video-RDMesh with over 500k dynamic mesh sequences to address pose mismatch between an input mesh and the first frame of a reference video.
#Multimodal#Vision#R-DMesh#Video-RDMesh
why featured
HKR-H/K pass: video-guided 3D mesh animation is a clear hook, and K comes from 500k+ sequences plus the three-part VAE decomposition. HKR-R is weak; no code, product path, or production metric is disclosed, so it stays in the 60–71 band.
editor take
R-DMesh trains on 500k dynamic meshes for pose mismatch; I buy the problem, not the abstract’s “solves” claim.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Causal Foundation Models with Continuous Treatments
The paper introduces a causal foundation model for continuous treatments. It trains a transformer on a synthetic causal corpus to reconstruct individual treatment-response curves from observational data, without extra training or fine-tuning on unseen tasks.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via the continuous-treatment, observational-data, no-finetuning mechanism. HKR-H/R are weak, and the causal-inference arXiv framing is specialized, so this stays in 60–71.
editor take
The paper trains a transformer on synthetic causal data for zero-finetune dose-response curves; benchmarks are undisclosed, so “first” needs receipts.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
arXiv:2605.14075 proposes measuring LLM layer relevance by the accuracy drop after removing a layer, and reports that cosine similarity often has weak or moderate correlation with actual performance degradation across tested LLMs.
#Interpretability#Benchmarking#Inference-opt#Research release
why featured
HKR-K passes: the paper offers a testable layer-removal accuracy-drop metric and challenges cosine similarity as a proxy. HKR-H/R are weak, and the arXiv summary alone keeps it in all, below featured.
editor take
arXiv 2605.14075 ranks layers by accuracy drop after deletion; I buy the direction, but models and tasks aren't disclosed here.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
PRAETORIAN: GNN Backdoor Defense Using Trigger Internal and External Characteristics
PRAETORIAN reduces average GNN backdoor attack success rate to 0.55% with a 0.62% clean-accuracy drop; under the same conditions, state-of-the-art defenses still leave average ASR above 20% and clean-accuracy loss above 3%.
#Safety#Benchmarking#PRAETORIAN#arXiv
why featured
HKR-K is strong: 0.55% ASR, 0.62% clean-accuracy loss, and SOTA >20% give a testable comparison. HKR-H is narrow, and HKR-R is weak because GNN backdoor defense lacks product or frontier-model impact.
editor take
PRAETORIAN cuts GNN backdoor ASR to 0.55%; I buy the mechanism forcing attackers into >80% ASR with >10% CA loss.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey
Kunil Lee and coauthors evaluate six vector-merging variants for multilingual knowledge editing across two backbone LLMs, two editing methods, and 12 languages on MzsRE. Vector summation with shared covariance is the most reliable overall strategy, simple summation performs poorly, and TSVM improves some settings but shows limited mitigation of multilingual interference.
#Fine-tuning#Benchmarking#Kunil Lee#Ki-Young Shin
why featured
HKR-K passes: the paper gives a concrete multilingual knowledge-editing test matrix and result. HKR-H and HKR-R are weak, so this is useful niche research, not a featured item.
editor take
Lee et al. test 6 merging methods across 2 LLMs and 12 languages; shared-covariance summation wins, TSVM barely tames interference.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering
TabClustPFN clusters unseen tabular datasets in one forward pass while inferring both cluster assignments and cluster cardinality, and the paper says its code is available on GitHub.
#Reasoning#TabClustPFN#GitHub#Research release
why featured
HKR-H and HKR-K pass: the paper offers a concrete one-forward-pass clustering mechanism and open code. HKR-R fails because niche tabular clustering lacks a strong LLM/agent practitioner nerve, so it stays in the 60-71 all band.
editor take
TabClustPFN infers cluster count and assignments in one pass; scale is undisclosed, so the real test is messy tabular benchmarks.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Vendor-Conditioned Contrastive Learning for Predicting Organizational Cyber Threat Targets
The paper proposes TRACE, a CySecBERT-based vendor-conditioned contrastive learning framework, to predict seven organizational cyber-threat target categories using 129,126 samples from 352,866 posts across nine exploit databases and hacker forums, and reports 97.00% macro F1 under temporal out-of-distribution evaluation.
#Embedding#Fine-tuning#Benchmarking#CySecBERT
why featured
HKR-K passes with a named method, sample count, and temporal OOD F1; HKR-H and HKR-R are weak. The cybersecurity-targeting niche keeps it in the lower interesting band, so tier is all.
editor take
TRACE reports 97.00% macro F1 under temporal OOD; I’d audit label leakage before celebrating vendor-conditioned contrastive learning.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts
The paper proposes L2R for MoE routing, assigning experts in a shared low-rank latent space and using Saturated Inner-Product Scoring to control Lipschitz behavior; experiments on an OLMoE-based language MoE model and an ImageNet vision MoE setting report improved routing geometry, expert discrimination, and overall performance, while the code is not yet released.
#Inference-opt#Benchmarking#OLMoE#ImageNet
why featured
HKR-K is present via a concrete routing mechanism, and HKR-R ties to MoE cost and stability. HKR-H is weak, and the post gives no result numbers, so this stays in all below featured.
editor take
L2R tests low-rank routing on OLMoE and ImageNet; code is unreleased, so the SIPS stability claim stays provisional.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
AMiD: Knowledge Distillation for LLMs with α-mixture Assistant Distribution
AMiD proposes an α-mixture assistant distribution for LLM knowledge distillation, makes α a tunable distribution design variable, generalizes the related divergence family, and releases code for arXiv:2510.15982v3 at the project repository.
#Fine-tuning#Inference-opt#KAIST#Research release
why featured
HKR-K passes with a concrete distillation mechanism and code. HKR-H/R are weak because benchmarks, model scale, and inference gains are not disclosed, so this stays in all.
editor take
AMiD makes KD’s α tunable and ships code; the snippet gives no benchmark numbers, so I don’t buy “superior” yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Language-Induced Priors for Domain Adaptation
The paper proposes Language-Induced Prior, which turns textual target-domain descriptions into a choice model and integrates it with EM, validating the framework on three tasks: Gaussian estimation, C-MAPSS, and MuJoCo hopper.
#Reasoning#arXiv#Research release
why featured
HKR-K passes: the method has a concrete mechanism and tests on Gaussian, C-MAPSS, and MuJoCo hopper. HKR-H/R are weak, so this stays in the 60–71 academic-research band.
editor take
LIP plugs target-domain text into EM, tested on 3 tasks; I buy the cold-start need, not the “correct LLM prior” premise.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation
The study builds a six-state U.S. dataset with 9 million accident records and 1 million high-resolution satellite images, then shows multimodal embeddings reach 90.1% average AUROC, a 3.7% gain over graph-only GNN models.
#Multimodal#Vision#Embedding#arXiv
why featured
HKR-K passes with concrete dataset scale and AUROC gains. HKR-H/R are weak because the paper is a niche traffic-prediction application with no model, product, or tooling impact for AI practitioners.
editor take
Six-state data hits 90.1% AUROC; I trust the prediction lift more than the matched 24% precipitation effect.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Towards Fine-Grained and Verifiable Concept Bottleneck Models
The paper proposes a fine-grained CBM framework that grounds each concept in localized visual evidence; experiments use medical imaging benchmarks, but the RSS snippet does not disclose the number of datasets or specific performance metrics.
#Vision#Interpretability#Research release
why featured
HKR-K and HKR-R pass: the mechanism is concrete and medical-AI verifiability has practitioner pull. Missing dataset counts and performance numbers keep it in the 60-71 band.
editor take
FG-CBM grounds concepts in local evidence; RSS gives no dataset count or metrics, so I don’t buy the clinical-readiness leap.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference
XFP achieves 138 tok/s single-stream decode on Qwen3.5-122B-A10B in V2 mode on RTX PRO 6000 Blackwell with TP=2, and reports 94.49% GSM8K strict-match across 3 seeds and 3,957 problems.
#Inference-opt#Benchmarking#Qwen#arXiv
why featured
HKR-K/R pass: XFP reports decode throughput for a 122B model and GSM8K strict-match accuracy, with clear serving-cost relevance. HKR-H fails because the angle is dense quantization detail for a narrow infra audience.
editor take
XFP hits 138 tok/s on Qwen3.5-122B; the auto codebook path is neat, but 397B evidence is single-seed GSM8K.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning
The paper shows that, on Needleman-Wunsch matrix generation, small Transformers reach high validation exact-match accuracy fastest at an intermediate dataset size, while larger post-threshold datasets still generalize but require more gradient updates.
#Reasoning#Benchmarking#Research release
why featured
HKR-H/K pass: the title has a paradox hook and the paper gives a concrete data-scale result. HKR-R is weak because the arXiv study is narrow and distant from product, cost, or safety stakes.
editor take
Small Transformers hit NW exact-match fastest at mid-scale data; treating more data as faster convergence looks too lazy here.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Interestingness as an Inductive Heuristic for Future Compression Progress
The paper formalizes interestingness as an inductive heuristic for future compression progress, proves expected progress changes exponentially with the recency of the last observed breakthrough, and reports experimental confirmation across three universal computational paradigms.
#Reasoning#Benchmarking#Research release
why featured
HKR-H/K pass, but the item is an arXiv theory-paper abstract with limited reproducible detail and no product or agent link. This fits the 60–71 research-interest band.
editor take
This pins interestingness to compression progress across 3 paradigms; the gap to agent task selection is still engineering-sized.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions
The paper introduces counterfactual time series forecasting with textual conditions, adds an evaluation framework covering factual and counterfactual settings without ground-truth future series, and proposes a text-attribution mechanism that separates mutable from immutable factors to improve forecasts under stochastic textual conditions.
#Benchmarking#arXiv#SeqML#Research release
why featured
HKR-H and HKR-K pass: the counterfactual setup is clickable, and the post names a new task, evaluation setup, and attribution mechanism. HKR-R is weak; as a single arXiv paper, it fits all, below featured.
editor take
arXiv 2605.14422 adds text-conditioned counterfactual forecasting; I don't buy no-ground-truth evaluation until TADiff shows its guardrails.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Vision-LLMs for Spatiotemporal Traffic Forecasting
The paper proposes ST-Vision-LLM for spatiotemporal mobile traffic forecasting, feeding historical global traffic matrices as image sequences into a Vision-LLM, using single-token floating-point encoding, two-stage numerical alignment, and GRPO, and reporting a 15.6% gain in long-term prediction accuracy.
#Vision#Multimodal#Fine-tuning#Research release
why featured
HKR-K passes via concrete mechanisms and a 15.6% accuracy gain. HKR-H/R are weak: this is domain traffic-forecasting research, not a general agent, product, or foundation-model competition story.
editor take
ST-Vision-LLM reports a 15.6% long-horizon accuracy gain. Treating traffic grids as images beats cramming time series into text.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages
The paper formulates diffusion sequence generation as a finite-horizon MDP and derives an exact unbiased policy gradient over denoising steps, then uses entropy-guided step selection and one-step denoising rewards to estimate advantages without explicit sequence likelihoods or costly multi-step rollouts.
#Reasoning#Code#Fine-tuning#Research release
why featured
HKR-K passes because the mechanism is concrete for diffusion-LLM training watchers. HKR-H/R are weak, and the post discloses no result numbers, code artifact, or production impact, so it stays a normal research update.
editor take
DLM-RL gets an unbiased stepwise gradient; SOTA numbers aren’t in the snippet, so I’d inspect repo cost first.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
AudioMosaic: Contrastive Masked Audio Representation Learning
AudioMosaic constructs positive pairs with structured time-frequency masking on spectrogram patches, reduces memory usage for large-batch contrastive pre-training, and reaches state-of-the-art results on several standard audio benchmarks under linear probing and fine-tuning.
#Audio#Embedding#Benchmarking#AudioMosaic
why featured
HKR-K passes on a concrete training mechanism and benchmark claim; HKR-H and HKR-R are weak because the angle is academic and narrow. This is useful research signal, not featured-level industry news.
editor take
AudioMosaic uses structured time-frequency masks for positives; memory savings lack numbers, so hold the SOTA claim lightly.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
MetaMoE unifies independently trained domain experts with public proxy data, uses diversity-aware proxy selection for router supervision, and outperforms recent privacy-preserving MoE unification methods on computer vision and NLP benchmarks.
#Fine-tuning#Alignment#Benchmarking#MetaMoE
why featured
HKR-K passes with a concrete mechanism and benchmark claim. HKR-H/R are weak: the angle is specialist, and the post lacks numbers, code, or production impact, so it stays in the lower research-news band.
editor take
MetaMoE trains routers with public proxy data; gains are undisclosed. Privacy MoE will hinge on proxy contamination, not expert count.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
MoZoo: Unleashing Video Diffusion Power in Animal Fur and Muscle Simulation
MoZoo synthesizes animal videos from coarse meshes under multimodal guidance, using RAR-RoPE, Asymmetric Decoupled Attention, and MoZooBench with 120 mesh-video pairs to evaluate fur simulation across animal skeletons and layouts.
#Multimodal#Vision#Benchmarking#MoZoo
why featured
HKR-H and HKR-K pass: the angle is novel and the post gives mechanisms plus MoZooBench size. HKR-R is weak because this is graphics-heavy arXiv research with limited near-term industry pull.
editor take
MoZooBench has only 120 mesh-video pairs; fur dynamics are hard, but that scale cannot carry the “cinematic-quality” claim.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
CA2: Code-Aware Agent for Automated Game Testing
CA2 trains a game-testing agent with function call traces and game state, then evaluates it in two instrumented environment types: state-based and image-based.
#Agent#Code#Valliappan Chidambaram Adaikkappan#Vincent Martineau
why featured
HKR-K passes because CA2 adds a concrete mechanism: call stacks plus game state for a testing agent. HKR-H/R are weak, and the excerpt gives no metrics, code, or production-replacement claim, so this stays niche research.
editor take
CA2 feeds call stacks to a testing agent across 2 environment types; I buy the direction, not the vague “consistent improvement.”
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review
The systematic review selected 134 studies from 2,067 papers published over ten years and identifies gaps in synthetic health tabular data evaluation, including no consensus on methods, inconsistent metric use, limited domain expert involvement, incomplete dataset reporting, and limited reproducibility.
#Benchmarking#arXiv#Research release
why featured
HKR-K is solid: 134 reviewed studies produce concrete evaluation gaps. HKR-R is niche to synthetic health tabular data, with no product, model, or open-source artifact, so this stays in all.
editor take
The review keeps 134 studies; synthetic health tabular evaluation is still metric soup, with clinicians and reproducibility missing.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
BioHuman: Learning Biomechanical Human Representations from Video
BioHuman introduces BioHuman10M, a dataset with synchronized video, motion, and muscle activations, and trains an end-to-end model that takes monocular video to jointly predict human motion and muscle activations.
#Vision#Multimodal#Benchmarking#BioHuman
why featured
HKR-H/K pass: the hook extends video human modeling to muscle activation, with BioHuman10M’s synced data modalities. HKR-R is weak; no product, open-source, or robotics deployment detail is disclosed, so it stays low-tier all.
editor take
BioHuman10M syncs video, motion, and muscle activation at 10M scale; activation is simulation-derived, so rehab claims need restraint.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Temporal Fair Division in Multi-Agent Systems: From Precise Alternation Metrics to Scalable Coordination Proxies
The paper introduces Rotational Periodicity and ALT temporal fairness metrics for repeated multi-agent resource competition, evaluates MBoE with 2, 3, 5, 8, and 10 agents, and reports that RP runs 12-25x faster than ALT while exposing Q-learning coordination failures that reward fairness misses.
#Agent#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: new metrics, agent counts, and a 12-25x speed result. HKR-H/R are weak; this is a niche arXiv methods paper without product or major agent-framework impact, so it stays in the 40-59 band.
editor take
RP runs 12–25x faster than ALT on 2–10 agents; stop trusting Reward Fairness for repeated allocation agents.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Pro-DG: Procedural Diffusion Guidance for Architectural Facade Generation
Pro-DG infers a facade hierarchy from one image and its segmentation, then uses procedural control maps in Stable Diffusion and ControlNet to perform structural edits such as floor duplication and window rearrangement.
#Vision#Multimodal#arXiv#Stable Diffusion
why featured
HKR-K passes because Pro-DG gives concrete inputs and a control mechanism; HKR-H/R are weak because the use case stays inside architectural facade generation, with no broad product or model-competition signal.
editor take
Pro-DG edits facades from one image plus segmentation. Metrics are undisclosed; the useful bit is procedural rules inside ControlNet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Communication-Efficient Federated Fine-Tuning
The paper proposes the FDA-Opt algorithm family for federated language-model fine-tuning, replacing FedOpt’s fixed exchange intervals with dynamic synchronization and outperforming FedOpt on downstream NLP experiments even when FedOpt uses hyperparameters optimized for those tasks.
#Fine-tuning#Research release
why featured
HKR-K passes on the dynamic-sync FDA-Opt mechanism, but the article gives no gain size, communication rounds, or reproducible setup. HKR-H/HKR-R are weak, so this stays a niche research signal.
editor take
FDA-Opt replaces FedOpt’s fixed exchange interval with dynamic sync; I buy the direction, but rounds and model sizes are undisclosed.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration
UniMamba integrates Mamba, FFT-Laplace Transform, TCN, and spatial-temporal attention for multivariate time-series forecasting, and the paper reports better forecasting accuracy and computational efficiency than prior models on eight public benchmark datasets.
#Reasoning#Benchmarking#UniMamba#Mamba
why featured
HKR-K passes via the concrete architecture mix and 8 public benchmarks. HKR-H/R are weak: this is a routine arXiv methods paper with no production replacement claim or open-source impact, so it stays in the upper 40-59 band.
editor take
UniMamba wins on 8 public benchmarks; without ablations or cost tables here, Mamba+attention+FFT-Laplace smells stacked.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression
RQ-MoE combines a two-level MoE with dual-stream quantization to adapt codebooks per input for high-dimensional embedding compression, and experiments report state-of-the-art or on-par reconstruction and retrieval with 6–14x faster decoding than prior vector quantization methods.
#Embedding#Inference-opt#KDEGroup#Research release
why featured
HKR-K/R pass: the paper has a concrete mechanism and 6–14x decoding claim. It remains a narrow embedding-compression paper with no major lab release, ecosystem signal, or production-replacement proof, so it stays below featured.
editor take
RQ-MoE claims 6–14x faster decoding; I’d benchmark ANN latency first, reconstruction scores don’t ship products.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Mini-JEPA Foundation Model Fleet Enables Agentic Hydrologic Intelligence
The paper proposes five 22M-parameter Mini-JEPA models with a router LLM selecting sensors per query; dual retrieval over AlphaEarth and the routed fleet outperforms AlphaEarth alone on physics-matched questions, with Cohen's d=1.10 and p=0.031.
#Agent#RAG#Vision#Google AlphaEarth
why featured
HKR-K passes via the small-model fleet, routing LLM, dual retrieval setup, and effect size; HKR-H/R are weak because the hydrology focus is narrow. No hard exclusion applies, so it lands in low all.
editor take
Five 22M Mini-JEPAs beat AlphaEarth-only retrieval; the router’s perfect hit rate is on curated questions, so “agentic” feels inflated.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
SurF: A Generative Model for Multivariate Irregular Time Series Forecasting
SurF maps event sequences to i.i.d. unit-rate exponential noise via the Time Rescaling Theorem, trains one model across heterogeneous event streams, and reports the best time RMSE on 3 of 6 real-world benchmarks: Earthquake, Retweet, and Taobao.
#Reasoning#Benchmarking#SurF#Amazon
why featured
HKR-K passes via a testable mechanism and 6-benchmark result; HKR-H/R miss because this is a niche time-series modeling paper with no product or industry spillover.
editor take
SurF tops time RMSE on 3/6 benchmarks; TRT as a learnable bijection is a credible pretraining handle for async event streams.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers
AaSP pre-trains audio spectrogram Transformers on AudioSet with AaPE, teacher-student masked modeling, a cross-attention predictor, and multi-mask contrastive regularization, then reports state-of-the-art fine-tuning results on AS-20K, ESC-50, and NSynth among compared self-supervised baselines, while linear evaluation also shows gains on US8K and NSynth.
#Audio#Multimodal#Benchmarking#AudioSet
why featured
HKR-K passes via named mechanisms and three fine-tuning benchmarks. HKR-H/R fail because the paper is niche audio representation work with no code, effect sizes, or broader practitioner nerve.
editor take
AaSP pretrains on AudioSet and wins 3 fine-tuning benchmarks; audio SSL is finally treating patch aliasing as a first-class bug.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Causal Time Series Generation via Diffusion Models
The paper introduces CaTSG, a diffusion-based framework that uses backdoor-adjusted guidance and abduction-action-prediction to generate observational, interventional, and counterfactual time series across synthetic and real-world datasets.
#Reasoning#CaTSG#Research release
why featured
HKR-K passes for a concrete CaTSG mechanism and three generation targets. HKR-H/R are weak, and this single arXiv paper gives no production replacement or open-source impact.
editor take
CaTSG spans observational, interventional, and counterfactual series; smells like Pearl’s ladder inside diffusion sampling, with the causal graph still doing the hard work.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Deep Image Segmentation via Discriminant Feature Learning
The paper introduces DDA, an architecture-agnostic segmentation loss evaluated on DIS5K across multiple architectures, which maximizes between-class variance and minimizes within-class variance to improve segmentation accuracy, boundary sharpness, and model confidence without adding inference cost.
#Vision#Benchmarking#Research release#Benchmark
why featured
HKR-K passes on the DDA loss mechanism and “no inference cost”; HKR-H/R are weak because the title is academic and the audience is narrow. No hard exclusion, but this is niche vision research, so it lands in the 40–59 band.
editor take
DDA improves DIS5K boundaries across architectures with zero inference cost; honestly, loss-side fixes beat another segmentation head.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection
PaAno uses short temporal patches and a 1D CNN for time-series anomaly detection, training embeddings with triplet loss plus pretext loss and evaluating on TSB-AD across univariate, multivariate, range-wise, and point-wise measures.
#Embedding#Benchmarking#PaAno#TSB-AD
why featured
HKR-K passes because the method and benchmark setup are concrete. HKR-H/R are weak: this is a narrow time-series anomaly-detection paper with limited general AI-practitioner pull.
editor take
PaAno claims TSB-AD SOTA but gives no scores here; a 1D-CNN patch method beating heavy models needs code and tables.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Architecture-Aware Explanation Auditing for Industrial Visual Inspection
The paper tests explanation auditing on WM-811K with 9 classes and 172k wafer maps, where ViT-Tiny plus Attention Rollout records a Deletion AUC of 0.211, while Swin-Tiny, ResNet18+CBAM, and DenseNet121 plus Grad-CAM score 0.432-0.525 and RISE compresses all families to about 0.1.
#Vision#Interpretability#Benchmarking#WM-811K
why featured
HKR-K passes with dataset size, class count, and Deletion AUC comparison. HKR-H/R are weak: this is a narrow industrial-vision interpretability benchmark, useful but not broad enough for featured.
editor take
ViT-Tiny+Attention Rollout scores 0.211 Deletion AUC on WM-811K; heatmap audits hinge on readout and perturbation choice.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution
RoSHAP models SHAP score distributions with bootstrap resampling and kernel density estimation, then uses asymptotic Gaussianity under mild regularity conditions to reduce distribution-estimation cost while ranking features by activity, strength, and stability.
#Interpretability#Research release
why featured
HKR-K passes: RoSHAP introduces a concrete mechanism for stable feature attribution, but the post gives no experiment numbers, code, or production claim. The academic framing keeps it in the 40–59 band.
editor take
RoSHAP adds bootstrap+KDE stability to SHAP ranking; no cost numbers disclosed, so test it first on seed-sensitive feature selection.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
CAKE: Confidence in Assignments via K-partition Ensembles
CAKE evaluates per-point confidence in clustering assignments with K-partition ensembles, combining cross-run assignment stability and local geometric-fit consistency into one interpretable score in [0,1].
#Benchmarking#CAKE#Research release
why featured
HKR-K passes because the post states a testable mechanism: a [0,1] assignment-confidence score from stability and local geometry. HKR-H and HKR-R are weak; this is a narrow methods paper, not featured.
editor take
CAKE scores each clustered point in [0,1]; no code or datasets disclosed, so don't treat robustness proofs as usability.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals
Dywave applies wavelet-based hierarchical decomposition to event-aligned dynamic tokenization for heterogeneous IoT sensing signals, and evaluations on five real-world datasets for activity recognition, stress assessment, and nearby object detection report up to 12% higher accuracy and up to 75% shorter input token lengths across mainstream sequence models.
#Inference-opt#Dywave#Research release#Benchmark
why featured
HKR-K passes on mechanism and numbers, but the story is niche IoT time-series research with little product or developer-workflow impact. No hard-exclusion rule is triggered, so it stays in the low-value research-signal band.
editor take
Dywave reports +12% accuracy and 75% fewer tokens on 5 IoT datasets; sensor-swap robustness is the hard test.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
bde: A Python Package for Bayesian Deep Ensembles via MILE
bde releases a Python package for Bayesian Deep Ensembles, built on a JAX implementation of MILE sampling-based inference, with scikit-learn compatible estimators for tabular regression and classification uncertainty quantification.
#Benchmarking#bde#JAX#scikit-learn
why featured
HKR-K passes via a concrete implementation and supported tasks; HKR-H and HKR-R are weak, with no major lab or broad industry impact. This fits the upper 40–59 band as a niche research-tool release.
editor take
bde ships JAX MILE samplers with scikit-learn estimators; another tabular uncertainty tool, but no benchmarks disclosed—don’t buy “fast” yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Toward Privileged Foundation Models: LUPI for Accelerated and Improved Learning
The paper introduces PIQL, a framework that adds two train-time privileged-information sources to tabular foundation models: aggregate dataset statistics and encodings of data-generating programs; the abstract says PIQL improves convergence, final loss, and generalization, but the post does not disclose concrete experimental numbers.
#Fine-tuning#Inference-opt#Reasoning#Research release
why featured
HKR-K passes because PIQL gives a testable mechanism using two classes of training-time privileged information. HKR-H/R are weak, and no concrete experiment numbers are disclosed, so this stays in the lower research-signal band.
editor take
PIQL adds two train-time privileged signals for tabular FMs, but reports no numbers here; I don’t buy the “first framework” flex without code.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Understanding Imbalanced Forgetting in Rehearsal-Based Class-Incremental Learning
The paper constructs three last-layer coefficients to predict class-wise forgetting ranks in rehearsal-based class-incremental learning, and identifies the self-induced interference coefficient as the strongest predictor under controlled experiments.
#Fine-tuning#Interpretability#Research release
why featured
HKR-K passes because the paper names three testable coefficients for forgetting order. HKR-H/R fail: the angle is academic and niche, with no broad product, cost, safety, or competition hook; no hard exclusion triggered.
editor take
3 last-layer coefficients predict forgetting ranks; snippet lacks datasets and effect sizes, so mitigation claims wait.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
AIM Framework for Standardised Explainability Evaluation in Graph Neural Networks
The paper introduces AIM, a framework that evaluates GNN explainability with three measure groups: Accuracy, instance-level explanations, and model-level explanations, then applies it to graph kernel networks and prototype networks, using the GKN case study to derive xGKN while the abstract does not disclose benchmark scores or datasets.
#Interpretability#Benchmarking#Research release
why featured
HKR-K passes on AIM metrics and xGKN, but HKR-H/HKR-R are weak. The GNN/GKN explainability angle needs specialist graph-ML background and gives no product path, triggering hard-exclusion-technical-accessibility; capped at 39.
editor take
AIM scores GNNs across accuracy, instance explanations, and model explanations. This 19-page TMLR paper pays down XAI’s benchmark debt.
HKR breakdown
hook knowledge resonance
open source
51
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
On the Burden of Achieving Fairness in Conformal Prediction
The paper derives a conservation law and lower bound for pooled split conformal calibration, showing that cross-group quantile heterogeneity creates irreducible group-wise coverage distortion and that Equalized Coverage conflicts with Equalized Set Size under the studied policy families.
#Benchmarking#Research release
why featured
Hard-exclusion-technical-accessibility applies: conformal-prediction fairness bounds are niche statistical theory with no product, agent, or engineering path. HKR-K passes, but the cap keeps it excluded.
editor take
The paper proves 1 conservation law and lower bound: pooled calibration turns group heterogeneity into coverage distortion. Fair conformal prediction has no free lunch.
HKR breakdown
hook knowledge resonance
open source
51
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval
The paper introduces The Spheres dataset with over one hour of multitrack orchestral recordings by Colibrì Ensemble, captured with 23 microphones, and provides isolated stems, estimated room impulse responses, and X-UMX baselines for orchestral family separation and microphone debleeding.
#Audio#Benchmarking#Colibrì Ensemble#The Spheres
why featured
HKR-K passes with concrete dataset size, capture setup, and baseline. HKR-H and HKR-R are weak because the story is niche music source-separation research, so it stays in all.
editor take
The Spheres offers 1 hour and 23-mic orchestral multitracks; small corpus, but stems plus RIR make it useful.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
WarmPrior replaces the standard Gaussian source distribution with a temporal prior built from recent action history, improving success rates for generative visuomotor robot control; the abstract does not disclose the number of tasks, success-rate gains, or sample sizes.
#Robotics#Inference-opt#WarmPrior#Research release
why featured
HKR-K passes for a testable mechanism in policy generation. The summary discloses no task count, success-rate gain, or sample size, and the angle is specialized robotics research, so it stays in the lower band.
editor take
WarmPrior swaps Gaussian sources for recent action history; no task counts or gains disclosed, but source distributions deserve control-stack attention.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Distributional Principal Autoencoders
The paper proposes Distributional Principal Autoencoder, which uses an encoder to adaptively choose latent dimensions and a decoder to match the conditional distribution given low-dimensional variables, with numerical results on climate data, single-cell data, and image benchmarks showing reconstruction of the original data distribution.
#Embedding#Benchmarking#Research release#Benchmark
why featured
HKR-K passes because the abstract gives a concrete mechanism and benchmark domains. HKR-H/R are weak: this is a technical representation-learning paper with no product, agent, or market hook.
editor take
DPA claims original-distribution reconstruction at any retained dimension; I don’t buy it without disclosed limits beyond climate, single-cell, and image benchmarks.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning
GFMate applies centroid and layer prompts after pre-training for Graph Foundation Models, then tunes them at test time with labeled and unlabeled target-domain data; experiments on 12 benchmark datasets report performance gains up to 30.63%, and the authors provide code on GitHub.
#Fine-tuning#Benchmarking#GFMate#Research release
why featured
HKR-K passes via 12 benchmarks and a 30.63% gain, but HKR-H and HKR-R miss: the graph-model prompt-tuning angle is niche and mostly academic. This fits the low-value research band, so tier is all.
editor take
GFMate reports up to 30.63% on 12 graph benchmarks; the useful bit is unlabeled target-graph tuning, not another few-shot prompt wrapper.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Measuring the Stability and Plasticity of Recommender Systems
The paper proposes an offline evaluation protocol that profiles recommender models after retraining by stability and plasticity, then reports preliminary results on three algorithm types using the GoodReads dataset, while the abstract does not disclose the exact metrics, model names, or numerical scores.
#Benchmarking#GoodReads#Research release#Benchmark
why featured
HKR-K passes: the paper offers a stability/plasticity offline evaluation protocol with GoodReads tests. The topic is niche recommender-system evaluation, with no product, open-source, or foundation-model impact shown.
editor take
The paper tests 3 recommender types on GoodReads; metrics and scores are undisclosed, but retraining drift belongs in offline eval.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning
NERVE tokenizes brain functional connectivity matrices into intra- and inter-network blocks and evaluates behavior and psychopathology prediction across three developmental cohorts: ABCD, PNC, and CCNP.
#Embedding#NERVE#ABCD#PNC
why featured
Triggers hard-exclusion-4: brain connectivity prediction is traditional science plus AI, with no agent, product, or engineering implication disclosed. HKR-K passes via the tokenization mechanism, while HKR-H and HKR-R fail.
editor take
NERVE tokenizes FC as network-pair blocks; three cohorts back transfer, and image-MAE defaults look lazy here.
HKR breakdown
hook knowledge resonance
open source
47
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Breaking the Reasoning Horizon in Entity Alignment Foundation Models
Yuanning Cui and four coauthors propose an entity alignment foundation model that uses seed entity pairs as local anchors for parallel encoding; the abstract reports experiments on unseen knowledge graphs, but the post does not disclose dataset counts or performance numbers.
#Reasoning#Yuanning Cui#Zequn Sun#Wei Hu
why featured
HKR-K comes from one mechanism: seed entity pairs as local anchors for parallel encoding; the post gives no datasets, metrics, or code. Niche entity alignment has weak practitioner resonance, so it sits in the 40–59 low-value research band.
editor take
Cui’s team uses seed entity pairs as anchors; no dataset counts or metrics are disclosed, so I don’t buy the “foundation model” label yet.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games
The paper proposes Data-Augmented Game Starts, which samples intermediate states from offline demonstrations for two-player zero-sum imperfect-information games, and tests it on long-horizon variants of Kuhn Poker, Goofspiel, and a counterexample game under fixed compute budgets.
#Reasoning#Benchmarking#OpenSpiel#Research release
why featured
HKR-K passes because DAGS gives a concrete mid-state self-play mechanism and 3 test environments. HKR-H/R are weak: dry paper framing and limited relevance beyond niche RL/game research.
editor take
DAGS starts self-play from offline mid-states and reports lower exploitability under fixed compute; I buy the exploration trick, not the demo-coverage assumption.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Time Series Forecasting Through the Lens of Dynamics
The paper proposes the PRO-DYN nomenclature to analyze time-series forecasting models through dynamics, reporting two observations: under-performing architectures learn dynamics only partially, and placing the dynamics block at the model end is critical.
#Benchmarking#Research release
why featured
Only HKR-K lands: the post gives a PRO-DYN taxonomy and a module-placement claim, but no numbers, artifact, or product angle. This is niche forecasting research, so it stays in all.
editor take
PRO-DYN frames forecasting as dynamics-block placement; the snippet gives no benchmark scale, so I don’t buy the design-guide claim yet.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Exploitation of Hidden Context in Dynamic Movement Forecasting: From Recurrent to Graph Neural Networks and General Purpose Transformers
The paper evaluates LSTM, GNN, Transformer, and linear baselines for NBA movement forecasting under forecast horizons up to 2 seconds; a context-augmented hybrid LSTM achieves the lowest final displacement error at 1.51 m, beating TCNN, GAT, and Transformers while using less data and training time than GAT and Transformers.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes because the paper gives a 2-second forecasting setup and 1.51m FDE result. HKR-H/R miss: this is a niche trajectory-forecasting benchmark with unclear product, agent, or platform impact.
editor take
Hybrid LSTM hits 1.51m FDE on 2s NBA forecasting; Transformers lose when short-horizon context beats model fashion.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Exploring Geographic Relative Space in Large Language Models through Activation Patching
The paper uses activation patching to examine how LLMs process relative geographic space; the RSS abstract discloses the mechanistic interpretability method but not the model names, datasets, or evaluation metrics.
#Interpretability#Research release
why featured
HKR-H barely passes on the geographic-representation hook, while HKR-K/R fail because the feed gives no models, datasets, metrics, or practical implication. This is relevant interpretability research, but thin and niche.
editor take
The paper uses activation patching for relative geography, but names no models or metrics; good question, thin evidence so far.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H1·K0·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Fully Dynamic Rebalancing in Dockless Bike-Sharing Systems via Deep Reinforcement Learning
The paper proposes a DRL method that routes one truck in real time for pick-up, drop-off, and charging actions in dockless bike-sharing systems; experiments use real-world data, but the RSS snippet does not disclose the exact reduction in availability failures.
#Agent#Robotics#Research release
why featured
HKR-K passes: the paper gives a real-time 1-truck dispatch mechanism tested on real data. H and R fail because this is a narrow operations application with no reported performance lift or AI-product implication.
editor take
DRL routes 1 truck for live rebalancing; no failure-rate delta is disclosed, so the engineering claim stays discounted.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Comparative Evaluation of Machine Learning Approaches for Minority-Class Financial Distress Prediction Under Class Imbalance Constraints
The arXiv paper compares statistical methods, ensemble learning, and exploratory neural models for minority-class financial distress prediction, using SMOTE, five ensemble architectures including XGBoost and LightGBM, and SHAP attribution under severe class imbalance conditions.
#Benchmarking#Interpretability#arXiv#XGBoost
why featured
HKR-K passes weakly because the setup names concrete methods, but there are no result numbers or production implications. The applied finance paper is vertical, not hard-excluded, so it stays low-value but browseable.
editor take
The paper compares 5 ensemble models plus SMOTE; dataset and AUC are undisclosed, so I file it as routine risk-ML replication.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset
Zarkadis and Douligeris compare tree ensembles, DNNs, hybrid stacking models, and ensemble neural networks on UAVIDS-2025 with stratified 10-fold cross-validation, then use SHAP and statistical tests to analyze XGBoost errors in Wormhole and Blackhole attacks.
#Interpretability#Benchmarking#Iakovos-Christos Zarkadis#Christos Douligeris
why featured
HKR-K passes via a new UAVIDS-2025 benchmark setup and model ranking; HKR-H/R are weak, and metrics are not disclosed. This is niche security-ML research, so it stays in all.
editor take
Zarkadis and Douligeris use 10-fold CV on UAVIDS-2025. XGBoost wins, but no scores are disclosed; SHAP isn't mechanistic interpretability.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
04:00
25d ago
arXiv · cs.LG· atomEN04:00 · 05·15
Proposal and Study of Statistical Features for String Similarity Computation and Classification
The paper applies co-occurrence matrix and run-length matrix features to string similarity computation; in the first synthetic experiment set, COM and RLM beat other statistical features, and in 3 of 4 cases they were more significant than the second-best distance-based group with P-value below 0.001.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes on concrete experiment details, but HKR-H and HKR-R fail. This is a narrow string-similarity methods paper with no product, agent, or foundation-model industry impact, so it stays in the low-value non-excluded band.
editor take
COM/RLM won 3 of 4 synthetic cases at P<0.001; looks useful for brittle similarity checks, not semantic retrieval.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0

more

feeds

admin