ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-06-01

261 papers · updated 3m ago
2026-06-01 · Mon
17:59
7d ago
arXiv · cs.AI· atomEN17:59 · 06·01
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling
The paper defines Perceptual Judgment Bias in multimodal LLM-as-a-Judge systems and trains judges with a perceptually perturbed dataset, a structured GRPO-based reward, and a batch-ranking objective; the RSS snippet does not disclose dataset size, benchmark names, or exact improvement numbers.
#Multimodal#Vision#Alignment#Research release
why featured
HKR-K/R pass: the mechanism is concrete and the topic matters for multimodal eval reliability. No sample size, gains, or reproducible setup are disclosed in the feed, so this stays in the interesting band.
editor take
The paper trains MLLM judges with perturbations and GRPO, but RSS gives no dataset size or gains; I buy the failure mode, not the victory lap.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
17:59
7d ago
HuggingFace Papers (takara mirror)· rssEN17:59 · 06·01
RoboDream: Compositional World Models for Scalable Robot Data Synthesis
RoboDream anchors generation to rendered robot motion and synthesizes photorealistic robot demonstrations with novel objects, scenes, and viewpoints; the snippet reports improved downstream policy performance and lower real-world data needs, but the post does not disclose task counts, dataset scale, or reduction percentages.
#Robotics#Multimodal#Vision#Research release
why featured
HKR-H/K/R pass, but the post lacks task counts, success rates, or data-cost deltas, so it stays in the 60–71 research-interest band rather than featured.
editor take
RoboDream constrains video generation with rendered robot motion; no task count, dataset scale, or reduction percent disclosed, so I don’t buy the “significantly reduces real data” claim yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
17:52
7d ago
arXiv · cs.CL· atomEN17:52 · 06·01
From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression
SubFit compresses LLMs at the Attention and FeedForward submodule level using non-contiguous selection and fitted residual bypasses; across 10 LLMs, five sparsity levels from 12.5% to 37.5%, and four replacement baselines, it retains 84.6% dense downstream accuracy at 25% sparsity versus 81.6% for the strongest baseline.
#Inference-opt#Benchmarking#SubFit#Research release
why featured
HKR-K is solid and HKR-R is moderate: SubFit shifts replacement to Attention and FeedForward submodules, with 10-model tests and 84.6% accuracy retention. The angle is niche compression research, so HKR-H misses and it stays below featured.
editor take
SubFit keeps 84.6% accuracy at 25% sparsity across 10 LLMs; layer-level compression looks lazy after this.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
17:51
7d ago
arXiv · cs.CL· atomEN17:51 · 06·01
HERO'S JOURNEY: Testing Complex Rule Induction with Text Games
HERO'S JOURNEY introduces 8 goal-directed text-game tasks where LLM agents infer hidden rules from demonstrations and execute them across multiple steps, with results showing limited, uneven rule induction and no reliable procedural-task gains from induction-specific steering methods.
#Agent#Reasoning#Benchmarking#HERO'S JOURNEY
why featured
HKR-H and HKR-K pass: 8 text-game tasks make rule induction and multi-step execution testable. No model scores, release details, or deployment stake are disclosed, keeping it in the normal research-benchmark band.
editor take
HERO'S JOURNEY tests 8 text games; LLMs still choke on procedural induction, and steering prompts don't fix it.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
17:50
7d ago
arXiv · cs.AI· atomEN17:50 · 06·01
Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation
MDA predicts multiple depth hypotheses and probabilities per pixel, then decodes depth from one hypothesis at object boundaries, reducing flying-point artifacts caused by single-depth training targets that place predictions between foreground and background surfaces.
#Vision#MDA#Research release
why featured
HKR-K passes for a concrete mechanism, but the item has only an arXiv title/brief summary with no metrics, code, or deployment angle. Depth-estimation research is narrow for this audience.
editor take
MDA predicts per-pixel depth mixtures; flying points get treated as target ambiguity, not cleanup noise.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
17:49
7d ago
arXiv · cs.CL· atomEN17:49 · 06·01
SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation
The paper proposes SN-WER, a training-free ASR evaluation metric that transliterates references and hypotheses into a language-specific canonical script before WER, then evaluates it on 5 Indic languages, 2 datasets, and 3 ASR models.
#Audio#Benchmarking#arXiv#Research release
why featured
HKR-K passes because SN-WER gives a concrete metric mechanism and test setup. HKR-H and HKR-R are weak: multi-script Indic ASR evaluation is narrow, so it stays in the 40–59 research-signal band.
editor take
SN-WER cuts inflated gaps by 12% across 5 Indic languages; I buy the metric, but Common Voice still exposes weak ASR.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
17:46
7d ago
arXiv · cs.CL· atomEN17:46 · 06·01
SimSD: Simple Speculative Decoding in Diffusion Language Models
SimSD adds valid token-level contexts to diffusion language models through a plug-and-play masking strategy, and experiments on SDAR-family dLLMs across four benchmarks report up to 7.46x higher decoding throughput while maintaining or improving average generation quality.
#Inference-opt#SimSD#SDAR#Research release
why featured
HKR-H/K/R pass via the 7.46x throughput hook, concrete masking mechanism, and inference-cost angle. The niche diffusion-LM scope keeps it below featured.
editor take
SimSD reports up to 7.46x throughput on four SDAR benchmarks; training-free is nice, but one model family is thin evidence.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:40
7d ago
arXiv · cs.AI· atomEN17:40 · 06·01
Research Proposes Text Embedding Direction Method for Measuring Adaptive Agent Behavior Traits
The authors define agent traits as directions in text-embedding space and score skill-file edits by projection; on 68 labeled skill-diff pairs for propensity to seek sensitive data, the method reaches 91.2% sign classification accuracy and Spearman ρ=0.82 under leave-one-out cross-validation.
#Agent#Embedding#Safety#Research release
why featured
HKR-K/R pass with a concrete mechanism and metrics tied to agent-safety evaluation. HKR-H is weak, and this is a single arXiv paper with no disclosed tool, code, or production path, so it stays in the interesting-not-featured band.
editor take
Embedding-direction trait tracking hits 91.2% on 68 diffs; tiny sample, but skill files as auditable behavior surfaces is right.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
17:32
7d ago
arXiv · cs.CL· atomEN17:32 · 06·01
FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes
FigSIM introduces a public dataset of 1,049 suicide memes annotated for severity levels, figurative phenomena, and suicide-related content, and benchmarks 16 unimodal and multimodal models across figurative language, severity, and content detection tasks.
#Multimodal#Vision#Benchmarking#FigSIM
why featured
HKR-H/K/R all pass, but this is a niche safety benchmark, not a model or product release. The 1,049-sample dataset and 16-model test add signal, while audience reach stays limited.
editor take
FigSIM ships 1,049 annotated suicide memes; 16 models underpredict severe figurative cases, exactly where moderation breaks.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
16:37
7d ago
HuggingFace Papers (takara mirror)· rssEN16:37 · 06·01
Learning When to Translate for Multilingual Reasoning
Luar trains reasoning language models to choose between direct reasoning on the original input and reasoning over an English translation, outperforming GRPO and other training baselines on multilingual reasoning benchmarks, while the post does not disclose exact scores.
#Reasoning#Alignment#Luar#GRPO
why featured
HKR-H and HKR-K pass: the routing mechanism is concrete and the GRPO benchmark claim is testable. Specific scores, model scale, and release details are not disclosed, so this stays interesting but not featured.
editor take
Luar makes RLMs translate on demand; no scores disclosed, so I buy the low-resource trigger idea, not the GRPO win claim.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
16:30
7d ago
HuggingFace Papers (takara mirror)· rssEN16:30 · 06·01
Active Exploring like a Pigeon: Reinforcing Spatial Reasoning via Agentic Vision-Language Models
The paper proposes a dynamic cognitive map and Spatial Assertion Codes for agentic VLM spatial reasoning, reaching 80.5% overall accuracy on MindCube and outperforming the prior best method by 29.5 accuracy points on the Rotation subset.
#Agent#Vision#Reasoning#Research release
why featured
HKR-H/K/R all pass, but this is a single research item with impact limited to MindCube and the Rotation subset; no broad replication or product path is disclosed, so it stays in the high 60–71 band.
editor take
The paper hits 80.5% on MindCube. SAC’s dense checks matter; the pigeon framing is just garnish.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
15:28
7d ago
HuggingFace Papers (takara mirror)· rssEN15:28 · 06·01
Honey, I Shrunk the Arc de Triomphe!
The authors introduce MetricScenes, a metrically grounded in-the-wild dataset using Internet photo collections, stereo imagery, geotagged metadata, and stereo baselines to recover absolute scale, then fine-tune MoGe-2 to reduce scale collapse in distant landmarks and open-domain scenes; the post does not disclose dataset size or benchmark numbers.
#Vision#Fine-tuning#Benchmarking#MetricScenes
why featured
HKR-H and HKR-K pass: the title gives a vivid failure case, and the post names MetricScenes plus the MoGe-2 fine-tuning path. Sample size is not disclosed, and HKR-R is narrow to CV researchers.
editor take
MetricScenes adds geotags and stereo baselines for absolute scale; size and metrics are undisclosed. The data-bottleneck blame sounds right.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
15:00
7d ago
HuggingFace Papers (takara mirror)· rssEN15:00 · 06·01
TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos
TROPHIES jointly estimates dynamic humans, static scenes, and camera poses from multi-view videos in one global coordinate frame, using scale consistency, contact priors, and cross-view temporal coherence for global alignment and reporting stronger global fidelity and human-scene consistency on EgoHuman and EgoExo4D.
#Vision#Multimodal#Reasoning#TROPHIES
why featured
HKR-K passes because the post gives a concrete joint reconstruction mechanism and EgoHuman/EgoExo4D setting. HKR-H and HKR-R are weak, and the 3D vision paper is niche for this feed, so it stays in the lower all tier.
editor take
TROPHIES tests 4D joint reconstruction on EgoHuman and EgoExo4D; metrics are undisclosed, so treat “physically plausible” as unproven.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
14:23
7d ago
HuggingFace Papers (takara mirror)· rssEN14:23 · 06·01
Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization
The paper proposes PHF, a three-level user modeling framework with practices, habitus, and fields, and evaluates a frozen-LLM PHF-Compass implementation on the LaMP benchmark for LLM personalization tasks.
#Memory#Interpretability#Benchmarking#Pierre Bourdieu
why featured
HKR-H/K/R pass at modest strength: PHF gives a testable three-layer personalization mechanism on LaMP. The post discloses no gain size, code, or production validation, so it stays below featured.
editor take
PHF tests a 3-layer user model on LaMP, but gains are undisclosed; nice sociology wrapper, prove it beats long-context memory.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
11:49
7d ago
HuggingFace Papers (takara mirror)· rssEN11:49 · 06·01
ProbRes: Volatility Learning for Probabilistic Time-Series Forecasting
ProbRes models conditional mean and conditional volatility with two architecture-agnostic modules, then generates predictive distributions at inference by resampling normalized residuals for univariate and multivariate heteroskedastic time series.
#Benchmarking#ProbRes#Research release
why featured
HKR-K passes via a concrete forecasting mechanism for heteroscedastic series. HKR-H/R are weak, and the post does not disclose benchmark gains, code, or production evidence, so it stays in the low research-signal band.
editor take
ProbRes uses two modules for mean and volatility; I like the calibration angle, but baselines and datasets are undisclosed.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
10:16
7d ago
HuggingFace Papers (takara mirror)· rssEN10:16 · 06·01
World-Task Factorization Framework for Robot Learning
The paper proposes a world-task factorization framework for robot learning, pairs AICON with a compact learned policy, and reports tests on three robotics problems where it outperforms end-to-end baselines and analytical heuristics, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.
#Robotics#Agent#Reasoning#AICON
why featured
HKR-K is clear: a named framework, 3 robotics problems, and zero-shot OOD results. HKR-R is limited to robotics-learning practitioners; no hard exclusion, but it lacks major-lab/product impact, so it stays in the 60–71 band.
editor take
AICON beats end-to-end baselines on 3 robot tasks; sample counts aren’t disclosed, but world/task factorization beats pure scaling here.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
09:50
7d ago
HuggingFace Papers (takara mirror)· rssEN09:50 · 06·01
CARTE: A Benchmark for Mapping Language Model Knowledge Across France
CARTE evaluates 27 LLMs from 1B to 12B parameters with 2,431 multiple-choice questions across France’s 13 metropolitan regions and 14 domains, including culture, language, demographics, economy, environment, and mobility.
#Reasoning#Benchmarking#CARTE#Research release
why featured
HKR-K is concrete and HKR-R matters for localization/eval teams, but this is a narrow benchmark paper without a major lab, broad artifact impact, or industry-level result, so it fits the 60–71 all band.
editor take
CARTE tests 27 small LLMs on 2,431 France questions; useful regional probe, but few-shot MCQ stays far from real retrieval.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
09:46
7d ago
HuggingFace Papers (takara mirror)· rssEN09:46 · 06·01
MT-EditFlow: Reinforcement Learning for Multi-Turn Image Editing with Flow Matching
MT-EditFlow applies flow-matching reinforcement learning to multi-turn image editing, combining multi-reward signals with GRPO and NFT-based methods; on FLUX.1-Kontext-dev, it raises turn-3 overall performance by 6.85 points and surpasses open-source models such as Qwen-Image-Edit, while the post does not disclose dataset size or training cost.
#Vision#Multimodal#Fine-tuning#FLUX.1-Kontext-dev
why featured
HKR-H and HKR-K pass: the paper has a clear multi-turn editing mechanism and a +6.85-point result. HKR-R is weak, and this is a normal research update, so it stays in the 60–71 band.
editor take
MT-EditFlow lifts FLUX.1-Kontext-dev turn-3 by 6.85 points; dataset size and training cost are undisclosed, so reproducibility is still thin.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
09:14
7d ago
HuggingFace Papers (takara mirror)· rssEN09:14 · 06·01
WALL-WM: Carving World Action Modeling at the Event Joints
WALL-WM shifts video-action learning to event-grounded VLA pretraining with event captions, cluster-balanced sampling, and two inference modes; the post says it reaches state-of-the-art performance in large-scale real-world generalization evaluation, but does not disclose scores or benchmark names.
#Robotics#Vision#Multimodal#WALL-WM
why featured
HKR-K passes on concrete mechanisms, but the post does not disclose real-generalization scores and stays within robotics/VLA research. HKR-H and HKR-R miss, so this lands as useful but narrow signal.
editor take
WALL-WM uses event-level VLA pretraining, but scores and benchmarks are undisclosed; I don’t buy the SOTA claim without open evals.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
09:08
7d ago
HuggingFace Papers (takara mirror)· rssEN09:08 · 06·01
Beyond Low-Rank: Low-Rank Sparse Prompting via Spiking Neural Network and Prompt Factorization
The paper proposes LoRSP, which combines low-rank prompt factorization with an SNN integrate-and-fire mechanism to generate instance-specific sparse visual prompts. Experiments cover five heterogeneous vision backbones and multiple benchmarks, while the snippet does not disclose exact accuracy, parameter counts, datasets, or energy metrics.
#Vision#Fine-tuning#Inference-opt#Research release
why featured
HKR-K passes via a concrete mechanism and 5-backbone evaluation. HKR-H/R are weak: the angle is narrow and the body does not disclose gain numbers, code, or deployment context, so this stays in low-value research territory.
editor take
LoRSP tests 5 vision backbones, but accuracy and energy numbers are undisclosed; I want the parameter table before buying SNN prompting.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
07:42
8d ago
HuggingFace Papers (takara mirror)· rssEN07:42 · 06·01
Dynamic Trust-Aware Sparse Communication Topology for LLM-Based Multi-Agent Consensus
DySCo selects a small set of communication edges in each reasoning round using agent reliability, answer divergence, and task relevance under budget constraints. The paper evaluates the mechanism on mathematical reasoning, logical reasoning, and factual question answering, but the RSS snippet does not disclose concrete token-cost, latency, or accuracy numbers.
#Agent#Reasoning#DySCo#Research release
why featured
HKR-K/R pass: DySCo adds trust-, disagreement-, and relevance-based sparse communication for LLM agents. No cost-reduction ratio or standout benchmark result is disclosed, so it stays in the 60–71 research-signal band.
editor take
DySCo picks edges by reliability, divergence, and relevance; no cost numbers disclosed, so sparse communication has not won yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
07:34
8d ago
HuggingFace Papers (takara mirror)· rssEN07:34 · 06·01
TalkTag: Fine-Grained Morphosyntactic Error Annotation for Transcribed Speech
TalkTag uses a fine-tuned LLM to automate CHAT-style morphosyntactic error annotation in spoken-language transcripts, developed with children’s narrative data under extreme data scarcity; the post says evaluation found precise annotations and ambiguity detection, but does not disclose dataset size, metrics, or model details.
#Fine-tuning#TalkTag#Research release
why featured
HKR-K passes on the concrete mechanism, but data size, accuracy, and reproducible setup are not disclosed. The computational-linguistics annotation niche has limited AI-practitioner resonance, so it sits in the low-value research band.
editor take
TalkTag targets CHAT speech errors, but gives no scale or metrics; clinical low-resource annotation needs error-cost reporting first.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:51
8d ago
HuggingFace Papers (takara mirror)· rssEN04:51 · 06·01
HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
The paper introduces HAIM, a dataset for tracking AI intervention across music production stages, with labels for hybrid production and agent-level tracking; the post does not disclose dataset size or detector scores.
#Audio#Benchmarking#Agent#HAIM
why featured
HKR-H/K/R pass through the provenance hook, multi-stage labels, and creator-rights nerve. Importance stays in 60–71: the post gives no sample size, results, release status, or adoption signal.
editor take
HAIM discloses staged labels, not dataset size or detector scores; AI music detection needs to drop binary purity tests.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:27
8d ago
HuggingFace Papers (takara mirror)· rssEN04:27 · 06·01
Time-Aware Diffusion Based on Preference Disentanglement for Generative Recommendation
TDPM disentangles user preference into long-span period preference and recent event-triggered point preference, then injects time-aware diffusion into SID tokens; on three public real-world datasets, it improves over state-of-the-art baselines by up to 29.21% in HR@20 and 25.45% in NDCG@20.
#Embedding#Benchmarking#TDPM#Research release
why featured
HKR-K passes: TDPM splits long-term period preference from recent point preference and reports three-dataset gains. HKR-H/R fail because this is a narrow recommender paper with no product release, code, or broader practitioner conflict.
editor take
TDPM claims +29.21% HR@20 on 3 datasets; I’d audit splits and negative sampling first, recommender gains inflate fast.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
● P1arXiv · cs.LG· atomEN04:00 · 06·01
No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval
The paper proposes Single-stage Sparse Retrieval, using a Sparse Autoencoder to project token embeddings into high-dimensional sparse representations; on BEIR, SSR reports 15x faster indexing than ColBERTv2, half the retrieval latency, and higher retrieval performance than leading baselines.
#RAG#Embedding#Inference-opt#ColBERT
why featured
HKR-H has a clear anti-K-means hook; HKR-K has the SAE mechanism plus BEIR numbers; HKR-R hits RAG infra cost. It stays below 78 since this is one arXiv paper with no code, author context, or production use disclosed.
editor take
Three sources trace to one arXiv paper; SSR dodges K-means with SAE, and 15x indexing is tempting, but BEIR is not production proof.
sharp
Three sources use the same title and point back to arXiv 2605.30120; this is a single paper chain, not independent confirmation. SSR makes a clean bet: the pain in multi-vector retrieval is less MaxSim itself, more the K-means tax ColBERTv2 pays to survive storage and indexing. The hook is concrete: SAE projects token embeddings into high-dimensional sparse codes, skips clustering, uses inverted indexes, claims 15x lower indexing time than ColBERTv2, half the retrieval latency, and better BEIR results. I buy the problem framing before I buy the “paradigm” language. CRISP tried to make vectors more clusterable during training; SSR walks around clustering entirely. The deciding cost is billion-scale corpus updates and inverted-list blowup, and the abstract does not show that bill.
HKR breakdown
hook knowledge resonance
open source
89
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
KernelCraft: Benchmarking Agentic Close-to-Metal Kernel Generation on Emerging Hardware
KernelCraft evaluates LLM agents generating low-level kernels for three emerging accelerators, across more than 20 machine-learning tasks and five configurations per task. The strongest reasoning models produced correct kernels for unseen ISAs within a few refinement steps, and their optimized kernels matched or beat compiler baselines.
#Agent#Code#Benchmarking#KernelCraft
why featured
HKR-H/K/R pass: unseen-ISA kernel generation, 3 accelerators, 20+ tasks × 5 configs, and compiler baselines give substance. The close-to-metal hardware niche lowers accessibility, so it stays below featured.
editor take
KernelCraft tests 3 accelerators and 20+ tasks; unseen-ISA kernels matching compilers is wild, but model names and failure rates aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Memory-Efficient Structured Backpropagation for On-Device LLM Fine-Tuning
The paper proposes MeSP, which recomputes LoRA’s intermediate projection h=xA during backward passes; on Qwen2.5 0.5B–3B models, it cuts average memory by 49% versus MeBP while producing mathematically identical gradients.
#Fine-tuning#Inference-opt#Qwen#Research release
why featured
HKR-K/R pass: the paper gives a 49% memory cut, gradient equivalence, and Qwen2.5 0.5B–3B test setting. HKR-H is weak, and this remains a single method paper, below featured.
editor take
MeSP cuts memory 49% on Qwen2.5 0.5B–3B; LoRA on-device tuning should squeeze backward caches first.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers
The paper introduces VMoER, a Bayesian approach that confines inference to MoE expert routing, adding under 1% FLOPs while reducing calibration error by 94%, improving routing stability under noise by 38%, and increasing out-of-distribution AUROC by 12% across fine-tuned foundation models.
#Reasoning#Inference-opt#Safety#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv methods paper with no named-lab impact or replication scope. The <1% FLOPs and 94% calibration-error drop place it above routine papers, below featured.
editor take
VMoER confines Bayes to MoE routing at under 1% FLOPs; the 94% calibration drop needs open reproduction.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
What Is Missing? Explaining Neurons Activated by Absent Concepts
The paper proposes two extensions to attribution and feature visualization methods to detect neuron activations caused by absent concepts, then tests them on ImageNet models; the abstract says mainstream XAI methods miss these encoded absences in their standard form and reports improved debiasing when absences are considered, but the snippet does not disclose model counts or metric values.
#Vision#Interpretability#Alignment#arXiv
why featured
HKR-H/K/R pass, but the post only gives method direction and ImageNet setting; no effect size, code, or major-lab signal is disclosed. This stays just below featured.
editor take
The paper adds two XAI extensions, but omits model counts and metrics; absence-activated neurons expose a real blind spot.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training
The paper analyzes two spurious-feature channels in DPO-style preference learning for log-linear policies: mean spurious bias and causal-spurious correlation leakage, then proposes tie training with equal-utility preference pairs as data-driven regularization.
#Alignment#Safety#Fine-tuning#Research release
why featured
HKR-K and HKR-R pass: it offers concrete DPO spurious-correlation mechanisms and tie training. As a single arXiv paper with no disclosed results or broad uptake, it stays in the lower 60–71 band.
editor take
DPO gets two spurious-feature channels under log-linear policies; tie pairs look clean, but equal-utility labeling cost is undisclosed.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Skill Reuse as Compression in Agentic RL
The paper introduces ReuseRL, an MDL-based agentic RL method that extracts a shared skill dictionary from successful trajectories and adds a segmentation cost to penalize poorly compressible behaviors. On ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL improves in-distribution and out-of-distribution success over vanilla GRPO and round-length baselines.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the MDL skill-dictionary framing and three agent benchmarks add signal. Kept in all because the summary lacks gain sizes, author context, code, or real-task validation.
editor take
ReuseRL beats GRPO on 3 benchmarks; I buy the MDL angle, but the snippet hides effect sizes.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference
IntAttention replaces floating-point softmax with IndexSoftmax in an integer-only attention path. Armv8 CPU experiments report up to 3.7x speedup and 61% lower energy than FP16 baselines, plus up to 2.0x speedup over conventional INT8 attention pipelines.
#Inference-opt#IntAttention#Research release#Open source
why featured
HKR-H/K/R pass, but this is a narrow inference-optimization paper rather than a broad model or product release. The Armv8 speed and energy numbers lift it to the high end of 60–71.
editor take
IntAttention reports 3.7x speedup and 61% less energy on Armv8; the 65% softmax detour is the edge bottleneck to kill.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
LVSA: Training-Free Sparse Attention for Long Video Diffusion
LVSA replaces dense self-attention with training-free block-sparse attention for video diffusion transformers. It cuts compute by up to 3.17x on Wan 2.1 1.3B at a 6x horizon, and enables single-GPU HunyuanVideo 1.5 generation at a 2x horizon where dense attention runs out of memory.
#Vision#Inference-opt#Benchmarking#Wan
why featured
HKR-H/K/R are present: training-free sparse attention, 3.17x compute reduction, and single-card long-video inference hit real GPU-cost nerves. This remains an arXiv method paper without disclosed code, adoption cost, or production validation, so it stays in 60–71.
editor take
LVSA cuts Wan 2.1 1.3B compute 3.17x at 6x horizon; training-free is strong, but VQeval needs outside replication.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
OrcaRouter: A Production-Oriented LLM Router with Hybrid Offline-Online Learning
OrcaRouter routes LLM requests with a LinUCB contextual bandit over lexical and sentence-embedding features, and its May 20, 2026 RouterArena submission ranked second with a 72.08 arena score, 75.54% accuracy, and a cost of USD 1.00 per 1,000 queries.
#Agent#Embedding#Inference-opt#OrcaRouter
why featured
HKR-K and HKR-R pass: the paper gives a concrete routing mechanism, rank, accuracy, and cost. HKR-H is weak, and no open-source artifact, deployment case, or cross-source cluster is disclosed, so it stays high-all.
editor take
OrcaRouter scored 72.08 for second on RouterArena; LinUCB routing keeps making giant-model-only inference stacks look wasteful.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
Token Sparse Attention compresses per-head Q/K/V into a smaller token set during attention, then decompresses outputs to the original sequence, reaching up to 3.23x attention speedup at 128K context with less than 1% accuracy degradation.
#Inference-opt#Research release
why featured
HKR-K and HKR-R pass via a concrete 128K speed result and cost/latency relevance. HKR-H is weak, and a single arXiv inference paper without adoption evidence stays in the 60–71 band.
editor take
Token Sparse Attention hits 3.23x at 128K with under 1% loss; reversible token selection beats one-shot eviction.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Expand Neurons, Not Parameters
The paper shows that increasing neuron count while keeping total non-zero parameters fixed improves accuracy on symbolic Boolean tasks, classifiers over CLIP embeddings, CNNs, and deeper MLPs, with gains tied to lower feature interference and reduced polysemanticity from splitting neurons into sparser sub-neurons.
#Interpretability#Inference-opt#Benchmarking#arXiv
why featured
Single arXiv architecture paper with HKR-H/K/R, but no concrete gain sizes, model scale, or replication detail in the feed. Useful for efficiency-minded practitioners; not same-day must-write.
editor take
More neurons at fixed nonzero parameters improve accuracy; random splits nearly work, which makes superposition look like an engineering constraint.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs
SimulCost introduces 4,878 physics-simulation tuning tasks across 13 simulators; frontier LLMs reach 72-81% success in multi-round mode, but run 1.5-2.5x slower than traditional scanning.
#Agent#Reasoning#Benchmarking#Rose-STL-Lab
why featured
HKR-H/K/R all pass: SimulCost has a clear speed-vs-success hook, concrete benchmark scale, and an agent cost lesson. It stays below featured because the physics-simulation scope is narrow and lacks major-lab or cross-source weight.
editor take
SimulCost has 4,878 tasks; 72-81% multi-round success still costs 1.5-2.5x slower than scanning.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
The paper introduces NAMEx, a Nash Bargaining framework for weighting and merging experts in Sparse MoE models. It reports experiments on language modeling, text and image classification, corruption robustness, and large-scale tests on Qwen1.5-MoE 14B and DeepSeek-MoE 16B in zero-shot and fine-tuning settings.
#Inference-opt#Benchmarking#Qwen#DeepSeek
why featured
HKR-H and HKR-K pass: the Nash Bargaining mechanism is specific, with tests on two MoE bases under zero-shot and fine-tuning settings. HKR-R is weaker because latency, memory, and deployment gains are not disclosed.
editor take
NAMEx merges experts on Qwen1.5-MoE 14B and DeepSeek-MoE 16B; without effect sizes, the Nash framing stays unproven.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Cost-Aware Learning
The paper proposes Cost-Aware SGD and Cost-Aware GRPO, sampling finite-sum components by gradient norms and costs, and reports that experiments on 1.5B, 4B, and 8B LLMs reduce policy-optimization tokens while matching or exceeding baseline accuracy.
#Fine-tuning#Inference-opt#Research release
why featured
HKR-K/R pass: the methods and 1.5B/4B/8B experiments add real signal, and token savings map to team costs. No reduction percentage or artifact is disclosed, so this stays high-all rather than featured.
editor take
Cost-Aware GRPO cuts policy-optimization tokens on 1.5B/4B/8B; no ratio disclosed, but cost-weighted sampling beats batch fiddling.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
The SuperActivator Mechanism: Transformers Concentrate Reliable Concept Signals in the Tail
The paper presents the SuperActivator mechanism: concept-aligned attention heads amplify activation gaps, and detection typically peaks using 5–10% of in-concept token activations, with F1 improving by up to 0.14 over standard aggregators and prompting baselines.
#Interpretability#Multimodal#Benchmarking#Research release
why featured
HKR-H/K pass: the tail-signal mechanism is a real hook and the abstract gives 5–10% token and +0.14 F1 claims. Single arXiv paper with limited application context keeps it in the 60–71 band.
editor take
SuperActivator peaks at 5–10% concept tokens and adds up to 0.14 F1; I buy the tail-signal claim, pending replication.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
XLGoBench: Detecting Cross-Lingual Skill Gaps with Algorithmic Tasks
XLGoBench detects cross-lingual skill gaps in large language models with synthetic algorithmic tasks, where each task can vary in complexity and has an objective correctness criterion; the abstract says extensive experiments expose persistent gaps across multiple state-of-the-art models.
#Benchmarking#Reasoning#XLGoBench#Research release
why featured
HKR-K and HKR-R pass: the paper adds an objective cross-lingual algorithmic benchmark with generated complexity. HKR-H is weak, and the summary gives no gap numbers or model ranking, so it stays in the 60–71 band.
editor take
XLGoBench uses synthetic algorithmic tasks for cross-lingual gaps; model names aren’t disclosed, so trust the auditable templates, not “SOTA.”
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
The Illusion of Generalization in Tabular Language Models
The paper re-evaluates Tabula-8B on 165 UniPredict datasets and reports near-zero median lift over majority-class baselines for binary and categorical classification, with aggregate gains driven by quartile tasks, pervasive train-test overlap, task-level leakage, and instruction tuning without tabular exposure recovering 92.2% of standard classification performance.
#Benchmarking#Reasoning#Fine-tuning#Tabula-8B
why featured
HKR-H/K/R pass: the paper offers a concrete benchmark critique of Tabula-8B on 165 UniPredict datasets. Scope is niche, so it stays in the 60–71 band rather than featured.
editor take
Tabula-8B shows near-zero median lift on 165 UniPredict datasets; I don’t buy TLM generalization when non-tabular tuning recovers 92.2%.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Forgetting Has Neighbors: Localized Collateral Forgetting in Machine Unlearning
The paper compares unlearned models with models retrained after deletion and finds pointwise discrepancies grow near the forget set for gradient-ascent and random-labeling methods, with or without retain-set fine-tuning; it proposes Local Teacher Distillation using soft labels from a small teacher trained on retained neighbors.
#Safety#Fine-tuning#Research release#Safety/alignment
why featured
HKR-H/K/R are present, but this is a single arXiv machine-unlearning paper; the article discloses no code, affiliations, or cross-source pickup. The localized forgetting mechanism keeps it in all, below featured.
editor take
This pins unlearning failure to local neighborhoods; CIFAR-100 numbers aren’t disclosed, but aggregate-only unlearning evals deserve demotion.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Who Endorsed It? Measuring Authority Bias Across Expertise Levels in Language Models
The paper evaluates 11 models on 4 math, legal, and medical reasoning datasets. Higher-authority misleading endorsements reduce accuracy and increase confidence in wrong answers.
#Reasoning#Interpretability#Benchmarking#Research release
why featured
HKR-H/K/R pass, but the post gives only 4 datasets, 11 models, and directional results; model names, effect sizes, and reproducibility details are not disclosed, keeping it below featured.
editor take
11 models across 4 reasoning sets follow high-authority wrong endorsements; expert labels are now an attack surface.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
MLIPilot: LLM-Driven Auto-Research for Machine-Learned Interatomic Potentials
MLIPilot uses tool-calling LLM agents to propose hypotheses, edit MLIP training code, launch HPC jobs, and accept or revert changes with a fixed physics-constrained scorecard across MACE optimization benchmarks.
#Agent#Code#Tools#OpenAI
why featured
HKR-H/K/R all pass: the agent loop is concrete and relevant to research automation. Kept in 60–71 because MLIP/MACE/HPC is niche, and the post gives no result numbers, open artifact, or reproducibility detail.
editor take
MLIPilot tests four LLM families on MACE optimization; I buy the physics scorecard, not the “auto-research” framing.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Aligning Dense Retrievers with LLM Utility via Distillation
The paper proposes Utility-Aligned Embeddings, which trains a bi-encoder with perplexity-reduction distillation, improving Recall@1 by 30.59%, MAP by 30.16%, and Token F1 by 17.3% over BGE-Base on QASPER.
#RAG#Embedding#Fine-tuning#QASPER
why featured
HKR-H/K/R all pass, but this is a single arXiv retrieval paper with evidence limited to QASPER vs BGE-Base, not a must-write product or framework release; lower-band score is 70.
editor take
UAE lifts BGE-Base Recall@1 by 30.59% on QASPER; distilling perplexity gain into a bi-encoder cuts reranking cost 180x.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems
The paper proposes an accountability attribution framework for multi-stage AI development, using counterfactual estimators to quantify how pretraining, fine-tuning, and alignment stages affect model behavior without retraining the model.
#Alignment#Interpretability#Safety#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv paper with no disclosed metrics, author signal, or visible debate; useful research signal, below the featured threshold.
editor take
This paper attributes behavior across pretraining, fine-tuning, and alignment without retraining; I want proof it survives billion-scale models.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Automating Formal Verification with Reinforcement Learning and Recursive Inference
The thesis uses RLVR and verifier-guided search to improve Dafny and Lean generation. Dafny verified reward rose from 2.2% to 58.1%, filtered multi-turn RLVR raised pass rate from 9.7% to 31.1%, and a Lean scaffold improved VeriCoding pass rate from 46.2% to 69.2%.
#Code#Reasoning#Tools#arXiv
why featured
HKR-K is strong with concrete Dafny gains, and HKR-R fits code-agent reliability. Kept below featured because Lean/Dafny formal verification is specialist, with no code, authors, or reproducible setup disclosed.
editor take
RLVR lifted Dafny verified reward to 58.1%, but spec hacking broke the story; formal-verification rewards need adversarial specs, not pass-rate worship.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Scaling Multi-Hop Training Data via Graph-Constrained Path Selection
The paper uses graph-constrained path selection to generate multi-hop training data from plain unannotated text, then fine-tunes Qwen3-32B on 80K CUAD legal-contract examples and raises closed-book Token F1 from 21.66% to 38.58%, with the full-scale gain attributed to a 4.4× expansion of usable corpus rather than higher per-chain quality.
#Reasoning#Fine-tuning#Embedding#Qwen
why featured
HKR-H and HKR-K pass: the method and CUAD numbers are concrete for synthetic training data work. HKR-R is weaker, and a single arXiv paper without code or cross-source traction stays in the 60–71 band.
editor take
Qwen3-32B gets 80K CUAD samples and Token F1 jumps 21.66 to 38.58; the gain is corpus yield, not better chains.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
WAV decomposes action-conditioned state prediction into state plausibility and action reachability checks. Across nine MiniGrid, RoboMimic, and ManiSkill tasks, it reports 2x higher sample efficiency and over 22% better downstream policy performance.
#Robotics#Reasoning#Benchmarking#Research release
why featured
HKR-H/K pass: the title has a self-improving world-model hook, and the article gives WAV’s mechanism plus nine-task results. HKR-R is narrow, and this remains a single arXiv paper below the featured threshold.
editor take
WAV reports 2x sample efficiency across 9 tasks. Video-derived subgoals plus inverse checks beat brute forward prediction.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning
DARTS uses distribution-aware trajectory sampling and adaptive redundancy allocation to shorten long-tail rollout distributions in LLM reinforcement learning, reporting up to 1.77x acceleration over state-of-the-art systems without compromising model performance.
#Reasoning#Inference-opt#DARTS#arXiv
why featured
HKR-K/R pass: 1.77x speedup and rollout-tail shaping are concrete and cost-relevant. HKR-H is weak, and the arXiv systems angle is specialized, so this stays in all.
editor take
DARTS reports up to 1.77x faster RL rollouts; I care whether it cuts verbosity or silently narrows exploration.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
AMNESIA: A Large-Scale Medical Unlearning Benchmark Suite with Disease-Informed Analysis
AMNESIA introduces an open-source medical unlearning benchmark with 70,560 question-answer pairs from 8,820 patient notes across 11 disease categories, evaluating four unlearning methods at random-patient and disease levels.
#Fine-tuning#Safety#Benchmarking#AMNESIA
why featured
HKR-K and HKR-R pass: the dataset scale and evaluation setup are concrete, and medical unlearning ties to privacy compliance. As a single arXiv benchmark without visible adoption or debate, it stays in the interesting-but-not-featured band.
editor take
AMNESIA ships 70,560 medical unlearning QAs; patient-level forgetting damages same-disease knowledge, a concrete failure mode benchmarks often dodge.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Shared Doubt: Zero-shot Cross-Lingual Confidence Estimation for Language Models
The paper trains a lightweight linear probe on one language to predict answer correctness from intermediate representations, then transfers it zero-shot to unseen languages, with ablations showing confidence features concentrate in middle layers.
#Reasoning#Interpretability#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper offers a testable cross-lingual confidence-estimation mechanism and touches multilingual reliability. No models, datasets, or numbers are disclosed, so it stays in the 60–71 band.
editor take
A monolingual linear probe transfers zero-shot across languages; models and datasets aren’t disclosed in the snippet, so I’d audit the middle-layer confidence-subspace claim first.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI
arXiv 2605.31021 proposes an evaluation framework using synthetic cognitive profiles, replacing a single assessment function with a state-space constrained manifold, and reports that sequential inference and stochastic prompt perturbations degrade persona coherence through state-space drift and semantic inconsistency.
#Alignment#Benchmarking#Safety#Research release
why featured
HKR-K/R pass: the paper offers a new eval mechanism and testable drift conditions tied to alignment. Single arXiv item lacks models, sample size, and metrics, so it stays in the 60–71 band.
editor take
arXiv 2605.31021 discloses only the abstract, no models or sample size; persona eval lives or dies on drift reproducibility.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
CacheProbe: Auditing Prompt Cache Isolation in Gateway APIs
CacheProbe audits prompt-cache isolation in OpenRouter’s API gateway, testing whether shared organizational credentials create global cache sharing across all OpenRouter users; the RSS snippet describes the threat model and cites Gu et al. at ICML 2025, but does not disclose empirical results.
#Inference-opt#Safety#OpenRouter#Gu et al.
why featured
HKR-H and HKR-R pass because prompt-cache isolation is a real AI API risk. HKR-K fails: no CacheProbe results, sample size, or vulnerability conclusion are disclosed, so this stays in the 60–71 band.
editor take
CacheProbe tests OpenRouter prompt-cache isolation, but results are undisclosed; I’d inspect the gateway credential model before buying the vuln headline.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K0·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation
GLIDE unifies PPI++, Stratified PPI, Predict-Then-Debias, Active Statistical Inference, and four sampler types in a scipy-style Python API for mean estimation. The paper says an agentic evaluation case study reduces human annotation at equivalent precision, but the RSS snippet does not disclose the exact savings rate.
#Agent#Benchmarking#Tools#GLIDE
why featured
HKR-K/R pass: GLIDE packages several PPI methods into a scipy-style API for agent evaluation costs. But it is a single arXiv source, technically narrow, and lacks a labeling-savings number, so it stays in 60–71.
editor take
GLIDE unifies 4 PPI estimator families and 4 samplers; savings rate is undisclosed, so treat it as eval plumbing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs
The paper introduces a learning-to-refine framework that uses compiler outputs to compress diverse proof attempts into structured failure modes. Under comparable test-time budgets, the method reports state-of-the-art PutnamBench results among publicly reported roughly 8B and 32B parameter models, while avoiding long histories of proof attempts.
#Reasoning#Code#Tools#PutnamBench
why featured
HKR-H and HKR-K pass: the mechanism and benchmark condition are concrete. HKR-R is weak because formal theorem proving is niche, with no absolute lift or usable artifact disclosed.
editor take
Compile to Compress turns compiler errors into failure modes; 8B/32B PutnamBench SOTA is reported, but rollout budgets lack detail.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Retriever Portfolios: A Principled Approach to Adaptive RAG
The paper introduces Retriever Portfolios for adaptive RAG, using an expected best-of-k objective to select a small diverse retriever subset, and reports better retrieval metrics and answer quality than single-retriever and naive multi-retriever baselines across multiple QA benchmarks.
#RAG#Inference-opt#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete adaptive RAG mechanism and benchmark claim. HKR-H is weak, and this is still an arXiv-level retrieval optimization result, so it stays in all.
editor take
Retriever Portfolios uses expected best-of-k to pick few retrievers; RAG tuning hurts most at latency and token cost.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching
SemStruct models tables as heterogeneous graphs with column and value nodes, trains only a lightweight structural encoder, keeps the PLM frozen, and outperforms fully fine-tuned baselines on the Valentine and SOTAB-SM schema-matching benchmarks.
#Embedding#Benchmarking#SemStruct#Valentine
why featured
HKR-H and HKR-K pass: a frozen PLM plus a lightweight structural encoder beating full fine-tuning is a concrete mechanism and claim. The schema-matching niche limits HKR-R, so it stays in the 60–71 all band.
editor take
SemStruct freezes the PLM and trains a structural encoder; beating Valentine and SOTAB-SM baselines is a clean jab at text-only table matching.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models
The paper proposes DMoA, a multi-agent framework that sparsely activates agents at each reasoning step, uses predictive entropy as a self-supervised routing signal, and reports state-of-the-art results across 9 benchmarks.
#Agent#Reasoning#Inference-opt#Research release
why featured
HKR-H/K/R pass, but only arXiv-level facts are available: SOTA on 9 benchmarks and a routing mechanism, with no code, model scale, cost curve, or real-task replication disclosed.
editor take
DMoA reports SOTA on 9 benchmarks, with no cost disclosed; adaptive routing is neat, but agent swarms still need a bill.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
FOCUS: Forcing In-Context Object Localization through Visual Support Constraints and Policy Optimization
FOCUS uses a two-stage training framework and GRPO to optimize in-context object localization without category supervision; its 7B-parameter model outperforms models up to 72B parameters in experiments, while the snippet does not disclose dataset names.
#Vision#Multimodal#Benchmarking#FOCUS
why featured
HKR-H/K/R are present, but this is a single arXiv vision-localization paper with no dataset name, code, or outside validation disclosed. It stays in the 60–71 band.
editor take
FOCUS 7B beats up to 72B; datasets aren’t disclosed, so hold applause—the anti-category-supervision direction is right.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Mechanistic Interpretability as Statistical Estimation: A Variance Analysis
The paper frames circuit discovery as statistical estimation built on causal mediation analysis and reports that exact single-input CMA scores have high intrinsic variance, while small input-data or hyperparameter perturbations yield different circuits.
#Interpretability#Research release
why featured
HKR-H/K/R all pass: the paper makes a concrete reliability claim about circuit discovery. It stays in the 60–71 band because it is a technical arXiv interpretability paper with no disclosed code, scale, or debate signal.
editor take
The paper recasts circuit discovery as CMA estimation; high single-input variance undercuts MI’s tidy deterministic circuit diagrams.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Don't Fool Me Twice: Adapting to Adversity in the Wild with Experience-Driven Reasoning
The paper proposes Don't Fool Me Twice for mobile robots facing embodiment-specific disturbances in unstructured environments. The agent records disturbance effects, queries a VLM with visual context for causes, models local anomalies with kernel regression, and validates four hypotheses in simulation and hardware across embodiments and adversity modes.
#Robotics#Reasoning#Vision#Research release
why featured
HKR-H and HKR-K pass: the paper offers experience-driven reasoning with VLM attribution and kernel-regression anomaly modeling, tested in simulation and hardware. Its academic robotics focus lacks broad practitioner resonance, so it stays in all.
editor take
Don't Fool Me Twice validates 4 hypotheses in sim and hardware; I buy online attribution, but baselines and failure rates are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
How Can Embedding Models Bind Concepts?
The paper analyzes why CLIP fails at concept binding: scene embeddings decompose additively into object representations, but CLIP’s binding function remains high-complexity; controlled Transformers trained from scratch learn multiplicative interactions and generalize when data coverage is sufficient.
#Embedding#Multimodal#Vision#CLIP
why featured
HKR-H and HKR-K pass: the paper gives a concrete mechanism for CLIP binding failures. It remains research-heavy with limited product or competitive impact, so it fits the 60–71 band.
editor take
CLIP decomposes scene embeddings additively, yet binding stays high-complexity; I buy this diagnosis over another retrieval leaderboard.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Fixed-Point Masked Generative Modeling
CoFRe replaces part of the denoiser with a fixed-point solver and cuts OpenWebText parameters by 38.8%, training time by 11.5%, and VRAM by 16.9%, while improving generative perplexity from 830.8 to 101.8 under 96 transformer-block forward passes versus MDLM.
#Inference-opt#Multimodal#Fine-tuning#arXiv
why featured
HKR-H/K/R pass via a novel mechanism, concrete efficiency numbers, and cost resonance. Still, this is a specialist arXiv architecture paper with evidence limited to OpenWebText/CoFRe, so it stays in the 60–71 band.
editor take
CoFRe cuts OpenWebText params 38.8% and hits 101.8 PPL; masked LMs finally get a credible compute story.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
On the “Induction Bias” in Sequence Models
The paper compares transformers and RNNs on state-tracking data efficiency, finding that transformers require training data that grows faster with state-space size and sequence length, while cross-length weight sharing is negligible or harmful even when train and test distributions match.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R pass, but the post gives conclusions without experiment scale, datasets, or reproduction details. It is core ML research signal, fit for all but below the featured threshold.
editor take
Transformers lose to RNNs even in-distribution on state tracking; no multiplier disclosed, but failed length weight sharing cuts deep.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
The Information Geometry of Softmax: Probing and Steering
arXiv:2602.15293v2 introduces dual steering, a linear-probe method for steering representations toward a target concept, and proves it minimizes changes to off-target concepts while empirically improving controllability and stability.
#Interpretability#Alignment#Research release
why featured
HKR-K/R pass: dual steering is a testable steering mechanism tied to model control. HKR-H misses, and the single arXiv post gives no experiment scale, model list, or product path, so it stays in all.
editor take
arXiv:2602.15293v2 proves dual steering; I’d check replication before treating linear probes as control knobs again.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Quantifying the Uncertainty of Foundation Models with Singular Value Ensembles
The paper proposes Singular Value Ensemble, freezing singular vectors and training only per-member singular values, keeping the base model’s parameter increase below 1% while improving calibration on NLP and vision tasks without reducing predictive accuracy.
#Benchmarking#Vision#Research release
why featured
HKR-K and HKR-R pass: an under-1% parameter-overhead ensemble method is concrete and relevant to reliability. As a single arXiv paper with a technical title, it stays below featured.
editor take
SVE adds <1% parameters for calibration; I like the engineering, if singular vectors really hold as “knowledge directions.”
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge
REAL uses a generalized policy gradient to optimize regression rewards for LLM-as-a-Judge, and on Qwen3-32B it improves over the SFT baseline by +8.40 Pearson and +7.20 Spearman.
#Reasoning#Fine-tuning#Benchmarking#Qwen
why featured
HKR-K and HKR-R pass: the post gives a training mechanism and Qwen3-32B metric gains, and it hits eval trust. It remains a narrow arXiv method paper without tooling or production proof, so it sits in 60–71.
editor take
REAL beats SFT on Qwen3-32B by +8.40 Pearson; binary RL rewards are a bad fit for 5-point judge scoring.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings
The paper analyzes binary quantization for contrastive embeddings with a Gaussian model. Experiments cover 18 datasets and 9 embedding families; off-diagonal covariance contributes 30–50% of the signal, while coordinate heterogeneity governs the value of extra bits and whether random rotation helps or hurts.
#Embedding#Inference-opt#Benchmarking#arXiv
why featured
HKR-K is solid with 18 datasets, 9 embedding types, and a 30–50% signal claim. HKR-R fits embedding cost/quality tradeoffs, but HKR-H fails and the mechanism is specialized, so it stays all.
editor take
The paper tests 18 datasets and 9 embedding families; 30–50% of signal sits off-diagonal, so stop blindly rotating BQ embeddings.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
How Does Bayesian Sampling Help Membership Inference Attacks?
The paper proposes Bayesian Membership Inference Attack, which uses Laplace approximation on a single reference model to estimate a posterior over parameters; experiments span image, text, and tabular datasets, and the authors report state-of-the-art effectiveness and efficiency.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-K/R pass: the paper adds a single-reference-model MIA with Laplace posterior sampling across image, text, and tabular tests. HKR-H fails because the angle is a specialist methods paper, so it stays in 60–71.
editor take
BMIA uses one reference model plus Laplace posterior sampling; multi-reference MIA just got cheaper, so average privacy risk reports look weaker.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Re-examining Low-Rank Adaptation for Private LLM Fine-Tuning
The paper proposes restoring the fast singular-value decay of gradients during DP-SGD private fine-tuning, and evaluates it on GLUE, E2E, and DART with RoBERTa, Qwen, and Llama models up to 4B parameters while keeping the same privacy guarantees.
#Fine-tuning#Safety#Inference-opt#RoBERTa
why featured
HKR-K is clear via DP-SGD private tuning, singular-value decay, and tests up to 4B parameters. HKR-R comes from privacy and sample efficiency, but HKR-H is weak, so this stays in the 60–71 band.
editor take
The paper restores gradient singular-value decay in DP-SGD; I buy it, since DP-LoRA controls rank but ignores spectral damage from noise.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents
The paper introduces CoSee to audit read-write-verify loops in document VQA with 4B–8B weak learners, and finds that without explicit verification, shared workspaces can amplify hallucinations and make extra compute correlate negatively with accuracy.
#Agent#Vision#Benchmarking#CoSee
why featured
HKR-H/K/R pass, but this is a single arXiv paper with only setup and headline finding disclosed; benchmark size, effect numbers, and artifacts are missing, so it stays in the 60–71 band.
editor take
CoSee tests 4B–8B document VQA: without explicit verification, shared workspaces amplify hallucinations; small-agent teams shouldn’t add rounds first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Detect in Any Scene: An Agentic Framework for Object Detection with Experience-Aware Reasoning
DetAS-X models object detection as a dynamic decision process, uses an MLLM to select restoration modules and specialized detectors, and reports a 28.36% average F1 gain across six benchmarks, with a 37.01% gain on DarkFace.
#Agent#Multimodal#Vision#Research release
why featured
HKR-H and HKR-K pass: DetAS-X has a clear agentic routing mechanism and six-benchmark gains. It remains a single arXiv vision paper with limited industry spread or HKR-R resonance, so it stays in all.
editor take
DetAS-X lifts F1 by 28.36% across six benchmarks; I’d scrutinize toolbox cost, since inference latency is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Auto-Discovery-Bench: Diagnosing Structured State Tracking in Oracle-Guided Discovery
Auto-Discovery-Bench tests agents on three controlled discovery abstractions: directed graph discovery, undirected relational discovery, and symbolic equation discovery; across models, performance declines as variable count, trajectory length, and distractors increase.
#Agent#Reasoning#Benchmarking#Auto-Discovery-Bench
why featured
HKR-K/R pass: it gives reproducible stress factors for agent state tracking. Single arXiv paper, with no model list, scores, or code disclosed in the summary, so it stays in the 60–71 band.
editor take
Auto-Discovery-Bench tests 3 discovery tasks; I buy the split: skip science-agent hype until long-range state tracking holds.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Is the Last Layer Sufficient for Uncertainty Quantification?
The paper compares full-network and last-layer linearized GLMs for epistemic uncertainty quantification, using random matrix theory and large-scale empirical evaluation; it finds no meaningful UQ gain from full linearization, while the last-layer approximation delivers comparable performance with lower computational cost.
#Safety#Benchmarking#Research release
why featured
HKR-H/K/R pass: the paper claims last-layer linearized GLMs can approximate full-network UQ with lower compute. It remains a single arXiv research item with high theory overhead and no product or open-source artifact, so it stays in 60-71.
editor take
arXiv 2605.30741 finds no UQ gain from full linearization; last-layer GLMs deserve baseline status until tasks are disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
The paper decomposes an LLM policy into internal layer and modular policies via the Transformer residual stream, reports progressive reasoning in Qwen versus abrupt convergence in Llama, and proposes BuPO to optimize internal layers during early RL stages on complex reasoning benchmarks.
#Reasoning#Fine-tuning#Interpretability#Qwen
why featured
HKR-H and HKR-K pass: the title has a counterintuitive hook, and the summary gives a residual-stream decomposition plus BuPO. No experiment numbers or code are disclosed, and HKR-R is weak, so this stays in the 60–71 research-signal band.
editor take
BuPO claims gains on complex reasoning, but scores aren’t disclosed; if layer-level RL holds, Qwen/Llama divergence is the sharp part.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video
RayDer uses one feed-forward transformer to combine camera estimation, scene reconstruction, and rendering for self-supervised novel view synthesis from real-world video; across multiple model sizes and orders of magnitude in data, the paper reports clean power-law scaling and zero-shot open-set results competitive with supervised methods.
#Vision#Multimodal#Benchmarking#RayDer
why featured
HKR-H/K pass: the mechanism is concrete and the scaling-law claim is testable. HKR-R is weak because RayDer is still a niche NVS research paper without product implications or major-lab pull, so it stays in 60–71.
editor take
RayDer folds 3 NVS modules into one transformer; if its power laws reproduce, video self-supervision gets a scalable shape.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Representation Collapse in Sequential Post-Training of Large Language Models
The paper defines a measurement suite for hidden states, logits, token trajectories, and LoRA updates. It analyzes five post-training settings: supervised fine-tuning, preference optimization, safety/refusal tuning, math and code specialization, and long chain-of-thought tuning under controlled stage orderings.
#Fine-tuning#Alignment#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the collapse angle is clickable, and the post gives a concrete measurement suite across 5 stages. It remains a single arXiv methods paper with no disclosed model list, experiment scale, or production impact.
editor take
The paper tests collapse across 5 post-training regimes; I buy the setup, but RSS omits models, scale, and effect sizes.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models
The paper evaluates test-time compute across seven VLMs and six benchmarks, testing feature scoring and majority voting. It proposes ETTC, an entropy-based selector that beats majority voting and the best single model in ensembles.
#Vision#Reasoning#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv benchmark paper with no disclosed code, source authority, or cross-source pickup. The 7-model/6-benchmark ETTC result is useful, not same-day featured.
editor take
Seven VLMs, six benchmarks: single-model voting barely helps; ETTC’s entropy selector beats brute-force sampling as the cleaner TTC bet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models
The paper proposes self-captioning multimodal interaction tuning, using a Multimodal Interaction Gate to convert unique interactions into redundant ones, reducing visually induced errors by 38.3% and improving consistency by 16.8% under ambiguous or corrupted modalities.
#Multimodal#Vision#Alignment#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete mechanism plus 38.3%/16.8% results, tied to VLM robustness. Single arXiv paper, jargon-heavy title, and no disclosed artifact or deployment keep it in 60–71.
editor take
This paper reports 38.3% fewer visually induced errors via redundancy amplification; I buy the angle, robustness beats purity-of-grounding dogma.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Speculative Decoding Across Languages
The paper compares three draft-model strategies for speculative decoding across 11 languages. Task-specific distillation improves translation efficiency but generalizes poorly to story generation; n-gram draft models have lower acceptance rates yet deliver large speed-ups because draft generation is much faster.
#Inference-opt#Fine-tuning#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper offers concrete experimental axes and inference-speed findings. HKR-H is weak, and speculative decoding remains specialized, so it stays in the lower all band.
editor take
The paper tests spec decoding on 11 languages; I’d bet on n-grams here: lower acceptance, faster drafts, less fine-tune debt.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging
The paper proposes OSRM to constrain LoRA subspaces before fine-tuning and evaluates model merging on 8 datasets, 3 widely used LMs, and 2 large LMs.
#Fine-tuning#OSRM#LoRA#Research release
why featured
HKR-K/R pass: OSRM gives a testable mechanism and concrete evaluation scope, tied to LoRA merge pain. Single arXiv paper and narrow title keep it below featured.
editor take
OSRM tests LoRA merging on 8 datasets and 5 LMs; pre-constraining subspaces is practical, but gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents
The paper proposes a goal-directedness evaluation framework for LLM agents. In a 2D grid-world case study, it compares behavior with optimal policies across grid sizes, obstacle densities, and goal structures, then uses probes to decode coarse spatial maps and multi-step action plans from internal representations.
#Agent#Interpretability#Reasoning#Research release
why featured
HKR-H and HKR-K pass: testing whether agents are genuinely goal-directed is a clean hook, with a 2D gridworld and probe findings. No major lab, tool release, or production validation, so it stays below featured.
editor take
The evidence is a 2D grid world with probes decoding coarse maps and plans; don’t sell it as general agent-goal measurement.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
MASPOB optimizes prompts for multi-agent systems using UCB bandits, GNN topology representations, and coordinate ascent. The paper says it reduces search complexity from exponential to linear and outperforms existing baselines across multiple benchmarks, but the RSS snippet does not disclose benchmark names or exact scores.
#Agent#Tools#Benchmarking#MASPOB
why featured
HKR-K and HKR-R pass: the mechanism and complexity claim are concrete, and agent prompt tuning is a real pain. Single arXiv paper with no code, named lab, or discussion keeps it in all.
editor take
MASPOB claims exponential-to-linear MAS prompt search, but names no benchmarks or scores; I’d file it as promising plumbing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Efficient Benchmarking Is Just Feature Selection and Multiple Regression
The arXiv paper reframes efficient LLM benchmarking as feature selection plus multiple regression, then uses kernel ridge regression for score prediction and mRMR for question subset selection; outside very data-poor settings, the method reports lower MAE and RMSE plus higher Spearman ρ and Kendall τ than existing efficient benchmarking approaches.
#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the title has a contrarian framing and the summary gives mRMR/KRR under a stated data condition. It is eval-method research with no concrete gains disclosed, so it stays in the 60–71 band.
editor take
KRR+mRMR beats prior efficient benchmarking methods; honestly, this reads like statistics catching up with LLM eval folklore.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence
The paper introduces BaLoRA, which projects LoRA iterates onto a balanced manifold to preserve the adapted matrix and improve conditioning; the abstract says it converges faster than standard LoRA across fine-tuning tasks, but the snippet does not disclose exact speed gains.
#Fine-tuning#Research release
why featured
HKR-K passes on the balanced-manifold projection mechanism, and HKR-R passes for fine-tuning cost pressure. The post gives no concrete speedup numbers, so this stays in the 60–71 band.
editor take
BaLoRA projects LoRA onto a balanced manifold; no speed numbers disclosed, so I’d file it as a plug-in training trick.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Differentially Private Preference Data Synthesis for Large Language Model Alignment
The paper introduces DPPrefSyn, an algorithm that uses a Bradley-Terry preference model, public prompts, and DP-PCA to synthesize differentially private preference data for LLM alignment; the code is available on GitHub.
#Alignment#Safety#Fine-tuning#DPPrefSyn
why featured
HKR-H/K/R all pass because DPPrefSyn gives a concrete DP preference-synthesis mechanism and code. Single arXiv source, with no experiment numbers, data scale, or production replacement claim, keeps it in the all band.
editor take
DPPrefSyn uses BT modeling and DP-PCA for preference synthesis; ε, baselines, and model scale are absent, so “strong DP” is not deployment evidence.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
dgMARK: Decoding-Guided Watermarking for Diffusion Language Models
The paper proposes dgMARK, a decoding-guided watermarking method for discrete diffusion language models that steers unmasking order with a binary-hash parity constraint and uses sliding-window detection for insertion, deletion, substitution, and paraphrasing edits.
#Safety#Inference-opt#Research release
why featured
Single arXiv paper with a concrete dLLM watermarking mechanism, but no disclosed metrics, artifact, or deployment path; HKR-K/R pass, HKR-H is weak, so it stays all.
editor take
dgMARK watermarks dLLMs by steering unmasking order; I buy the channel, but false positives and attack cost are undisclosed.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Effective Reasoning Chains Reduce Intrinsic Dimensionality
The paper validates on GSM8K with Gemma-3 1B and 4B that effective CoT strategies reduce task intrinsic dimensionality, which shows a strong inverse correlation with both in-distribution and out-of-distribution generalization performance.
#Reasoning#Interpretability#Benchmarking#Gemma
why featured
HKR-H/K/R pass, but evidence is limited to GSM8K with Gemma-3 1B/4B and the intrinsic-dimensionality framing is research-heavy. Useful paper, not a same-day industry item.
editor take
Gemma-3 1B/4B on GSM8K shows CoT lowers intrinsic dimensionality; I buy the metric, not the scope.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
De-attribute to Forget for LLM Unlearning
The paper proposes DareU, an LLM unlearning framework that uses reinforcement learning to reduce attribution scores from generated responses to forget-data owners. Its evaluation uses an LLM classifier as an attribution proxy, reports better balance between forget quality and model utility than baselines, and does not disclose dataset size in the RSS snippet.
#Alignment#Safety#Fine-tuning#Research release
why featured
HKR-H/K/R pass: the paper reframes unlearning via attribution and gives a concrete RL mechanism. Single arXiv release, no disclosed dataset scale or deployment result, so it stays in 60–71.
editor take
DareU lowers attribution via RL; dataset size is undisclosed, and I don’t buy an LLM classifier as the attribution proxy.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks
The paper proposes LGC and LGC-H for decision-based black-box adversarial attacks, using curvature-aware geometric search and a Residual-based Adversarial Generation mechanism to reach SSIM above 0.99 and LPIPS below 0.01 at 5,000 queries.
#Vision#Safety#Benchmarking#Research release
why featured
HKR-H/K/R pass via the 5,000-query imperceptible-attack claim, concrete metrics, and security relevance. It stays in 60–71 because this is a specialized adversarial-attack paper with no disclosed code or wider industry uptake.
editor take
LGC hits SSIM>0.99 and LPIPS<0.01 at 5,000 queries; I care most about reproducible robust-model breakage.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
The paper proposes Relay, a differentiable per-token channel for MDMs trained with truncated BPTT, and scales it to Fast-dLLM v2, where coding-task experiments reduce inference latency by up to 32% versus the reported baselines.
#Inference-opt#Code#Fast-dLLM v2#Research release
why featured
HKR-K lands via Relay, truncated BPTT, and a 32% latency claim; HKR-R is cost-driven. The niche discrete-diffusion angle and jargon-heavy title keep it in the 60–71 band.
editor take
Relay cuts Fast-dLLM v2 coding latency by up to 32%; discrete diffusion needs memory before it can threaten autoregression.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Native Hierarchical and Compositional Representations with Subspace Embeddings
The paper proposes representing concepts as linear subspaces instead of vectors, trains them with differentiable soft projection matrices, and reports state-of-the-art results on hierarchical and natural language inference benchmarks while preserving compatibility with efficient Euclidean vector search.
#Embedding#Reasoning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the representation mechanism is novel and benchmark-testable. But this is an arXiv representation-learning paper with no disclosed scores, dataset details, or product impact, so it stays in the lower band.
editor take
Subspace Embeddings learns concept dimensions via soft projections; SOTA tables aren’t disclosed, but the negation result is the sharper claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go
Go-UT-Bench provides 5,264 code and unit-test pairs from 10 permissively licensed Go repositories for fine-tuning LLMs on unit test generation; the fine-tuned models outperform their base versions on more than 75% of benchmark tasks.
#Code#Fine-tuning#Benchmarking#Go-UT-Bench
why featured
HKR-K/R pass via dataset size and fine-tuning results, and the topic matters to AI coding workflows. HKR-H is weak; this is a narrow Go unit-test benchmark, so it stays in the 60–71 band.
editor take
Go-UT-Bench has 5,264 Go test pairs; 10 repos is thin, so don't extrapolate 75% wins to real CI yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future
arXiv:2605.30553v1 presents diffusion models through a destroy-then-generate view, classifying them as training methods that withhold input information and predict it, with discussion of data-scarce settings and conditions for porting reinforcement learning techniques into diffusion contexts.
#Reasoning#arXiv#Research release#Commentary
why featured
HKR-H and HKR-K pass: the title has a sharp thesis and the summary gives a destroy-then-generate mechanism. As a single arXiv perspective paper with no disclosed benchmark, experiment number, or production result, it stays in the mid-interest band.
editor take
The paper offers destroy-then-generate, with no empirical numbers; I don’t buy the exploration claim, but data-scarce training is testable.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Can Subgraph Explanations Be Weaponized to Steal Graph Neural Networks?
The paper presents the first strict black-box model extraction attack for graph classification, where the attacker observes only discrete class labels and binary explanation masks, then uses Monte Carlo edge-sensitivity estimation and explanation subgraphs to narrow the decision-boundary search space.
#Interpretability#Safety#Benchmarking#LabRAI
why featured
HKR-H/K/R pass: the weaponized-explanation angle is clicky, and the attack conditions are concrete. Kept in all because it is one arXiv paper, no success rates, datasets, or code details in the snippet, and GNN graph classification is niche.
editor take
XSTEAL extracts GNNs using only class labels and binary explanation masks; don't ship explainability APIs blindly.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Scaling Multi-Agent Environment Co-Design with Diffusion Models
DiCoDe uses Projected Universal Guidance and critic distillation for multi-agent environment co-design; on the warehouse benchmark, it reports 39% higher rewards with 66% fewer simulation samples than the prior state of the art.
#Agent#Robotics#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives mechanisms and two concrete metrics, and it hits multi-agent training cost. Single arXiv paper with a narrow warehouse-simulation scope keeps it in all.
editor take
DiCoDe reports 39% higher warehouse reward with 66% fewer samples; I want the PUG constraints tested on real robots.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Rays as Pixels: Learning a Joint Distribution of Videos and Camera Trajectories
Rays as Pixels uses one Video Diffusion Model to learn a joint distribution over videos and camera trajectories, representing cameras as dense ray pixels in the same latent space as frames. The single trained model handles 3 tasks: pose prediction from video, trajectory-conditioned video generation from images, and joint synthesis of video and trajectory from images.
#Vision#Multimodal#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the paper has a concrete “rays as pixels” modeling hook and states one video diffusion model handles 3 trajectory/video tasks. No metrics, open-source artifact, or product path are disclosed, so it stays mid-band.
editor take
Rays as Pixels folds 3 camera-video tasks into one VDM; I buy raxels if closed-loop consistency beats pose-only score chasing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
VeriGate: Verifier-Gated Step-Level Supervision for GRPO
VeriGate trains 1.5B and 7B Qwen2.5-Instruct models on MATH and improves average accuracy by about 20% and 12% across six reasoning benchmarks, using verifier-gated step-level rewards only when GRPO verifier rewards are degenerate.
#Reasoning#Alignment#Benchmarking#Aakriti Agrawal
why featured
HKR-K/R pass: it has a concrete training mechanism and six-benchmark gain claims, with relevance to open reasoning post-training. HKR-H is weak, and this is a single arXiv paper without code or production evidence, so it stays in 60–71.
editor take
VeriGate lifts Qwen2.5-Instruct by 20%/12% across 6 reasoning benchmarks; GRPO’s zero-gradient failure gets a cleaner patch than blunt PRM reward hacking.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Quantifying Error Propagation and Model Collapse in Diffusion Models
The paper analyzes distribution drift in recursively trained score-based diffusion models, assuming each round mixes synthetic data with fresh target-distribution samples, and derives upper and lower bounds on accumulated divergence between generated and target distributions, with regimes determined by score estimation error and the fresh-data proportion.
#Fine-tuning#Benchmarking#arXiv#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv theory paper with bounds, not code, scale, or product impact. The technical-accessibility drag keeps it in all, below featured.
editor take
2602.16601 bounds drift in recursive diffusion training; I buy the fresh-data-ratio knob more than another scary collapse plot.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Autoregressive Visual Generation Needs a Prologue
Prologue prepends learned tokens to autoregressive image sequences and trains them only with AR cross-entropy, while visual tokens keep reconstruction duties. On ImageNet 256×256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance, and 16 prologue tokens reach 35.88% Top-1 in linear probing versus 23.71% for the first 16 standard tokenizer tokens.
#Vision#Benchmarking#ImageNet#Research release
why featured
HKR-K is strong and HKR-H has a clean hook, but HKR-R is narrow. Without a major lab, open-source artifact, or production-pipeline claim, this stays in the interesting research band.
editor take
Prologue-Base cuts ImageNet gFID to 10.75; I buy the split—stop forcing one token stream to serve reconstruction and generation.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
The Fundamental Limits of Fraud Detection in Card Payment Networks
The paper formalizes card authorization as a sequential decision problem and derives a minimax regret lower bound where delayed, censored, corrupted, and counterfactually missing feedback reduce the achievable learning rate through a multiplicative denominator.
#Reasoning#Benchmarking#Research release
why featured
HKR-K/R pass: the paper adds a concrete sequential-decision framing and multiplicative feedback-limit claim. Niche payments-risk scope and no product/model impact keep it in the 60–71 band.
editor take
The paper gives a minimax regret bound: delay, censoring, corruption, missing counterfactuals multiply the learning drag; bigger models won’t fix issuer feedback.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Survival Reinforcement Learning: Toward Scalable Self-Supervised RL
The paper introduces Survival Reinforcement Learning, an online classification alternative that maximizes an agent’s dwell time at target goals and outperforms CRL by 2x to 8x on stable long-horizon locomotion tasks.
#Agent#Robotics#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the paper has a clear reframing and a 2–8x experimental claim. HKR-R is weak, and this is a niche arXiv RL methods paper, so it stays in the 60–71 band.
editor take
SRL beats CRL by 2–8x on long-horizon locomotion; I’m not buying it until benchmarks and code land.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Advancing Creative Physical Intelligence in Large Multimodal Models
The paper introduces MM-CreativityBench to test affordance-grounded creative tool use in LMMs, using scenario images plus candidate entity and part views, and reports that Direct Preference Optimization improves correct entity and part selection while reducing visual hallucination errors.
#Multimodal#Vision#Alignment#Research release
why featured
HKR-K and HKR-R pass: the paper offers a new benchmark, concrete eval mechanisms, and DPO for hallucination reduction. As a single arXiv research item without visible open-source uptake or cross-source traction, it sits in 60–71.
editor take
MM-CreativityBench tests creative tool use in LMMs; scale is undisclosed. DPO reduces hallucination, but benchmark gains aren't physical intelligence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
PRISM: Preference-Aware Influence Function Based Data Selection for Fine-Tuning
PRISM weights target examples with model preferences and selects training samples by their influence on that preference-aware direction for efficient fine-tuning; the abstract says experiments cover diverse architectures and parameter scales, but the post does not disclose the specific models, datasets, metrics, or scores.
#Fine-tuning#Alignment#Safety#Research release
why featured
HKR-K and HKR-R pass: the mechanism is clear and the problem maps to fine-tuning cost. HKR-H fails, and the post lacks model names, datasets, and scores, so it stays in the lower research band.
editor take
PRISM uses preference-weighted influence functions for fine-tuning data selection; only the abstract is disclosed, with no models, datasets, or scores.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
idSCD: Identifying Training Datasets through Semantic Correlation Descriptors
idSCD uses semantic correlation descriptors for white-box dataset-level membership inference, comparing against RMIA, Attack-P, LiRA, and SIF across three task settings; the paper reports perfect separation in a controlled leave-one-dataset-out diagnostic and a largest relative ROC-AUC gain above 60% when dataset groups show distinct semantic particularities.
#Safety#Interpretability#Benchmarking#Andrada Gobeaja
why featured
HKR-K and HKR-R are clear, but this is a single arXiv paper without visible industry uptake. The method is niche, so it fits the 60–71 research-release band.
editor take
idSCD beats 4 baselines across 3 tasks; white-box membership inference gets sharper, but weak semantic separation limits the trick.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Bounded Behavioral Indistinguishability for Black-Box LLM Distillation
Munawar Hasan defines bounded behavioral indistinguishability as an (ε,q,t,A) condition over a prompt distribution, then tests Qwen and Llama teacher-student pairs on 5,000 behavioral probes; LoRA raises semantic similarity to 0.862 for Qwen and 0.874 for Llama, but learned discriminators still retain nonzero distinguishing advantage.
#Fine-tuning#Benchmarking#Alignment#Munawar Hasan
why featured
HKR-K/R pass: the paper offers a new metric, a 5,000-prompt test, and a LoRA finding tied to distillation mimicry. HKR-H is weak, and a single technical arXiv paper stays below featured.
editor take
LoRA lifts Llama similarity to 0.874, yet discriminators still separate it; semantic-score-only distillation eval is too lax.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Conformal Reliability: A New Evaluation Metric for Conditional Generation
The paper proposes reliability score, a conformal-prediction metric for conditional generation that measures worst-case performance within a prediction set at a preset confidence level. It also introduces CReL to construct covered prediction sets and optimize the score, with experiments on synthetic data, image-to-text, and text-to-image tasks.
#Benchmarking#Multimodal#arXiv#Research release
why featured
HKR-K and HKR-R pass: the paper offers a conformal-prediction reliability metric plus code for generation evaluation. HKR-H is weak, and the method-paper angle keeps it below featured.
editor take
CReL scores worst-case generation at preset confidence; I like the move, single-output metrics deserve pressure from risk-set audits.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies
BOKBO adds a conformal abstention layer to K-sample VLA inference and gives finite-sample distribution-free guarantees on executed-violation rate. On libero_object_temp_x0.1 with OpenVLA-OFT at ε=0.05, its learned violation predictor reaches 78% coverage and 70% net task success, while Mondrian-BOKBO raises the minimum per-task conditional hold fraction from 0.71 to 0.93.
#Robotics#Vision#Safety#BOKBO
why featured
HKR-H and HKR-K pass: the abstention-over-bad-options angle is fresh, and the paper gives ε=0.05 plus 0.71→0.93 retention. HKR-R is narrow because VLA robot safety is specialist-facing, so it stays below featured.
editor take
BOKBO lifts per-task hold fraction from 0.71 to 0.93 at ε=0.05; stop trusting internal confidence for VLA safety.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
CSULoRA: Closest Safe Update Low-Rank Adaptation
CSULoRA estimates a safety-aligned subspace from weight displacement between aligned and base checkpoints. It corrects trained LoRA adapters with a closed-form penalized minimum-change update. Adversarial fine-tuning tests report lower attack success rate while preserving most LoRA utility gains, but the snippet does not disclose exact numbers.
#Fine-tuning#Alignment#Safety#CSULoRA
why featured
HKR-K/R pass: the mechanism is concrete and relevant to LoRA safety after fine-tuning. HKR-H is weak, and the post withholds attack-success-rate numbers, so this stays an interesting research release, not featured.
editor take
CSULoRA post-corrects trained LoRA via weight displacement, but ASR numbers are missing; neat closed-form fix, pending subspace validation.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Eigenvectors of Experts Are Training-Free Non-Collapsing Routers
The paper proposes SSMoE, a training-free routing framework that uses SVD-derived spectral features from expert weight matrices, and evaluates expert collapse across language tasks, vision tasks, clean data, and corrupted data; the abstract reports public code but does not disclose model names, dataset counts, or numeric gains.
#Inference-opt#Interpretability#SSMoE#Research release
why featured
HKR-H/K pass: SSMoE offers an SVD-based training-free router and collapse tests. This remains a technical arXiv paper; code, scale numbers, and production impact are not disclosed, so it stays in 60–71.
editor take
SSMoE routes via expert-weight SVD with zero training; the abstract omits models and gains, so treat “non-collapsing” as unverified.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
MAAT: Multi-phase Adapter-Aware Targeted Unlearning
The paper introduces 5WBENCH and MAAT. 5WBENCH has 5,000 samples, with 1,000 per 5W category, while MAAT applies a three-phase LoRA-adapter procedure to target Why-type causal unlearning failures.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the item gives 5WBENCH size and a 3-phase LoRA mechanism, tied to unlearning/compliance. As an arXiv method paper without disclosed metrics or strong source authority, it stays in the normal research-signal band.
editor take
5WBENCH gives Why 1,000 cases; I buy the angle—0.06% causal coverage let unlearning scores hide failures.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack
The paper introduces OU@epsilon and the Prototypical Relearning Attack for class-level machine unlearning, then proposes Spotter, a plug-and-play objective tested on CIFAR, TinyImageNet, and CASIA-WebFace to reduce over-unlearning and block prototype-based relearning.
#Safety#Alignment#Benchmarking#arXiv
why featured
HKR-H and HKR-K pass: the paper introduces a metric, an attack, and a mitigation tested on CIFAR, TinyImageNet, and CASIA-WebFace. No major lab or product impact is disclosed, so it stays in the 60–71 research-signal band.
editor take
Spotter reports 3 datasets; if few samples restore a forgotten class, forget accuracy is a weak deletion receipt.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Rationalize: Shared Semantic Reasoning for Human-AI Alignment
Rationalize proposes four human-AI role pairs for data-driven sensemaking, making purposes, questions, assumptions, evidence, inferences, and implications explicit to support bidirectional alignment between humans and AI systems.
#Reasoning#Alignment#Rationalize#Research release
why featured
HKR-K comes from the 4 role-pair mechanism; HKR-R comes from the safety boundary in human-AI collaboration. HKR-H is weak, and no results, artifact, or production claim are disclosed.
editor take
Rationalize defines 4 human-AI role pairs; no experiments disclosed, so read it as interaction design, not model progress.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study
Vincent Granville proposes an RBF-style alternative architecture for LLMs in a 9-page arXiv paper, claiming it finds the global optimum of the loss function in closed form in one iteration; the post does not disclose reproducible experimental details beyond a high-level case study and comparison.
#Reasoning#Interpretability#Vincent Granville#arXiv
why featured
HKR-H and HKR-K pass: the title attacks the DNN premise and offers an RBF/closed-form claim. As a lone arXiv paper with no disclosed benchmarks or replication details, it stays in the 60–71 band.
editor take
Vincent Granville claims closed-form one-pass LLM training in 9 pages; no code or benchmarks, so I’m filing this as RBF repackaging.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
HetCCL: Enabling Collective Communication for Mixed-Vendor Heterogeneous Clusters
HetCCL uses heterogeneous P2P transport and a border-communicator mechanism for collective communication in mixed-vendor clusters; across 4 heterogeneous settings, it delivers 17-19x higher bandwidth than Gloo and reduces end-to-end LLM training per-step time by up to 16.9%.
#Inference-opt#HetCCL#Gloo#OpenMPI
why featured
HKR-K and HKR-R pass: the paper has concrete mechanisms and numbers, and mixed-vendor training clusters touch cost. The low-level collective-communication focus keeps it below featured.
editor take
HetCCL shows 17-19x Gloo bandwidth across 4 mixed-vendor setups; 16.9% step-time gain is modest, but the baseline matters.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
TRINE: A Token-Aware, Runtime-Adaptive FPGA Inference Engine for Multimodal AI
TRINE runs single-bitstream multimodal inference on Alveo U50 and ZCU104, reducing latency by up to 22.57x versus RTX 4090 at 20–21 W, while int8 quantization keeps accuracy drops below 2.5% across representative tasks.
#Multimodal#Inference-opt#Vision#TRINE
why featured
HKR-H and HKR-K pass via the 22.57x latency and <2.5% accuracy-loss claims. FPGA inference hardware is narrow for this audience, so it stays in the lower interesting band.
editor take
TRINE claims 22.57x lower latency than RTX 4090 at 20–21W; I want batch sizes, because FPGA papers love dunking on underfed GPUs.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
DisasterLex: An Expert Concept-to-Schema Knowledge Graph for Geospatial Reasoning in Disaster Analytics
DisasterLex links user queries to disaster databases through an expert knowledge graph with 107 concepts, 117 causal edges, and 52 concept-to-schema links, and on a 75-query test set over 36 geospatial tables it outperforms four baselines by 1.4x to 2.75x across seven base models.
#RAG#Reasoning#Tools#DisasterLex
why featured
HKR-K passes because the paper gives concrete dataset, graph, and baseline-gain numbers. HKR-H/R are weak: the domain is vertical disaster geospatial analytics, so this belongs in all, not featured.
editor take
DisasterLex wins 1.4–2.75x with 107 concepts and 117 causal edges; 3.56/5 says expert graphs remain a hard patch for geo-SQL.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
A Kinetic Energy Perspective of Flow Matching
The paper introduces Kinetic Path Energy, a per-sample diagnostic that accumulates kinetic effort along an ODE trajectory; experiments report two correspondences with semantic fidelity and sparse representation regions, and Kinetic Trajectory Shaping uses a two-phase training-free inference strategy to reduce memorization.
#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes with KPE, KTS, and a testable training-free memorization claim. HKR-H/R are weak, and the flow-matching energy framing is specialist, so this stays in all.
editor take
KPE scores per-sample ODE trajectory energy; I buy the diagnostic, but KTS needs disclosed benchmark numbers.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
dashi: A Python Library for Dataset Shift Characterization to Support Trustworthy AI Development and Deployment
dashi provides an open-source Python library for dataset shift analysis, using unsupervised information-geometry metrics and supervised performance-degradation checks across user-defined temporal or source batches, with demonstrations on 3 health AI case studies: gestational diabetes, COVID-19, and emergency medical dispatch.
#Tools#Safety#Benchmarking#dashi
why featured
HKR-K/R pass: dashi has a concrete tool shape and 3 health-AI examples for trustworthy deployment. HKR-H is weak, and this is not a model or major platform update, so it sits in 60-71.
editor take
dashi packages dataset-shift checks into Python and shows 3 health cases; I buy the tooling, not the “trustworthy AI” wrapper.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments
The paper introduces Flow Equivariant World Modeling, which makes latent memory transform equivariantly with self-motion and inferred object motion. It evaluates the method on 2D and 3D partially observed video world-modeling benchmarks against diffusion, memory-augmented, and recurrent architectures, but the snippet does not disclose exact metric values.
#Memory#Vision#Benchmarking#Research release
why featured
HKR-K passes for a concrete memory mechanism and 2D/3D partially observed video benchmarks, but metrics are not disclosed. HKR-H/R are weak, so this stays in the normal research-release band at 64.
editor take
Flow Equivariant World Modeling compares 3 architecture classes, with no metrics disclosed; I buy the bet—memory must move with motion.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning
Polaris separates semantics and hierarchy with angular geometry and radius, then evaluates taxonomy expansion across trees, multi-parent DAGs, and multimodal hierarchies; against fourteen baselines, it improves top-K retrieval by up to about 19 points and reduces mean rank by up to about 60%.
#Embedding#Multimodal#RAG#Polaris
why featured
HKR-K passes because the paper gives a testable embedding mechanism and benchmark gains. HKR-H/R are weak: the hook is a niche method name, and the practical nerve is limited to retrieval, taxonomy, and multimodal hierarchy work.
editor take
Polaris gains up to 19 top-K points over 14 baselines; angle/radius separation looks worth testing for RAG taxonomies.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
ProofWala: A Framework for Multilingual Proof Data Synthesis and Theorem-Proving
ProofWala provides a unified ITP interface for Lean 4 and Rocq, open-sources two repositories, and supports repository-scale extraction, parallel proof search, and multilingual training across theorem-proving datasets.
#Reasoning#Code#Tools#ProofWala
why featured
HKR-K passes: the post gives a unified ITP interface, 2 open-source repos, and parallel proof search. The theorem-proving toolchain is niche and technical, but not a hard-exclusion case, so it stays in the 60-71 band.
editor take
ProofWala bridges Lean 4 and Rocq; no lift numbers are disclosed, so treat it as proof-data plumbing, not reasoning progress.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings
CHARM adds channel-level text descriptions to a channel-order-equivariant Transformer and trains semantic time-series embeddings with JEPA, evaluating the learned representations with only a linear probe across anomaly detection, classification, and short- and long-term forecasting.
#Multimodal#Embedding#Interpretability#CHARM
why featured
HKR-K passes because CHARM has a concrete mechanism and evaluation setup. HKR-H/R are weak: this is niche time-series representation learning, useful signal but below the featured bar.
editor take
CHARM trains JEPA time-series embeddings and tests four tasks with linear probes; I buy text as channel IDs, not sensor semantics.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs
ZO-Finetuner learns per-LLM perturbation strategies for zeroth-order fine-tuning, and experiments on 4 LLMs and 7 datasets show it beats prior zeroth-order baselines in 82.1% of task-model combinations.
#Fine-tuning#Inference-opt#ASTRAL-Group#Research release
why featured
HKR-K passes with concrete scale and an 82.1% win rate. HKR-H is weak and HKR-R is narrow; zeroth-order optimization remains specialist, so this stays mid-band all.
editor take
ZO-Finetuner wins 82.1% across 4 LLMs and 7 datasets; model-version drift is the obvious tax on its train-once story.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation
DISCO introduces the SAM causal framework plus DISCO_m and sDISCO estimators, evaluates them against observed bias mitigation methods on six datasets, and releases source code on GitHub.
#Alignment#Benchmarking#DISCO#Research release
why featured
HKR-K/R pass: the paper offers a concrete mechanism, 6-dataset evaluation, and code, with fairness relevance. HKR-H is weak; this is an academic methods paper without a product or industry-event hook.
editor take
DISCO matches or beats bias baselines on 6 datasets; I want repo-level reproduction and multi-bias compute cost first.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Softsign: Smooth Sign in Your Optimizer for Better Parameter Heterogeneity Handling
The paper proposes SoftSignum and SoftMuon, replacing hard sign updates with a temperature-controlled soft-sign transform and an adaptive quantile temperature schedule. Experiments across deep learning tasks, including LLM pretraining, report consistent gains over hard sign-based optimizers and AdamW, while the paper proves stochastic non-convex convergence through a geometry-relaxation framework.
#Inference-opt#Benchmarking#Research release#Benchmark
why featured
HKR-K has concrete mechanisms and HKR-R matters to LLM pretraining practitioners. The post does not disclose gains, scale, or reproducibility details, and the optimizer-paper angle stays niche, so it lands in all.
editor take
SoftSignum swaps hard sign for temperature soft-sign; LLM scale is undisclosed, so don’t bury AdamW yet.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Plain Transformers are Surprisingly Powerful Link Predictors
PENCIL uses an encoder-only plain Transformer with attention over sampled local subgraphs, and the paper reports stronger results than heuristic-informed GNNs across multiple benchmarks while releasing code publicly.
#Reasoning#Benchmarking#PENCIL#arXiv
why featured
HKR-H/K pass: the angle is a plain Transformer challenging GNN link predictors, and the post gives PENCIL’s mechanism plus code. No major lab or product impact; graph link prediction is niche, so this stays in all.
editor take
PENCIL uses a plain encoder Transformer on local subgraphs, but no scores are disclosed here; I’d reproduce before buying the GNN-beating claim.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
GEM: Geometric Entropy Mixing for Optimal LLM Data Curation
GEM reformulates LLM pre-training data curation as a variational problem on the hypersphere with a mixing-balance regularizer, and experiments on 1.1B-parameter models show up to 1.2% higher average downstream accuracy when integrated with DoReMi and RegMix.
#Fine-tuning#Benchmarking#GEM#DoReMi
why featured
HKR-K passes with a testable mechanism and +1.2% result; HKR-R passes on pretraining cost. HKR-H is weak, and the gain is small and specialized, so this stays at 63.
editor take
GEM reports up to +1.2% on 1.1B models; I’d want replication before buying geometry as the cure for data-mix noise.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Discovering a Zeta Map Algorithm on Dyck Paths via Mechanistic Interpretability
Researchers trained a one-layer, one-head encoder-decoder Transformer on the zeta map for Dyck paths and analyzed it with decoder cross-attention, linear probing, and causal intervention. The study extracts a level-based mechanism and converts it into a peak-centered scaffolding algorithm, then proves agreement with the zeta map up to a labeling reversal convention.
#Interpretability#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the paper turns a tiny Transformer’s internals into a provable algorithm. The Dyck-path/zeta-map setup is niche and has no direct product or safety impact, so it stays in all.
editor take
A 1-layer 1-head Transformer learns Dyck zeta maps; I buy this—interpretability produced a provable algorithm, not vibes.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation
The paper models benchmark aggregation as a multitask principal-agent game and audits OLMES items across 3 item-level primitives: welfare alignment, marginal improvability, and performance variance. It uses WORKBank, EvoLM 4B, and PolyPythias 410M, identifies Pareto-inferior OLMES items under a pro-worker welfare operationalization, and releases code on GitHub.
#Benchmarking#Alignment#OLMES#WORKBank
why featured
HKR-K comes from a testable benchmark aggregation mechanism and code; HKR-R comes from evaluation trust. The academic framing and narrow impact keep it in the mid all band.
editor take
The paper audits OLMES with 3 item-level primitives; uniform averaging deserves scrutiny, but the pro-worker welfare choice carries the punchline.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Breaking Information Cocoons: A Hyperbolic Framework for Balancing Exploration and Exploitation in Recommender Systems
HERec aligns textual semantics with collaborative signals in hyperbolic space and optimizes Dasgupta's cost for automatic hierarchy clustering, reporting up to 5.49% utility improvement and 11.39% diversity increase over Euclidean and hyperbolic recommender baselines.
#Embedding#Benchmarking#HERec#Research release
why featured
HKR-H/K pass: the hook is hyperbolic geometry against information cocoons, and the post gives HERec plus +5.49% utility/+11.39% diversity. HKR-R fails because the impact is narrow recommender research, so it stays all.
editor take
HERec reports up to 5.49% utility and 11.39% diversity gains; honestly, deployment hinges on controllable exploration, not hyperbolic elegance.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
What Does Preference Learning Recover from Pairwise Comparison Data?
The paper formalizes CPRD from triplet comparison data, gives conditions under which the Bradley-Terry model fits the distribution, and identifies margin and connectivity as two factors controlling sample efficiency.
#Alignment#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes because the paper adds BT-model conditions and a sample-efficiency mechanism. HKR-H/R are weak: the title is academic, and the feed gives no experiments, numbers, or deployment stakes.
editor take
CPRD formalizes triplet preferences; when BT assumptions fail, your learned reward scores may lack stable meaning.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
ForecastCompass: Guiding Agentic Forecasting with Adaptive Factor Memory
ForecastCompass adds factor memory and reasoning memory for agentic forecasting, and experiments on Prophet Arena and FutureX with GPT-5-mini and Gemini-2.5-Flash report improved probabilistic accuracy and calibration.
#Agent#Memory#Reasoning#ForecastCompass
why featured
HKR-K passes: the paper offers a memory mechanism and two benchmark settings for agent forecasting. HKR-H and HKR-R are weak, and the post does not disclose gain size or artifacts, so it stays in the normal research band.
editor take
ForecastCompass reports gains on 2 benchmarks and 2 models, but no deltas; I’d scrutinize time leakage before buying it.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Diving into Kronecker Adapters: Component Design Matters
The paper proposes CDKA, which tunes the dimensions and number of Kronecker components and adds parameter-budget-aware configuration guidelines; the abstract says experiments cover multiple architectures and modalities, but the post does not disclose specific metrics.
#Fine-tuning#Multimodal#Research release#Open source
why featured
HKR-K passes because CDKA offers a concrete adapter-configuration mechanism and budget guide. HKR-H/R are weak, and no experiment metrics are disclosed, so this stays in all.
editor take
CDKA tunes Kronecker component dimensions and counts; no metrics disclosed, so I’d treat it as LoRA-family tuning work.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Assessing Predictive Models for Fairness Based on Movement Patterns
The paper proposes assessing spatial fairness in predictive models using individual movement patterns, with multi-resolution spatial partitions and a spatial scan statistic, and evaluates the method on thousands of synthetic unfair datasets.
#Alignment#Benchmarking#Research release
why featured
HKR-K passes via a concrete spatial-fairness mechanism and synthetic test scale. HKR-H/R are weak because the title is dry and the movement-pattern setting is narrow; no hard-exclusion rule applies.
editor take
The paper tests movement-pattern fairness across thousands of synthetic datasets; without real mobility data, the claim stays methodological.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Learning to Reason with Insight for Informal Theorem Proving
The paper proposes DeepInsight, a three-part training framework for informal theorem proving that teaches LLMs to identify core proof techniques; the abstract says it outperforms baselines on mathematical benchmarks, but the post does not disclose exact scores.
#Reasoning#Fine-tuning#Benchmarking#DeepInsight
why featured
HKR-K passes because the article gives a three-part DeepInsight training mechanism. HKR-H and HKR-R are weak: no concrete benchmark numbers, product angle, safety issue, or competitive trigger.
editor take
DeepInsight trains proof-technique recognition with 3 components; scores are undisclosed, and “insight” needs reproducible rewards or it’s branding.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation
The paper introduces LARK for reasoning distillation trajectory selection, using a learnability factor ρ to estimate the student model’s loss reduction rate and a χ²-regularized selection policy to balance learnability with distributional coverage.
#Reasoning#Fine-tuning#Tianrun Yu#Research release
why featured
HKR-K lands: LARK’s trajectory-selection mechanism is concrete. HKR-H is weak and HKR-R lacks benchmark gains or cost numbers, so this is useful research signal but not featured.
editor take
LARK scores trajectories by student loss-drop rate ρ; gains aren’t disclosed, so I buy the learnability angle pending replication.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection
AxonAD predicts future multi-head attention query vectors from past context and combines reconstruction error with query mismatch, improving ranking quality and temporal localization on TSB-AD’s 17 datasets and 180 series.
#Benchmarking#AxonAD#TSB-AD#Research release
why featured
HKR-K lands with a concrete AxonAD mechanism and TSB-AD coverage across 17 datasets and 180 sequences. HKR-H is only a research hook, and HKR-R misses broader practitioner nerves.
editor take
AxonAD improves ranking and localization on TSB-AD’s 17 datasets, 180 series; query drift is a cleaner anomaly signal than residuals.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Toward Identifiable Sparse Autoencoders
The paper introduces two iSAE variants for unstable TopK SAE training; the abstract reports lower reconstruction error and improved stability, but the RSS snippet does not disclose experiment scale or benchmark details.
#Interpretability#Research release
why featured
HKR-K passes: iSAE targets TopK SAE instability with a new mechanism and performance claim. HKR-H and HKR-R are weak, and experiment scale is not disclosed, so this stays as niche research signal.
editor take
iSAE claims lower TopK SAE error and stabler dictionaries; RSS gives no scale, so don’t equate identifiability with usability.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Assign and Add: A Mechanistic Study of Compositional Arithmetic
The paper trains small transformers on a controlled variable-assignment and modular-addition task, finds generalization to unseen variable-number combinations, and reports three learning phases: modular addition, variable-assignment structure, and refinement on hard unseen sequences.
#Reasoning#Interpretability#Research release
why featured
HKR-K and HKR-R pass: the paper gives a controlled setup, generalization condition, and 3 learning stages. It remains a narrow mechanistic-interpretability paper, with no production claim or frontier-model result, so it stays in 60–71.
editor take
Small transformers reuse one modular-addition MLP for direct and variable inputs; controlled tasks beat mystical LLM attribution here.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Research discovers randomized self-reductions to improve query efficiency
Bitween discovers randomized self-reductions for 64 of 80 functions on RSR-Bench, with Agentic Bitween using LLM agents to propose new query functions and raising the hit rate from the linear-regression backend’s 54% to 80%.
#Agent#Reasoning#Benchmarking#Bitween
why featured
HKR-K is solid with 80 functions, 64 findings, and a 54%→80% hit-rate gain; HKR-H and HKR-R stay weak because the paper is theory-heavy and narrow.
editor take
Agentic Bitween hits 64/80 functions; here the LLM is a search heuristic, not a proof machine.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection
ReTabAD releases 20 tabular datasets with structured textual metadata, plus implementations of classical, deep learning, and LLM-based anomaly detection methods and a zero-shot LLM baseline that uses semantic context without task-specific training.
#Reasoning#Benchmarking#ReTabAD#arXiv
why featured
HKR-K passes: ReTabAD provides 20 datasets with structured text metadata and zero-shot LLM baselines. HKR-H/R are weak, so it sits in the 60–71 band as a niche benchmark resource.
editor take
ReTabAD ships 20 metadata-rich tabular sets; I buy the direction, but the abstract hides LLM baseline gains.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Remembering by Reconstructing: Domain Incremental Learning With Test-Time Training on Video Streams
The paper proposes online test-time training on a masked autoencoder head to select the domain LoRA matching the current video-stream input, and evaluates the method on domain-incremental action recognition and semantic segmentation tasks.
#Vision#Fine-tuning#Research release
why featured
HKR-K passes on a concrete mechanism: MAE-based test-time training selects a domain LoRA for video streams. HKR-H/R miss due to no result number, product path, or practitioner pain hook, so it stays in all.
editor take
The paper uses MAE test-time training to pick domain LoRAs; no gains disclosed, but treating forgetting as routing is neat.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Memory by Design: Probabilistic Sequence Layers
The paper introduces a design-model framework that writes memory through exact Bayesian filtering; its Bayesian Layer propagates both mean and covariance, and the authors show linear attention, GLA, and Mamba-2/SSD as exact filters under one design model.
#Memory#Reasoning#Benchmarking#arXiv
why featured
HKR-H/K pass: the Bayesian filtering view across Mamba-2/SSD, GLA, and linear attention is a concrete mechanism. The paper is theory-heavy and gives no experiment numbers or deployment condition here, so technical accessibility keeps it in all.
editor take
Bayesian Layer keeps covariance and distills into 340M Gated DeltaNet for RULER gains; I buy the frame, but scores are missing.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Research proposes TimeRCD foundation model for zero-shot time series anomaly detection
TimeRCD uses Relative Context Discrepancy pre-training to detect time-series anomalies by comparing a query pattern with its surrounding context, and the arXiv abstract says it outperforms existing general-purpose and anomaly-specific foundation models in most zero-shot TSAD benchmark settings while staying competitive with dataset-specific full-shot baselines.
#Reasoning#Benchmarking#TimeRCD#Research release
why featured
HKR-K passes: the paper gives TimeRCD, RCD pretraining, and claimed wins across zero-shot TSAD benchmarks. HKR-H and HKR-R are weak because this is a narrow research item with no product, safety, or major-lab hook.
editor take
TimeRCD uses RCD for zero-shot TSAD; benchmark counts are undisclosed, so discount the strong claim.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
SWIM: Single-Instance Whole-Body Imitation for Swimming
SWIM learns whole-body swimming control from a single swimming motion and generalizes to unseen environments, body conditions, and swimming styles; the abstract does not disclose dataset size, metric values, or code availability.
#Robotics#Agent#Benchmarking#Research release
why featured
HKR-H and HKR-K pass on the single-instance swimming-control claim, but HKR-R fails: no product tie, code, metrics, or mainstream model angle is disclosed.
editor take
SWIM trains on one swim motion; no metrics or code disclosed, so I don’t buy the style-generalization claim yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs
NeUQI reduces uniform quantization initialization from joint scale and zero-point optimization to scale-only optimization, then reports stronger results than existing low-bit uniform quantization methods across LLaMA and Qwen settings and tasks. The arXiv snippet does not disclose exact bit widths, datasets, latency numbers, or performance deltas.
#Inference-opt#LLaMA#Qwen#Research release
why featured
HKR-K/R pass because the paper offers a concrete quantization mechanism tied to inference cost. HKR-H fails, and the post lacks bit widths, datasets, and lift numbers, so it stays in the lower 60–71 band.
editor take
NeUQI collapses scale/zero-point init to scale-only; without bit widths or deltas, I’m not buying the PV-tuning win yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits
The paper unifies SVD LLM and Basis Sharing under one optimization problem and reports up to 46% lower weight reconstruction error on Pythia models, but downstream perplexity and accuracy degrade versus standard per-layer SVD LLM.
#Inference-opt#Pythia#Research release
why featured
HKR-K passes: the paper adds a unified optimization framing plus a 46% reconstruction-error result that fails on downstream metrics. HKR-H/R are weak; the framing is niche and no production impact is shown.
editor take
Cross-Layer Subspace Coupling cuts Pythia reconstruction error 46%; perplexity still loses to per-layer SVD, so weight-space compression fails again.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Improving Selective Classification with Pairwise Queries for Binary Classification
The paper proposes pairwise queries to the same model for detecting high-error samples in selective binary classification, and reports better accuracy-cost tradeoffs than raw confidence estimates such as LLM next-token logits on 1 synthetic and 4 real in-context learning datasets.
#Reasoning#Benchmarking#Research release
why featured
HKR-K passes: the paper offers a concrete pairwise-query mechanism and dataset scope. HKR-H and HKR-R are weak because the title is academic and the impact is narrow, so it fits the low-60s research band.
editor take
Pairwise queries beat raw logits on 5 binary datasets; when confidence is inconsistent, asking the same model twice saves expert budget.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Improving Relative Representations with Learned Anchors and Whitened Inner Products
The paper proposes learned semantic anchors and whitened inner products for Relative Representations, replacing random anchors and cosine similarity to improve cross-model communication on vision and language tasks, including stable zero-shot communication between heterogeneous small language models.
#Embedding#Multimodal#Research release
why featured
HKR-K passes: the paper names learned anchors and whitened inner products, with a zero-shot heterogeneous SLM communication claim. HKR-H/R are weak, and no numbers or deployment conditions are disclosed.
editor take
Learned anchors plus whitened inner products replace random anchors and cosine; “nearly lossless” has no numbers, so treat this as RR repair work.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Paper proposes Mixture of Concept Bottleneck Experts framework extending CBM
The paper proposes M-CBE, extending CBM task predictors from one preset expression to multiple expert expressions, and evaluates two instances: Linear M-CBE and Symbolic M-CBE.
#Interpretability#Research release
why featured
HKR-K passes: M-CBE extends CBM task predictors into multiple expert expressions, with Linear and Symbolic variants. No metrics, code, or production claim are disclosed, so it stays in the 60-71 band.
editor take
M-CBE turns CBM predictors into multiple expert expressions; no metrics disclosed, so this reads like interpretability tuning, not proof.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Generalizing Multi-Scale Time-Series Modeling with a Single Operator
SiGMA uses a learnable discrete Gaussian kernel for distance-aware scaling, ranks best in 13 of 16 long-term forecasting settings, and reports up to 5.3x faster training plus up to 3.8x lower memory use than the strongest competitors.
#Benchmarking#SiGMA#Research release#Open source
why featured
HKR-K is solid because the post gives a concrete mechanism and benchmark numbers. HKR-H and HKR-R are weak: this is a niche time-series modeling paper, not a broad model, agent, or product update.
editor take
SiGMA wins 13/16 long-horizon settings; I’d trust the 5.3x speedup only after reproducing their code.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings
ScaleMAP rescales pairwise embedding displacements by original-space local radii, preserving density without adding a competing penalty. It matches DensMAP on density preservation, maintains UMAP-level neighborhood preservation, recovers sparse transcriptomic bridges collapsed by UMAP, and represents flow-cytometry density across 17 orders of magnitude; the same mechanism also improves PaCMAP density preservation.
#Embedding#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: ScaleMAP has a concrete mechanism and a 17-order-magnitude evaluation claim. The topic remains algorithmic research with limited product or industry resonance, so it stays in all.
editor take
ScaleMAP rescales displacements by local radii and spans 17 density orders; I buy this cleaner than bolting penalties onto UMAP.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
The paper evaluates audio SSL probing on 13 datasets and 6 spectrogram-based encoders, introducing binarized prototypical probes that use class-wise prototypes to aggregate localized token information and outperform linear and attentive probing.
#Audio#Embedding#Benchmarking#arXiv
why featured
HKR-K passes with concrete test scope and a named probe mechanism. HKR-H/R are weak: the hook is niche, and the paper lacks product, cost, safety, or competitive impact, so it sits in the low-60 research band.
editor take
This tests 13 datasets and 6 spectrogram encoders; for audio SSL, CLS linear probes are a bad proxy.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
PINE: Pruning Boosted Tree Ensembles with Conformal In-Distribution Prediction Equivalence
PINE prunes boosted tree ensembles by using conformal calibration with a single alpha parameter to control an in-distribution region, and experiments on 12 public tabular datasets report up to a 30% higher compression ratio while preserving predictions at a level comparable to existing faithful pruning methods.
#Inference-opt#Benchmarking#PINE#arXiv
why featured
HKR-K passes with a concrete mechanism and 12-dataset result. HKR-H/R are weak: tabular tree-ensemble pruning is useful but narrow, so this stays as regular research signal.
editor take
PINE reports 30% more compression on 12 tabular sets; limiting equivalence to in-distribution regions is the pragmatic trade.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Generalistic or Specific Embeddings, Which Is Better? An Empirical Study on Clinical Coding Search in Non-English Languages
The study fine-tunes a Spanish biomedical two-stage retriever on about 19,500 Gemini-generated pairs, raising aggregate R@5 to 0.822 versus BioBERT-ST’s 0.790 while improving four of five evaluated languages.
#Embedding#RAG#Fine-tuning#Gemini
why featured
HKR-K has concrete metrics, and HKR-R touches domain-adaptation costs for multilingual medical RAG. The topic is academic and narrow, with no product, framework, or broad mechanism, so it stays in the 60-71 all band.
editor take
19.5k Gemini pairs push R@5 to 0.822; I trust this narrow clinical recipe more than generic embedding leaderboards.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Smaller and Faster 3DGS via Post-Training Dictionary Learning
The paper introduces a post-training dictionary-learning compression pipeline for 3DGS and reports average compression ratios of 3.95x, 3.10x, and 4.55x on 3DGS, 3DGS-MCMC, and PixelGS across 13 benchmark scenes, with rendering speedups of 23.3%, 24.3%, and 25.3% while maintaining image quality.
#Vision#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes with a concrete post-training compression method and 13-scene ratios. The 3DGS dictionary-learning angle is niche, so HKR-H/R are weak and it stays in the 60–71 band.
editor take
Post-training dictionary learning gives PixelGS 4.55x compression without retraining; I’d check PSNR off those 13 scenes first.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Pairwise Reference Alignment as a Model-Level Ordinal Observable
The paper defines pairwise reference alignment as the probability that a model score ranks y+ above y- under a reference pair distribution P_pair, then gives finite-sample estimators, concentration bounds, a margin extension, and an initial study on Qwen2.5 models and RewardBench.
#Alignment#Benchmarking#Qwen#RewardBench
why featured
HKR-K passes with a concrete alignment observable, estimator, bounds, and Qwen2.5/RewardBench tests. HKR-H/R are weak, so this is useful eval research but too narrow for featured.
editor take
The paper defines one preference-order probability; Qwen2.5 and RewardBench results lack scale, so this reads as metric hygiene.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
PROWL: Prioritized Regret-Driven Optimization for World Model Learning
PROWL trains a diffusion-based action-conditioned world model with a KL-constrained adversarial curriculum and evaluates it in MineRL. Its PAT buffer re-ranks trajectories by prediction error, action fidelity, and learning progress, while the abstract says robustness improves over passive-data training but does not disclose numeric gains.
#Agent#Vision#Fine-tuning#PROWL
why featured
Only HKR-K lands: the PAT buffer and KL-constrained curriculum are testable mechanisms, but MineRL metrics are not disclosed and the title is paper jargon. This fits all, below featured.
editor take
PROWL reports MineRL and the mechanism, not numeric gains; I don't buy broad generalization, but PAT targets the right world-model failure mode.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Reward Learning from Best-of-N Preference Data: Targets, Tradeoffs, and Design Principles
The paper analyzes Bradley–Terry reward learning from Best-of-N preference data, where N candidates are sampled and the best is paired with a rejected response. It derives closed-form targets for independent-reference variants, shows Best-vs-Random and Best-vs-Worst generally fail exact BT representability, and reports that larger N increases pairwise margins while reducing connectivity.
#Alignment#Benchmarking#Research release
why featured
HKR-K passes via a testable Best-of-N tradeoff between margin and connectivity. HKR-H/R are weak, and the reward-modeling scope is too niche for featured.
editor take
Best-of-N widens margins and hurts connectivity; crank N only when labels are costly, not when generation is the bottleneck.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Trust-Region Behavior Blending for On-Policy Distillation
The paper proposes TRB, a warmup method that replaces early rollout policy within a student-centered KL trust region, keeps the reverse-KL OPD loss unchanged, and reports the strongest average performance across two math-reasoning distillation settings.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K passes with a concrete distillation mechanism and two math-reasoning settings. HKR-H/R are weak, and no code, model name, or major-lab source is disclosed, so this stays in the 60–71 research-signal band.
editor take
TRB only changes early rollouts and wins in 2 math distillation settings; I’d probe whether KL annealing erases the gain.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Federated Learning with Enhanced Privacy via Model Splitting and Random Client Participation
The paper proposes MS-PAFL, a federated learning framework that splits each client model into a local private submodel and an aggregated public submodel, injects calibrated Gaussian noise only into the public part, and analyzes single-round and total privacy loss under random client participation and local data subsampling.
#Fine-tuning#Alignment#Benchmarking#Research release
why featured
HKR-K passes on a concrete mechanism and privacy-loss analysis. HKR-H/R fail: this is a narrow arXiv federated-privacy paper with limited immediate industry pull.
editor take
MS-PAFL adds Gaussian noise only to the public submodel; no datasets, ε, or accuracy numbers in the snippet, so I don’t buy “significant.”
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection
The paper proposes a conditional attribution framework that retrieves contextually similar normal states via VAE latent spaces and UMAP embeddings, then evaluates root-cause identification, temporal localization, and robustness on the SWaT and MSDS benchmarks across multiple anomaly detection models.
#Interpretability#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with a concrete attribution mechanism and SWaT/MSDS evaluation. HKR-H/R are weak, and this is a single arXiv method paper without production replacement or strong SOTA numbers, so it sits in 60–71.
editor take
The paper tests conditional attribution on SWaT and MSDS; gains aren’t disclosed, so don’t crown VAE+UMAP retrieval as RCA’s fix.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
BAT: Better Audio Transformer Guided by Convex Gated Probing
The paper introduces Convex Gated Probing and BAT for audio SSL evaluation, using gated access to frozen layers; the abstract claims new SOTA on audio benchmarks, but the post does not disclose benchmark scores.
#Audio#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: the paper adds Convex Gated Probing and a frozen-layer gating mechanism, but the summary gives no scores or production impact. No hard exclusion; this fits a standard research-release score.
editor take
BAT claims SOTA via CGP, but scores are undisclosed; I’d treat this as a probing paper before buying the leaderboard claim.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment
FlexRank uses low-rank weight decomposition and importance-ordered nested consolidation to extract submodels from pretrained LLMs and ViTs under different compute budgets; the arXiv abstract does not disclose benchmark scores, latency numbers, or implementation details.
#Inference-opt#Research release
why featured
HKR-K passes because the paper offers a testable adaptive-deployment mechanism. HKR-H/R are weak, and no performance numbers are disclosed, so it stays below the interesting band.
editor take
FlexRank extracts budgeted submodels, but reports no scores or latency; I don't buy “train once, deploy everywhere” yet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
What Changes After Deployment? A Survey on On-device Learning in TinyML
The survey organizes about 70 TinyML on-device learning works by distribution-change regime, then analyzes how change types affect deployable applications, hardware choices, and solution structure.
#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes with about 70 surveyed works and a distribution-shift taxonomy. HKR-H and HKR-R are weak: the niche TinyML survey lacks a click hook and broad industry tension, so it stays in all.
editor take
This survey maps ~70 TinyML ODL papers; centering distribution shift beats another benchmark leaderboard for deployment reality.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Subspace-Decomposed JEPAs: Disentangling Progression and Content in Latent World Models
SD-JEPA splits JEPA latents into two orthogonal subspaces, including an 8-dimensional progression subspace. That subspace is 4.2% of the latent, explains 72–95% of task-progress variance across four environments, and improves semantic event localization on 40 held-out cube episodes by up to +0.18 pooled AUROC.
#Agent#Reasoning#Benchmarking#arXiv
why featured
HKR-K passes on a concrete mechanism and numbers. HKR-H/R miss: JEPA latent-space decomposition is narrow, with no product or open-source hook, so it sits in low all rather than featured.
editor take
SD-JEPA’s 8-D subspace explains 72–95% progress variance; I buy the split, but 40 cube episodes is thin.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Zero Collapse: A Failure Mode of Policy Gradient Methods in Discontinuous Reward Environments
The paper identifies “zero collapse” in policy-gradient RL for discontinuous reward environments, demonstrated across REINFORCE and actor-critic variants. In first-price auctions, flat zero-reward regions and sharp reward thresholds let stochastic exploration and gradient updates overshoot high-reward regions, after which missing gradient signals make recovery sample-inefficient.
#Reasoning#Benchmarking#Research release
why featured
HKR-H/K pass via a named failure mode and a concrete RL mechanism. HKR-R is weak; no product, open-source artifact, or major-lab move, so it stays in the low-value upper band.
editor take
Zero collapse hits REINFORCE and actor-critic; in auction RL, exploration tuning won’t save you when reward cliffs erase gradients.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences
The paper proposes FedVPA-GP, a federated variational preference alignment framework that uses a Federated Mixture Prior and Orthogonal Loss to separate user preferences, and evaluates it against monolithic reward-model baselines on the HH-RLHF dataset.
#Fine-tuning#Alignment#Research release#Safety/alignment
why featured
HKR-K passes via the FedVPA-GP mechanism and HH-RLHF evaluation. HKR-H/R are weak: the title is specialist-heavy, and the paper lacks a production-impact or safety-incident hook.
editor take
FedVPA-GP is tested only on HH-RLHF, with client count undisclosed; the idea is sane, but “significantly outperforms” needs runs.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector
Adaptive NAD evaluates unsupervised network anomaly detection on three security datasets, reporting false alarm rates of 1.33%, 0.71%, and 0.08%, plus more than 3x faster online inference latency than state-of-the-art baselines on CIC-Darknet2020, NSL-KDD, and Edge-IIoTset.
#Benchmarking#Adaptive NAD#Research release#Open source
why featured
HKR-K passes on concrete false-positive and latency numbers. HKR-H/R are weak, and network anomaly detection is specialized, so this stays in the lower research-news band without hard exclusion.
editor take
Adaptive NAD reports 0.08% false alarms on Edge-IIoTset; I care whether its online self-training survives poisoned traffic.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Bridging the Gap Between Natural Language and Market Dynamics via High-Dimensional Representation Learning
The paper replaces scalar sentiment scores with dense FinBERT embeddings in a Transformer forecasting architecture, benchmarking raw embeddings, attention-weighted aggregation, and Siamese-optimized embeddings on the FNSPID dataset; Siamese embeddings outperformed the scalar baseline and raw embeddings, while attention aggregation struggled under financial data’s low signal-to-noise condition.
#Embedding#Benchmarking#FinBERT#FNSPID
why featured
HKR-K passes via the FinBERT-embedding mechanism and three strategy comparisons. HKR-H/R fail, and no performance numbers are disclosed, so this stays a narrow research item in all.
editor take
Siamese FinBERT embeddings beat scalar sentiment baselines here; stop worshipping sentiment scores, though the snippet omits effect size.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames
HUNT unifies UAV traversal, target acquisition, and tracking in one relative formulation using onboard instantaneous observables such as attitude, altitude, and velocity; the abstract reports outdoor tests in forests, container compounds, and SAR scenes, but does not disclose speed, success rate, or quantitative baselines.
#Robotics#Research release
why featured
HKR-K passes: HUNT proposes one relative-frame mechanism using onboard instantaneous observations for three UAV tasks. No speed, success-rate, or baseline numbers are disclosed, so HKR-H/R stay weak.
editor take
HUNT unifies search and tracking via instantaneous relative frames; no speed or success rate disclosed, so I don’t buy “high-speed robust” yet.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Performance and Complexity Trade-off Optimization of Speech Models During Training
The paper proposes a feature-noise-injection reparameterization method that lets SGD jointly optimize speech-model task performance and computational complexity during training, instead of applying post hoc pruning or quantization; the authors evaluate it in 3 case studies, covering a synthetic setup, voice activity detection, and audio anti-spoofing, and state that the related code is public.
#Audio#Inference-opt#Research release#Open source
why featured
HKR-K and HKR-R pass via a concrete training mechanism and cost angle; HKR-H fails because this is a niche academic optimization paper with no disclosed code, savings number, or product impact.
editor take
Feature-noise injection lets SGD optimize speech-model error and FLOPs in 3 cases; this smells useful, not another pruning wrapper.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
FlagGAM: Rule-Based Generalized Additive Modeling for Explainable Tabular Prediction
FlagGAM converts numerical and categorical variables into sparse human-readable rule bases, then uses a default additive head that stays close to EBM on tabular benchmarks and shows smaller AUROC degradation under missing and noisy perturbations.
#Interpretability#Benchmarking#FlagGAM#Research release
why featured
HKR-K passes via a concrete mechanism and robustness claim, but HKR-H/R are weak: this is a niche tabular interpretability paper with no product pull or broad practitioner debate. No hard exclusion applies.
editor take
FlagGAM keeps a sparse rule-basis matrix; the EBM-close claim lacks concrete benchmark numbers, so don’t crown it yet.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Supervised Training Rapidly Degrades Early Visual Cortex Alignment Across Biologically Plausible Learning Rules
The paper evaluates four learning rules using 720 THINGS images and fMRI data from three subjects across six visual ROIs. One training epoch reduces V1 alignment by 25–90%, with backpropagation showing the largest drop and predictive coding plus STDP preserving more alignment.
#Vision#Benchmarking#Alignment#arXiv
why featured
HKR-H/K pass: the counterintuitive drop has concrete setup and numbers. HKR-R is weak because this is niche neuro/vision representation research with limited product or practitioner impact.
editor take
One epoch drops V1 alignment 25–90%; stop using brain-similarity as BP halo, even this 3-subject fMRI cut stings.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Graph Machine Learning in the Era of Large Language Models
arXiv:2404.14928v3 surveys two-way links between Graph ML and LLMs, covering graph feature enhancement, reduced labeled-data reliance, graph heterophily, OOD generalization, and graph-based improvements to LLM pre-training and inference.
#Reasoning#RAG#Research release
why featured
HKR-K passes because the survey gives a mechanism map for graph ML and LLM integration. HKR-H/R fail, and the post lacks a new model, benchmark number, or product impact, so it stays in all.
editor take
arXiv 2404.14928v3 is survey-only here, with no benchmarks disclosed; Graph-LLM work needs reproducible wins, not another taxonomy.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Forecasting with Hyper-Trees
The paper introduces Hyper-Trees, a gradient-boosted tree framework that learns parameters for target time-series models such as ARIMA or Exponential Smoothing, and uses a shallow network to reduce scaling limits when estimating high-dimensional parameter sets.
#Benchmarking#Research release
why featured
HKR-K passes on a concrete modeling mechanism, but no benchmark numbers, code, or production-replacement claim is disclosed. HKR-H and HKR-R are weak, so this stays in all.
editor take
Hyper-Trees uses GBDT to predict ARIMA/ES parameters; I buy the direction, but no benchmark numbers are disclosed.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization
The paper proposes H-EARS, which encodes known dominant energy terms into reward potentials with O(n) per-step computation, and reports gains in convergence speed, policy stability, and final performance across 4 continuous-control benchmarks and 4 baseline algorithms.
#Robotics#Reasoning#Benchmarking#Research release
why featured
HKR-K passes: H-EARS adds dominant energy terms to the reward potential, with O(n), 4 benchmarks, and 4 baselines. The RL-paper framing lacks HKR-H and HKR-R, so it stays in the 40–59 band.
editor take
H-EARS adds known energy terms to reward at O(n); 4 benchmarks are thin, so verify the extreme-road sim.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Learning to Perceive the World Through Control: Empowerment-Based Representation Learning
arXiv:2605.30656 studies empowerment-based representation learning in reinforcement learning environments where observations exceed control-relevant variables. The paper shows empowerment agents induce two complementary representations, forward and backward, both invariant to control-irrelevant features, and argues that interaction aimed at maximizing control is required for these invariance properties.
#Agent#Reasoning#Research release
why featured
HKR-H and HKR-K pass via the agent-control framing and concrete representation claims. HKR-R is weak: single arXiv theory paper, no product path, artifact, or industry debate disclosed.
editor take
arXiv 2605.30656 proves two empowerment representations; I buy the invariance angle, but sample complexity is undisclosed.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H1·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation
The paper proposes DSP for few-shot atypical layout-to-image generation, using Semantic Anchoring, Primitive Imbuing, and Conceptual Steering to improve visual fidelity and alignment in the 5-shot regime.
#Vision#Multimodal#iCVTEAM#Research release
why featured
HKR-K passes on the 5-shot atypical L2I setup and DSP mechanisms. HKR-H/R are weak, and the post lacks metrics, code quality, or reproducibility details, so it stays in the lower research-release band.
editor take
DSP claims 5-shot gains but exposes no metrics here; I’d file it as a patch for long-tail L2I layouts.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems
The paper analyzes four challenges in deploying reinforcement learning to a real industrial thermal heating network: partial observability, action-space design, reward design, and the simulation-to-reality gap; the real deployment reaches operational stability, but the abstract does not disclose the size of the performance gap versus simulation.
#Agent#Robotics#Research release
why featured
HKR-K passes because the paper gives four RL deployment blockers for industrial heat networks; HKR-R is limited to real-world control practitioners. No performance delta or AI product angle, so it stays in the lower research band.
editor take
RL ran stably on a real heating network, but gap size is undisclosed; control papers need failure boundaries, not SOTA theater.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R1
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Enhancing Regime Shift Detection Using Unstructured Data: A Study on the Treasury Market
The paper proposes a text-enhanced regime shift detection pipeline that uses LLM reasoning over FOMC minutes, validates candidates with a bootstrap likelihood-ratio test on VAR, and evaluates 2010-2024 data with a 14-variable U.S. Treasury and macro panel; it reports F1 = 0.82 and same-day modal detection latency against verified monetary-policy regime shifts.
#Reasoning#Benchmarking#FOMC#U.S. Treasury
why featured
HKR-K passes via a concrete LLM-plus-FOMC-minutes setup, a 2010-2024 panel, and F1=0.82. HKR-H and HKR-R miss because this is a narrow finance paper, not a core AI product or model-capability story.
editor take
FOMC minutes plus a 14-variable panel hit F1=0.82; I buy LLM-as-candidate, not LLM-as-trading-signal.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling
Haochen Yuan and three coauthors propose Unicorn, a high-dimensional time-series forecasting framework that uses a latent prototype codebook to decouple correlation modeling from channel identities for multi-dataset pretraining and few-shot transfer.
#Benchmarking#Haochen Yuan#Yichen Song#Yunbo Wang
why featured
HKR-K passes: Unicorn uses a latent prototype codebook for multi-dataset pretraining and few-shot transfer. HKR-H/R fail, and no benchmark number or production impact is disclosed.
editor take
Unicorn decouples channel identity via a prototype codebook; no benchmark numbers disclosed, so I’d file it as a promising time-series pretraining bet.
HKR breakdown
hook knowledge resonance
open source
49
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
How Well Does Classification Accuracy Capture Concept Drift Detection Quality?
The paper studies the relationship between eight drift detection quality metrics and classifier performance across seven synthetic data stream generators, with drift dynamics included as an evaluation condition.
#Benchmarking#arXiv#Research release#Benchmark
why featured
HKR-K passes on concrete evaluation scope: 8 metrics and 7 stream generators. HKR-H/R are weak, and the body does not disclose the main finding, so this stays in the lower-value all tier.
editor take
This tests 8 drift metrics across 7 synthetic stream tools; judging drift detection by accuracy alone was overdue for a teardown.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster
The paper proposes HADT for autonomous resource management in heterogeneous EO satellite clusters, modeling the task as sequential decision-making with relational observation-action tokenization and differential attention; the RSS snippet does not disclose baseline names, dataset settings, or exact performance gains.
#Agent#Reasoning#Robotics#Research release
why featured
HKR-K passes for the HADT mechanism and tokenization design. HKR-H and HKR-R fail: no baseline names or gains are disclosed, and satellite resource management is distant from mainstream AI product practice.
editor take
HADT frames heterogeneous EO satellite scheduling as sequential decisions; baseline names and gains are undisclosed, so treat it as an engineering idea.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Student Capacity Moderates Knowledge Distillation Effectiveness Across ResNet Teacher-Student Pairs on CIFAR-10
The paper tests three ResNet teacher-student pairs on CIFAR-10 under three seeds with mean and standard deviation reported. R50→R34 Feature-KD gains +0.30pp over baseline, while a 32×32-aware ResNet stem correction raises teacher accuracy by more than 5pp, far larger than any distillation gain.
#Vision#Benchmarking#arXiv#ResNet
why featured
HKR-K passes with reproducible teacher-student pairs and concrete point gains. HKR-H/R fail because this is a narrow distillation ablation on an old vision benchmark, not broad industry signal.
editor take
R50→R34 Feature-KD gains just 0.30pp; the 32×32 stem fix adds 5pp+, so check implementation before praising KD.
HKR breakdown
hook knowledge resonance
open source
47
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
MADQI: An Evaluation Metric for Unsupervised Learning in AIS-Based Maritime Anomaly Detection
The paper proposes MADQI to evaluate unlabeled AIS-based maritime anomaly detection, combining four metrics—ARC, PPS, SDS, and ECE—and reports a MADQI score of 80.37% on an AIS dataset.
#Benchmarking#Ismet Gocer#Zakirul Bhuiyan#Raza Hasan
why featured
HKR-K passes because the paper names a metric, four components, and an 80.37% result. HKR-H/R fail: AIS maritime anomaly detection is narrow, with no agent, product, or frontier-model implication, so it sits in the low-value research band.
editor take
MADQI combines 4 metrics and reports 80.37%. I don’t buy it yet: unlabeled evaluation easily turns heuristics into a score.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning
The paper constructs two linear filters for partially observable reinforcement learning: one exactly reproduces belief-vector pre-softmax logits under deterministic HMM transitions, and the other drives state-decoding error to zero under nearly deterministic transitions.
#Reasoning#Memory#Research release
why featured
HKR-K passes for a testable mechanism around linear filters and HMM assumptions. HKR-H/R are weak, and the POMRL theory barrier keeps it in the lower research-signal band.
editor take
The paper gives two linear filters; deterministic HMMs recover belief logits exactly. Linear memory gets a mechanism, not emergence folklore.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
8d ago
arXiv · cs.LG· atomEN04:00 · 06·01
Early Prediction of Future Behavioral Strategy from Process Traces
The paper introduces PLVM, a process-level latent variable model that fuses partial traces from two cleaning tasks to predict whether PowerWash Simulator players use locally persistent Zone Planner behavior or frequent Zone Hopper behavior in the held-out Fire Station level; the abstract does not disclose dataset size or accuracy numbers.
#Benchmarking#PowerWash Simulator#Research release
why featured
HKR-H comes from the odd game setting, and HKR-K has a concrete PLVM trace-prediction setup. No metrics or product/agent implications are disclosed, so this stays in the low-value research band.
editor take
PLVM predicts Fire Station strategy from two cleaning traces; no sample size or accuracy disclosed, so this reads like telemetry modeling, not agent benchmarking.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H1·K1·R0
02:59
8d ago
HuggingFace Papers (takara mirror)· rssEN02:59 · 06·01
Exploiting Semantic and Pixel Representations for Ultra-Low Bitrate Image Compression
SPRDiff applies a diffusion-based triple-encoder design and a distortion-aware reconstruction module to ultra-low-bitrate image compression, using pretrained distortion-oriented and semantic-oriented encoders to compensate for a frozen VAE encoder; benchmark experiments report better rate-distortion-perception trade-offs than state-of-the-art methods below 0.03 bpp, and the authors say code and trained models will be released on GitHub.
#Vision#Multimodal#Benchmarking#SPRDiff
why featured
HKR-K passes with testable details: below 0.03 bpp, a tri-encoder design, and distortion-aware reconstruction. HKR-H/R stay weak because this is niche image-compression research without product or broad cost impact.
editor take
SPRDiff beats SOTA below 0.03 bpp; I care whether inference latency eats the compression win after weights ship.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
01:44
8d ago
HuggingFace Papers (takara mirror)· rssEN01:44 · 06·01
CRePE: Convolution-aware Relative Importance in Efficient Post-training Pruning
CRePE adds 2D local neighborhood context and adaptive coefficients to relative-importance post-training pruning, while PHO replaces repeated perplexity evaluations and reduces coefficient search time from about 11 hours to about 20 minutes.
#Inference-opt#CRePE#PHO#RIA
why featured
HKR-K is strong and HKR-R is moderate: the 11h-to-20m search cut is concrete and cost-relevant. HKR-H is weak because the paper is narrow pruning research, so it stays in all.
editor take
PHO cuts search from 11 hours to 20 minutes; I buy transferable pruning knobs, but accuracy numbers aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
00:08
8d ago
HuggingFace Papers (takara mirror)· rssEN00:08 · 06·01
Agent Operating Systems (AOS): Integrating Agentic Control Planes into and Beyond Traditional Operating Systems
The paper defines an Agent Operating System architecture for agent workloads, decomposing its control plane into five responsibility areas: scheduling, context and memory management, tool and capability registries, policy and trust enforcement, and observability and audit, while mapping integration models onto Linux and Windows primitives rather than proposing wholesale OS replacement.
#Agent#Memory#Safety#Linux
why featured
HKR-H/K/R all pass, but the item gives only a paper title and architecture summary, with no implementation, benchmark, or code. It stays in the 60–71 band.
editor take
AOS splits agent control planes into 5 duties; I buy the systems problem, not the OS-name ambition.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1

more

feeds

admin