ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-06-02

473 papers · updated 3m ago
2026-06-02 · Tue
17:59
6d ago
arXiv · cs.AI· atomEN17:59 · 06·02
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models
The authors introduce Imaginative Perception Tokens for BAGEL and evaluate them on PET, PT, and MVC with about 20K examples; IPT supervision raises MVC accuracy by 3.4% and often beats textual chain-of-thought training without image generation at inference time.
#Multimodal#Vision#Reasoning#BAGEL
why featured
HKR-H/K pass: the title offers a new mechanism, and the body gives training scale plus an accuracy delta. No product path, open-source impact, or major-lab signal, so it stays in the 60–71 band.
editor take
IPT trains BAGEL on ~20K examples and adds 3.4% on MVC; I buy the anti-text-CoT signal for spatial reasoning.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
17:59
6d ago
arXiv · cs.AI· atomEN17:59 · 06·02
Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking
Humanoid-GPT trains a GPT-style causal-attention Transformer on a 2B-frame retargeted motion corpus, combining major mocap datasets and in-house recordings for whole-body control, and reports zero-shot tracking on unseen motions and control tasks.
#Robotics#Agent#Benchmarking#Humanoid-GPT
why featured
HKR-H/K/R all pass, but this is a single arXiv robotics-control paper with method and data scale only; code, real-robot results, and independent reproduction are not disclosed. Lower-band score: 70, tier all.
editor take
Humanoid-GPT trains on 2B motion frames. Big zero-shot claim, but the RSS gives no metrics.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:58
6d ago
arXiv · cs.CL· atomEN17:58 · 06·02
Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics
The paper tests LMs on quantity comparisons such as 110 cm versus 1.2 m across several controlled unit systems, finds accuracy drops near comparison boundaries, and shows linear surrogate models predict preferences from numerical-difference and unit-scale-difference cues.
#Reasoning#Interpretability#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv mechanism paper with no production replacement claim or major model release. The concrete finding is useful, yet the impact stays in the 60–71 band.
editor take
LMs degrade near 110cm-vs-1.2m boundaries; unit conversion looks less like computation than heuristic voting.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
17:56
6d ago
● P1arXiv · cs.AI· atomEN17:56 · 06·02
Research Proposes Sleep Paradigm for Language Models to Consolidate Memory and Self-Modify
The paper proposes a “Sleep” paradigm with two stages: Knowledge Seeding distills a smaller self into a larger network using on-policy distillation and RL-based imitation learning, while Dreaming uses RL to generate synthetic curricula for rehearsing new knowledge without human supervision.
#Memory#Fine-tuning#Reasoning#Research release
why featured
HKR-H/K/R all pass: the title has a strong hook, the summary gives a two-stage mechanism, and memory consolidation is a live agent problem. Missing metrics and artifacts keep it in the 78–84 band.
editor take
“LLMs need sleep” is sticky framing, but the actual bet is moving episodic context into weights; without forgetting and safety data, don’t call it self-improvement yet.
sharp
Three sources track the same arXiv 2606.03979 paper: cs.AI and cs.LG are duplicate listings, while Jiqizhixin turns the abstract into the “dreaming” hook. The agreement comes from the paper’s own framing, not independent validation. The concrete mechanism is two-stage: Knowledge Seeding distills a “smaller-self” memory into a larger network, then Dreaming uses RL to generate synthetic curricula for rehearsal. I like the direction more than another context-window stunt, because it targets weight-level continual learning rather than retrieval cache. But I don’t buy the strong “self-modify” framing yet. The abstract claims experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization, but gives no forgetting rate, contamination protocol, or rollback condition. Compared with RAG memory or long-context Claude/Gemini-style product memory, this reads like a research probe, not a deployable memory substrate.
HKR breakdown
hook knowledge resonance
open source
92
SCORE
H1·K1·R1
17:56
6d ago
arXiv · cs.AI· atomEN17:56 · 06·02
Research paper formalizes visual binding problem using information-theoretic approach with Vision Transformer probe
The paper formalizes the visual binding problem with an information-theoretic approach and introduces a probe to measure binding information in ViT representations, testing [CLS] and spatial tokens across feature sharing, occlusion, and natural-feature datasets while comparing several pre-trained ViTs; the RSS snippet does not disclose model names, dataset names, or quantitative results.
#Vision#Interpretability#Benchmarking#Research release
why featured
HKR-K passes: the post gives a testable ViT binding-information probe and experiment conditions. The angle is academic interpretability, with no product impact or broad industry nerve, so it stays in all.
editor take
This paper gives ViT binding an information-theoretic probe; names and scores are undisclosed, so don’t crown it a benchmark yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
17:53
6d ago
arXiv · cs.CL· atomEN17:53 · 06·02
QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards
QUBRIC co-designs query rewriting and rubric generation for rubric-based RL beyond verifiable rewards, using teacher-derived key points, contrastive rubric generation, and learnability filtering for GRPO training. It reports a +5.5 point ArenaHard gain over the SFT baseline and a +6.3 point average transfer gain across three held-out legal, moral, and narrative reasoning benchmarks.
#Reasoning#Alignment#Benchmarking#QUBRIC
why featured
HKR-H/K/R pass, but this is a single arXiv methods paper with benchmark gains only; no artifact, major-lab signal, or production replacement claim is disclosed. It stays in the interesting research band below featured.
editor take
QUBRIC beats SFT by 5.5 on ArenaHard; I buy the direction, but rubric RL still inherits teacher-keypoint quality.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:52
6d ago
arXiv · cs.CL· atomEN17:52 · 06·02
AlignAtt4LLM: Fast Simultaneous Speech Translation for Decoder-Only LLMs at IWSLT 2026
AlignAtt4LLM uses a Qwen3-ASR and Gemma-4 E4B-it cascade on the IWSLT 2026 development set, beating supplied baselines for English-German and English-Italian at about 2 seconds low latency and below 4 seconds CU-LongYAAL high latency, while English-Chinese results are more mixed.
#Audio#Alignment#Inference-opt#Qwen
why featured
HKR-K passes with model pairing, language pairs, and latency numbers. HKR-H/R miss because this is a narrow task-paper result with limited product or competitive impact for general AI practitioners.
editor take
AlignAtt4LLM beats IWSLT 2026 baselines for En-De/En-It at ~2s latency; mixed En-Zh keeps the Gemma cascade honest.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
17:51
6d ago
arXiv · cs.CL· atomEN17:51 · 06·02
Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning
The paper introduces ACTS, a Markov decision process controller that reads the reasoning trace and remaining token budget at each step, then selects a reasoning strategy and steering phrase for a frozen reasoner. Experiments across multiple benchmarks report full-thinking-level performance with token savings, but the snippet does not disclose exact savings or benchmark scores.
#Agent#Reasoning#Inference-opt#Research release
why featured
HKR-H/K/R pass, but the post gives a mechanism and qualitative “near full-thinking with token savings” only; no savings ratio or strong benchmark number is disclosed, so it stays in the 60–71 research band.
editor take
ACTS reads trace and token budget each step; no savings ratio disclosed, so I file it as reasoning scheduling, not an efficiency breakthrough.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
17:50
6d ago
arXiv · cs.AI· atomEN17:50 · 06·02
Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation
AgenticRL uses a multimodal GPT agent to generate and refine reward functions for vision-conditioned UAV navigation, trains policies with PPO, and reports a 71% policy-behavior improvement over initial rewards, with 91% real-world success and 94% sim-to-real accuracy.
#Agent#Vision#Robotics#AgenticRL
why featured
HKR-H and HKR-K pass: the paper gives a concrete mechanism plus 71% and 91% results. As a single arXiv robotics/RL paper without product uptake or multi-source discussion, it stays at the top of the 60–71 band.
editor take
AgenticRL reports 91% real-world UAV success. GPT-written reward loops remove one manual robotics-RL knob.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R0
17:42
6d ago
HuggingFace Papers (takara mirror)· rssEN17:42 · 06·02
VLESA Vision-Language Embodied Safety Agent for Human Activity Monitoring
VLESA monitors egocentric video, predicts dangerous human actions, and triggers safety interventions; on ASIMOV-2.0, it exceeds baselines in exact-frame intervention accuracy, while a GRPO-trained goal-conditioned Q-filter improves action safety by over 41 percentage points.
#Agent#Vision#Safety#VLESA
why featured
Concrete mechanism and a +41pp result give HKR-K, with H/R present but narrow. This is a single paper with no major-lab, product, or multi-source adoption signal, so it stays in 60–71.
editor take
VLESA lifts action safety by 41 points; ASIMOV-2.0 is useful, but home-video generalization remains unproven.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:37
6d ago
arXiv · cs.CL· atomEN17:37 · 06·02
A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026
CUNI implements simultaneous speech translation with the offline direct speech-to-text Canary model and AlignAtt, submitting it to the IWSLT 2026 shared task for Czech-English, English-German, and English-Italian; the system has 1B parameters and supports 25 source and 25 target languages.
#Audio#Multimodal#Benchmarking#CUNI
why featured
HKR-H/K pass: the pocket offline speech-translation angle is clicky, and the post gives 1B parameters, 25×25 languages, and IWSLT tasks. HKR-R is weak; this is a niche benchmark submission, not a product or flagship model.
editor take
CUNI runs 1B Canary on three IWSLT 2026 pairs; offline ST doing simultaneity is neat, but latency numbers are undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
15:57
6d ago
HuggingFace Papers (takara mirror)· rssEN15:57 · 06·02
Leveraging BART to Assess CS1 C++ Programming Assignments using Rubric-based Criteria
The paper fine-tunes BART with LoRA on multi-semester CS1 C++ submissions to jointly predict numeric scores and letter-grade buckets, using rubrics and a distribution-matching loss; multitask BART with boundary-based soft labels and rubric context reports lower MAE and better grade-distribution alignment than single-task, hard-label, or code-only baselines.
#Fine-tuning#Code#Benchmarking#Research release
why featured
HKR-K passes because the post gives a concrete model and labeling mechanism, but no MAE number, dataset size, or reproducible setup. The CS1 grading focus is far from mainstream AI product or tooling concerns.
editor take
BART+LoRA lowers MAE on multi-semester CS1 data; sample size is undisclosed, so don't trust the grading story yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
15:18
6d ago
HuggingFace Papers (takara mirror)· rssEN15:18 · 06·02
Merit or networks? What decides where research is published
The study used a discipline-trained LLM to score idea quality before publication across 6,208 economics working papers, then estimated journal placement from five inputs; execution quality was the largest input, while connections raised placement odds and mattered most near the most selective journals.
#Reasoning#Benchmarking#Research release
why featured
HKR-H/K pass: the title has tension, and the summary gives 6,208 papers plus a concrete quality-vs-network finding. AI is mainly a research instrument here, with no model, product, or direct practitioner impact.
editor take
An LLM blind-scored 6,208 econ papers: execution dominates, connections bite near top journals; cronyism exists, but not as the whole story.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
14:49
6d ago
HuggingFace Papers (takara mirror)· rssEN14:49 · 06·02
Research proposes conformal language modeling via posterior sampling
The paper proposes sampling from approximations to an LLM posterior conditioned on a calibrated high-scoring region, and evaluates the method on open-ended biography generation and mathematical problem solving while retaining target risk control.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K/R pass: the paper gives a testable mechanism for posterior sampling plus calibrated high-score regions, tied to hallucination control. HKR-H is weak, and the source omits authors, code, and metrics, so it stays in all.
editor take
Posterior sampling controls hallucination here; only bio and math cases disclosed, with no model scale or risk threshold.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
14:48
6d ago
HuggingFace Papers (takara mirror)· rssEN14:48 · 06·02
Re-Ranking Through an Attribution Lens for Citation Quality in Legal QA
The paper finds that semantic similarity does not correlate with passage attribution on AQuAECHR, then trains a lightweight cross-encoder on continuous perturbation-based attribution scores to re-rank legal QA retrieval passages under two language models and five-fold cross-validation.
#RAG#Benchmarking#Research release#Benchmark
why featured
HKR-K/R pass: the item gives a dataset, mechanism, and 5-fold setup, and it targets RAG citation quality. No effect size is disclosed, and the angle is narrow, so it stays in the 60–71 band.
editor take
On AQuAECHR, similarity ranking loses to random; using embedding top-k as a citation proxy in legal RAG looks sloppy.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
14:34
6d ago
HuggingFace Papers (takara mirror)· rssEN14:34 · 06·02
Investigating Adversarial Robustness of Multi-modal Large Language Models
The paper studies adversarial robustness in MLLMs and reports that end-to-end training with robust vision encoders improves performance under strong attacks by 28 CIDEr points and 11.7% VQA accuracy over constrained plug-and-play baselines.
#Multimodal#Vision#Safety#CLIP
why featured
HKR-K/R pass: the summary gives a robust vision-encoder mechanism and two attack-time gains. HKR-H is weak, and this is a single paper summary without artifact details or visible industry debate, so it stays in the 60-71 band.
editor take
End-to-end robust vision encoders add 28 CIDEr and 11.7% VQA; CLIP-alignment defenses look like a ceiling, not a moat.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
13:07
6d ago
HuggingFace Papers (takara mirror)· rssEN13:07 · 06·02
Research proposes PF-OPSD method combining world models and language models for complementary reasoning
The paper proposes PF-OPSD and reports 10.6% and 10.9% gains over baselines on VRQABench and OpenWorldQA; training uses ground-truth future videos as teacher-side privileged context, while the deployable student never observes true futures at test time.
#Reasoning#Multimodal#Vision#Research release
why featured
HKR-H comes from the future-video teacher setup, and HKR-K has method plus two benchmark gains. It remains an academic multimodal-reasoning paper without product impact or industry tension, so it stays in the 60–71 band.
editor take
PF-OPSD gains 10.6%/10.9%; using true futures only as teacher privilege is a cleaner answer than trusting video rollouts.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
12:36
6d ago
HuggingFace Papers (takara mirror)· rssEN12:36 · 06·02
When Attention Collapses: Stage-Aware Visual Token Pruning from Structure to Semantics
The paper introduces STS, a two-stage visual token pruning framework for VLM inference: repulsion-based sampling first preserves spatial and structural diversity, then instruction-aware cross-attention filters prompt-irrelevant tokens; the snippet does not disclose model names, benchmark scores, latency gains, or token reduction ratios.
#Vision#Multimodal#Inference-opt#Research release
why featured
HKR-H/K/R pass, but the item only provides a title and mechanism summary, with no benchmark, code artifact, or production claim. This stays in the mid “all” band.
editor take
STS prunes visual tokens in two stages; no reduction or latency numbers are disclosed, so I don’t buy the win yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
11:43
6d ago
HuggingFace Papers (takara mirror)· rssEN11:43 · 06·02
Post-Hoc Robustness for Model-Based Reinforcement Learning
The paper introduces inference-time robustification for deep RL agents, using a trained nominal policy and learned transition model for one robust policy improvement step without extra neural-network training.
#Agent#Reasoning#Inference-opt#Gymnasium MuJoCo
why featured
HKR-K passes for a concrete inference-time robustness mechanism. HKR-H/R are weak, and the post gives no benchmark numbers or product path, so it stays in the lower all band.
editor take
The paper adds one robust improvement step for perturbed MuJoCo; MPC+PGD at inference is useful, but latency is undisclosed.
HKR breakdown
hook knowledge resonance
open source
49
SCORE
H0·K1·R0
11:27
6d ago
HuggingFace Papers (takara mirror)· rssEN11:27 · 06·02
EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation
EvoMemNav builds a Visual-Semantic Memory Graph that stores raw views with semantic cues and topological relations in a room-view-object hierarchy, then uses budgeted coarse-to-fine VLM calls and reflection-driven write-back; experiments on GOAT-Bench and HM3D report SR/SPL gains across object, text-description, and image-goal modalities.
#Agent#Vision#Memory#EvoMemNav
why featured
HKR-H and HKR-K pass via VSMGraph, the room-view-object hierarchy, and GOAT-Bench/HM3D claims. Exact gains are not disclosed, and embodied navigation remains too niche for featured.
editor take
EvoMemNav keeps raw views in VSMGraph and budgets VLM calls; SR/SPL gains are claimed, but no margins disclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
11:23
6d ago
HuggingFace Papers (takara mirror)· rssEN11:23 · 06·02
BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
BaltiVoice releases a 16.8-hour read-speech corpus for Balti with 10,060 validated Nastaliq utterances, and a fine-tuned OpenAI Whisper-small model reduces WER from a 182.18% zero-shot baseline to 30.07% on 538 held-out validation utterances.
#Audio#Fine-tuning#OpenAI#HuggingFace
why featured
HKR-K is solid: the article gives corpus size, text count, and WER change for a reproducible Whisper-small setup. HKR-H and HKR-R are weak because the release is niche academic ASR work.
editor take
BaltiVoice cuts Whisper-small WER to 30.07% with 16.8 hours; low-resource ASR still lives or dies on clean data.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
11:07
6d ago
HuggingFace Papers (takara mirror)· rssEN11:07 · 06·02
Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs
Tree-like Self-Play frames secure code generation as fine-grained sequential decision-making, raising CodeLlama-7B's SPR@1 on Python security benchmarks to 75.8% versus 57.0% for SFT.
#Code#Fine-tuning#Safety#CodeLlama
why featured
HKR-H/K/R pass, but this is a niche secure-code training paper rather than a broad model or product release. The 75.8% vs 57.0% result gives signal, placing it in all below featured.
editor take
TSP lifts CodeLlama-7B to 75.8% SPR@1 on Python; I buy token-level self-play, but need real-repo patch data.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
10:45
6d ago
HuggingFace Papers (takara mirror)· rssEN10:45 · 06·02
Research Paper Reevaluates Tensor Decompositions for Language Model Compression
The paper evaluates tensor compression across dense and MoE LLM architectures, identifies a mismatch between tensor decompositions’ shared-subspace assumption and heterogeneous representations in modern LLMs, and releases code on GitHub, while the snippet does not disclose model sizes or compression ratios.
#Inference-opt#Benchmarking#Research release#Open source
why featured
HKR-K/R pass: it offers a mechanism for why tensor-decomposition compression fails and open code. Missing compression ratios, model list, and benchmark numbers keep it in the 60–71 band.
editor take
The paper tests tensor compression on dense and MoE LLMs; no model sizes or ratios disclosed, so TT-LLM stays unproven for deployment.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
09:26
6d ago
HuggingFace Papers (takara mirror)· rssEN09:26 · 06·02
Paper Proposes Gaussian Trust Region Policy Optimization Method for PPO
The paper proposes Gaussian Trust Region Policy Optimization to reshape PPO’s trust region with a Gaussian kernel. Its bounded, non-monotonic constraint relaxes under sustained high-advantage updates. The method is tested across games, simulated robotic control, open-world exploration, and language model post-training. The code is available through an anonymous 4open repository.
#Fine-tuning#Robotics#Benchmarking#Research release
why featured
HKR-K passes: GTR uses a Gaussian kernel to reshape PPO trust regions, with bounded non-monotonic constraints and public code. HKR-H/R are weak; no baseline gains or training cost are disclosed.
editor take
GTR reshapes PPO’s trust region with a Gaussian kernel; no benchmark numbers are disclosed, so four-domain claims need restraint.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
08:54
6d ago
HuggingFace Papers (takara mirror)· rssEN08:54 · 06·02
Beyond Semantics: Modeling Factual and Affective Perceptual Experiences from Vision-Language Data
The paper introduces PercepT, a two-stage architecture for P-Topics modeling, and reports 0.97 silhouette score and 0.94 AUC on ArtELingo, compared with 0.37 and 0.77 from the closest baseline.
#Multimodal#Vision#Benchmarking#PercepT
why featured
HKR-K passes on the PercepT mechanism and ArtELingo metrics, but HKR-H/R are weak: no demo, release path, adoption signal, or practitioner pain point. No hard exclusion; this fits a routine research-release all tier.
editor take
PercepT hits 0.97 silhouette on ArtELingo; I trust the clustering signal, not the cross-cultural perception claim yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
08:40
6d ago
HuggingFace Papers (takara mirror)· rssEN08:40 · 06·02
Evaluating LLMs' Effectiveness on Real-World Consumer Device Repair Questions
The study introduces a benchmark of 991 Reddit repair questions and evaluates six LLMs in English and Bangla; GPT-5.4 ranks best overall, while all models still make substantial errors in high-risk repair tasks.
#Reasoning#Safety#Benchmarking#Reddit
why featured
HKR-H/K/R all pass, but this is a narrow single-paper benchmark without broad field impact yet. The concrete signal is 991 Reddit repair questions, 6 LLMs, English/Bengali testing, and unreliable high-risk repair advice.
editor take
991 Reddit repair questions test six models; GPT-5.4 leads, but high-risk fixes still fail, and Bangla lags English.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
05:48
7d ago
HuggingFace Papers (takara mirror)· rssEN05:48 · 06·02
SenseJudge: Human-Centric Preference-Driven Judgment Framework
The paper proposes SenseJudge and SenseBench for two tasks: personalized LLM judging and model ranking; the RSS snippet does not disclose dataset size, baseline list, or exact scores.
#Alignment#Benchmarking#SenseJudge#SenseBench
why featured
HKR-K passes for a new eval framework and two disclosed tasks, but sample size, baselines, and scores are not disclosed. HKR-H and HKR-R are weak, so it stays in all.
editor take
SenseJudge covers 2 eval tasks; dataset size and scores are undisclosed, so I don’t buy the “human preference” claim yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:45
7d ago
HuggingFace Papers (takara mirror)· rssEN04:45 · 06·02
$A^2$: Smaller Self-Supervised ViTs Localize Better than Larger Ones
The paper proposes $A^2$, which uses a small self-supervised ViT to locate attention peaks and crop regions, then embeds the crops with a larger ViT; across 5 benchmarks, it is competitive with DFR and outperforms end-to-end attention training under stronger distribution shifts.
#Vision#Embedding#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the title has a counterintuitive finding, and the summary gives A²’s two-step mechanism plus 5-benchmark results. HKR-R is weak, and this is a technical vision paper, so it stays in the 60–71 all band.
editor take
$A^2$ lets small ViTs crop and large ViTs embed; across 5 benchmarks, that inverse-scaling jab lands.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
A Theoretical Framework for Statistical Evaluability of Generative Models
The paper introduces a theoretical framework for generative model evaluation and proves that IPMs over bounded test classes are evaluable from finite samples, while Rényi and KL divergences are not, because rare events can determine their values.
#Benchmarking#Research release
why featured
HKR-K/R pass: the paper gives a finite-sample evaluability boundary for generative-model metrics. HKR-H fails; no experiments or tool artifact are disclosed, so it stays at the top of 60–71.
editor take
This nails the finite-sample line: bounded IPMs are evaluable; KL/Rényi break on rare events. Stop treating divergence scores as certainty.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
FLARE: Diffusion for Hybrid Language Models
FLARE converts hybrid-attention AR LLMs into diffusion language models. One checkpoint supports AR-style verified decoding and diffusion-style parallel denoising. The paper reports throughput gains over open-source dLLM baselines under single-GPU concurrent serving, and identifies transfer data quality as the main factor for capability preservation.
#Inference-opt#Reasoning#FLARE#arXiv
why featured
HKR-H/K/R all pass: the hook is one checkpoint doing AR and diffusion decoding, with a single-GPU throughput claim touching serving cost. Kept below featured because exact numbers, model size, and reproducible setup are not disclosed in the feed.
editor take
FLARE runs AR and diffusion from one checkpoint. I buy the data-quality diagnosis; single-GPU throughput is the narrow proof.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CRMA: A Spectrally Bounded Backbone for Modular Continual Fine-Tuning of LLMs
CRMA uses Sinkhorn normalization to keep its mixing matrix M doubly stochastic at every forward pass, and on Mistral-7B across 5 sequential domains it reduces loss-relative drift from +42.96% to -0.17% compared with naive sequential fine-tuning.
#Fine-tuning#Memory#Benchmarking#Mistral
why featured
HKR-K/R pass: the post gives a Sinkhorn doubly stochastic constraint and a Mistral-7B five-domain drift result. HKR-H fails on a jargon-heavy title; this is useful research, not a major model release.
editor take
CRMA cuts Mistral-7B five-domain drift to -0.17%; I’d check code first, but the 98/100 toggle test is hard to ignore.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
A combination of noise and bilateral filters achieve supralinear and scalable adversarial robustness in CNNs
The paper proposes a preprocessor combining Gaussian noise and bilateral filtering, and when paired with adversarial training on RobustBench it ranks second on AutoAttack while using about 35% of the training FLOPs versus state-of-the-art defenses.
#Vision#Safety#Benchmarking#RobustBench
why featured
HKR-K is strong: RobustBench #2, AutoAttack, and 35% training FLOPs are concrete. HKR-H/R mainly serve vision-safety readers, while CNN adversarial robustness has limited spillover to LLM and agent practitioners.
editor take
Gaussian noise plus bilateral filtering ranks second on AutoAttack at 35% training FLOPs; I’d audit adaptive attacks before buying it.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation
The paper introduces ACR to measure ineffective-gradient batches in GRPO training, and AVSPO reduces advantage collapse by 58-63% versus GRPO across 0.5B to 14B models on mathematical reasoning benchmarks.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but this is a narrow arXiv post-training paper with method names, scale, and reduction only; no artifact or external replication is disclosed, so it stays below featured.
editor take
AVSPO cuts ACR 58-63% on 0.5B-14B math models; GRPO’s failure mode is measurable, but virtual rewards need bias audits.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
How Generation Architecture Shapes Code Complexity in Multi-Agent LLM Systems: A Paired Study on HumanEval
The paper compares six multi-agent code-generation architectures under two GPT-4o-family models across 164 HumanEval tasks and 1,968 paired observations, finding two indistinguishable complexity clusters separated by a 50–130% gap, while the heavier cluster shows no pass@1 advantage over leaner architectures.
#Agent#Code#Benchmarking#OpenAI
why featured
HKR-H/K/R all pass, but this is still a single arXiv HumanEval study with no disclosed adoption or tooling impact; defaulting to the lower 60-71 band keeps it in all.
editor take
Six agent architectures split into two clusters across 1,968 samples; 50–130% extra code complexity buys no pass@1 gain.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns
scicode-lint detects methodology bugs in scientific Python code with a two-tier design that generates patterns at build time and runs a small local model at runtime; it reports 97.7% accuracy across 66 controlled patterns, plus 65% precision at 100% recall for preprocessing leakage on Kaggle notebooks.
#Code#Tools#Benchmarking#scicode-lint
why featured
HKR-H/K/R all pass, but this is a single arXiv tooling paper with abstract-level metrics only; open-source status, real-project scale, and external replication are not disclosed, so it stays in 60–71.
editor take
scicode-lint hits 97.7% on 66 controlled patterns, but 54% precision on held-out papers; I don’t buy the tokens-over-engineering pitch.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Reconsidering Positional Supervision in Masked Diffusion Language Model Training
The paper tests positional sensitivity in LLaDA-8B-Instruct under iterative MDLM decoding: shifting only 1% of generated tokens by one position substantially reduces Arena-Hard win rates against the unintervened model. A CTC-style supervised fine-tuning objective with a <slack> token beats the original model and a matched cross-entropy baseline on four open-ended generation benchmarks, with statistically significant gains on all four.
#Fine-tuning#Benchmarking#Inference-opt#Research release
why featured
HKR-H and HKR-K pass: the 1% positional shift result is testable, and CTC-style SFT gives a concrete comparison. The MDLM-training scope is too narrow for featured.
editor take
LLaDA-8B-Instruct breaks under 1% token shifts; MDLM training should stop treating position-wise CE as harmless.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MURMUR: An Efficient Inference System for Long-Form ASR
Murmur matches single-pass accuracy on AMI-IHM and reduces long-form ASR latency by 4.2x, using intermediate chunk sizes plus sliding-window KV cache eviction over output and speech tokens with less than 1% relative tcpWER degradation.
#Audio#Inference-opt#Murmur#Research release
why featured
HKR-H/K/R all pass, but this is a niche arXiv ASR inference paper rather than a broad model or product release. The 4.2x latency result is useful, so it lands high in 60–71.
editor take
Murmur cuts AMI-IHM latency 4.2x; I trust this KV-eviction scalpel more than another giant ASR retrain.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting
RAFT improves average domain accuracy by 23.2% over standard SFT across three instruction-tuned backbones and five domains, while recovering SFT-induced degradation on MS-Bench and IFEval by 18.2% and 10.2%, respectively.
#Fine-tuning#Alignment#Benchmarking#RAFT
why featured
HKR-H/K/R all pass, but this is an arXiv fine-tuning method paper with metrics only; no artifact or adoption is disclosed, so it stays in the interesting 60–71 band.
editor take
RAFT beats SFT by 23.2% across 3 backbones and 5 domains; its useful claim is trajectory preservation, not more data.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Towards Sparse Video Understanding and Reasoning
REVISE uses a multi-round agent for video question answering by selecting a small set of informative frames, maintaining a summary-as-state across rounds, and stopping early when confidence is sufficient.
#Agent#Reasoning#Vision#REVISE
why featured
HKR-H/K/R pass, but this is a single arXiv paper with no disclosed benchmark gains, code, or reproducible setup in the provided text. It stays in all below the 72 featured line.
editor take
REVISE sparsifies multi-round VQA, but frame-reduction numbers are undisclosed; EAGER’s 3-part reward is the credible part.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Continuous Reasoning for Vision-Language-Action
The paper proposes Continuous Reasoning for Vision-Language-Action, using a shared Gaussian latent interface and a self-verification objective, and reports a 40.4% mean subtask success gain over π0.5 on TX-G2 plus 26.3% on HSR.
#Reasoning#Vision#Robotics#AgiBot
why featured
HKR-K is strong and HKR-H clears on the VLA angle, but this is a single arXiv robotics paper with no disclosed code, lab authority, or replication detail. Audience impact stays below featured.
editor take
Continuous Reasoning beats π0.5 by 40.4% on TX-G2; I buy the bet that VLA reasoning shouldn't be text-shaped.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
The paper introduces trust functions that assign each weak label a scalar trust score, then filter weak supervision for student training across world knowledge, quantitative reasoning, and strategy games; the abstract reports near-lossless weak-to-strong generalization, but does not disclose exact benchmark scores.
#Fine-tuning#Reasoning#Alignment#Research release
why featured
HKR-H/K/R all pass, but the text gives mechanism and domains only, with no authors, metrics, or artifact. As a single arXiv research item, it stays in the high 60–71 band, not featured.
editor take
Trust functions score and filter weak labels; scores aren’t disclosed. I buy data selection, not the “near-lossless” claim yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement
StressDream steers diffusion-based video world models by optimizing initial noise at inference time, using a vision-language semantic objective and a plausibility objective to generate high-impact but plausible futures for policy evaluation in autonomous driving and robotic manipulation.
#Robotics#Vision#Agent#StressDream
why featured
HKR-H/K/R all pass, but the article gives only arXiv title-level facts. The mechanism is useful, yet no metrics, artifact, or top-lab signal is disclosed, so it stays in the upper 60–71 band.
editor take
StressDream optimizes diffusion initial noise, not the world model; smells like a red-team layer for autonomy sims, gated by OOD control.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization
WUSH derives blockwise linear transforms for joint LLM weight-activation quantization under RTN AbsMax quantizers, and on Llama-3.1-8B-Instruct with MXFP4 W4A4 it improves average accuracy by 2.8 points over Hadamard-based baselines while reaching up to 5.8x per-layer throughput over BF16 via FP4 MatMul.
#Inference-opt#IST-DASLab#Llama#Research release
why featured
HKR-K/R pass: the paper gives a concrete transform mechanism and a +2.8-point W4A4 result on Llama-3.1-8B, tied to inference cost. HKR-H is weak, and quantization math keeps it in the 60–71 band.
editor take
WUSH beats Hadamard by 2.8 points on Llama-3.1-8B MXFP4 W4A4; FP4 quantization is moving from clever rotations to provable transforms.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Safety Game: Inference-Time Alignment of Black-Box LLMs via Constrained Optimization
The paper proposes Safety Game, a black-box inference-time alignment framework that requires no retraining or model-internal access and uses a two-player zero-sum game plus a linear programming solver to compute equilibrium strategies between safety and helpfulness.
#Alignment#Safety#Inference-opt#Research release
why featured
HKR-H/K/R pass: black-box, no-retraining inference alignment has a real hook. The body gives no experiment numbers, model list, or artifact, so it stays below featured.
editor take
Safety Game needs only black-box inference access; no metrics are disclosed, so LP equilibrium sounds neat but latency decides.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Silent Failures in Federated Personalization of Foundation Models
The paper defines six “silent failure” modes in federated personalization of foundation models, including amplified bias, fairness collapse, and alignment erosion. It argues that privacy constraints limit behavioral visibility, while existing federated benchmarks measure system performance and centralized trustworthiness benchmarks require model access incompatible with federated privacy.
#Fine-tuning#Safety#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv paper with taxonomy and benchmark-gap claims only; no tool, measured deployment impact, or adoption signal is disclosed, so it stays in all at 70.
editor take
The paper names 6 silent-failure modes in federated personalization; I buy the framing, but taxonomy is not a benchmark.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Sympatheia: Emotionally Adaptive Voice Assistant with Continuous Affect Conditioning
Researchers introduce Sympatheia, a speech-to-speech dialogue framework, and build Sympatheia-18k with 18,000 synthetic dialogues and 12 emotion anchors to condition responses through a continuous valence-arousal control signal.
#Audio#Multimodal#Alignment#Sympatheia
why featured
HKR-H/K/R all pass, but this is a single arXiv paper with a framework, synthetic dataset, and control signal only; no real-user evaluation or product deployment is disclosed, so it stays at the top of 60–71.
editor take
Sympatheia-18k trains on 18k synthetic dialogues; I don’t buy the empathy framing, but VA control is useful for voice agents.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
The Shape of Wisdom: Decision Trajectories in Language Models
The paper analyzes 9,000 MMLU decision trajectories across Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-v0.3, finding unstable-correct cases form the largest group rather than stable-correct cases.
#Reasoning#Interpretability#Benchmarking#Qwen
why featured
HKR-H/K/R all pass, but this is still a narrow arXiv eval paper: 3 small instruct models on MMLU trajectories, with no known-author pull, tool release, or cross-source pickup, so it stays high-all.
editor take
Across 9,000 MMLU trajectories, unstable-correct is largest; stop treating correct as solved in 7B/8B models.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Modeling Robotics Dataset Construction as an Artifact-Based Build Process
The paper introduces Bagzel, an open-source Bazel extension that models ROS bag to nuScenes dataset construction as artifact-based dependency-graph builds, reporting up to 386.26x faster warm builds and 7.21x faster incremental builds than a sequential rosbag2nuscenes baseline on a 20.4 GB dataset.
#Robotics#Multimodal#Bagzel#Bazel
why featured
HKR-H and HKR-K pass: Bagzel reframes robotics dataset construction as artifact builds and reports 386.26x warm-build speedup on 20.4GB. Robotics MLOps is niche, so it stays below featured.
editor take
Bagzel reports 386.26x faster warm builds on 20.4GB ROS data; robotics pipelines should have stolen Bazel years ago.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
The paper introduces the Oracle Performance Gap metric and a diagnostic suite, finding that RL training on benchmark train splits reaches nearly the same performance as training on test splits, so current LLM RL benchmarks fail to separate further progress or expose failures under distribution shifts, difficulty changes, and counterfactual scenarios.
#Reasoning#Benchmarking#Alignment#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv paper with OPG and diagnostic-suite claims only; authors, experiment scale, and adoption signal are not disclosed, so it stays high in the 60–71 band.
editor take
OPG quantifies train-test training gaps; near-zero gaps make RL benchmark wins smell like answer-key adaptation.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Step-Level Sparse Autoencoder for Reasoning Process Interpretation
The paper proposes SSAE to interpret LLM Chain-of-Thought reasoning with step-level sparse features; experiments span multiple base models and reasoning tasks, and the code is available in the Miaow-Lab/SSAE GitHub repository.
#Reasoning#Interpretability#Miaow-Lab#Research release
why featured
HKR-H/K/R pass, but the body gives no result numbers, model list, or reproducible setup details. A single arXiv interpretability paper has signal, not enough for featured.
editor take
SSAE extracts step-level sparse CoT features; linear probes recover correctness and logicality, a cleaner debugging target than token-level SAEs.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Hierarchical Online Prompt Mutation with Dual-Loop Feedback for Guardrailed Evidence Document Generation
HOPM evaluated seven prompt-adaptation variants on the same 600 marketplace dispute-evidence cases, raising count win rate from 34.7% to 45.7% and amount-weighted win rate from 22.3% to 41.4% versus a static prompting control.
#Agent#Alignment#Benchmarking#arXiv
why featured
HKR-K and HKR-R pass with 600 matched samples and concrete win-rate gains in a production workflow. HKR-H is weak because the title is dense, so this stays in the interesting-not-featured band.
editor take
HOPM gains 11.0pp on 600 matched cases; less flashy agent lore, more treating prompts as production policies.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
The paper proposes TS-OPSD, which applies high-temperature scaling to a collapsed RL checkpoint’s own logits and distills the smoother distribution back into the student, with experiments on Qwen3-4B-Base and Qwen3-8B-Base showing stronger continued-RL initialization than standard continued RL and rollout-level temperature reheating.
#Reasoning#Fine-tuning#Alignment#Qwen
why featured
HKR-H/K/R pass, but this is a single arXiv post-training method with Qwen3-4B/8B evidence only, no disclosed code, lab signal, or cross-source pickup; it stays in the 60–71 band.
editor take
TS-OPSD reheats collapsed Qwen3-4B/8B RL checkpoints; I buy the angle—rollout temperature that never enters weights is a leaky fix.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
How to Correctly Report LLM-as-a-Judge Evaluations
The paper proposes a plug-in framework that corrects bias from imperfect LLM-judge sensitivity and specificity, then builds confidence intervals using uncertainty from both the test dataset and a human-labeled calibration dataset.
#Benchmarking#Alignment#Research release#Benchmark
why featured
HKR-H/K/R pass, but this is a single arXiv methods paper; the post gives the mechanism, not sample size, error reduction, or adoption, so it stays in 60–71.
editor take
This paper corrects two LLM-judge error types; sample sizes are undisclosed, but evals need statistics, not judge worship.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation
DREAM-S uses neural architecture search, target-aware supernet training, and attention-entropy-guided feature distillation to speed up speculative decoding for VLMs, reporting up to 3.85× speedup over standard decoding across multiple established VLMs, with code released on GitHub.
#Multimodal#Vision#Inference-opt#SAI-Lab-NYU
why featured
HKR-H/K/R pass: the 3.85x VLM decoding claim is concrete and cost-relevant, with code and a named NAS/drafting mechanism. As a single arXiv inference paper, it stays in the 60–71 band.
editor take
DREAM-S reports up to 3.85× VLM decoding speedup; I care whether its NAS-chosen draft architecture reproduces across hardware.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization
The paper proposes DEPO, a Lagrangian primal-dual reinforcement learning method that formulates detector-evasive LLM paraphrasing as a Constrained Markov Decision Process and evaluates it on MAGE, M4, RAID, and peer-review datasets against five detectors.
#Alignment#Safety#Benchmarking#MAGE
why featured
HKR-H/K/R all pass: the adversarial detection-evasion angle is relevant and the post names DEPO plus evaluation datasets. It lacks evasion rates, semantic-preservation numbers, and code, so it stays below featured.
editor take
DEPO tests 4 dataset groups against 5 detectors; hard semantic constraints make this closer to an attack baseline than prompt hacks.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations
AdaptiveK SAE uses linear probes to estimate input semantic complexity and dynamically adjusts Top K sparsity during training, with experiments across 10 language models reporting better reconstruction fidelity, explained variance, cosine similarity, and interpretability metrics than fixed-sparsity baselines.
#Interpretability#AdaptiveK#Research release#Open source
why featured
HKR-H and HKR-K pass: AdaptiveK offers dynamic Top K sparsity and 10 model experiments. The topic is niche interpretability research; no repo, effect size, or production condition is disclosed, so it stays all.
editor take
AdaptiveK tunes Top K across 10 language models; I buy the direction, but no effect sizes are disclosed here.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MineDraft: A Framework for Batch Parallel Speculative Decoding
MineDraft overlaps drafting for one request batch with verification for another, reducing the sequential bottleneck in standard speculative decoding. The paper reports up to 75% higher throughput and up to 39% lower end-to-end latency, and implements MineDraft as a vLLM plugin for inference systems.
#Inference-opt#MineDraft#vLLM#Research release
why featured
HKR-K and HKR-R pass: the story has a concrete mechanism, benchmark numbers, and a vLLM plugin for serving teams. HKR-H is weak because the topic is narrow and systems-heavy, so it stays in the 60–71 band.
editor take
MineDraft overlaps two request batches and reports 75% throughput gains; the vLLM plugin is nice, but workload details are undisclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers
The paper proposes a Bayesian stopping policy for multi-sample LLM answer aggregation, tracking only the L-1 most frequent answer counts; it proves L=3 reaches asymptotic optimality and reports up to 50% fewer LLM calls at similar answer accuracy.
#Reasoning#Inference-opt#Research release
why featured
HKR-H/K/R pass, but this is an arXiv methods paper with mechanism and savings only, not production adoption or broad tooling impact. Defaulting to the lower 60–71 band gives 70 and tier all.
editor take
Bayesian stopping with L=3 tracks top-two answer counts and cuts calls up to 50%; sampling-vote inference finally gets a clean cost knife.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Hypothesis Generation and Inductive Inference in Children and Language Models
The paper compares children and LLM-based agents in a Box Task formalized as Bayesian particle-based program induction, and reports that both discount unreliable evidence and seek missing information, while LLM-based agents over-observe and over-comply with instructions relative to children.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv cognition-evaluation paper with no disclosed deployable fix or market impact. It stays in the 60–71 band, not featured.
editor take
Box Task shows LLM agents discount unreliable evidence; their over-observation is a cost-model bug, not childlike reasoning.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Vision-Language Models
PolarMem converts frozen VLM perceptual signals into HAS, NOT_HAS, and Uncertain memory states, stores them in a polarized graph, and applies lexicographical logic-aware retrieval before semantic similarity during inference; the paper reports improvements on retrieval-intensive tasks and fewer retrieval-level contradictions across eight frozen VLM backbones and six multimodal benchmarks, with code released on GitHub.
#Memory#Multimodal#Vision#PolarMem
why featured
HKR-K and HKR-R pass: the ternary graph memory plus 8 VLM backbones and 6 benchmarks are testable, and VLM reliability is a live practitioner concern. HKR-H is weak and this is a single arXiv paper, so it stays in all.
editor take
PolarMem tests 8 VLMs and 6 benchmarks; explicit NOT_HAS memory is sane, but the snippet gives no gains, so don’t buy breakthrough claims.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Contrastive Representation Regularization for Vision-Language-Action Models
The paper introduces Robot State-aware Contrastive Loss for VLA models, using relative distances between robot proprioceptive states as soft supervision, and reports 69.7% on RoboCasa-Kitchen plus real-robot manipulation success rates rising from 45.0% to 58.3%.
#Multimodal#Robotics#Alignment#arXiv
why featured
HKR-H/K/R are supported by a concrete VLA mechanism and real-robot gain from 45.0% to 58.3%. Still, this is a single arXiv methods paper with no disclosed open-source artifact, major-lab release, or product impact, so it stays in 60–71.
editor take
RS-CL lifts real-robot success from 45.0% to 58.3%; VLA needs proprioceptive structure, not another bigger VLM.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation
TIGER extracts an observation graph from the input and a claim graph from the current output at inference time, assigns each claim a graph-conditioned risk score, and repairs high-risk facts with a frozen backbone across four cross-modal paths: image-to-text, image+text-to-text, audio-to-text, and video-to-text.
#Multimodal#Vision#Audio#TIGER
why featured
HKR-K and HKR-R pass: the mechanism and experiment scope are concrete, and multimodal reliability matters. Single arXiv paper with no effect size, author signal, or artifact keeps it in the lower band.
editor take
TIGER covers 4 cross-modal paths; claim-level repair beats training another judge when the backbone stays frozen.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Research Proposes Importance-Aware Attention Mechanism to Improve Model Performance
Soohyeong Shin and Yeongwook Yang propose SISA, which inserts an SSM-derived importance term into attention scores and runs as one SDPA call; at 152M parameters trained on 5B tokens, it reaches 17.3% LAMBADA-greedy and 100% NIAH from step 1K.
#Reasoning#Inference-opt#Benchmarking#Soohyeong Shin
why featured
HKR-H/K pass: the title challenges attention and the post gives SISA plus concrete small-scale metrics. As a single arXiv paper at 152M/5B tokens with no disclosed code or large-scale replication, it stays in all.
editor take
SISA hits 17.3% LAMBADA at 152M/5B tokens; I buy the SDPA trick before I buy the “forget attention” headline.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG
The paper proposes Grounded Decoding, a training-free RAG decoding framework that fuses a full RAG distribution with a retrieval-only distribution via a KL-barycenter objective, and reports higher factual accuracy and citation quality on ALCE, Natural Questions, and FActScore while keeping model parameters unchanged.
#RAG#Inference-opt#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the mechanism and benchmark suite are clear, and RAG faithfulness matters to builders. No gain numbers, code artifact, or production evidence are disclosed, so it stays in the 60–71 band.
editor take
Grounded Decoding fuses two distributions via a KL barycenter; no effect sizes disclosed, so I’d treat it as a clean RAG decoding patch.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Policy and World Modeling Co-Training for Language Agents
PaW adds auxiliary world-modeling supervision to the same policy during RL, using on-policy rollout transitions as training data, and reports consistent gains over strong RL baselines on three agentic task benchmarks across models and RL algorithms.
#Agent#Reasoning#Fine-tuning#Research release
why featured
HKR-K is clear and HKR-R is relevant to agent training, but the post only says PaW beats strong RL baselines on 3 benchmarks. Model scale, task details, and release status are not disclosed, so it stays at 69.
editor take
PaW co-trains world modeling from on-policy transitions and beats strong RL on 3 agent benchmarks; skipping simulators is the practical win.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
The paper compares Shared-Policy and Isolated-Policy RL for multi-agent LLM workflows across Eval-Opt, Voting, Orch-Workers, math and code tasks, and 0.6B, 1.7B, and 4B models, finding that gains depend on workflow, task, and scale rather than policy sharing alone.
#Agent#Reasoning#Code#Research release
why featured
HKR-H/K/R all pass, but the body gives the experimental matrix without main findings, author authority, or a reproducible tool. This stays in the upper 60–71 research-interest band.
editor take
The paper tests 3 workflows, 2 task types, and 3 scales; policy sharing isn’t a stabilizer, it just moves failure around.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Principle-Evolvable Scientific Discovery via Uncertainty Minimization
PiEvo models scientific discovery as Bayesian optimization over an expanding principle space, using Gaussian Process-based information-directed hypothesis selection and anomaly-driven augmentation; across four benchmarks, it reports 90.81%–93.15% average solution quality, 29.7%–31.1% above state of the art, and an 83.3% convergence-step speedup.
#Agent#Reasoning#Benchmarking#PiEvo
why featured
HKR-K is strong and HKR-R is moderate: PiEvo gives a Bayesian-optimization mechanism and roughly 30% benchmark gains. HKR-H is weak; an unknown-team arXiv paper without real-world task evidence stays in all.
editor take
PiEvo reports 90.81%–93.15% quality on 4 benchmarks; I’d audit task design first, scientific-discovery evals love self-congratulation.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency
PETS allocates stochastic reasoning trajectories using a self-consistency rate, defined as agreement with infinite-budget majority vote; on GPQA, it reaches perfect self-consistency in both offline and online settings while reducing sampling budgets by up to 75% and 55% versus uniform allocation.
#Reasoning#Inference-opt#Benchmarking#ZDCSlab
why featured
HKR-K and HKR-R pass: the paper gives a concrete allocation mechanism and GPQA sampling reductions tied to inference cost. As a single technical arXiv paper with a weak headline hook, it stays in the 60-71 band.
editor take
PETS cuts GPQA trajectories by 75%/55%; adaptive sampling finally treats self-consistency as allocation, not a uniform-vote script.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference
ProbeScale uses task-specific probes to select subnetworks inside pre-trained SLMs; on RoBERTa-Large and T5-Base, the method reduces parameters by 5 to 10 times while retaining 95% to 98% of the original model performance on targeted tasks.
#Inference-opt#Interpretability#RoBERTa#T5
why featured
HKR-H/K/R pass, but this is a single arXiv compression paper with method and two model results only; no code, production workload, or cross-source traction is disclosed, so it stays in the 60–71 band.
editor take
ProbeScale cuts RoBERTa-Large/T5-Base by 5–10x; the catch is target-task 95–98%, with generalization and latency undisclosed.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Simple Recipe Works: Vision-Language-Action Models Are Natural Continual Learners with Reinforcement Learning
UT Austin researchers study continual reinforcement learning for pretrained VLA models across multiple lifelong RL benchmarks, finding that sequential fine-tuning with LoRA preserves plasticity, shows little forgetting, retains zero-shot generalization, and often outperforms more complex continual RL methods.
#Robotics#Fine-tuning#Agent#UT Austin
why featured
HKR-H/K/R pass, but this is a single technical arXiv paper with no exact scores, benchmark names, or artifact details in the feed. Robotics continual RL is useful but niche, so it stays in 60–71.
editor take
UT Austin says LoRA sequential FT shows little forgetting across lifelong RL benchmarks; I buy it, but benchmarks aren't robot deployment.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization
BitsMoE decomposes each MoE layer with SVD and assigns bits via integer linear programming; under 2-bit quantization on Qwen3-30B-A3B-Base, it runs quantization 12.3× faster than GPTQ, improves average accuracy by 27.83 percentage points, and increases decoding speed by 1.76×.
#Inference-opt#Qwen#GPTQ#BitsMoE
why featured
HKR-K/R pass: the paper gives concrete mechanisms and metrics for 2-bit Qwen3-30B-A3B-Base quantization. The inference-optimization topic is technical, so it stays in the lower 60–71 band.
editor take
BitsMoE beats GPTQ by 27.83 points on Qwen3-30B-A3B 2-bit; MoE quantization needs spectral budgets, not layer-level bluntness.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
STARFISH: Fast Accuracy Recovery in Pruned Networks from Internal State Healing
STARFISH aligns a pruned network’s internal representations with the original model using a tiny unlabeled calibration set, improving recovered accuracy by up to 22% over state-of-the-art methods on ViT-based networks after 50% weight pruning.
#Inference-opt#Vision#STARFISH#DeiT-B
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and a +22% recovery claim tied to inference cost. HKR-H is weak, and this single arXiv pruning paper stays in the 60–71 band.
editor take
STARFISH restores 82% dense DeiT-B accuracy after 75% pruning using 0.4% ImageNet calibration; internal-state healing looks cheap and nasty.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
OmniOPD replaces token-level logit matching with multi-token chunk semantic verification, beating standard OPD by up to 28.64% on math benchmarks and adding 9.54% relative gain when paired with black-box teachers Claude-4.5-Haiku and Gemini-2.5-Flash.
#Reasoning#Fine-tuning#Benchmarking#Claude-4.5-Haiku
why featured
HKR-K passes with a concrete mechanism and +28.64%/+9.54% gains. HKR-H/R are weak: this is a niche training-method paper without a product, cost, or safety angle, so it stays in all.
editor take
OmniOPD beats standard OPD by up to 28.64% on math; chunk verification fits black-box teachers better than logit distillation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
WildCat: Near-Linear Attention in Theory and Practice
Tobias Schröder and Lester Mackey introduce WildCat, which selects a weighted coreset via randomly pivoted Cholesky and approximates exact attention in O(n^{1+o(1)}) time under bounded inputs.
#Inference-opt#Benchmarking#Tobias Schröder#Lester Mackey
why featured
HKR-H/K/R all pass: the runtime claim and randomized pivoted Cholesky coreset are concrete, and long-context cost matters. Still, this is a theory-heavy arXiv item with no benchmark scale, code, or reproduction setup disclosed.
editor take
WildCat claims O(n^{1+o(1)}) attention; the bounded-input assumption is the catch, and real long-context workloads will test it hard.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection
AnomSeer trains Qwen2.5-VL-3B/7B-Instruct with TimerPO for time-series anomaly classification, localization, and explanation, and the paper reports higher classification and localization accuracy than larger commercial baselines such as GPT-4o, especially on point- and frequency-driven exceptions.
#Multimodal#Reasoning#Fine-tuning#Qwen
why featured
HKR-H/K/R pass, but this is a niche arXiv task paper centered on anomaly-detection benchmarks, with no disclosed production replacement or artifact details; it stays in the 60–71 band.
editor take
AnomSeer has Qwen2.5-VL-3B/7B beat GPT-4o on three TSAD tasks; I want replication, because CoT supervision can fake neat explanations.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Inverse Depth Scaling From Most Layers Being Similar
The paper quantifies how LLM depth affects loss and finds loss scales roughly inversely with depth, attributing the effect to ensemble averaging across functionally similar layers rather than compositional learning or discretizing smooth dynamics.
#Benchmarking#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the paper gives a counterintuitive depth-scaling claim and a mechanism. HKR-R is weak, and the feed text omits model sizes, setups, or code, so it stays in the 60–71 research-interest band.
editor take
The paper says LLM loss scales roughly inverse with depth; if similar layers just ensemble errors, depth is an ugly efficiency tax.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning
The paper splits CoT entropy dynamics into an exploratory uncertainty region and a convergent confidence region; its training-free CUSUM early-exit controller reaches 63.06% accuracy with an 11.1% token reduction, outperforming DEER and Dynasor by 3.28 and 4.36 accuracy points.
#Reasoning#Inference-opt#CUSUM#DEER
why featured
HKR-H/K/R all pass: the paper offers a CoT entropy mechanism, CUSUM early-stopping numbers, and a reasoning-cost angle. As a single arXiv result with modest gains, it stays below featured.
editor take
CUSUM early exit hits 63.06% accuracy with 11.1% fewer tokens; treating CoT entropy as changepoints beats another trained controller.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Can Vision Language Models Learn Intuitive Physics from Interaction?
The paper trains vision-language models with reinforcement learning in a simulated environment; interaction improves within-task performance, but models trained on one task still do not reliably generalize to related tasks sharing visual statistics and physical principles.
#Multimodal#Vision#Reasoning#Research release
why featured
HKR-H/K/R pass, but the item gives only the question, RL setup, and negative transfer result, with no metrics or artifact details. This fits the 60–71 research-interest band.
editor take
RL interaction improves in-task scores, but transfer still fails; VLM physics intuition is not fixed by more rollouts.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Score × Decoder: A Unified View of Unsupervised Inference-Time Scaling for Hallucination Mitigation
The paper pairs four intrinsic scores with three decoding families and evaluates all cells on MATH500 using base and instruction-tuned Qwen3-1.7B, finding that self-verification with a training-free virtual-thinking prefix works well in most settings, while score quality depends on the decoder and model capability.
#Reasoning#Inference-opt#Benchmarking#Qwen
why featured
HKR-K/R pass: the paper gives a reproducible score-decoder grid with a named model and benchmark, and targets hallucination mitigation. HKR-H is weak, and this is a single arXiv paper without production impact evidence, so it stays in 60–71.
editor take
The paper tests 4 scores × 3 decoder families; I buy the negative result: unsupervised anti-hallucination scores don't transfer cleanly.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models
The paper presents Responsible Contrastive Soft Prompting, evaluated on five generative QA datasets with Gemma 3 12B and Llama 3.1 8B, using contrastive loss, curriculum learning, and KL regularization to suppress hallucinations, encourage abstention under uncertainty, and preserve factual recall.
#Alignment#Safety#Fine-tuning#Gemma
why featured
HKR-K/R pass: the method, models, and 5-dataset setup give testable detail, and reliability is a live practitioner concern. HKR-H is weak, and effect size is not disclosed, so this stays in the 60–71 band.
editor take
RCSP trains only soft prompts across 5 QA sets on Gemma 3 12B and Llama 3.1 8B; LLM-judge evidence needs human labels.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Automatically Differentiable Nonlinear Tensor Networks for Exponential Compression of Deep Neural Networks
The paper introduces ADNTNs as structured weight generators trained by reverse-mode automatic differentiation, and simulations on AlexNet and VGG-16 layers show per-layer compression ratios of roughly 2000× to 77000×, with accuracy often matching the dense baseline and improving it in several VGG-16 cases.
#Fine-tuning#Inference-opt#AlexNet#VGG-16
why featured
HKR-H/K/R pass, but the evidence is limited to AlexNet/VGG-16 single-layer simulations, not LLM compression or production inference. Research novelty earns all, below featured.
editor take
ADNTNs compress AlexNet/VGG-16 layers 2,000×-77,000×; I don’t buy deployment relevance until end-to-end kernels land.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization
The paper identifies massive LLM activation spikes as structural bias vectors and proposes INSERTQUANT, a post-training quantization framework that clamps spikes and restores their function with pre-computed template vectors, enabling low-bit quantization and reporting generalization beyond text to ViTs.
#Interpretability#Inference-opt#Multimodal#Research release
why featured
HKR-H/K/R pass, but this is a technical arXiv quantization paper with no disclosed bit-width, speed, or accuracy numbers in the feed, so it stays in the 60–71 band.
editor take
INSERTQUANT replaces activation spikes with template vectors; accuracy, bit width, and model scale are undisclosed, so buy the mechanism later.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference
BlockBatch runs multiple block-size branches for the same request inside a batched forward pass, using confidence-gated merging, leader synchronization, and periodic full-sequence refreshes; across 3 dLLMs and 4 datasets, it reduces denoising NFEs by 26.6% on average and reaches a 1.33× end-to-end speedup over Fast-dLLM while preserving accuracy.
#Inference-opt#BlockBatch#Fast-dLLM#Research release
why featured
HKR-K and HKR-R pass: the mechanism and benchmark numbers are concrete, and inference efficiency matters. HKR-H is weak, and dLLM decoding optimization is narrow, so it stays in all.
editor take
BlockBatch cuts 26.6% NFEs across 3 dLLMs; dLLM inference is starting to look like branch scheduling, not just denoising.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
KG-Guard: Graph-Based Hallucination Detection for Knowledge Base Question Answering
KG-Guard frames hallucination detection in KBQA as answer-node classification, reaches F1 scores of 82.0, 87.4, and 84.3 on WebQSP, ComplexWebQuestions, and PUGG, and uses about 305 times fewer parameters than reference approaches.
#RAG#Reasoning#Benchmarking#KG-Guard
why featured
HKR-H and HKR-K pass: the mechanism and benchmark numbers are concrete. HKR-R is weak; as a single arXiv paper in narrow KBQA, it fits the interesting-but-not-featured band.
editor take
KG-Guard hits 82.0/87.4/84.3 F1; node classification beats LLM judges with 305x fewer parameters, a practical KBQA guardrail.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Saliency-Aware Model Merging
The paper introduces SA-Merging for data-free model merging, using SynFlow-style connectivity saliency over task vectors and merge-aware expert agreement, and extends the method to LoRAs through rank-wise saliency decomposition without changing their structural integrity.
#Fine-tuning#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the mechanism is concrete and maps to LoRA merging pain. The arXiv snippet gives no metrics, model scale, or reproducible setup, so it stays in the 60–71 band.
editor take
SA-Merging applies SynFlow-style saliency to data-free merging and LoRA ranks; scores are undisclosed, so don't retire TTA yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications
arXiv:2606.00133v1 presents a four-axis survey framework for world models, covering architecture, methodological family, reasoning strategy, and application domain, and discusses systems including PlaNet, Dreamer, MuZero, Sora, Cosmos, and Genie.
#Agent#Reasoning#Robotics#PlaNet
why featured
HKR-K and HKR-R pass: the survey maps world-model architectures and systems. HKR-H is weak, and this is not a new model, benchmark, or reproducible experiment, so it stays in all.
editor take
arXiv 2606.00133 folds PlaNet-to-Sora into four axes; huge survey scope, but no benchmark table disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Dynamic Proxy-Mixing: Transferring Replay Controllers from Small to Large Models for Continual Instruction Tuning
PROXYMIX transfers a frozen replay controller trained on a small proxy model to LLaMA-3-8B across five continual instruction-tuning sequences, improving average accuracy by 3.4 points, reducing final forgetting by 3.5 points, and raising safety score by 5.8 points over the strongest non-oracle baseline at roughly 50x lower policy-learning cost than Oracle Target RL.
#Fine-tuning#Safety#Alignment#LLaMA
why featured
HKR-K/R pass: the paper gives testable metrics and targets regression risk in continual tuning. HKR-H is weak, and this is a single arXiv method paper with no disclosed release or adoption.
editor take
PROXYMIX gives LLaMA-3-8B +3.4 accuracy points; transferable proxy controllers are a practical cut to continual-tuning RL cost.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
LASER: Loss-Aware SVD and Rank Allocation for Efficient Low-Precision Vision-Language Models
LASER compresses vision-language models with a curvature-weighted SVD objective, Kronecker-factored Fisher information, and calibration-gradient rank allocation, achieving more than 2.3x decoding speedup over prior work under low-precision inference.
#Multimodal#Vision#Inference-opt#LASER
why featured
HKR-K and HKR-R pass: 2.3x decoding speed and Fisher-based rank allocation are useful. HKR-H is weak, and a single technical arXiv compression paper stays below featured.
editor take
LASER claims 2.3x decoding speedup; Fisher-weighted ranks plus FFN compression are solid, but the snippet hides accuracy loss.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Post-Deterministic Distributed Systems: A New Foundation for Trustworthy Autonomous Infrastructure
The paper introduces Post-Deterministic Distributed Systems as a model for coordinating deterministic code, stochastic models, and autonomous agents, outlines five architectural pillars including Verifiable Agentic Infrastructure and Epistemic State Replication, and defines failure classes for autonomous infrastructure.
#Agent#Memory#Safety#Research release
why featured
HKR-K/HKR-R pass because it offers a five-pillar model and failure taxonomy for agentic infrastructure; HKR-H is weak, and the feed item gives no experiments, implementation, or adoption signal, so it stays in the 60–71 research-signal band.
editor take
PDDS lists five pillars, but proofs are undisclosed; I don’t buy “new foundation,” yet distributed systems must face nondeterministic agents.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Shortcut to Nowhere: Demystifying Deep Spurious Regression
The paper defines Deep Spurious Regression for attribute-label confounding in continuous targets, then evaluates calibration strategies on real-world datasets spanning computer vision, environmental sensing, and LLM regression.
#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R all hit weakly: catchy framing, a new DSR definition, and reliability resonance. Single arXiv paper lacks metrics, code, or product impact, so it stays below featured.
editor take
DSR targets continuous regression shortcuts; datasets and metrics aren’t disclosed in the snippet, so treat “superior performance” as unproven.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval
The paper proposes CausalNeg with 2 modules: CoT-guided counterfactual perturbation for negative construction and query-view entropy maximization during training; the abstract says naive generated negatives often degrade retrieval performance, while the snippet does not disclose benchmark names or numeric gains.
#RAG#Embedding#Reasoning#CausalNeg
why featured
HKR-H/K/R pass: the hard-negative reversal, two CausalNeg mechanisms, and RAG retrieval-risk nerve are clear. The post discloses no benchmark numbers or code link, so it stays in 60–71 all.
editor take
CausalNeg has 2 modules, but no benchmarks or gains in the snippet; I buy the diagnosis, not the cure yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models
TrustLDM evaluates LDM trustworthiness across safety, privacy, and fairness, and TrustLDM-Auto uses LDM decoding flexibility to identify vulnerable configurations; the paper reports that malicious post contexts attached to masked responses degrade alignment behavior across evaluated models and dimensions.
#Safety#Alignment#Benchmarking#PKU-ML
why featured
HKR-H/K/R all pass, but this is a single arXiv benchmark with only dimensions and an auto-search mechanism disclosed; no model scale, dataset size, or results numbers, so it stays in the 60–71 research band.
editor take
TrustLDM tests 3 trust axes; malicious post-context breaks alignment, so AR-era safety checks won't cover LDM decoding.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RDA: Reward Design Agent for Reinforcement Learning
RDA uses a VLM-based agentic loop to decompose tasks, inspect trajectories, summarize failures, and revise reward code, improving instruction alignment across 12 ManiSkill tabletop manipulation tasks and 4 HumanoidBench whole-body manipulation tasks while maintaining comparable success rates.
#Agent#Vision#Robotics#RDA
why featured
HKR-H/K pass: the paper gives an automated reward-code design mechanism and 16-task evaluation setup. It remains a single arXiv research item with no disclosed artifact, effect size, or production replacement claim, so it stays in all.
editor take
RDA edits reward code across 16 robotics tasks; I buy the direction—RL needs visible semantic feedback, not success-rate worship.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards
The paper uses inverse reinforcement learning to fine-tune pi-0.5, maintaining or improving performance across six sparse manipulation tasks and reaching a ≥90% success rate on five of six complex manipulation tasks.
#Robotics#Fine-tuning#Research release#Benchmark
why featured
HKR-K and HKR-R pass: IRL fine-tunes pi-0.5 across 12 manipulation tasks, with 5/6 complex tasks at ≥90%. HKR-H is weak; no code, lab, or deployment detail keeps it in all.
editor take
IRL fine-tuning keeps pi-0.5 from regressing on 6 sparse tasks; sparse-reward RL looks like the wrong baseline here.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Don't Read Everything: A Curvature-Conditioned Query for Linear Attention
The paper introduces Curvature-Conditioned Query, a read-step modification for linear attention that contracts queries using running key covariance; when attached to GLA and Gated DeltaNet, it improves perplexity, zero-shot accuracy, S-NIAH retrieval at and beyond training context, 4K-to-20K length extrapolation, and LongBench accuracy, while the abstract does not disclose exact scores or overhead.
#Inference-opt#Reasoning#Benchmarking#GLA
why featured
HKR-H/K/R pass: the title has a clean hook, CCQ is a concrete linear-attention mechanism, and long-context cost resonates. Kept in all because the post gives summary-level facts without gain size, code, or reproduction details.
editor take
CCQ only changes the read step on GLA and Gated DeltaNet; gains span 4K-to-20K, but overhead is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
3DCodeBench: Benchmarking Agentic Procedural 3D Modeling via Code
3DCodeBench evaluates 12 VLMs on translating text and image references into procedural 3D modeling code, and releases a toolkit with multimodal prompts, procedural code, 3D object triplets, an evaluation protocol, and the public 3DCodeArena pairwise human-preference ranking platform.
#Agent#Multimodal#Vision#3DCodeBench
why featured
HKR-H and HKR-K pass: the item gives 12 VLMs, text/image-to-procedural-3D-code tasks, and a released toolkit. The impact is still niche benchmarking/open-source tooling, so it sits in the 60–71 band.
editor take
3DCodeBench tests 12 VLMs writing 3D code; API mismatch is the failure mode vendors avoid showing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
From Zero to Hero: Advancing Zero-Shot Foundation Models for Tabular Outlier Detection
OUTFORMER pretrains a tabular outlier-detection foundation model only on synthetic labeled datasets, uses a new task’s training data as in-context input, and reports state-of-the-art results on AdBench plus two new large-scale benchmarks covering more than 1,500 datasets.
#Reasoning#Benchmarking#OUTFORMER#FoMo-0D
why featured
HKR-K is strong via the 1,500+ dataset result and synthetic-label pretraining mechanism; HKR-R is limited to tabular anomaly teams. Practical research claim, but too niche for featured.
editor take
OUTFORMER claims SOTA across 1,500+ datasets; synthetic pretraining for zero-shot OD is strong if its new benchmarks survive leakage checks.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Characterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven Simulation
The paper introduces GAIATrace and Vidur-Agent, capturing token-level traces from MiroThinker and OWL on the GAIA benchmark and replaying them for reproducible, lower-cost system evaluation across simulated environments.
#Agent#Reasoning#Tools#MiroThinker
why featured
HKR-K and HKR-R pass: the paper offers new traces and a simulation tool for agent evaluation costs. HKR-H is weak, and the body does not disclose cost reduction size, release link, or baselines.
editor take
GAIATrace logs token-level GAIA runs for MiroThinker and OWL; replayable traces beat another leaderboard for agent systems work.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Efficient LLM Moderation with Multi-Layer Latent Prototypes
The paper introduces MLPM, an input moderation method using prototypes from intermediate representations across multiple layers. The arXiv v4 abstract claims negligible generation overhead and state-of-the-art results on diverse moderation benchmarks, but the snippet does not disclose exact scores, latency, or model-specific settings.
#Safety#Alignment#Inference-opt#arXiv
why featured
HKR-K and HKR-R pass: the paper offers a concrete moderation mechanism and low-overhead claim tied to safety and cost. Missing benchmark scores and a technical title keep it in the 60–71 research-signal band.
editor take
MLPM moderates via multi-layer latent prototypes; scores and latency are undisclosed, so the SOTA and negligible-overhead claims stay discounted.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Stabilizing Policy Optimization via Logits Convexity
The paper proposes Logits Convex Optimization, using logits-level convexity to explain the stability gap between SFT and PPO, and reports that LCO improves training stability across multiple model families and benchmarks, while the RSS snippet does not disclose benchmark names, model sizes, or exact scores.
#Fine-tuning#Alignment#Reasoning#Research release
why featured
HKR-K/R pass: logits convexity reframes SFT/PPO stability and LCO claims gains across model families. HKR-H fails; no scores, model names, or artifact are disclosed, so this stays a specialized training paper.
editor take
LCO bets on logits convexity; sizes, benchmark names, and scores are undisclosed, so don’t retire PPO yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Controllable Value Alignment in Large Language Models through Neuron-Level Editing
The paper proposes NeVA, a neuron-level editing framework that identifies sparse value-relevant neurons and edits activations at inference time to reduce non-target value leakage during value steering; the abstract does not disclose the evaluated models, datasets, or exact leakage reduction numbers.
#Alignment#Safety#Interpretability#NeVA
why featured
HKR-H/K/R pass, but the body gives the method idea only; models, datasets, and reduction numbers are not disclosed. This is useful alignment research, not a same-day must-write.
editor take
NeVA has only an RSS abstract, with no models or reductions disclosed; neuron editing sounds clean, but don't buy it pre-replication.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding
The paper introduces LK losses to directly optimize speculative decoding acceptance rate, and experiments across 4 draft architectures and 6 target models from 8B to 685B parameters report up to 8–10% gains in average acceptance length over KL-based training.
#Inference-opt#Research release
why featured
HKR-K/R pass: the paper gives a concrete mechanism and cross-model numbers, and it maps to inference cost. HKR-H is weak because the angle is specialist infra, so it stays below featured.
editor take
LK losses lift acceptance length 8–10% across 4 draft types and 6 8B–685B targets; speculative decoding should stop worshipping KL proxies.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
TLG: Temporal-Logic Grounding for Video Question Answering via Source-Annotation Reconstruction and Category-Targeted Reasoning
TLG raises TimeLogic Challenge test accuracy from a 46.9% VLM baseline to 71.37% by reconstructing action timelines from source-dataset annotations, parsing questions into temporal-logic programs, and executing 16 operator types including before, after, until, and always.
#Reasoning#Vision#Benchmarking#TLG
why featured
HKR-H and HKR-K pass via the benchmark jump and mechanism; HKR-R is weak. A single arXiv multimodal-eval paper stays in the interesting-but-not-featured band.
editor take
TLG hits 71.37% on TimeLogic; the win comes from annotation timelines, not a bigger VLM.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MARFT: Multi-Agent Reinforcement Fine-Tuning
The MARFT paper proposes Flex-MG and a universal algorithmic framework for reinforcement fine-tuning of LLM-based multi-agent systems; the v5 abstract identifies three differences from classical MARL—asynchronous interactions, profile-aware agent design, and heterogeneous architectures—and provides a GitHub implementation.
#Agent#Fine-tuning#Alignment#Research release
why featured
HKR-K and HKR-R pass: the post names concrete mechanisms and an implementation, and maps to agent post-training. HKR-H is weak, and the arXiv-summary-only evidence keeps it below featured.
editor take
MARFT v5 names 3 LaMAS gaps and ships GitHub; it still reads framework-heavy, with sample inefficiency unsolved.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CaptionFormer Unifies Video Object Segmentation, Tracking, and Captioning
CaptionFormer combines video object detection, segmentation, tracking, and captioning in an end-to-end DVOC model, extends LVIS and LV-VIS with synthetic captions generated by a state-of-the-art VLM, and reports state-of-the-art results on three benchmarks: VidSTG, VLN, and BenSMOT.
#Vision#Multimodal#Benchmarking#CaptionFormer
why featured
HKR-H/K pass: the four-task video-object setup and 3 benchmark SOTAs add concrete signal. HKR-R fails; this is a vision-benchmark paper without product, cost, or platform-competition pull.
editor take
CaptionFormer unifies detection, segmentation, tracking, and captioning; the SOTA rests on VLM-synthetic labels, so inspect LVISCap noise first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
SCOPE routes on-policy rollouts by correctness into two supervision paths, and experiments on six reasoning benchmarks report average relative gains of 11.42% in Avg@32 and 7.30% in Pass@32 over competitive baselines.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K passes with a mechanism and benchmark deltas; HKR-R passes for distillation cost/performance pressure. HKR-H fails, and a single technical paper belongs in the 60–71 band.
editor take
SCOPE lifts Avg@32 by 11.42% on six reasoning benchmarks; correctness-routed supervision is a cleaner OPD credit-assignment patch.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models
MENTIS compares four 7–8B instruction-tuned and preference-aligned checkpoint pairs with T1, T2, and ERA diagnostics, finding alignment-induced internal changes are selective, larger for normative concepts than factual concepts, negatively correlated with contextual entropy, and concentrated in architecture-specific mid-to-late layers.
#Alignment#Interpretability#Benchmarking#MENTIS
why featured
HKR-K and HKR-R pass via concrete checkpoints and an alignment-safety question. HKR-H is weak because the method is specialist-heavy, so this stays in the 60–71 all band.
editor take
MENTIS tests four 7–8B IT/PA pairs: normative concepts twist more than factual ones; useful map, still far from intervention.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Tensor Network Method Accelerates Shapley Values and Interactions Computation
TN-SHAP replaces O(2^n) coalition enumeration with targeted evaluations on a tensor-network surrogate, computes order-1 and order-2 Shapley interactions at O(n*poly(chi)+n^2) cost, and reports 25-1000x wall-clock speedups over KernelSHAP-IQ on UCI datasets at comparable accuracy.
#Interpretability#KernelSHAP-IQ#UCI#Research release
why featured
HKR-H and HKR-K pass: the mechanism, complexity, and speedup numbers are concrete. HKR-R is weak because tensor-network SHAP is specialist; no hard exclusion applies, but it stays in the 60-71 band.
editor take
TN-SHAP cuts order-1/2 interactions to O(n*poly(chi)+n²). I’d stress-test surrogate error; 25-1000x on UCI is not enough.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Discrete Diffusion VLA discretizes action chunks and performs diffusion decoding inside a unified Transformer backbone, reaching 96.4% average success on LIBERO, 71.2% visual matching on SimplerEnv-Fractal, 54.2% overall on SimplerEnv-Bridge, and two real-robot evaluations on AgileX Cobot Magic.
#Robotics#Multimodal#Inference-opt#AgileX
why featured
HKR-H/K pass: the method, 96.4% LIBERO result, and AgileX robot tests add real signal. As a single arXiv robotics-policy paper without open-source or deployment evidence, it stays in 60–71.
editor take
Discrete Diffusion VLA hits 96.4% on LIBERO. I buy the secondary re-masking: action decoding finally gets error correction.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems
The paper introduces STABLEVAL, a disagreement-aware evaluation framework that models latent item correctness and annotator-specific confusion patterns, and reports that it produces more stable system rankings than majority vote across controlled synthetic experiments and multiple human-annotated benchmarks.
#Benchmarking#Alignment#STABLEVAL#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete alternative to majority-vote evaluation and targets ranking reliability. No effect sizes, code, or marquee benchmarks are disclosed, so it stays in the 60–71 band.
editor take
STABLEVAL models item difficulty and annotator confusion; benchmark counts aren’t disclosed, so don’t generalize its majority-vote win yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Self-Improving Small Object Grounding in LVLMs
The authors propose ACS to select candidate boxes from LVLM attention maps; its lightweight IoU regressor reaches Pearson r above 0.67, and experiments on COCO and Objects365 report up to 19% improvement in small-object localization.
#Vision#Multimodal#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the paper offers a self-improvement mechanism and up to 19% gains on COCO and Objects365. The LVLM grounding focus is narrow, so HKR-R fails and it stays in the 60–71 band.
editor take
ACS lifts LVLM small-object grounding by 19%; Pearson r>0.67 is useful, but cross-LVLM generalization is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation
KG-FairDiff refines text-to-image prompts at inference time using a knowledge graph of about 1,200 culture- and bias-related triples, an LLM rewriter, and a validator that accepts only prompts reducing divergence-based fairness loss while preserving semantic fidelity; the paper also audits eight widely deployed backbone generators and reports reduced gender, race, age, and intersectional disparities.
#Vision#Safety#Tools#Research release
why featured
HKR-K has a concrete mechanism and evaluation scale; HKR-R fits image-bias governance concerns. HKR-H is weak, and this is a single arXiv method paper without visible adoption or debate, so it stays in 60–71.
editor take
KG-FairDiff edits prompts at inference with 1,200 triples across 8 generators; prompt-layer fairness still lets vendors outsource bias cleanup to wrapping paper.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DOT-MoE: Differentiable Optimal Transport for MoEfication
DOT-MoE formulates dense-layer decomposition as a differentiable optimal transport problem, uses Sinkhorn-Knopp iterations and straight-through estimators to learn expert assignment and routing, and retains 90% of the dense model’s performance while reducing active parameters by 50% across multiple architectures and benchmarks.
#Inference-opt#Fine-tuning#Benchmarking#Research release
why featured
HKR-K/R pass: 50% active params retain 90% dense-model performance, directly tied to inference cost. HKR-H is weak, and the arXiv summary lacks code, model scale, and reproducibility details, so it stays in all.
editor take
DOT-MoE keeps 90% dense performance with 50% fewer active params; I buy OT assignment, but model scale is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Prototype Transformer: Towards Language Model Architectures Interpretable by Design
The paper introduces ProtoT, an autoregressive LM architecture that replaces quadratic-cost Transformer self-attention with a linear-cost module using learned prototypes, and evaluates it against baselines on text generation, GLUE, scaling with model and data size, and robustness to input perturbations.
#Interpretability#Inference-opt#Benchmarking#Research release
why featured
HKR-H/K/R pass, but the body gives mechanism and benchmark scope without concrete scores, model scale, or code status. This stays in the high 60–71 research-paper band.
editor take
ProtoT replaces self-attention with learned prototypes; no model sizes or scores are disclosed, so I don't buy the interpretable-architecture pitch yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling
The paper introduces stochastic backtracking over a persistent pool of historical prefixes, and reports higher accuracy per generated token across mathematical reasoning benchmarks and model scales versus PRM-guided baselines.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the mechanism targets token cost in test-time scaling, but the post lacks savings rate, model list, and reproducibility details. A single arXiv paper stays in the 60–71 band.
editor take
Stochastic backtracking adds a persistent prefix pool; no exact token savings disclosed, and I suspect PRM-noise gains are overstated.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases
GRASP combines plan-based graph retrieval, plan-conditioned dense-retriever fusion, and a fine-tuned reranker into a three-stage SKB retrieval framework, raising average Hit@1 from 62.0 to 73.9 across three STaRK benchmarks.
#RAG#Embedding#Benchmarking#GRASP
why featured
HKR-K is strong and HKR-R is moderate: the method and Hit@1 gain are concrete, but this is still a single benchmark paper without production replacement or open-source adoption details.
editor take
GRASP lifts STaRK Hit@1 from 62.0 to 73.9; I buy plan-constrained retrieval, but cost and latency are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision
TGAD evaluates text-guided anomaly detection across 3 scenarios and finds that current multimodal systems mostly use language superficially; the generative model’s I-AUROC drops from 97.4 to 82.6 when the object noun is removed, while three paradigms score 71.2, 50.5, and 31.5 on APD.
#Multimodal#Vision#Benchmarking#MVTec AD
why featured
HKR-H/K/R pass, but this is a niche industrial-vision benchmark rather than a broad model or product update. Concrete metrics keep it in the 60–71 band.
editor take
TGAD tests 3 settings; removing object nouns drops I-AUROC from 97.4 to 82.6. Industrial VLMs barely follow text.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Byte Pair Encoding for Efficient Time Series Forecasting
The paper proposes the first pattern-centric tokenization scheme for time series, using a discrete vocabulary of frequent motifs to merge patterned samples into adaptive tokens; on recent time series foundation models, it improves forecasting performance by 40% and average efficiency by 2314%, while conditional decoding adds no gradient computation and reduces MSE by up to 48%.
#Benchmarking#Inference-opt#Research release
why featured
HKR-H comes from moving NLP tokenization into time-series models, and HKR-K has concrete gains plus gradient-free conditional decoding. The audience fit is narrow, so it stays below featured.
editor take
BPE time-series tokenization claims 2314% average efficiency gains. Smells like a low-entropy-series win; vocabulary transfer details aren’t disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Gradient Preconditioning for Efficient and Reliable Reward-Guided Generation
The paper proposes gradient preconditioning for reward-guided generation by projecting reward gradients onto a white Gaussian noise feasible set; in FLUX experiments with four reward models, it reaches a comparable Aesthetic Score using 30% of the wall-clock time of a regularization-based baseline.
#Inference-opt#Alignment#FLUX#Research release
why featured
HKR-K/R pass: the paper gives a concrete preconditioning mechanism and a 30% wall-clock result tied to generation cost. HKR-H is weak, and the work remains a methods paper rather than a product or broad industry update.
editor take
FLUX hits comparable Aesthetic Score at 30% wall-clock across 4 reward models; closed-form projection is the useful part.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge of Chaos
The paper develops a mean-field theory of dropout and reports that front-loaded dropout schedules reduce test loss by 18%–35% versus constant dropout in MLPs and Vision Transformers under a fixed budget.
#Benchmarking#Vision#Research release
why featured
HKR-K is solid: the paper gives a testable 18%–35% loss reduction via front-loaded dropout. HKR-R is mild on training efficiency, but the edge-of-chaos framing is niche, so it stays in the 60–71 band.
editor take
Front-loaded dropout cuts MLP/ViT test loss 18%–35% at fixed budget; I buy the mechanism, pending non-toy training replication.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction
The paper introduces Field-Aware Transformer for CTR prediction, replacing standard Transformer assumptions with field-centric parameters and a Basis-Composed Hypernetwork; experiments report up to 4.38% AUC improvement, plus 2.33% CTR and 0.66% RPM gains in live production.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K is strong with a field-centered mechanism and online CTR/RPM numbers. HKR-R is narrow to ad/recsys teams, while HKR-H is weak, so this stays below featured.
editor take
FAT reports +4.38% AUC and +2.33% live CTR; blindly scaling Transformers for CTR looks lazy against field-aware structure.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Heterogeneous Decentralized Diffusion Models
The paper presents a heterogeneous decentralized diffusion training framework that mixes DDPM and Flow Matching objectives, unifies them at inference without retraining, and reports 16× less compute and 14× less data than prior DDM training scale on LAION-Aesthetics.
#Multimodal#Fine-tuning#Inference-opt#arXiv
why featured
HKR-H/K/R all pass via the cost-cut numbers and mixed-objective mechanism, but this is a single arXiv method paper with narrow validation and a research-heavy audience, so it stays in 60–71.
editor take
DDM drops from 1176 GPU-days to 16× less compute and 24–48GB single-GPU entry; FID/diversity alone won’t prove scale.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Value-Free Policy Optimization via Reward Partitioning
The paper introduces Reward Partition Optimization, which normalizes scalar rewards using prompt-level reward partitions and trains policies without value function learning, auxiliary models, or reinforcement learning loops.
#Fine-tuning#Alignment#Research release
why featured
HKR-H/K/R are present, but the post only discloses the mechanism, not benchmarks, author authority, or reproducible results. This fits the 60–71 band for a technical arXiv alignment-training paper.
editor take
RPO trains on prompt-level reward partitions; cutting value functions, auxiliary models, and RL loops is a pragmatic offline-feedback bet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Which Leakage Types Matter? A Quantitative Landscape Across 2,047 Benchmark Datasets
The paper runs 28 within-subject counterfactual experiments across 2,047 iid tabular datasets and one boundary experiment on 129 temporal datasets. It finds normalization leakage negligible with |ΔAUC| ≤ 0.005 across nine conditions, while selection leakage produces inflation consistent with about 90% noise exploitation.
#Benchmarking#arXiv#Research release#Benchmark
why featured
HKR-H/K/R pass, but the scope is iid tabular dataset leakage rather than LLMs, agents, or product news. Strong numbers, limited industry spillover, so it sits in the 60–71 research-signal band.
editor take
2,047 iid tabular datasets put normalization leakage at ≤0.005 AUC; stop blaming scalers, seed cherry-picking is the dirty part.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Interpreto: An Explainability Library for Transformers
Interpreto releases an open-source Python library for HuggingFace language models, providing two method families, attribution and concept-based explanations, with a unified API for classification and text generation workflows.
#Interpretability#Tools#Interpreto#HuggingFace
why featured
HKR-K and HKR-R pass: it offers a testable transformer explainability tool, but it is not a major lab release and discloses no adoption data or standout benchmark, so it stays in 60–71.
editor take
Interpreto covers two explanation families for HuggingFace; the concept pipeline is useful, but no benchmarks or overhead disclosed.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Understanding the Effects of Distractors on Reasoning Vision-Language Models
The paper introduces Idis, a visual question-answering dataset that varies image distractors across semantic and numerical dimensions; visual distractors reduce accuracy in reasoning VLMs without increasing reasoning length, and the authors add a prompting strategy to reduce distractor-driven predictions.
#Reasoning#Multimodal#Vision#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv benchmark paper with no major model release, tool artifact, or cross-source debate; it fits the 60–71 research/benchmark band.
editor take
Idis varies visual distractors by semantics and count; VLMs get worse without longer traces, smelling more like visual binding failure.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
TIMEGATE: Sustainable Time-Boxed Promotion Gates for Continual ML Adaptation Under Resource Constraints
TIMEGATE manages continual ML adaptation with budgets for time, labeling, training, and evaluation; in a 100-cycle simulation, it saved 66% of evaluation compute with no silent mis-promotions, and a 10% slice evaluation on LLaMA used 89% less wall-clock time and energy on one H200.
#Fine-tuning#Inference-opt#Benchmarking#TIMEGATE
why featured
HKR-K and HKR-R pass: the paper gives a budget-gated mechanism and a 66% evaluation-compute reduction. HKR-H is weak, and it is a single arXiv result without production deployment evidence.
editor take
TIMEGATE saved 66% eval compute over 100 simulated cycles; I care how its zero silent mis-promotions survives online drift.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Research proposes replacing standard neurons in artificial neural networks with cortical cell model
The paper replaces the ANN point neuron with a recent cortical-cell model and reports higher expressivity, robustness, and learning speed without increasing parameter count.
#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the paper replaces ANN point neurons with a cortical-cell model and claims gains without more parameters. The feed gives no benchmark numbers or reproducible setup, keeping it in the 60–71 band.
editor take
The paper swaps point neurons without extra parameters; benchmarks aren’t disclosed, so I’d discount the speed-robustness claims hard.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Task Diversity Produces Systematic Transfer but Inhibits Continual Reinforcement Learning
The paper introduces Banyan, a GPU-accelerated continual RL benchmark that controls task diversity across 3 axes—map layouts, objects, and hierarchical sub-goal dependencies—and reports that diversity improves local transfer after individual distribution shifts, but repeated shifts cause longer-horizon tasks to plateau and earlier task distributions to be forgotten.
#Agent#Reasoning#Benchmarking#Banyan
why featured
HKR-H/K/R all pass, but this is a niche arXiv continual-RL benchmark for agent training rather than a broad product or model release. Concrete axes and findings keep it in all, below featured.
editor take
Banyan splits diversity into 3 axes; local transfer improves, long-horizon RL still plateaus and forgets.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization
The paper proposes an exposure-based framework using BLiMP minimal pairs and critical phrases to split proxy-train and proxy-validation sets, and reports delayed generalization across five grammatical phenomena during LLM pre-training.
#Reasoning#Interpretability#Benchmarking#BLiMP
why featured
HKR-H and HKR-K pass: the paper frames grokking-like delayed grammar generalization and gives a concrete BLiMP-based setup. HKR-R is weak because no deployment, cost, or competitive implication is disclosed.
editor take
BLiMP proxy splits show delayed generalization across 5 grammar types; I buy the method, not the pretraining-grokking label yet.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States
The paper shows that risk-neutral Bellman optimal control in MDPs with an absorbing catastrophic state produces three prospect-theory-like signatures, and reproduces policy reversal across 495 configurations.
#Reasoning#Benchmarking#Research release
why featured
A single arXiv theory paper clears HKR-H/K with a concrete mechanism and numbers, but it has no code, product tie-in, or industry discussion signal. It stays in the 60–71 band.
editor take
Absorbing catastrophe states make Bellman optimality mimic prospect theory across 495 setups; preferences may be boundary artifacts.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DeepLatent: Think with Images via Parallel Latent Visual Reasoning
DeepLatent proposes a parallel latent visual reasoning framework with LatentFormer, a continuous-space reinforcement learning algorithm, and the DeepLatent-180K dataset; the abstract claims state-of-the-art results across multiple benchmarks, but the post does not disclose specific scores.
#Reasoning#Vision#Multimodal#DeepLatent
why featured
HKR-H and HKR-K pass: the title has a latent visual-reasoning hook, and the summary names a method plus dataset. No scores, code, or model scale are disclosed, so this stays in the 60–71 research-signal band.
editor take
DeepLatent discloses a 180K dataset and parallel latent stack, but no scores; I don’t buy the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning
Prune-OPD monitors local student-teacher compatibility with signals such as top-k overlap, down-weights unreliable rewards after prefix drift, and truncates rollouts dynamically; across AMC, AIME, and HMMT benchmarks, it reduces training time by 37.6%–68.0% while preserving or often improving performance.
#Reasoning#Fine-tuning#Inference-opt#Research release
why featured
HKR-K is strong: the mechanism and AMC/AIME/HMMT numbers are clear. HKR-R holds on training cost, but HKR-H is weak and this is a single arXiv method paper without open-source or production proof.
editor take
Prune-OPD cuts training time 37.6%–68.0%; using top-k overlap to kill drifted rollouts is saner than paying for bad teacher rewards.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents
The paper uses TradeArena to analyze eight LLM trading trajectories and 80 rolling failure anchors, finding pre-failure embedding drift, effective-rank contraction, and model-dependent calibration or return changes under structured risk feedback.
#Agent#Alignment#Benchmarking#TradeArena
why featured
HKR-H/K/R all pass, but this is a narrow arXiv research paper with 8 traces and 80 failure anchors. No reproducible artifact or production impact is disclosed, so it stays in the 60–71 band.
editor take
TradeArena has 8 trajectories and 80 failure anchors; ignore profit talk, embedding drift is the reproducible hook.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Causal Evaluation of Membership Inference Attacks
The paper frames MIA evaluation as causal inference, defines memorization as the causal effect of including a data point in training, and proposes estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-K/R pass: the paper offers a causal framework and testable estimators for MIA evaluation, with clear privacy relevance. HKR-H fails, and this is a niche single arXiv paper, not same-day must-write.
editor take
The paper recasts MIA evaluation as causal effect estimation across multi-, one-, and zero-run settings; I buy it—zero-run shift finally gets handled directly.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook
DOVE builds a compact value codebook from 10K documents, compares human-written text distributions with outputs from 12 LLMs, and reports 31.56% correlation with downstream tasks while maintaining reliability with 500 samples per culture.
#Alignment#Benchmarking#DOVE#Research release
why featured
HKR-K and HKR-R pass via concrete metrics and alignment relevance, but HKR-H is weak and this is a single arXiv evaluation paper; it fits the 60–71 band, not featured.
editor take
DOVE tests 12 LLMs with 10K documents; 31.56% downstream correlation is modest, but beats multiple-choice alignment theater.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
When Does Predictive Inverse Dynamics Outperform Behavior Cloning?
The paper explains PIDM’s bias-variance tradeoff against behavior cloning: in 2D navigation, BC needs up to 5x more demonstrations and 3x on average, while in a 3D video-game environment with visual inputs and stochastic transitions, BC needs over 66% more samples.
#Robotics#Reasoning#Benchmarking#Research release
why featured
HKR-H/K/R pass because the paper offers a clear method duel, concrete sample-efficiency numbers, and a training-data cost nerve. It remains a specialized imitation-learning paper, so it stays in the 60–71 all band.
editor take
PIDM cuts 2D demos by up to 5x; this paper turns future prediction from a trick into a bias-variance account.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Consistent Diffusion Language Models
The paper introduces CDLM, a single-stage training framework that uses exact posterior bridges instead of a sample-space ODE for discrete diffusion, and reports stronger conditional and unconditional text generation than base discrete diffusion models under few-step sampling budgets.
#Reasoning#Inference-opt#Research release#Benchmark
why featured
HKR-K is clear via the posterior-bridge mechanism, and HKR-R links to sampling cost. HKR-H is weak, and the post gives no benchmark numbers or reproducible setup, so this stays in the normal research band.
editor take
CDLM swaps discrete diffusion’s shaky ODE story for exact posterior bridges; no gain numbers in the snippet, so AR replacement talk is premature.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention
arXiv 2606.01243 proposes training-free decode-time interventions for latent reasoning, using structural, causal, and geometric probes to analyze continuous reasoning vectors, and reports that early latent vectors act as critical causal hubs across multiple model scales and task domains.
#Reasoning#Interpretability#Research release
why featured
HKR-H and HKR-K pass, but the article gives only an arXiv-level summary without model scale, tasks, or reproducible result details. The interpretability angle has signal, yet its technical narrowness keeps it in the 60–71 band.
editor take
arXiv 2606.01243 claims training-free reasoning gains, but scales and baselines are undisclosed; treat it as a strong control claim pending code.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Benchmarking Recursive-Collapse Warning Claims Under Matched False-Positive Control
The paper introduces Loopzero, a claim-bounded benchmark with a Lean-specified boundary, and evaluates two frozen public benchmarks under a locked false-positive contract of 0.03–0.07; neither standard comparators nor Loopzero’s pre-registered quantile detector reached an accepted operating point.
#Benchmarking#Safety#Alignment#Loopzero
why featured
HKR-K and HKR-R pass: the paper gives a new benchmark, false-positive contract, and negative result for safety evals. HKR-H is weak, and Lean/quantile-detector framing keeps it in the 60–71 band.
editor take
Loopzero failed every detector at FP 0.03–0.07; making non-acceptance first-class beats another collapse-warning metric.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context
Soft-NBCE replaces hard chunk routing with temperature-scaled Softmax fusion over entropy-weighted chunk distributions, raising LongBench MuSiQue F1 from 0.275 to 0.310 and HotpotQA F1 from 0.427 to 0.479 while reporting NIAH-32K retrieval accuracy of 0.909 and O(L^2/n) peak memory.
#RAG#Inference-opt#Reasoning#Soft-NBCE
why featured
HKR-K and HKR-R pass: the mechanism and LongBench numbers are concrete, and RAG teams care about chunk fusion. HKR-H is weak, and a single arXiv benchmark gain fits the 60–71 band.
editor take
Soft-NBCE lifts MuSiQue F1 to 0.310; modest gains, but soft fusion is the sane fix for brittle chunk routing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories
SkillAdaptor updates reusable external skills for LLM agents through step-level failure attribution, keeps the backbone frozen, and reports maximum gains of 1.7 points on WebShop success rate, 1.5 on PinchBench Avg Score%, and 1.8 on Claw-Eval Avg Score.
#Agent#Tools#Alignment#Kimi-K2.5
why featured
HKR-H/K/R all pass, but the evidence is a single arXiv paper with small gains of 1.7/1.5/1.8 points, so this stays an incremental agent-research item.
editor take
SkillAdaptor tops out at +1.8 points; frozen-backbone step attribution is clean, but the gain barely outruns agent-benchmark noise.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces
HASTE uses a group-shared fixed fan-in sparse output layer for million-label XMC, reporting up to 4.4× forward speedup and up to 25× backward speedup over standard fixed fan-in sparsity.
#Inference-opt#Benchmarking#HASTE#arXiv
why featured
HKR-K is strong with a mechanism and speed numbers, and HKR-R hits training cost. HKR-H is weak, and the large-output-space training niche keeps it below featured.
editor take
HASTE reports 4.4× forward and 25× backward speedups on million-label XMC; sparse training only matters when CUDA likes it.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CUPID in the Model Zoo: Online Matchmaking for Selecting Your Dream LLM
CUPID uses a dueling bandit algorithm to iteratively select pairs of LLMs, collect user feedback, and update beliefs about latent preferences under user-specified cost and time budgets.
#Alignment#Benchmarking#CUPID#Research release
why featured
HKR-H/K/R pass: the LLM matchmaking angle is relevant and mechanism-specific. As a single arXiv method with no disclosed scale, datasets, results, or usable artifact, it stays in the 60–71 band.
editor take
CUPID uses dueling bandits for LLM choice; no model count or cost curve disclosed, so I read it as preference routing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Research Paper Analyzes Structural Properties of Multilingual Large Language Models
The paper studies LLM multilinguality with representational structural analysis and reports that low-resource languages are structurally farther from English than high- and mid-resource languages, while language-specific post-training changes their structures but preserves inter-language relationships.
#Benchmarking#Research release
why featured
HKR-K has concrete claims on low-resource language drift and post-training effects; HKR-R fits multilingual deployment pain. HKR-H is weak, and the arXiv summary lacks model names, data scale, and reproducible setup.
editor take
The paper uses structural RSA; models and language count are undisclosed, so don't overgeneralize the low-resource-English distance claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DarkVesselNet: Multi-Modal Remote Sensing and Trajectory Reasoning for Dark Vessel Detection
DarkVesselNet combines Sentinel-1 SAR, Sentinel-2 optical imagery, and AIS trajectory reasoning to detect dark vessels; the available evidence is software-grounded, with tests for SAR speckle filtering, optical band ratios, TGARD gap emission, sensor coregistration, backbone token shapes, and differentiable anomaly scoring.
#Multimodal#Vision#Reasoning#DarkVesselNet
why featured
HKR-H/K pass: dark-vessel detection is a strong hook and the post names Sentinel-1, Sentinel-2, AIS reasoning, and a HF Space. HKR-R is weak: niche maritime remote sensing, with no adoption, performance, or product evidence, so it stays in 60–71.
editor take
DarkVesselNet fuses Sentinel-1, Sentinel-2, and AIS; evidence is package tests and a Space, far from maritime recall.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?
The paper evaluates Random Erasing as a defense against model inversion attacks across 37 setups, showing lower reconstruction quality and attack accuracy while maintaining reasonable natural accuracy, with some configurations degrading attack accuracy without reducing utility.
#Safety#Vision#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a model-inversion defense evaluation with limited deployment detail. It fits the 60–71 research-security band, not featured.
editor take
Random Erasing weakens inversion attacks across 37 setups; I buy the mechanism, not the SOTA claim without tables.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Multi-Rollout On-Policy Distillation via Peer Successes and Failures
MOPD uses successful and failed rollouts from the same prompt to construct teacher signals, and experiments across four benchmark categories—competitive programming, mathematical reasoning, scientific question answering, and tool use—show improvements over standard on-policy distillation baselines.
#Reasoning#Tools#Fine-tuning#Research release
why featured
HKR-K passes: the paper introduces a concrete distillation mechanism across programming, math, science QA, and tool-use benchmarks. HKR-H/R are weak because gains, model scale, and reproduction details are not disclosed.
editor take
MOPD beats standard OPD on 4 benchmark types; feeding peer successes and failures to the teacher turns RL sampling waste into signal.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Finer Parameter Steps for Low-Rank PEFT: A Controlled Study with CP Tensor Adapters
The paper compares CP tensor adapters with LoRA on OPT-1.3B, where each CP component stores 193 trainable scalars per projection, about 21 times smaller than one LoRA rank step. SST-2 hits an early low-budget plateau, BoolQ benefits before saturating slightly below LoRA, and RTE remains LoRA-favored.
#Fine-tuning#Benchmarking#OPT#Research release
why featured
HKR-K lands with concrete PEFT numbers, while HKR-R is limited to fine-tuning specialists. The study is useful but narrow, tied to OPT-1.3B and a few tasks, so it stays in 60–71.
editor take
CP uses 193 scalars per component versus LoRA’s 4096 per rank; finer budget steps help diagnosis, not accuracy.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
The Assistant as a Privileged Persona: A Canonical Reference in Cross-Persona Self-Recognition
The paper measures cross-persona authorship claim matrices on Llama-3.1-70B-Instruct and finds that, on the Assistant evaluator row, claim rate, activation-space distance from Assistant, and entropy gap are tightly coupled. The same coupling fails for pirate, dragon, and Shakespeare evaluators, where authorship judgments track surprise relative to Assistant rather than the generator persona.
#Interpretability#Benchmarking#Llama-3.1-70B-Instruct#Research release
why featured
HKR-H and HKR-K pass: the “privileged persona” angle is novel, and the post names Llama-3.1-70B plus attribution/activation/entropy links. HKR-R is weak because it stays as a single-model interpretability paper without product or safety impact.
editor take
Llama-3.1-70B-Instruct treats Assistant as the authorship baseline; persona self-recognition tests miss the asymmetry if they only check mutual role claims.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Research paper proposes Trust Region On-Policy Distillation for stable LLM distillation
The paper proposes TrOPD, a trust-region on-policy distillation method for LLM post-training that uses three mechanisms: reliable-region supervision, outlier handling via clipping, masking, or forward KL, and off-policy guidance, with experiments across mathematical reasoning, code generation, and general-domain benchmarks against OPD, EOPD, and REOPOLD.
#Fine-tuning#Reasoning#Code#Research release
why featured
HKR-K passes: TrOPD proposes three mechanisms for stable on-policy distillation. HKR-H fails because the headline is a method name, and HKR-R is weak without cost, performance, or open-source deployment numbers.
editor take
TrOPD adds 3 stabilizers to OPD; gains are undisclosed, so don’t crown it a distillation breakthrough yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
IMWM: Intuition Models Complement World Models for Latent Planning
IMWM outperforms a world-model-only planner across four pixel-based goal-reaching tasks, using Retrieval Initialization, Hybrid Cost, and a Reliability Gate; the largest reported gains are Two-Room at 99.2% success with +11.5 points and OGBench-Cube at 94.7% success with +28.5 points.
#Robotics#Reasoning#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the mechanism has a clear twist, and the summary gives testable success rates. As a single arXiv latent-planning paper without product impact or broad debate, it stays in the 60–71 band.
editor take
IMWM wins all 4 pixel tasks, +28.5 pts on Cube; stop blaming every planning miss on world-model accuracy.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning
The paper proposes POPO for zero-variance samples in RLVR, using prioritized group replay and decoupled importance sampling to replace ineffective on-policy groups and reduce off-policy bias, with evaluations on mathematics, planning, and visual geometry showing faster RL finetuning with fewer rollouts.
#Reasoning#Fine-tuning#Vision#Research release
why featured
HKR-K is clear via POPO and ineffective-rollout handling; HKR-R is limited to training-cost pressure with no savings number. A single technical arXiv paper fits the 60–71 all band.
editor take
POPO replaces zero-variance RLVR groups with replayed effective groups; rollout reduction isn’t disclosed, but the compute-saving angle is practical.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense
The paper presents a compliance-scored Best-of-N guardrail orchestration layer for text and image inputs in payments dispute defense, reporting 5 attempts within 20 seconds, 91% compliance, and aggregate variable-cohort win rates of 301/659 versus 536/1548 controls.
#Multimodal#Safety#Tools#Research release
why featured
HKR-K and HKR-R pass: the paper gives a Best-of-N guardrail mechanism plus 91% compliance. The payments-dispute niche limits broader pull, so it stays in the 60–71 band.
editor take
The paper reports 5 attempts, under 20 seconds, 91% compliance; 301/659 vs 536/1548 is not A/B evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight
arXiv:2602.15259v2 proposes an epistemic incompleteness framing for generative proactivity, arguing that agents should surface unknown unknowns while constraining when, how, and how far they intervene to avoid misdirecting attention, overwhelming users, or causing harm.
#Agent#Alignment#Safety#Research release
why featured
HKR-K and HKR-R pass: the paper offers a constraint frame for proactive generative agents and maps to agent safety. The disclosed facts stay conceptual, with no benchmark, artifact, or reproducible system, so it fits 60–71.
editor take
arXiv 2602.15259v2 gives a framework, no experiments; proactive agents must prove when to stay quiet first.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise
The paper proposes Decan, a diversity metric that reads per-token log-probabilities from a base model in one forward pass per permutation without embeddings, reference corpora, or human labels; it reaches 0.846 OCA on McDiv prompt_gen, below SentBERT’s reported 0.897.
#Benchmarking#Tevet and Berant#SentBERT#OLMo
why featured
HKR-K is strong thanks to the mechanism and OCA result; HKR-R is moderate for generation-eval teams. The scope is narrow and it remains a single arXiv metric paper, below featured threshold.
editor take
Decan hits 0.846 OCA on McDiv prompt_gen, below SentBERT’s 0.897; its edge is no embeddings, corpus, or labels.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
S-SPPO: Semantic-Calibrated Self-Play Preference Optimization
The paper proposes S-SPPO, a dual-space calibration method for SPPO using semantic gating and latent repulsion, and reports a 52.19% win rate and 47.46% length-controlled win rate on AlpacaEval 2.0 with Llama-3-8B without extra human-annotated preferences.
#Alignment#Fine-tuning#Benchmarking#Llama
why featured
HKR-K passes with concrete mechanisms and a 52.19% benchmark result; HKR-R passes on human-preference-label cost. HKR-H is weak, and this remains a single arXiv method paper, so it fits the 60–71 band.
editor take
S-SPPO reports 52.19% on AlpacaEval 2.0 with Llama-3-8B; the useful bit is naming SPPO’s overconfident near-duplicate failure mode.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing
DuetServe separates prefill and decode inside a single GPU with SM-level spatial multiplexing, activates partitioning when Time-Between-Tokens degradation is predicted, and reports up to 1.3x higher total throughput while maintaining low generation latency versus state-of-the-art serving frameworks.
#Inference-opt#DuetServe#Research release
why featured
HKR-K/R pass: DuetServe gives a concrete serving mechanism and 1.3x throughput claim, with cost relevance. The GPU-scheduling angle is specialized, so it stays in the lower “all” band.
editor take
DuetServe reports 1.3x throughput via single-GPU SM partitioning; I’d question its TBT predictor under messy co-served traffic.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RuleEdit: Failure-Guided Human-AI Model Editing with Prospective Impact Preview
RuleEdit uses rule-table mismatch signals and prospective embedding previews for human-AI model editing in stroke rehabilitation assessment, raising Human+AI performance by 14.16% (p<0.001) and increasing post-update local gains from 11.50% to 36.38% after users’ rule-based feedback.
#Alignment#Interpretability#Tools#RuleEdit
why featured
HKR-K is strong with concrete numbers and a named mechanism; HKR-R lands on reliability in model editing. HKR-H is weak, and this is a niche arXiv paper without product or major-lab pull.
editor take
RuleEdit lifts Human+AI stroke assessment 14.16%; pre-edit previews are useful, but global degradation keeps this from being a safety patch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Model Multiplicity and Predictive Arbitrariness in Recidivism Risk Assessment
The study builds a dataset of thousands of inmate releases from a recidivism risk assessment system used for over 15 years and finds that similarly accurate models show higher empirical predictive agreement than worst-case theoretical guarantees suggest.
#Benchmarking#Interpretability#Alignment#arXiv
why featured
HKR-K and HKR-R pass: the paper offers a new long-span recidivism dataset and a concrete claim about same-accuracy model agreement. It remains a niche ML fairness paper, with no product or broad industry trigger.
editor take
On a 15-year recidivism system, equal-accuracy models agree above worst-case bounds; I buy that, but lowest-risk policy costs are undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Lookahead Sample Reward Guidance for Test-Time Scaling of Diffusion Models
The paper introduces LiDAR sampling, which computes expected future reward from marginal samples of a pre-trained diffusion model and matches the latest gradient guidance method on SDXL in GenEval performance with a 9.5x speedup.
#Vision#Inference-opt#Alignment#KAIST
why featured
HKR-K is strong and HKR-H rides on the 9.5x speed claim; but this is a diffusion-sampling paper with high access cost and no disclosed product impact, so it stays in all.
editor take
LiDAR matches gradient guidance on SDXL GenEval and runs 9.5x faster; no-backprop test-time guidance is the clean hook.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Test-Time Training for Zero-Resource Dense Retrieval Reranking
DART adapts a bilinear scoring matrix at inference time using top documents as pseudo-positives and bottom documents as pseudo-negatives, and on six BEIR benchmarks it reports a mean per-dataset relative NDCG@10 gain of 2.1% over the dense retrieval baseline with under 10 ms added latency per query.
#RAG#Inference-opt#Benchmarking#arXiv
why featured
HKR-K and HKR-R pass: DART gives testable BEIR results and latency conditions, and targets RAG retrieval quality. The +2.1% relative gain is modest and the source is a single arXiv paper, so it stays below featured.
editor take
DART gains 2.1% NDCG@10 on 6 BEIR sets; per-query W updates under 10ms look like a cheap RAG rerank patch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DenseMLLM: Standard Multimodal LLMs for Dense Prediction
DenseMLLM adapts standard MLLMs to semantic segmentation and depth estimation using a vision-token supervision strategy, without task-specific decoders, and the project is available on GitHub; the abstract does not disclose benchmark scores or model size.
#Multimodal#Vision#DenseMLLM#Research release
why featured
HKR-H/K pass: visual-token supervision extends standard MLLMs to segmentation and depth estimation with open code. HKR-R misses; this is a single arXiv vision paper with narrow practitioner resonance, below the featured threshold.
editor take
DenseMLLM uses vision-token supervision for segmentation and depth; no scores disclosed, so don’t treat “decoder-free” as a win yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
HOIST: Humanoid Optimization with Imitation and Sample-efficient Tuning for Manipulating Suspended Loads
HOIST fine-tunes a VLA policy from VR teleoperation demonstrations and then applies iterative batched RL for humanoid suspended-load manipulation, reducing translational placement error by 19.9 cm and raw angular error by 3.56 degrees versus pure VLA rollouts in simulation and real-robot experiments.
#Robotics#Agent#Vision#HOIST
why featured
HKR-H and HKR-K pass: the task is concrete, with a clear method and error numbers. As a single arXiv robotics paper with no named lab impact or product path disclosed, it stays in the 60–71 band.
editor take
HOIST cuts VLA error by 19.9 cm; for suspended loads, RL tuning beats hoarding more VR demos.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
The paper proposes MAHALO, a framework that combines PRM step-level supervision, Multi-Action-Head DPO, objective-specific weighting, and PRM-guided decoding to align models across three settings: math reasoning, human values alignment, and multi-turn tutoring.
#Alignment#Reasoning#Tools#MAHALO
why featured
HKR-K and HKR-R pass: the mechanism and target conflicts are concrete for alignment practitioners. The post gives no scores, model scale, or artifact details, so this stays in the 60–71 band.
editor take
MAHALO targets 3 alignment settings, but metrics are undisclosed here; I buy multi-head DPO, not the no-interference claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Position: Neglecting the Sustainability of AI is Fuelling a Global AI Arms Race
The position paper introduces the Climate and Resource Aware Machine Learning framework across five levels: individual, community, industry, government, and global, arguing that sustainable AI must address both climate impact and equitable access to development resources.
#Karl Marx#Research release#Policy#Commentary
why featured
HKR-H/K/R all pass, but the article only discloses a position paper and the five-level CARAML frame, with no new dataset, policy action, or industry move; this fits the 60–71 commentary band.
editor take
CARAML spans 5 governance levels, but no carbon ledger is disclosed; Marx adds edge, and also risks thesis cosplay.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
GIFT: Geometry-Induced Functional Transfer for Category-level Object Manipulation
GIFT transfers complex object manipulation skills from a single human demonstration, using Functional Maps and ScLERP to map object-centric interactions and generate smooth robot paths, with experiments reporting task execution across diverse real-world environments without additional training.
#Robotics#Research release
why featured
HKR-H and HKR-K pass: one-demo, no-extra-training manipulation transfer has a concrete mechanism. Success rates, task count, and baselines are not disclosed, so this stays in the 60–71 band.
editor take
GIFT transfers manipulation from one human demo, but reports no success rate; clean geometry story, not VLA-grade generalization yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
The paper reformulates GRPO as a discriminative objective and identifies two objective-level limits: likelihood-misaligned surrogate scores and score-insensitive credit assignment. ConSPO uses length-normalized sequence log-probabilities, group-wise InfoNCE contrast between verified positive and negative rollouts, plus a curriculum-scheduled margin; the abstract says it beats strong baselines, but does not disclose benchmark numbers.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K is solid: ConSPO gives a testable objective rewrite and contrastive mechanism. HKR-R is narrow but real for RLVR/GRPO post-training; no benchmark lift or production impact is disclosed, so it stays in 60–71.
editor take
ConSPO swaps GRPO for group-wise InfoNCE; no benchmark numbers are disclosed, so “strong baselines” is placeholder language.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark
WSADBench evaluates 36 algorithms across 4 modalities under standardized changes in label quantity, granularity, and quality, and the authors report more than 700K experiments plus an open-source release with code and datasets for weakly supervised anomaly detection research.
#Benchmarking#SUFE-AILAB#WSADBench#Research release
why featured
HKR-K is clear: WSADBench reports 4 modalities, 36 algorithms, 700k+ experiments, and open artifacts. HKR-R is niche and HKR-H is weak, so this sits in the 60–71 research-benchmark band.
editor take
WSADBench ran 36 algorithms, 4 modalities, 700K tests; WSAD silos look tired once tabular foundation models get labels.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
When Data Is Scarce: Scaling Sparse Language Models with Repeated Training
The paper fits a scaling law for data-constrained sparse language models using experiments up to 1.92B parameters, 93.75% sparsity, 2.6B unique tokens, 41.6B total tokens, and 16 training epochs.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a niche sparse-LM scaling-law paper with experiment settings rather than a production replacement claim or major lab release, so it stays in the 60-71 band.
editor take
The paper fits sparse scaling with 1.92B models and 16 epochs; 50% sparsity sells loss, 93.75% sells compute.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging
MERIT splits mixtures by dataset-level gradient conflicts, fine-tunes partitions without inter-partition communication, and raises Qwen2.5-VL-3B’s 8-benchmark average on 136 Vision-FLAN tasks from 54.3 under joint training to 57.0.
#Fine-tuning#Multimodal#Benchmarking#Qwen
why featured
HKR-K is solid with Qwen2.5-VL-3B, 136 Vision-FLAN tasks, and 54.3→57.0. HKR-R is limited to tuning practitioners, while HKR-H is weak, so this stays in the 60–71 band.
editor take
MERIT lifts 136 Vision-FLAN tasks from 54.3 to 57.0; I buy the no-communication split more than merge mystique.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
On the Difficulty of Learning a Meta-network for Training Data Selection
The paper analyzes two obstacles in MTS for training-data selection, low gradient signal-to-noise ratio and uninformative features; across four benchmarks, it reports average gains of 5.49% over training without selection and 2.89% over the strongest baseline.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with mechanisms and four-benchmark numbers; HKR-R touches fine-tuning data efficiency. HKR-H is weak, and the topic is academic training-method work, so it stays all.
editor take
MTS gains 5.49% on four benchmarks; I buy the GSNR diagnosis, not batch size as the scaling fix.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization
The paper proposes FoLoRA, a forgetting-aware LoRA framework that scores update directions by task utility per unit forgetting penalty using a generalized Rayleigh quotient, then evaluates it against baselines on math, code, and instruction-following adaptation while the snippet does not disclose dataset names or exact scores.
#Fine-tuning#Alignment#Reasoning#FoLoRA
why featured
HKR-K/R pass: FoLoRA has a concrete mechanism and tests math, code, and instruction following. Single arXiv paper, high technical framing, and no disclosed code or production evidence keep it in 60–71.
editor take
FoLoRA gates LoRA updates with a generalized Rayleigh quotient; no code disclosed, so beware elegant spectra losing to cheap regularizers.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Adversarial Dual On-Policy Distillation from Expressive Teacher
FA-OPD co-trains a Flow Matching teacher with a lightweight MLP student, using reward and action channels on student rollouts, and outperforms strong baselines across six robot navigation, manipulation, and locomotion benchmarks under noisy or limited demonstrations.
#Robotics#Fine-tuning#Alignment#FA-OPD
why featured
HKR-K/R pass: the post gives a concrete mechanism and 6 robotics benchmarks, tied to lightweight policy deployment. HKR-H is weak, and no code, real-robot result, or major-lab signal is disclosed, so it stays below featured.
editor take
FA-OPD wins on 6 robotics benchmarks; pulling an FM teacher into on-policy loops beats another offline BC leaderboard bump.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL
The paper proposes a local perturbation model for multi-domain RL, where later-domain training damages earlier domains through a second-order term concentrated in a low-dimensional shared conflict subspace. After Code→Math→QA→CW training, a short Re-Math refresh raises Math from 57.66 to 66.04 while largely preserving other domains, with the best average score at 66.39.
#Reasoning#Code#Fine-tuning#Research release
why featured
HKR-K and HKR-R pass: the paper gives a mechanism and recovery numbers for multi-domain RL interference. HKR-H is weak, and the technical entry cost keeps it in the 60–71 research-signal band.
editor take
Short Re-Math lifts Math from 57.66 to 66.04; I buy the local conflict-subspace framing over generic forgetting talk.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
ReSkill adds three mechanisms to GRPO’s group-wise structure for agentic RL: assertion-driven conditional skill revisions, within-group rollout comparisons of skill versions, and Thompson Sampling with adaptive discounting for version selection. The abstract says it beats memory and skill-based RL methods across several domains, with the largest gains on unseen tasks.
#Agent#Reasoning#Memory#Anthropic
why featured
HKR-K passes with concrete mechanisms, and HKR-R is moderate because skill reuse in agentic RL is a real practitioner pain. No benchmark numbers or reproducible setup are disclosed, and HKR-H is weak, so it stays in the 60–71 band.
editor take
ReSkill plugs 3 skill loops into GRPO; versioned rollouts are neat, but no overhead or benchmark table yet, so generalization claims stay provisional.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Model Parallelism With Subnetwork Data Parallelism
The paper introduces Subnetwork Data Parallelism, which partitions models into structured subnetworks across workers without exchanging activations, and reports 28%-60% lower per-device memory in experiments from 1B LLaMA pre-training on FineWeb to ResNet-18 on CIFAR under FLOP-matched settings.
#Inference-opt#arXiv#LLaMA#FineWeb
why featured
HKR-K and HKR-R pass: the paper gives a named method and 28%-60% memory reduction. HKR-H is weak because the framing is narrow ML-systems jargon, so this stays in the interesting-not-featured band.
editor take
SDP cuts memory 28–60% on 1B LLaMA and ResNet-18; don’t celebrate until comms and convergence curves are disclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Distillation of Large Language Models via Concrete Score Matching
KAIST proposes Concrete Score Distillation, a discrete score-matching objective that aligns relative logit differences across all vocabulary pairs between student and teacher models, and evaluates it on GPT-2-1.5B, OpenLLaMA-7B, and GEMMA-7B-IT for instruction-following and task-specific distillation.
#Fine-tuning#Inference-opt#Benchmarking#KAIST
why featured
HKR-K and HKR-R pass: the full-vocab pairwise-logit CSD mechanism is concrete and relevant to model-compression costs. As a KAIST arXiv method without disclosed code, SOTA numbers, or production replacement evidence, it stays in 60–71.
editor take
KAIST’s CSD matches all-pair logit gaps; the shift-invariance fix is credible, but RSS gives no exact gains.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
The paper analyzes fine-tuning objectives in linear attention models and finds that updating all attention parameters harms few-shot performance, while restricting updates to the value matrix improves zero-shot performance and preserves in-context learning under the studied conditions.
#Fine-tuning#Reasoning#Research release
why featured
HKR-H/K/R all pass, but this is a linear-attention theory paper; the feed gives no real-model benchmark, code, or production impact, so it stays in the lower research-signal band.
editor take
Linear-attention theory says full fine-tuning hurts few-shot; value-only updates are a small lever, but useful for preserving ICL.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RAIGen: Rare Attribute Identification in Text-to-Image Generative Models
RAIGen introduces a label-free rare-attribute discovery framework for diffusion models, using Matryoshka Sparse Autoencoders and a minority metric based on activation frequency and semantic distinctiveness to audit Stable Diffusion and SDXL.
#Vision#Interpretability#Safety#RAIGen
why featured
HKR-K passes: RAIGen presents an unlabeled rare-attribute discovery mechanism for Stable Diffusion and SDXL. HKR-H is weak, and the post lacks result numbers, keeping it in the 60–71 band.
editor take
RAIGen audits Stable Diffusion and SDXL, but scale is undisclosed; activation-frequency rarity risks mixing real minorities with artifacts.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms
The paper introduces inner product aware quantization objectives and adaptive unbiased methods that preserve inner products for worst-case and average-case inputs; its practical ASQ algorithms run 2-10× faster than prior state-of-the-art methods while maintaining quality.
#Inference-opt#arXiv#Research release
why featured
HKR-K/R pass on the 2-10x ASQ speedup and cost-quality angle. HKR-H fails: this is a narrow quantization algorithm paper with no product rollout, open-source artifact, or LLM deployment case, so it stays in all.
editor take
ASQ gets 2-10× faster at same quality; I buy the inner-product target, MSE quantization is stale for retrieval.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CRAFT: Fine-Grained Cost-Aware Expert Replication for Efficient MoE Serving
CRAFT estimates per-layer expert replication benefit and replicates MoE experts under a fixed memory budget, raising end-to-end serving throughput by 1.14× on average and up to 1.2× over existing replication techniques in large-scale deployments.
#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the article gives a concrete mechanism and throughput gains, tied to inference cost. HKR-H is weak, and this is a narrow single arXiv systems paper, so it stays in 60–71.
editor take
CRAFT lifts MoE serving throughput 1.14× on average under fixed memory; expert replication is now a per-layer ROI problem.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Capturing LLM Capabilities via Evidence-Calibrated Query Clustering
The paper proposes ECC, which calibrates semantic embeddings with limited posterior model comparisons and parameterizes cluster capability profiles using a Bradley-Terry model; it improves LLM capability ranking quality by 17.64 percentage points over human-labeled baselines and 18.02 points over embedding-based baselines on reported evaluations.
#Benchmarking#Embedding#Tools#Research release
why featured
HKR-K passes with a concrete method and a 17.64-point result. HKR-H and HKR-R are weak, so this is useful eval research rather than a same-day industry story.
editor take
ECC lifts ranking quality by 17.64 points; the useful jab is simple: semantic clusters are a bad proxy for capability clusters.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models
LayerRoute adds per-layer routers and rank-8 LoRA adapters to all 24 blocks of Qwen2.5-0.5B-Instruct, then trains for 3,000 steps on agentic data to skip 15.25% of FLOPs for tool calls and 2.34% for planning steps.
#Agent#Inference-opt#Fine-tuning#Qwen
why featured
HKR-H/K/R all pass, but this is a single arXiv inference-optimization paper tested on Qwen2.5-0.5B, so scope and transfer remain limited; it fits the 60–71 band.
editor take
LayerRoute skips 15.25% FLOPs on tool calls but 2.34% on planning; agent inference savings live in step classification.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MESA: Improving MoE Safety Alignment via Decentralized Expertise
The paper proposes MESA for MoE-based LLM safety alignment, using optimal transport to reallocate safety duties across experts and routing constraints to activate decentralized modules; the authors report stronger defense on harmful benchmarks while preserving helpfulness, and the code is available on GitHub.
#Alignment#Safety#MESA#Research release
why featured
HKR-K is clear with a concrete mechanism and code; HKR-R lands for MoE safety alignment. HKR-H is weak, and this is a single arXiv method paper without reported gains or adoption, so it stays in 60-71.
editor take
MESA frames MoE safety sparsity as OT allocation plus router constraints; I buy the problem, but base models and gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Rethinking the Role of Temperature in Large Language Model Distillation
The paper compares FKL and RKL under temperature scaling in LLM distillation: RKL outperforms FKL at τ=1, while FKL consistently exceeds RKL at higher temperatures across instruction-following benchmarks.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the paper claims temperature flips the FKL/RKL ranking, a testable distillation recipe. Its reach is mostly training researchers, below product-update or model-release weight.
editor take
Temperature flips the KL story: RKL wins at τ=1, FKL wins higher; I don’t buy KL rankings without τ disclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Perception First: A Frontier Native-Video Model with Self-Consistency for Implicit Video QA
The paper evaluates multiple-choice video QA on VRR-QA across Qwen2.5-VL, Qwen3-VL, InternVL3, Gemma-3, Video-R1, and VideoChat-R1.5; a prompt that injects monocular depth cues lowers test accuracy by 5.8 points.
#Multimodal#Vision#Reasoning#Qwen
why featured
HKR-H and HKR-K pass: the counterintuitive depth-prompt result and 5.8-point number add signal. HKR-R is weak; this remains an arXiv multimodal evaluation paper, not a product or competitive industry event.
editor take
VRR-QA depth prompting drops accuracy 5.8 points; piling CoT onto weak video perception just amplifies noise.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
FLaG: Fine-Grained Latent Grouping for Hallucination Detection
FLaG models LLM hallucination detection as mechanism-aware evidence aggregation, softly routes each instance to multiple latent evidence groups with an energy-based mechanism, and combines group-conditional reliability signals through log-marginal aggregation; the paper says the frozen-model head leaves the underlying model unchanged, but the RSS snippet does not disclose the number of benchmarks, LLM backbones, or overhead figures.
#Safety#Alignment#Benchmarking#Research release
why featured
HKR-K/R pass: the post gives a concrete detection mechanism and targets LLM reliability. HKR-H is weak, and benchmark count or effect sizes are not disclosed, so this stays in the all band.
editor take
FLaG adds a frozen head for hallucination detection; benchmarks and overhead are undisclosed, so don't buy the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CardioLens: Revealing the Clinical Reality Gap of MLLMs via Multi-Sequence Cardiac MRI Evaluations
CardioLens evaluates 24 MLLMs on multi-sequence cardiac MRI using 473,896 slices and 13,494 verified QA pairs, finding poor performance that degrades across image understanding, report generation, and diagnosis, while random, clinical, and data-driven slice selection protocols usually change results by only about 1%.
#Multimodal#Vision#Benchmarking#CardioLens
why featured
HKR-K/R pass: the dataset scale and 24-model evaluation add concrete signal, and the real-workflow drop hits medical deployment safety. HKR-H is weak, and the cardiac MRI niche keeps it in 60–71.
editor take
CardioLens tests 24 MLLMs on 473,896 slices; 1% slice-protocol swings pin the failure on cross-sequence evidence, not sampling.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning
CoFi separates global scaffold formation from local detail refinement at inference time, then reuses the same pretrained local diffusion prior; across long-horizon robotic planning, panoramic image generation, and long video generation, it reports better global coherence and local sample quality than prior compositional baselines with 2-8x fewer denoiser evaluations.
#Robotics#Vision#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the paper gives a mechanism and a 2-8x efficiency claim tied to planning cost. As a single arXiv research item with no code or product release disclosed, it stays in the 60-71 band.
editor take
CoFi uses 2-8x fewer denoiser calls for long-horizon composition; I like that it changes inference, not the pretrained prior.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
IntraShuffler: Privacy-Preserving Framework for Heterogeneous Differential Privacy Federated Learning
IntraShuffler targets heterogeneous DP federated learning by grouping clients into privacy-compatible buckets and shuffling parameters within each bucket; across four datasets, it reduces gradient recoverability by over 60% and lowers surrogate inference accuracy from 0.78 to 0.33 while preserving epsilon-aware aggregation and comparable utility.
#Safety#Research release#Safety/alignment
why featured
HKR-K and HKR-R pass: the paper gives checkable privacy metrics and maps to training-security concerns. The DP federated-learning scope is narrow, so it stays in the 60–71 band.
editor take
IntraShuffler cuts surrogate inference from 0.78 to 0.33; ε-aware FL aggregation leaks structure, not just noise budget.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
VRPRM: Process Reward Modeling via Visual Reasoning
VRPRM trains with 3.6K CoT-PRM SFT samples and 50K non-CoT PRM RL samples, surpasses a non-thinking PRM trained on 400K total samples, and reaches up to 118% relative improvement over the base model in the BoN experiment.
#Reasoning#Vision#Alignment#VRPRM
why featured
HKR-K passes with concrete dataset sizes and a 118% BoN gain. HKR-H is weak and HKR-R is narrow; no hard exclusion applies, so this stays an interesting research item rather than featured.
editor take
VRPRM beats a 400K non-thinking PRM with 53.6K samples; the 118% BoN gain is nice, but task coverage is undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Escaping the Mode Lottery: Multi-Response Training Improves Language Model Generalization
The paper studies multi-response training that keeps multiple answers per prompt, and explains its distributional generalization gains through a variance-budget tradeoff, with the largest gains reported under high response diversity and low prompt redundancy conditions.
#Fine-tuning#Alignment#Benchmarking#Research release
why featured
HKR-H/K pass: the title has a “Mode Lottery” hook and the summary gives a multi-response training plus variance-budget mechanism. No model scale, dataset, gain size, or artifact is disclosed, so this stays research-interest only.
editor take
MRT helps most with high response diversity and low prompt redundancy; treating RLHF preference picks as distribution samples is sloppy.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
CalArena introduces a post-hoc calibration benchmark with nearly 2,000 experiments across tabular and computer vision tasks, covering binary, multiclass, and large-scale classification, and releases data, code, and evaluation tools for reproducible comparison of calibration methods.
#Benchmarking#CalArena#arXiv#Research release
why featured
HKR-K/R pass: the scale, task coverage, and open artifacts are concrete additions tied to model reliability. HKR-H is weak, and the research-benchmark angle stays below the 72 featured bar.
editor take
CalArena covers nearly 2,000 calibration runs; if PHI beats ECE, many old calibration leaderboards age badly.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition
MOSAIC frames automated data science as staged model selection, and on financial time-series forecasting and generation it builds a blueprint from task profiles, retrieved prior cases, source-code modules, and execution feedback; the abstract says experiments beat AutoML and agentic baselines, but the snippet does not disclose numeric results.
#Agent#RAG#Code#MOSAIC
why featured
HKR-K and HKR-R pass: MOSAIC describes a staged orchestration mechanism for automated data science on financial time-series tasks. No result numbers, open-source status, or reproducible setup are disclosed, so it stays in the normal research-release band.
editor take
MOSAIC beats AutoML and agent baselines on finance time-series, but no numbers are disclosed; blueprint-constrained code is credible, scoreless wins are not.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Improving Visual Representation Alignment Generation with GRPO
VRPO replaces REPA’s static alignment loss with a reward-guided generative representation policy optimization objective, and on ImageNet-256x256 it improves FID by up to 1.8 points while training 2.3x faster than REPA under identical compute budgets.
#Vision#Fine-tuning#Alignment#Research release
why featured
HKR-H/K pass: GRPO in visual alignment is a real technical hook, with FID and training-speed numbers. The work is still a narrow research-method story, so it stays in the 60–71 band.
editor take
VRPO beats REPA by 1.8 FID and 2.3x speed on ImageNet-256; I want reward ablations before buying the RL branding.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CAREF: Calibration-Aware Regularization for Explanation Faithfulness Without Rationale Supervision
CAREF optimizes predictive accuracy and explanation faithfulness with a unified LSCED loss, and its CAREF-AQ variant reaches 89.04 average accuracy and 81.00 nBERT explanation alignment on four NLE benchmarks using 6.43% trainable parameters.
#Fine-tuning#Alignment#Interpretability#CAREF
why featured
HKR-K and HKR-R pass: the post gives a concrete loss and benchmark numbers, tied to explanation faithfulness. HKR-H is weak, and the narrow arXiv method scope keeps it below featured.
editor take
CAREF-AQ hits 89.04 accuracy with 6.43% trainable params; nBERT faithfulness still needs tougher causal checks.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Multimodal Music Recommendation System Using LLMs
The paper adds audio embeddings, lyric embeddings, LLM-generated semantic metadata, and listening completion ratios to LastFM-1K, and reports that content-based features improve ID-only baselines by up to 95% in Recall and 79% in NDCG.
#Multimodal#Embedding#Fine-tuning#LastFM-1K
why featured
HKR-H and HKR-K pass: the paper reports a specific LastFM-1K multimodal setup and metric gains. HKR-R is weak because music recommendation is niche for AI practitioners, so this stays in the 60-71 band.
editor take
LastFM-1K gains up to 95% Recall from 4 content signals; I’d credit completion ratios before LLM metadata.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Turning Back Without Forgetting: Selective Backward Refinement for Parameter-Efficient Continual Learning
SABER proposes a replay-free backward refinement framework for prompt-based continual learning, using prompt-gradient geometry and loss-distribution similarity to select beneficial task updates, then restricting changes to non-interfering prompt-space directions; the abstract reports experiments across multiple continual learning benchmarks and pretrained backbones including T5-Large, LLaMA, and Qwen.
#Fine-tuning#Memory#Benchmarking#T5-Large
why featured
HKR-K/R pass because the post gives a concrete continual-learning mechanism and tests on T5-Large, LLaMA, and Qwen. Missing gains, datasets, and reproducibility details keep it in the lower 60–71 band.
editor take
SABER tests replay-free backward refinement on T5-Large, LLaMA, and Qwen; no gains disclosed, so treat “positive transfer” as unproven.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Explainable AI Through a Democratic Lens: DhondtXAI for D'Hondt-Projected Feature Attribution
The paper presents DhondtXAI, a SHAP-independent tabular attribution framework that allocates feature seats with the D’Hondt rule; on WDBC and diabetes datasets, it reports Spearman rho values of 0.9273 and 0.9353 against SHAP under aligned settings.
#Interpretability#DhondtXAI#SHAP#LIME
why featured
HKR-H/K pass: an election seat-allocation rule is mapped to tabular attribution, with two concrete correlation results. HKR-R fails because it is niche XAI research without product impact or a practitioner-wide debate hook.
editor take
DhondtXAI hits 0.9273/0.9353 rho versus aligned SHAP; I buy the complement, not a SHAP replacement.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
RefLoRA selects the optimal low-rank factorization at each step to minimize an upper loss bound, and the paper evaluates convergence and performance on DeBERTaV3, LLaMA-7B, LLaMA2-7B, and LLaMA3-8B across natural language understanding and commonsense reasoning tasks.
#Fine-tuning#Inference-opt#Benchmarking#DeBERTaV3
why featured
HKR-K and HKR-R pass: RefLoRA gives a concrete training mechanism and tests DeBERTaV3 plus LLaMA 7B/8B variants. As a single arXiv fine-tuning method, it stays incremental and below featured.
editor take
RefLoRA refactors low-rank matrices each step; gains are undisclosed here, so treat it as a LoRA stability patch.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Quality Audio Prototyping: A Prototype System for Unified Sound Retrieval and Procedural Generation
The paper introduces QuAP, a prototype that combines similarity-based audio retrieval, real-time procedural sound models, and a rule-based parameter assistant in one interface; preliminary evaluation reports statistically significant quality gains in five of six embedded synthesis models and a user study with 16 practitioners.
#Audio#Tools#Quality Audio Prototyping#QuAP
why featured
HKR-K passes with a concrete mechanism and evaluation: 5 of 6 synthesis models improved and 16 practitioners participated. The topic is niche audio tooling with a dry paper angle, so it fits the 60–71 interesting band.
editor take
QuAP tested 16 practitioners and 6 synthesis models; it smells like Copilot for sound tools, with small-sample evidence.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
The Representation-Rationalizability Tradeoff in Reward Learning
The paper decomposes excess cross-entropy loss in RLHF reward learning into two terms: a representational term that shrinks with a richer φ, and an aggregation term that grows when richer representations expose more comparisons that no scalar reward can rank consistently.
#Alignment#Fine-tuning#Reasoning#Research release
why featured
HKR-K is clear: the representation and aggregation terms give RLHF reward learning a concrete mechanism. HKR-R is narrow to alignment/reward-modeling readers, and HKR-H is too academic for featured.
editor take
This decomposes RLHF loss into representation and aggregation terms; DPO is hit too, as richer φ exposes scalar-inconsistent preference cycles.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Per-Group Error, Not Total MSE: Fine-Tuning VLA Models for 11-DoF Mobile Manipulation
The paper fine-tunes SmolVLA and π0.5 on the 11-DoF Toyota HSR, and 60 real-robot trials show π0.5 80k scoring 4.0/4, above expert-only 3k at 3.75/4 and HSR-SmolVLA at 3.5/4.
#Robotics#Fine-tuning#Benchmarking#Toyota
why featured
HKR-K is solid: 60 real-robot trials and a 4.0/4 result add testable signal. HKR-H and HKR-R are weak because the VLA robotics framing is niche, so it stays in the 60–71 band.
editor take
π0.5 80k scores 4.0/4 on 60 trials; I buy it, total MSE lies on heterogeneous robot joints.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms
The paper argues that generative pre-training should design the inference procedure before the training objective, using three mechanisms: DDIM-style samplers’ target-time limitation, multi-token prediction’s joint-distribution limitation, and flow-map plus few-step distillation methods that parameterize long-range inference moves.
#Inference-opt#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the angle is unusual and the summary names three mechanisms. No experiment scale, gain numbers, or code are disclosed, so HKR-R fails and the item stays in the 60–71 research-interest band.
editor take
This frames AR and diffusion as inference-procedure choices; its 3 mechanisms cut to one point: objectives cannot rescue bad sampling factorization.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Learning Fine-grained Parameter Sharing via Sparse Tensor Decomposition
FiPS combines cross-block parameter sharing, low-rank factorization, and sparsity for transformer MLP compression, reducing ViTs by up to 33% with under 1% top-1 accuracy loss on ImageNet-1k and reaching 57% compression when combined with fine-tuning.
#Inference-opt#Fine-tuning#FiPS#Gemma
why featured
HKR-K is backed by a concrete compression result and mechanism, and HKR-R touches inference cost. HKR-H is weak; the post lacks code, deployment evidence, or LLM-scale results, so it stays in all.
editor take
FiPS cuts ViTs 33% with under 1% ImageNet loss. The wild part: 3-bit FiPS beats 2-bit QAT perplexity on Gemma-2-2B at 8x compression.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Semantic-Geometric Task Representations for Bimanual Manipulation
The paper introduces a semantic-geometric graph task representation for bimanual manipulation, using an MPNN encoder and Transformer decoder to predict future actions, objects, and motions across 11 tasks from two datasets.
#Robotics#Reasoning#arXiv#Research release
why featured
HKR-K passes: the item gives a semantic-geometric representation, 11 bimanual tasks, and model architecture. HKR-H and HKR-R are weak, so this stays as useful robotics research, not featured industry news.
editor take
SGTR spans 11 bimanual tasks; only 2 real-robot successes are disclosed, so I want failure splits and robot-set size.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Generative AI and Digital Ecosystem Resilience: A Proactive Lifecycle-Based Survey
The arXiv survey uses the C5 Interaction Model to review proactive detection of synthetic content threats, covering Coordinated Inauthentic Behavior, multi-layer graph coordination detection, Hawkes processes, and agentic AI systems.
#Agent#Embedding#Safety#Research release
why featured
This is a safety survey, not a model release or reproducible experiment. HKR-K has concrete frameworks and detection mechanisms, HKR-R hits synthetic-content risk, but HKR-H is weak, so it stays in all.
editor take
This survey covers C5, CIB, Hawkes, and agentic AI, but reports no benchmarks; I don’t buy the proactive-detection wrapper.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
KDH-CAD: Knowledge-data hybrid CAD learning under data scarcity
KDH-CAD integrates foundation-model priors, structured CAD knowledge from textbooks and tutorials, and small labeled datasets for mechanical part classification, reaching 92.6% accuracy with 250 training samples and 95.8% with 1,000 samples without fine-tuning the foundation model.
#Fine-tuning#Benchmarking#arXiv#KDH-CAD
why featured
HKR-K is strong: KDH-CAD reports 92.6% accuracy with 250 CAD samples and 95.8% with 1,000, without fine-tuning the foundation model. The CAD classification niche lacks product impact, so it stays in the 60–71 all band.
editor take
KDH-CAD hits 92.6% with 250 samples and no foundation-model tuning; for CAD, this beats another synthetic-data treadmill.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Learning to Sample From Diffusion Models via Inverse Reinforcement Learning
Bourdrez et al. introduce an inverse reinforcement learning framework that learns diffusion sampling strategies without retraining the denoiser, modeling sampling as a finite-horizon MDP; on ImageNet-64, one training run replaces exhaustive grid search at up to 9x lower cost with 16% inference overhead.
#Inference-opt#Reasoning#Constant Bourdrez#Alexandre Vérine
why featured
HKR-K/R pass: the paper gives a concrete mechanism and ImageNet-64 numbers, with a real cost angle. HKR-H is weak, and the IRL sampler topic is technical, so it stays in the 60–71 band.
editor take
Bourdrez uses IRL for sampling, cutting ImageNet-64 grid-search cost 9x; I buy the tuning win, not the 16% inference tax.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts
CARE-RL reports Total Avg scores of 47.9 on Qwen2.5-7B and 50.7 on Qwen3-4B, combining PA-GRM reward generation with DACSP capability subspace projection across math, chat, and instruction-following benchmarks.
#Reasoning#Alignment#Fine-tuning#Qwen
why featured
HKR-K and HKR-R pass: the item gives Qwen2.5-7B/Qwen3-4B scores and concrete reward/subspace mechanisms. HKR-H is weak, and this is a single arXiv method paper with no disclosed code or production impact.
editor take
CARE-RL scores 47.9 on Qwen2.5-7B; I buy DACSP more than PA-GRM’s protocol-wrapped reward judging.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Exploring and Exploiting Stability in Latent Flow Matching
The paper shows that LFM models preserve similar outputs under data reduction and architectural shrinkage with identical noise seeds, then uses three sample-scoring criteria and a two-model coarse-to-fine trajectory to save data and achieve more than 2x inference speedup with comparable generative outputs.
#Inference-opt#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives three scoring criteria and a testable >2x inference-speed claim. The LFM scope is narrow and lacks product uptake, so it stays in the interesting band, not featured.
editor take
LFM keeps outputs similar under identical noise seeds and gets 2x+ speedup; if reproducible, this pressures distillation-heavy pipelines.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks
The authors introduce GenAN, a sim-to-real pipeline that learns actuator models from joint position trajectories and transfers simulation-trained reaching, ball-in-a-cup, and table-tennis policies to PAMY2, a four-degree-of-freedom tendon-driven robot arm powered by pneumatic artificial muscles.
#Robotics#PAMY2#GenAN#Research release
why featured
HKR-H comes from the muscle-actuated robot/table-tennis angle, and HKR-K has GenAN plus a 4-DOF PAMY2 test. HKR-R is weak because this is niche robotics control, not a broad AI-industry trigger.
editor take
GenAN identifies actuation from joint trajectories and transfers three tasks on 4-DoF PAMY2; no torque sensors is the sharp part.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-Class Architectures
The paper tracks 30 mechanistic-interpretability runs across Pythia 1B, OLMo 1B, and OLMoE 1B-7B, finding that in DCLM-trained models induction circuits form 10-20x earlier in tokens than BOS-attractor attention sinks.
#Interpretability#Reasoning#Pythia#OLMo
why featured
HKR-H and HKR-K pass: the training-time circuit hook is specific, and the post gives 30 runs plus a 10-20x token-timing gap. HKR-R is weak, and the mechanistic-interpretability angle stays niche, so this sits in all.
editor take
Across 30 Pythia/OLMo/OLMoE runs, induction appears 10–20x earlier in tokens; stop bundling capability circuits with BOS sinks.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
SurrogateSHAP: Training-Free Contributor Attribution for Text-to-Image Models
SurrogateSHAP replaces per-subset retraining with inference from a pretrained model and uses a gradient-boosted tree to derive Shapley values analytically, with evaluation across 3 attribution tasks covering DDPM-CFG on CIFAR-20, Stable Diffusion on Post-Impressionist artworks, and FLUX.1 on Fashion-Product data.
#Multimodal#Vision#Interpretability#SurrogateSHAP
why featured
HKR-K passes with a concrete method and evaluation setup. HKR-H and HKR-R are weak; as a single arXiv interpretability paper with no product uptake signal, it fits the 60–71 interesting band.
editor take
SurrogateSHAP covers 3 T2I attribution tasks; I buy the audit angle, but fair payment still needs a pricing mechanism.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MidSteer: Optimal Affine Framework for Steering Generative Models
The paper introduces MidSteer, an affine steering framework that links standard behavior removal to LEACE, defines LEACE-Switch for concept switching, and evaluates directed minimal-disturbance transformations across vision diffusion models and LLMs.
#Alignment#Safety#Multimodal#MidSteer
why featured
HKR-K and HKR-R pass: the post gives MidSteer, a LEACE-special-case proof, and tests on diffusion models plus LLMs. HKR-H is weak, and the arXiv summary lacks code, metrics, or broad replication.
editor take
MidSteer frames behavior removal as LEACE; I buy the theory cleanup, but LLM tasks and baselines aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants
The paper argues that ML fairness audits should quantify social determinants before mitigation, using a college admissions model, a U.S. census demographic study, and a breast cancer screening application to show that mitigation centered only on sensitive attributes can introduce structural injustice.
#Alignment#Safety#arXiv#Research release
why featured
HKR-K and HKR-R pass: it offers a concrete fairness-audit mechanism and examples, but no new data, tool, or reproducible test. No hard exclusion; this fits the interesting-but-not-featured band.
editor take
Three cases hit a stale fairness habit: sensitive-attribute fixes can treat structural variables as noise; I buy the warning.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Short-form Text Rewriting with Phi Silica
The paper adapts Phi Silica for short-form rewriting using public slide-deck text, GPT-5-chat supervision, parameter-efficient fine-tuning, and LLM-as-judge evaluation; the abstract reports improved semantic fidelity, reduced hallucinations, and a higher preference win rate against GPT-5-chat rewrites, but it does not disclose dataset size or exact scores.
#Fine-tuning#Alignment#Benchmarking#Phi Silica
why featured
HKR-K and HKR-R pass: the paper gives a reproducible fine-tuning/evaluation setup, but no concrete win-rate or hallucination numbers are disclosed, and this is not a flagship model release.
editor take
Phi Silica beats GPT-5-chat by GPT-5-chat judging; no dataset size or scores disclosed, so I’d treat this as a neat distillation loop.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Logit Distillation on Manifolds: Mapping by Learning
The paper introduces layer-wise and point-wise projection mappings that align student and teacher representations during training, and when combined with LoRA injection, the method reduces student trainable parameters to under 1% of the teacher model while improving WER over other distillation methods in ablation studies.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-K/R pass: the paper gives a concrete compression mechanism, <1% trainable-parameter claim, and WER ablation. HKR-H is weak, and single arXiv work without a release or major lab link stays in all.
editor take
Logit Distillation cuts trainable params below 1% of the teacher; I want the full WER table, not an RSS claim.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Calibrating Uncertainty for Zero-Shot Adversarial CLIP
The paper proposes an adversarial fine-tuning objective for CLIP that reparameterizes outputs as Dirichlet concentration parameters and reports improved uncertainty calibration across multiple zero-shot benchmarks, while the abstract does not disclose benchmark names, attack settings, or numeric gains.
#Vision#Alignment#Benchmarking#CLIP
why featured
HKR-K/R pass: the mechanism targets adversarial robustness and uncertainty calibration for zero-shot CLIP. Benchmarks, gains, and reproducible setup are not disclosed, and the angle is narrow, so it stays in the lower interesting band.
editor take
CLIP outputs become Dirichlet concentrations; no attacks or gains disclosed, so treat the calibration claim as unverified.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MomentKV: Closing the Directional Gap in KV Cache Eviction for Long-Context Inference
MomentKV keeps compact moment statistics for evicted tokens, including count, key mean, value mean, and value-key covariance, and tests the method on LongBench and RULER with LLaMA-3.1-8B-Instruct and Qwen3-4B-Instruct; the abstract says it beats baselines at every cache budget but does not disclose exact scores.
#Inference-opt#Memory#Benchmarking#LLaMA
why featured
HKR-K/R pass: the mechanism is concrete and tied to long-context cost. HKR-H is weak, and the summary gives no LongBench or RULER scores, so this stays mid-band research signal.
editor take
MomentKV adds four moment stats for evicted KV; no LongBench/RULER scores, but directional mismatch beats another eviction heuristic as a thesis.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Improving Diffusion Planners by Self-Supervised Action Gating with Energies
SAGE re-ranks sampled diffusion-planner trajectories at inference time using JEPA latent prediction error as an energy score, combines it with value estimates for action selection, and requires no environment rollouts or policy retraining across locomotion, navigation, and manipulation benchmarks.
#Agent#Reasoning#Inference-opt#SAGE
why featured
HKR-K passes with a clear inference-time gating mechanism, and HKR-R fits robotics/agent planning concerns. HKR-H is weak, no improvement numbers are disclosed, and the arXiv paper remains specialist, so it stays in 60-71.
editor take
SAGE re-ranks trajectories with JEPA error; no rollouts or retraining makes this inference patch more practical than another policy-training loop.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Fair Finetuning Mitigates Distribution Inference Attacks
The paper proposes Fair Fine-tuning, fine-tunes trained models on complementary-distribution samples under an Equalized Odds constraint, and reports adversarial accuracy gaps below the 0.1 detection threshold across six datasets.
#Fine-tuning#Safety#Alignment#Research release
why featured
HKR-K/R pass: the paper gives a concrete defense mechanism and 6-dataset result, with privacy risk relevance for fine-tuning. Its niche arXiv security angle lacks product or industry impact, so it stays in 60–71.
editor take
Fair Fine-tuning cuts DIA gaps below 0.1 on six datasets; EO helps, but the accuracy-cost curve is undisclosed.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Domain-Shift-Aware Conformal Prediction for Large Language Models
The paper proposes Domain-Shift-Aware Conformal Prediction, which reweights calibration samples by proximity to the test prompt under domain shift, and reports more reliable coverage than standard conformal prediction on MMLU while maintaining efficiency.
#Alignment#Benchmarking#arXiv#MMLU
why featured
HKR-K passes via a concrete mechanism and MMLU setup; HKR-H is weak and HKR-R is narrow. This is useful for calibration/evaluation practitioners, but not broad enough for featured.
editor take
DS-CP reweights calibration by prompt proximity; MMLU works, but cross-task shift and open-set details are undisclosed.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
The paper proposes OGLS-SD, an outcome-guided logit-steering framework that contrasts teacher logits from successful and failed on-policy trajectories and uses verifiable outcome rewards for token-level guidance; experiments on mathematical reasoning benchmarks report more stable self-distillation and higher performance than standard OPSD and other variants.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K passes: the article gives a concrete outcome-guided logit steering mechanism and claims math-benchmark gains over OPSD. HKR-H and HKR-R are weak, so this stays in the 60–71 all band.
editor take
OGLS-SD contrasts teacher logits from success/failure traces. Scores are undisclosed; I’d file it as an OPSD stability patch.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
How Can Reinforcement Learning Achieve Expert-level Placement?
The paper proposes inferring step-by-step trajectories from final expert chip layouts and training a reward model with demonstrations or preferences; experiments report that the framework learns from even a single design and generalizes to unseen cases.
#Agent#Reasoning#Research release
why featured
HKR-H/K pass via the reverse-trajectory mechanism and single-design generalization claim. No benchmark numbers, code, or product path are disclosed, and EDA placement is narrow, so it stays in the 60-71 band.
editor take
The paper claims one expert layout trains the reward model; benchmark scale is undisclosed, so treat generalization lightly.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
The Attribution Contract: Feature Attribution for Generative Language Models
The paper introduces the Attribution Contract for generative language model attribution, specifying five items: the explained output, eligible features, assumed generation process, held-fixed variables, and attributed model score.
#Interpretability#Research release
why featured
HKR-K passes: the paper defines a five-part attribution contract for generative LMs. HKR-H/R are weak; no experiment, code, or production claim is disclosed, so it stays in all.
editor take
Attribution Contract names 5 required choices; I buy the move from attribution heatmaps to explicit explanatory contracts.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Feature to Dynamics: Feature-space to Autoregression Strategy for Zero-shot Time Series Forecasting
The paper proposes FSA for zero-shot univariate time-series forecasting, mapping interpretable features to autoregressive strategies and outperforming Transformer-based architectures under identical pretraining data, training protocol, and comparable parameter budgets.
#Reasoning#Benchmarking#arXiv#FSA
why featured
HKR-H/K pass: the paper has a Transformer comparison under controlled budgets. As a single arXiv item with narrow time-series scope and limited disclosed reproducibility details, it sits in the 60–71 band.
editor take
FSA beats Transformers under matched data, protocol, and params; no datasets or error tables in the snippet, so don't crown it yet.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Perturbation Effects on Accuracy and Fairness among Similar Individuals
The paper defines Robust Individual Fairness and introduces RIFair, a black-box decoupled perturbation framework that builds semantics-preserving instance pairs and exposes failure modes missed by robustness-only or fairness-only metrics across multiple model architectures and real-world textual datasets.
#Safety#Benchmarking#RIFair#Research release
why featured
HKR-K has a concrete mechanism and HKR-R fits fairness evaluation concerns. But this is a specialist arXiv research item without product impact, major-lab signal, or a strong hook, so it stays in all.
editor take
RIFair tests RIF with black-box perturbations; dataset counts are undisclosed, but separate robustness and fairness metrics miss joint failures.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DAPD: Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs
DAPD uses self-attention to build a conditional dependency graph over masked tokens, then selects an independent set for parallel unmasking at each iteration; experiments on LLaDA and Dream report a better accuracy-steps trade-off than existing methods without auxiliary models or retraining.
#Inference-opt#Reasoning#LLaDA#Dream
why featured
Narrow arXiv inference-optimization paper: HKR-K lands on a concrete mechanism, and HKR-R is limited to diffusion-LLM latency watchers. No speedup or benchmark numbers are disclosed, so it stays in 60–71.
editor take
DAPD picks independent sets from attention graphs on LLaDA and Dream; training-free is nice, but the snippet gives no speed or accuracy numbers.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Value Flows
Value Flows uses flow-based models to estimate full future return distributions and identify high-variance states, then reports a 1.3x average success-rate improvement across 37 state-based and 25 image-based benchmark tasks.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes on a concrete mechanism and 1.3x results across 62 tasks. HKR-H/R are weak: the title is a standard arXiv method name, and the post does not disclose code release or product impact.
editor take
Value Flows reports 1.3x success across 62 RL tasks; I buy the direction, pending offline baselines and compute cost.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time
The paper proposes RCA, an inference-time method that uses raw pre-softmax attention scores to build a dynamic gain field and amplify context-token value-vector norms without changing attention probabilities; Llama-3 experiments report improved factual consistency under knowledge conflicts, but the snippet does not disclose exact scores.
#Inference-opt#Reasoning#Llama#Research release
why featured
HKR-K/R pass: RCA describes an inference-time value-norm gain mechanism for factual consistency. HKR-H fails, and the post gives no concrete scores, so it stays in the 60-71 research-signal band.
editor take
RCA boosts value norms from pre-softmax scores; without exact numbers, treat the Pareto claim as arXiv self-reporting.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Position: Current Benchmarking Hinders Real Progress in Deep Learning for Time Series Forecasting
An arXiv position paper argues that time-series forecasting benchmarks overlook design dimensions such as globality and locality, and proposes an auxiliary forecasting model card template to record key architectural choices when comparing existing and new models.
#Benchmarking#arXiv#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the paper has a clear anti-benchmark angle and concrete evaluation dimensions. Its reach stays inside time-series forecasting research, with no model release, tool adoption, or cost signal, so it fits the 60-71 band.
editor take
arXiv 2512.22702 says globality/locality can dominate sequence layers; time-series SOTA without these controls is config-table theater.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Tackling Misinformation by Teaching Logical Fallacies via Socratic Questioning
The paper introduces LFTutor, an LLM-based tutoring system that teaches laypeople logical fallacies using intent-driven Socratic questioning and critical argumentation, and automatic plus human evaluations show it significantly outperforms baseline LLMs that lack those pedagogical strategies.
#Reasoning#Alignment#Safety#LFTutor
why featured
HKR-K passes via a concrete tutoring mechanism and auto/human evaluation against baselines; HKR-H and HKR-R are weak. This is a readable LLM safety-education paper, but no metrics or deployment path are disclosed.
editor take
LFTutor claims significant gains over baseline LLMs, but sample size and effect size are missing; don't buy tutor vibes as misinformation defense.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
On the Uncertainty Quantification Ability of Tabular Foundation Models
The paper compares TabPFN v2.5 with Gaussian processes on multiple regression settings for uncertainty quantification, finding that GPs deliver stronger predictive accuracy and UQ in data-scarce cases or when the chosen kernel matches the underlying function prior.
#Benchmarking#TabPFN#Gaussian processes#Research release
why featured
HKR-H and HKR-K pass: TabPFN v2.5 underperforming GPs is a useful twist, with clear small-data and matched-kernel conditions. The topic is niche tabular-UQ benchmarking, so it stays in the 60–71 band.
editor take
TabPFN v2.5 loses UQ to default GPs on small-data regression; learned priors still don't replace Bayesian ones.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Why Self-Inconsistency Arises in GNN Explanations and How to Exploit It
The paper attributes self-inconsistency in SI-GNN explanations to re-explanation-induced context perturbation and proposes Self-Denoising, a model-agnostic, training-free post-processing method that uses one extra forward pass and adds about 4–6% computational overhead in experiments.
#Interpretability#Research release
why featured
HKR-K passes with a concrete mechanism and overhead claim. HKR-H and HKR-R are weak because GNN explanation consistency is narrow for general AI practitioners, so it stays in the lower 60–71 band.
editor take
Self-Denoising adds one forward pass and 4–6% overhead; SI-GNN explanation instability finally gets a cheap patch.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Multimodal Function Vectors for Visual Relations
The paper identifies a small subset of attention heads in OpenFlamingo and Qwen3-VL that transmit visual relation representations, extracts multimodal function vectors through causal mediation analysis, and fine-tunes them with frozen LMM parameters to outperform in-context learning baselines.
#Multimodal#Vision#Interpretability#OpenFlamingo
why featured
HKR-K passes: the item gives a testable multimodal function-vector mechanism and names OpenFlamingo and Qwen3-VL. The paper is niche, with no product angle or broad HKR-R nerve, so it sits in the 60-71 band.
editor take
OpenFlamingo and Qwen3-VL localize visual relations in few attention heads; no gains disclosed, but vector tuning beats prompt stuffing.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
ODTQA-FoRe: An Open-Domain Tabular QA Dataset for Future Data Forecasting and Reasoning
ODTQA-FoRe introduces an open-domain tabular QA task using real estate data for time-series forecasting and forecast-based reasoning. TimeFore splits the pipeline into three roles: a Retriever generates SQL, a Forecaster calls external time-series models, and an Analyzer composes the final answer.
#Agent#Reasoning#Tools#ODTQA-FoRe
why featured
HKR-K passes with a new dataset and the TimeFore retriever/forecaster/analyzer mechanism. HKR-H/R are weak, so this stays in the lower interesting band without hard exclusion.
editor take
ODTQA-FoRe discloses real estate scope and a three-role pipeline, not dataset size; external forecasters make this an orchestration test.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation
The paper reformulates audio-driven talking-head evaluation with Soft Dynamic Time Warping and benchmarks 20 methods across seven datasets under standardized protocols.
#Audio#Vision#Benchmarking#Research release
why featured
HKR-K passes with a concrete metric mechanism and benchmark scale. HKR-H and HKR-R are weak because the topic is niche, so it fits the 60–71 interesting band.
editor take
Soft-DTW benchmarked 20 methods on 7 datasets; frame-wise talking-head scores punish harmless timing drift, so old lip-sync leaderboards look suspect.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Reason, Retrieve, Re-rank: A Zero-Shot Reasoning-Aware Framework for Composed Video Retrieval
R3-CoVR achieves 91.9% R@1 on the CVPR 2026 VidLLMs zero-shot CoVR-R test set; Qwen3-VL-8B first verbalizes the post-edit result, SigLIP-2 retrieves candidates, and the same multimodal model re-ranks the shortlist with constraint-aware judging.
#Reasoning#Multimodal#RAG#Qwen
why featured
HKR-K passes with a concrete metric and pipeline; HKR-H and HKR-R are weak. This is a niche video-retrieval paper, not hard-excluded, so it sits in the 60–71 band.
editor take
R3-CoVR hits 91.9% zero-shot R@1; SigLIP retrieval is routine, Qwen3-VL-8B reranking from 72.7 is the punchline.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Learning to Reduce Search Space for Generalizable Neural Routing Solver
The paper introduces L2R, a learning-based dynamic search-space-reduction framework that prunes nodes at each construction step using problem-specific features, and reports experiments across VRP variants where the solver scales to 10 million-node instances while maintaining solution quality; the code is released on GitHub.
#Reasoning#Inference-opt#CIAM-Group#Research release
why featured
HKR-K passes on the dynamic pruning mechanism, 10M-node VRP claim, and open code. HKR-H and HKR-R are weak because this is a specialist routing-solver paper, not a broad AI product or practice story.
editor take
L2R claims 10M-node VRP scaling, but hardware and latency are undisclosed; I’d treat it as an NCO scaling stress test.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Unsupervised Cognition
arXiv:2409.18624v4 proposes a primitive-based unsupervised method for decision-making, models input space as an input-agnostic distributed hierarchical structure, and claims stronger results than prior methods on small, incomplete, and cancer type classification tasks.
#Reasoning#Benchmarking#arXiv#Research release
why featured
HKR-H/K pass, but this is a single arXiv v4 paper with method claims only; code, authorship signal, and deployment conditions are not disclosed, so it stays in the 60-71 research band.
editor take
arXiv:2409.18624v4 claims unsupervised beats supervised baselines; datasets and metrics are undisclosed, so discount the cognition framing.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Medication-Aware Financial Exploitation Detection for Alzheimer's Patients Using Edge-Aware Interaction Risk Modeling
The paper evaluates financial exploitation detection on a 45-day hybrid simulation for 180 Alzheimer’s patients, using 8,100 medication records and 30,855 transactions; the interaction-aware model raises recall in medication-induced vulnerability windows from 0.7442 to 0.9070, while the financial-only baseline still has the highest global F1-score at 0.5000.
#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R pass, but this is a narrow single-paper result based on simulated patient data, with no product path or broader model impact disclosed. Keep it in all, below featured.
editor take
On 180 simulated patients over 45 days, interaction modeling lifts vulnerable-window recall to 0.9070; global F1 still loses to the 0.5000 baseline.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
MADPO trains a reward model to estimate preference margins, then applies per-sample continuous weights to the DPO loss; the paper reports better results than strong baselines on a human-preference summarization task across a sweep of decoding temperatures.
#Alignment#Fine-tuning#Research release
why featured
HKR-K passes via a concrete MADPO mechanism, but HKR-H and HKR-R are weak: the framing is technical and the audience impact is narrow. No hard exclusion applies, so it sits in the lower interesting band.
editor take
MADPO reweights DPO loss per margin; with only summarization results disclosed, I’d wait for code and cross-task replication.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Tempora: Characterising the Time-Contingent Utility of Online Test-Time Adaptation
Tempora evaluates 11 test-time adaptation methods with 3 time-contingent utility metrics, and 750+ temporal evaluations show conventional rankings do not predict rankings under latency pressure.
#Inference-opt#Benchmarking#Tempora#Research release
why featured
HKR-H/K pass: the paper has concrete evaluation scale and a counterintuitive latency-ranking claim. TTA benchmarking is niche and far from product or agent impact, so it stays in the lower band.
editor take
Tempora tests 11 TTA methods across 750+ temporal runs; offline accuracy rankings collapse under latency constraints.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification
The paper introduces a class-auxiliary-data-conditioned diffusion model that generates synthetic embeddings for zero-shot environmental sound classification, combines them with seen-class embeddings to train a classifier, and reports average gains over baselines across six audio datasets including ESC-50, UrbanSound8k, TAU Urban Acoustics 2019, and GTZAN.
#Audio#Embedding#Benchmarking#Research release
why featured
HKR-K passes with a concrete mechanism and results across 6 datasets. HKR-H/R are weak, and zero-shot environmental audio classification is niche research, so it stays in all.
editor take
Diffusion wins on average across 6 audio sets; no gain sizes are disclosed, so zero-shot audio is far from solved.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Beyond Model Base Retrieval: Weaving Knowledge to Master Fine-grained Neural Network Design
M-DESIGN uses edit-effect evidence graphs for retrieval-augmented model refinement, and experiments on 67,760 graph neural networks across 22 datasets show it reaches the search-space best performance in 26 of 33 cases under a strict budget.
#RAG#Reasoning#Benchmarking#M-DESIGN
why featured
HKR-K is solid: the mechanism and evaluation scale are concrete, with 26/33 best cases. HKR-H/R are weak because fine-grained GNN architecture design is narrow, so this stays in all rather than featured.
editor take
M-DESIGN hits search-space best in 26/33 cases; better than static retrieval, but the strict budget is undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
How Hard Can It Be? Hardness-Aware Multi-Objective Unlearning
The paper proposes HAMU, a machine unlearning algorithm that quantifies hardness via similarity between forget and retain data, then updates weights to guarantee a specified forget-quality improvement while minimizing retain-utility degradation under non-convex models.
#Alignment#Safety#HAMU#Research release
why featured
HKR-K and HKR-R pass: HAMU offers a concrete hardness mechanism and touches compliance/safety pain. HKR-H is weak, and the post lacks experiment numbers, code conditions, or mainstream-model results.
editor take
HAMU scores hardness via forget/retain similarity; the useful bit is telling you when unlearning is doomed, not another benchmark bump.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Efficient Weighted Sampling via Score-based Generative Models
The paper proposes a training-free weighted sampling framework that augments the backward diffusion process with auxiliary guidance, avoids Hessian evaluations and particle-based resampling, and reports 1.2× to 4.7× speedups in settings including Stable Diffusion XL.
#Inference-opt#Stable Diffusion XL#Research release
why featured
HKR-K/R pass: the post gives a training-free reverse-diffusion guidance mechanism and 1.2×–4.7× speedups. Still a technical arXiv sampling paper, so it stays in the 60–71 band.
editor take
This reports 1.2×–4.7× speedups on SDXL-class settings; skipping Hessians and resampling is a practical diffusion-control trick.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
A Theoretical Framework for Self-Play Theorem Proving Algorithms
The paper models theorem sets as graphs and proves that, when the theorem graph is well connected, a prover–conjecturer system using a reversible random walk can grow the set of proved theorems exponentially.
#Agent#Reasoning#Embedding#Research release
why featured
HKR-H/K pass: self-play prover-conjecturer systems and an exponential-expansion claim add signal. No experiments, code, or product path are disclosed, and the theory-heavy angle keeps it in the interesting band.
editor take
The proof gives exponential growth on well-connected theorem graphs; that assumption does the heavy lifting, far from Lean-scale evidence.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion
EMoE separates expert-specific paths at an early MoE layer in pre-trained text-to-image diffusion models, reuses the same initial noise, and measures latent variance after the first denoising step; on COCO and CC3M, it ranks prompts by text-image alignment quality more consistently than diffusion-specific and router-based baselines.
#Multimodal#Vision#Benchmarking#EMoE
why featured
HKR-K passes with a clear mechanism and benchmark setting; HKR-H and HKR-R are weak. This is a narrow arXiv research item, so it belongs in all, below featured.
editor take
EMoE predicts alignment risk from first-step latent variance. It skips full sampling, but only for MoE diffusion models—not SDXL.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Collaborative and Efficient Fine-tuning: Leveraging Task Similarity
The paper proposes CoLoRA, a fine-tuning method that trains one shared adapter for task similarity and personalized adapters for user-specific tasks, with theoretical guarantees on heterogeneous linear regression and NLP experiments under varying task similarity.
#Fine-tuning#CoLoRA#LoRA#Research release
why featured
HKR-K is clear and HKR-R is modest: CoLoRA offers a shared-adapter plus per-user-adapter fine-tuning setup. No metrics or strong headline hook are disclosed, so it stays in the normal research band.
editor take
CoLoRA adds 1 shared adapter plus personal adapters; multi-tenant fine-tuning pain is real, but NLP gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Ethical Fairness in Ubiquitous Health Sensing without Known Attributes
Flare uses Fisher Information to identify latent subgroups without demographic or heterogeneous attributes, then applies do-no-harm optimization and a BHE metric suite across four health-sensing datasets: EDA, OhioT1DM, IHS, and Percept-R.
#Alignment#Interpretability#Shaily Roy#Tanzeem Choudhury
why featured
HKR-K and HKR-R pass: the Fisher Information mechanism and 4 datasets are concrete, and fairness without attributes is relevant. HKR-H is weak, and health sensing is too narrow for featured.
editor take
Flare tests attribute-free fairness on 4 health datasets; I buy the mechanism, not the ethics gloss without metric weighting disclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Adaptive Time Series Reasoning via Segment Selection
ARTIST frames time-series reasoning as a sequential decision problem and improves average accuracy by 6.46 absolute percentage points over the strongest baseline across six benchmarks.
#Reasoning#Agent#Fine-tuning#ARTIST
why featured
HKR-K is solid: ARTIST turns time-series reasoning into sequential segment selection and reports +6.46 pp average accuracy across 6 benchmarks. HKR-H and HKR-R are weak because the paper is narrow academic ML, so it sits in the 60-71 band.
editor take
ARTIST gains 6.46 points on six benchmarks; stuffing whole time series into models looks lazier than segment selection.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning
The paper introduces an LLM-guided framework that synthesizes executable auxiliary reward programs, trains policies from scratch with MAPPO under a fixed compute budget, and evaluates candidates across four Overcooked-AI layouts using sparse task returns for selection.
#Agent#Reasoning#Overcooked-AI#Research release
why featured
HKR-K passes with a concrete mechanism and test setup; HKR-H and HKR-R are weak. MARL reward design is narrow for general AI practitioners, so this fits the 60–71 research-update band.
editor take
LLM writes reward programs for MAPPO across 4 Overcooked-AI layouts; the key claim lacks effect sizes in the snippet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Adaptive Querying with AI Persona Priors
The paper introduces a finite-dictionary AI persona latent variable model for adaptive querying under tight query budgets, using closed-form posterior updates and finite-mixture predictions for sequential item selection, with experiments on synthetic data and WorldValuesBench; the abstract does not disclose dictionary size, query budget values, or model names.
#Reasoning#Benchmarking#WorldValuesBench#Research release
why featured
HKR-K passes with a concrete mechanism and benchmark setting, while HKR-H and HKR-R are weak. This is useful evaluation research, not a product update or broad industry debate, so it sits in the 60-71 band.
editor take
The paper uses finite persona priors with closed-form updates; dictionary size, budgets, and model names are undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling
The paper proposes SMET for LLM training, stabilizing Dynamic Sparse Training with optimizer warm-up and density-aware learning-rate scaling while storing gradients and optimizer states only for active parameters.
#Fine-tuning#Inference-opt#SMET#Research release
why featured
HKR-K and HKR-R pass: SMET names concrete DST stability mechanisms and active-parameter state storage. No savings ratio, scale result, or repo detail is disclosed, so the technical paper stays in all.
editor take
SMET stores gradients and optimizer state only for active weights; without scale numbers disclosed, I file it as a sparse-training patch.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
GNMR: Runtime Stability Control for Low-Precision Large Language Model Training
The paper presents GNMR, a lightweight controller that compares each recoverable unit’s current gradient norm with its historical mean, then applies sparse bounded recovery under a hard maxO budget and short lock interval without changing numerical format, kernels, or backend recipe.
#Fine-tuning#Inference-opt#GNMR#DeepSeek
why featured
HKR-K and HKR-R pass: GNMR offers a concrete risk-detection and sparse-recovery mechanism for low-precision training. No experiment numbers, model scale, or artifact are disclosed, so the narrow training-infra angle stays in all.
editor take
GNMR gates low-precision training risk via gradient-norm ratios; maxO values are undisclosed, so backend-agnostic claims get a discount.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
d2: Improving Reasoning in Diffusion Language Models via Trajectory Likelihood Estimation
d2 introduces a policy-gradient reasoning framework for masked diffusion language models using sampling-trajectory likelihood estimates; d2-AnyOrder obtains exact trajectory likelihood in one model pass when any-order decoding is supported, while d2-StepMerge approximates likelihood for standard masked diffusion models with an analytic compute-accuracy tradeoff.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K passes because the paper states a concrete trajectory-likelihood method and two variants for masked DLMs. HKR-H/R are weak, and the topic is specialist, so it stays in all below featured.
editor take
d2 estimates trajectory likelihood in one pass; DLM reasoning now hinges on trainability, not generation quality.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Position: Stop Preaching and Start Practising Data Frugality for Responsible AI Development
The position paper urges the machine learning community to practice data frugality, using ImageNet-1K downstream use to estimate energy consumption and carbon emissions, but the RSS snippet does not disclose the specific figures or experimental settings.
#Benchmarking#Safety#ImageNet-1K#Research release
why featured
HKR-H and HKR-R pass: the title has conflict and the topic fits responsible AI. HKR-K is weak because energy and carbon values are not disclosed, so this stays in the mid interesting band.
editor take
The paper uses ImageNet-1K to estimate carbon costs, but discloses no figures; data frugality is right, auditability is the test.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K0·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference
The paper proposes TACG and GESR for multi-task MoE inference, using task-family co-activation traces, exact GPU capacity constraints, and selective replication of generic experts; experiments on three open-source MoE models reduce average communication cost by 31.39% over the baseline while preserving a 0.9975 average Jain fairness index.
#Inference-opt#Research release
why featured
HKR-K is solid: named methods, test setting, and cost-reduction numbers. HKR-R is limited to MoE serving costs; with no product tie-in or broad industry implication, this stays in the lower research-release band.
editor take
TACG cuts communication 31.39% on three MoEs; I buy task-conditioned placement, but GESR replication is the production risk.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Iterated Population Based Training with Task-Agnostic Restarts
The paper introduces IPBT, a Population Based Training variant that automatically adjusts the interval between hyperparameter updates via task-agnostic restarts, reused weights, and time-varying Bayesian optimization; on 8 image classification and reinforcement learning tasks, it matches or outperforms 5 prior PBT variants plus random search, ASHA, and SMAC3 on average without increasing the budget.
#Fine-tuning#Benchmarking#IPBT#ASHA
why featured
HKR-K passes on a concrete method and 8-task comparison; HKR-H and HKR-R miss because the title is academic and the impact is narrow. No hard-exclusion rule fires, so this lands in the lower research-method band.
editor take
IPBT beats 5 PBT variants on 8 tasks; small sample, but auto-tuning PBT update intervals is the useful bit.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Collaborative Attention Reconstruction Improves Multimodal Embedding Quality
CoCoA adds collaborative attention and an EOS-based reconstruction task to Qwen2-VL and Qwen2.5-VL, and MMEB-V1 experiments show improved multimodal embedding quality; the abstract does not disclose exact score gains.
#Multimodal#Embedding#Vision#Jiahan Chen
why featured
HKR-K passes: CoCoA adds collaborative attention and EOS reconstruction on Qwen2-VL/Qwen2.5-VL with MMEB-V1 evaluation. HKR-H and HKR-R are weak, and the excerpt gives no gain numbers, so this stays in the all tier.
editor take
CoCoA changes Qwen2-VL attention and EOS reconstruction; no scores disclosed, so I read it as pre-contrastive embedding prep.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Concept Heterogeneity-aware Representation Steering
CHaRS models representation steering as discrete optimal transport between semantic latent clusters, then uses barycentric projection to produce an input-dependent steering map. The paper says this kernel-weighted cluster-shift method outperforms single global steering directions across multiple experimental settings, but the RSS snippet does not disclose benchmark names or numeric gains.
#Alignment#Inference-opt#Research release
why featured
HKR-K passes: CHaRS uses semantic-cluster discrete optimal transport and barycentric projection to build input-dependent steering maps, with claimed gains over global directions. HKR-H/R are weak; the impact stays niche.
editor take
CHaRS swaps global steering for discrete OT; benchmarks and gains are undisclosed, so file it under activation editing getting less lazy.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design
Design-MLLM optimizes interior design generation with reinforcement alignment, using three mechanisms: programmatic spatial constraint checks, aesthetic scoring only among feasible candidates, and group-relative optimization for stable preference signals.
#Multimodal#Alignment#Reasoning#Yuxuan Yang
why featured
HKR-K passes because the paper names a concrete 3-part alignment setup. HKR-H/R are weak: the interior-design vertical lacks a broader practitioner hook, so this sits in the low 60s.
editor take
Design-MLLM splits constraint checks, aesthetic scoring, and group-relative optimization into 3 steps; I buy the direction, but gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Semantic Retrieval for Product Search in E-Commerce
The paper presents a Siamese LLM dual-encoder for e-commerce semantic retrieval, using two training stages: contrastive learning with a false-negative margin mask in Stage 1, and ROAR preference optimization over graded relevance groups in Stage 2.
#RAG#Embedding#Fine-tuning#Research release
why featured
HKR-K passes with testable mechanisms: a Siamese LLM dual encoder, boundary-mask contrastive learning, and ROAR graded preference optimization. HKR-H/R are weak because the title is academic and the audience fit is narrow.
editor take
Siamese LLM dual-encoder uses two-stage training; A/B numbers are undisclosed, and ROAR substitute ranking beats the model-name pitch.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
You Can Learn Tokenization End-to-End with Reinforcement Learning
The paper learns discrete token boundaries with score function estimates and reports qualitative and quantitative gains over straight-through estimates at the 100 million parameter scale.
#Reasoning#Fine-tuning#Research release
why featured
HKR-H and HKR-K pass: the title has a real hook, and the post gives score-function estimates, discrete token boundaries, and a 100M-parameter comparison. It remains a narrow training-method paper, so it stays in the 60–71 band.
editor take
The paper learns token boundaries at 100M params with score-function estimates; RL variance control makes hardcoded BPE look lazier.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
SemImage: HSV-Based Semantic Image Encoding for Disentangled Text Representation
SemImage encodes a document as a 2D semantic image: each word maps to a pixel, sentences map to rows, and HSV channels represent topic, sentiment, and intensity or certainty.
#Vision#Interpretability#Benchmarking#SemImage
why featured
HKR-H/K pass: the title and summary give a concrete text-to-HSV-image mechanism. No benchmark numbers, open artifact, or product implication are disclosed, so this stays in all.
editor take
SemImage maps words to pixels and HSV to 3 semantic channels; no accuracy table disclosed, so I’d file it under interpretability.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Multi-Objective Reference-Aligned Machine Unlearning
RAUL replaces unbounded loss maximization with bounded KL alignment toward a reference distribution and uses Jacobian descent to aggregate non-conflicting gradients; the abstract says RAUL achieves the closest gap to full retraining, but the snippet does not disclose datasets, model sizes, or numeric results.
#Fine-tuning#Alignment#Research release
why featured
HKR-K passes: RAUL provides concrete mechanisms and a claim of the smallest gap to full retraining. HKR-H/R are weak, and the item is a single arXiv research release, so it stays in all.
editor take
RAUL uses bounded KL plus Jacobian descent for unlearning; datasets and numbers are undisclosed, so don't buy the retraining-gap claim yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
ChronosAD: Leveraging Time Series Foundation Models for Accurate Anomaly Detection
ChronosAD uses a time-series foundation model to extract zero-shot embeddings, then refines them with a Temporal Block combining BiLSTM and Multi-Head Attention; across 11 benchmarks, it reports average gains of 4.72% in AUC and 6.60% in AP over existing methods.
#Embedding#Benchmarking#ChronosAD#Intelligolabs
why featured
HKR-K passes via a concrete mechanism and gains on 11 benchmarks. HKR-H/R are weak because this is a niche time-series anomaly-detection paper, so it sits in the 60-71 band rather than featured.
editor take
ChronosAD reports +4.72% AUC and +6.60% AP on 11 benchmarks; I want leave-one-domain-out results, not averaged comfort.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
LLMSynthor: Macro-Aligned Micro-Records Synthesis with Large Language Models
LLMSynthor uses a pretrained LLM as a macro-aware simulator, iteratively generating record batches to reduce discrepancies between synthetic aggregates and target statistics across mobility, e-commerce, and population domains.
#Agent#LLMSynthor#Research release
why featured
This is a synthetic-data paper with HKR-K: LLMs batch-generate micro-records aligned to aggregate targets. No metrics, datasets, or artifact are disclosed, so HKR-H/R fail and it stays in all rather than featured.
editor take
LLMSynthor batch-generates micro-records across 3 domains; treating an LLM as a nonparametric copula is neat, but metrics are undisclosed.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Feature Alignment Determines Fusion Strategy: Cross-Attention vs. Concatenation in Multimodal Learning
The paper compares cross-attention and concatenation on Flickr8k, showing concatenation leads by 4.1-5.1 percentage points across 2,048-16,384 samples when CLIP features are pre-aligned by vision-language pretraining.
#Multimodal#Vision#Benchmarking#arXiv
why featured
HKR-K passes because the paper gives a testable dataset, sample range, and accuracy gap. HKR-H and HKR-R are weak, so this stays in all as a narrow but useful multimodal methods item.
editor take
On Flickr8k, CLIP-feature concatenation wins by 4.1–5.1 points; don’t default to cross-attention when alignment is already paid for.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning
The paper proposes LMAC, an LLM-driven communication protocol for cooperative multi-agent reinforcement learning, using a state-awareness criterion to refine messages. The abstract says LMAC improves state reconstruction and performance over prior communication baselines, but the post does not disclose benchmark names, effect sizes, or model details.
#Agent#Reasoning#arXiv#Research release
why featured
HKR-K passes because LMAC uses an LLM to design MARL communication protocols. The post names no benchmarks or numbers, and the topic is narrow, so it stays in the lower research-release band.
editor take
LMAC uses an LLM to design MARL communication; benchmarks, gains, and model details are missing, so I’d file it as an idea, not evidence.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
SHARP: Sleep-based Hierarchical Accelerated Replay for Long-Range Non-Stationary Temporal Pattern Recognition
SHARP splits streaming temporal learning into a memory module and a pattern-recognition module, using offline sleep phases to replay structured traces and reporting improved recurrent-baseline performance on text8 and PG-19 with linearly scaled compute cost.
#Memory#Reasoning#Benchmarking#SHARP
why featured
HKR-K passes via a concrete mechanism and benchmarks, but gains, code, and reproducibility details are not disclosed. HKR-H and HKR-R are weak, so this stays in all as a routine arXiv research item.
editor take
SHARP claims linear-cost context growth on text8 and PG-19; only the abstract is disclosed, with no baseline sizes or gains.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
BRo-JEPA: Learning Modular Arithmetic in Latent Space
BRo-JEPA tests abstract rule learning with MNIST digits as states and modulo-10 operations as actions. Its ResNet-based JEPA block-rotation model reaches 99.46% zero-shot accuracy and 99.46% rollout accuracy, while additive-operation JEPA baselines fail on unseen operations.
#Reasoning#Benchmarking#Research release#Open source
why featured
HKR-K passes on the 99.46% zero-shot/rollout result and ResNet JEPA block-rotation mechanism. HKR-H/R are weak because this is a narrow arXiv representation-learning paper with no product or engineering pull.
editor take
BRo-JEPA hits 99.46% zero-shot on MNIST mod-10; hard-coded cyclic structure won, so don’t sell this as general symbolic reasoning.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Why Are DMD Students Lazy? Understanding Copying Behavior in Few-Step Distillation
The paper analyzes copying in DMD few-step distillation: high-dimensional student models reproduce the teacher’s original noise-data pairings, and the authors attribute the behavior to limited geometric freedom during high-dimensional distillation rather than adversarial objectives or teacher memorization.
#Fine-tuning#Vision#Research release
why featured
HKR-H and HKR-K pass: the title has a clear twist, and the summary gives a mechanism for copying in few-step distillation. The DMD geometry angle is specialist research, so this stays below featured.
editor take
DMD students copy teacher noise-data pairings in high-dimensional few-step distillation; latent remapping freedom takes a real hit if this holds.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning
The paper proposes GraphGPO, which aggregates all rollout trajectories into one state-transition graph and assigns edge credit by estimating how much each transition reduces the distance to the task goal.
#Agent#Reasoning#GraphGPO#Research release
why featured
HKR-K passes on a concrete credit-assignment mechanism; HKR-H is weak and HKR-R is niche. No metrics, benchmark tasks, or code are disclosed, so this stays near the top of the low-value research band.
editor take
GraphGPO scores edges in a rollout graph; no benchmark numbers are disclosed, so I’d treat the SOTA claim as unverified.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction
OncoReason trains autoregressive LLMs on MSK-CHORD for binary survival classification, continuous survival-time regression, and rationale generation; CoT raises F1 by 6.0 and cuts MAE by 12%, while GRPO improves interpretability and prediction across BLEU, ROUGE, and BERTScore.
#Reasoning#Fine-tuning#Alignment#OncoReason
why featured
HKR-K is solid and HKR-R is limited: MSK-CHORD, multitask training, CoT, and GRPO give concrete data. The oncology-survival niche lacks product, open-source, or adoption signals, so it stays in the upper low-value band.
editor take
OncoReason lifts F1 6 points and cuts MAE 12% on MSK-CHORD; clinical LLMs need auditable reasoning, not answer theater.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness
The paper introduces Spatial Similarity Distance Correlation to measure spatial structure in ViT token representations and compares learned absolute, sinusoidal, and rotary positional encodings under content-disrupting distribution shifts.
#Vision#Benchmarking#Research release
why featured
HKR-K passes: SSDC plus a three-way positional-encoding robustness comparison gives a testable claim. HKR-H and HKR-R are weak; this is useful niche ViT research, so it fits all, not featured.
editor take
SSDC shows three ViT PE families stabilize index anchors; don’t oversell the mechanism until token-permutation stress tests replicate.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing
The paper proposes SAGC, a dynamic group-size controller that adjusts synchronous GRPO and DAPO training groups online based on rollout behavior; the abstract says it reduces straggler incidence and improves wall-clock efficiency, but the post does not disclose numerical gains.
#Reasoning#Alignment#Research release
why featured
HKR-K passes because SAGC is a testable training mechanism, but no speedup numbers are disclosed. The title is niche systems jargon, so this stays below featured rather than triggering a hard exclusion.
editor take
SAGC tunes GRPO/DAPO group size online; no gain numbers disclosed, but straggler control is a real sync-RL bottleneck.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
HyperDet: 3D Object Detection with Hyper 4D Radar Point Clouds
HyperDet improves radar-only 3D object detection on two public surround-view 4D radar datasets by building task-aware hyper 4D radar point clouds, using LiDAR-guided pseudo-radar supervision only during training, and requiring radar input alone at inference.
#Vision#Robotics#Benchmarking#HyperDet
why featured
HKR-K passes via a concrete training/inference setup and 2 public datasets. HKR-H and HKR-R are weak, and 3D radar point-cloud detection is narrow for a general AI-practitioner feed.
editor take
HyperDet improves on 2 public 4D radar sets; LiDAR stays training-only, so this smells more practical than another detector head.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Guidance for Low-Level Perceptual Editing in Unconditional Diffusion Models
The paper introduces a training-free image-editing framework for unconditional diffusion models, using degradation concept vectors, bottleneck patching, and classifier-free guidance at inference time to steer samples away from degraded manifolds and improve low-level perceptual quality.
#Vision#Inference-opt#Research release
why featured
HKR-K passes because the paper names a testable training-free diffusion-editing mechanism. HKR-H and HKR-R are weak, and the low-level vision focus keeps it below featured.
editor take
The paper says h-space patching fails global low-level edits; I buy the problem, but “consistent improvement” lacks disclosed benchmarks.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation
The paper introduces MAVL, a multilingual multimodal benchmark for singable animated-song lyric translation, and proposes SylAVL-CoT with audio-video cues and syllable constraints; the RSS snippet does not disclose dataset size, language count, or concrete evaluation scores.
#Multimodal#Audio#Benchmarking#MAVL
why featured
HKR-H passes on the unusual animated-song translation angle, but HKR-K lacks scale, languages, or scores and HKR-R is weak. This stays in all as a niche research item.
editor take
MAVL names the task and method, but omits size, languages, and scores; song translation needs benchmarks, but “first” needs proof.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K0·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Private and Stable Test-Time Adaptation with Differential Privacy
The paper reformulates Tent, EATA, SAR, DeYO, and COME as DP-TTA methods using per-sample gradient clipping and Gaussian noise, and reports adequate privacy on ImageNet-C with a small accuracy cost and modest computational overhead.
#Fine-tuning#Safety#Vision#Research release
why featured
HKR-K passes for concrete DP-TTA mechanics and ImageNet-C evaluation; HKR-H and HKR-R are weak. The topic is research-heavy and product impact is not disclosed, so it stays below featured.
editor take
DP-TTA covers Tent through COME on ImageNet-C; epsilon and accuracy deltas are undisclosed, so privacy is not free here.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting
FinTSB introduces a benchmark for financial time series forecasting with four stock movement pattern categories and standardized metrics across three dimensions. The benchmark models trading constraints such as transaction fees, and the code is available in the TongjiFinLab GitHub repository.
#Benchmarking#TongjiFinLab#Benchmark#Research release
why featured
HKR-K passes: FinTSB offers a financial time-series benchmark with 4 trend patterns, 3 evaluation dimensions, and open code. HKR-H and HKR-R are weak, so it stays below featured.
editor take
FinTSB covers 4 movement types, 3 metric dimensions, and fees; finance forecasting benchmarks need this anti-backtest-bloat pressure.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RobustModelMaker: Coupling Bootstrap Stability Selection with Leakage-Safe Nested Cross-Validation for Scientific Machine Learning
RobustModelMaker couples bootstrap stability selection with strict nested cross-validation, keeping preprocessing and selection inside each fold, and supports nine algorithms across binary classification, multiclass classification, and regression. The paper verifies behavior with deterministic unit, performance, and reproducibility tests on three scientific datasets against ANOVA F-test, RFECV, and Boruta using predictive score and Jaccard stability.
#Benchmarking#RobustModelMaker#PLCO Trial#UCI
why featured
HKR-K and HKR-R pass: the post gives a leakage-safe nested-CV mechanism plus 9 algorithms and 3 task types. It remains a niche scientific-ML methods paper, not a product or model release.
editor take
RobustModelMaker supports 9 algorithms and 3 task types; I buy the leakage discipline, but 3 datasets don’t prove framework status.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks
The authors release TabPrep, a lightweight preprocessing pipeline with feature generators for three structural data patterns, and report consistent gains on TabArena across tree-based, neural, linear, and foundation models, with the arXiv snippet not disclosing exact score deltas or dataset counts.
#Benchmarking#TabPrep#TabArena#Research release
why featured
HKR-K passes via the three structural-pattern generators in TabArena, but HKR-H is a standard benchmark-paper hook and HKR-R is narrow. No hard exclusion, yet missing gain numbers keeps it below the interesting-news band.
editor take
TabPrep claims gains across four model families on TabArena; no deltas disclosed, so don’t confuse preprocessing lift with architecture progress.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Why Do Time Series Models Need Long Context Windows?
The paper splits grouped time-series forecasting into generative process identification and conditional forecasting, then proves that even when a process has memory length P, the input window must be strictly larger than P to reach the minimum attainable error.
#Reasoning#Benchmarking#Research release
why featured
HKR-K passes because the paper states a concrete condition for long-context necessity. The topic is narrow time-series theory with no product impact or industry debate, so it stays in the low-value research band.
editor take
The paper proves windows must exceed memory P; long context earns its keep by identifying the generator, not dependency length.
HKR breakdown
hook knowledge resonance
open source
57
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CART: Context-Anchored Recurrent Transformer with Learned Stability
CART reuses one shared core block R times and freezes K/V from a multi-layer prelude; across 36 configurations trained for 30,500 steps, its LTI gate kept spectral radius at 0.79-0.83, but at d=1024 it failed to beat a parameter-matched dense baseline.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes with CART’s shared core, frozen K/V, and 0.79-0.83 spectral-radius result. HKR-H and HKR-R are weak because the d=1024 matched test did not beat the dense baseline.
editor take
CART holds rho=0.79-0.83 across 36 runs, then loses 1-10% to dense at d=1024; recurrence efficiency still owes proof.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
GuidaPA: Privacy-Preserving Chatbot for Public Administration via Federated Learning
GuidaPA trains an Italian public-administration chatbot with 15 federated QLoRA rounds on about 8 SIGESON pages and 31 SIDFORS manual/FAQ pages, reporting a best federated model with ROUGE-1/2/L of 61.10/55.77/59.44, BLEU-4 of 45.02, and METEOR of 63.94 while keeping data on-site.
#Fine-tuning#Safety#RAG#GuidaPA
why featured
HKR-K is solid and HKR-R is moderate: the paper gives federated QLoRA setup, corpus sizes, and ROUGE scores. The scope is narrow public administration research with no product adoption or broader industry impact.
editor take
GuidaPA runs 15 federated QLoRA rounds on 39 pages; the metrics look nice, but this is a compliance demo, not PA chatbot proof.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Density-Aware Translation of Spurious Correlations in Zero-Shot VLMs
The paper proposes Density-Aware Translation, which rescales CLIP image-text similarity with a local density term from group reference sets; the abstract reports improved worst-group and average accuracy on benchmarks, but does not disclose exact numbers.
#Multimodal#Vision#Benchmarking#Research release
why featured
HKR-K/R pass: DAT adds a local-density recalibration mechanism for CLIP and targets VLM robustness. HKR-H fails, and the abstract gives no gain numbers, so this stays in 40-59.
editor take
DAT rescales CLIP similarity with group-set density; no gains disclosed, so I’d file it as calibration patchwork.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Human in the Loop Adaptive Optimization for Improved Time Series Forecasting
The paper introduces a post-training adaptive optimization framework that corrects time-series forecast outputs using reinforcement learning, contextual bandits, or genetic algorithms, and reports consistent accuracy gains across electricity, weather, and traffic benchmarks with minimal computational overhead.
#Agent#Reasoning#Tools#Research release
why featured
HKR-K passes with a concrete post-training optimization mechanism and electricity, weather, and traffic benchmarks. HKR-H and HKR-R are weak, so this is a browseable research item, not featured.
editor take
The paper uses RL, bandits, and genetic algorithms for post-hoc forecast correction; no gain numbers disclosed, so I file it as calibration plumbing.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
NestRL: A Nested Training Regime for Mutual Adaptation in Human-AI Teaming
The paper proposes NestRL, a finite-level I-POMDP nested training regime for human-AI teaming, and evaluates it in Overcooked against state-of-the-art baselines; the snippet says it improves performance with unseen adaptive agents and real human teammates, but does not disclose sample sizes or scores.
#Agent#Reasoning#Benchmarking#NestRL
why featured
HKR-K passes via a concrete training mechanism and Overcooked benchmark. HKR-H/R miss: no metrics, sample size, production tie-in, or practitioner pain point, so it stays in the lower research-signal band.
editor take
NestRL gives Overcooked plus I-POMDP mechanics, but no sample size or scores; don't trust the human-teammate win yet.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
COLLIE: Guiding Skill Discovery in Semantically Coherent Latent Space
COLLIE builds a semantically coherent skill latent space from dense unsupervised data, uses sparse online feedback to create training-free guidance signals, and reports better downstream performance across state-based and pixel-based tasks while reducing hazardous behaviors.
#Robotics#Alignment#Reasoning#COLLIE
why featured
HKR-K and HKR-R pass, but the item gives only title-level claims and a summary mechanism, with no code, benchmark numbers, or reproducible setup; RL skill discovery is narrow, so it stays in the lower research band.
editor take
COLLIE turns sparse online feedback into training-free guidance across state and pixel tasks; hazard reduction sounds good, but the abstract gives no rate.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R1
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks
ISOMORPH introduces the first public digital twin of a multi-echelon logistics network, releasing datasets at catalogue scales C=50 and C=200, with six scenario sweeps and 20 Latin-hypercube perturbations for time-series forecasting benchmarks.
#Benchmarking#ISOMORPH#Chronos#TimesFM
why featured
HKR-K passes via concrete benchmark settings; HKR-H/R fail because the angle is a niche supply-chain forecasting paper, not a model or product update. No hard exclusion, but audience fit stays low.
editor take
ISOMORPH adds C=50/200 logistics twins to TSF; useful benchmark, but “MASE above GIFT-Eval” is a soft flex.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Introduction to Graph Neural Networks for Machine Learning Engineers
arXiv:2412.19419v2 presents a graph neural network survey for machine learning engineers, using an encoder-decoder framework and experiments on homogeneous graphs to examine training size, graph complexity, oversmoothing, and oversquashing.
#Benchmarking#arXiv#Research release
why featured
HKR-K passes because it gives ML engineers a GNN framing and two concrete failure modes. HKR-H/R fail: this is an arXiv survey v2, not a new model, tool, or reproducible breakthrough.
editor take
arXiv:2412.19419v2 sticks to homogeneous graphs; useful GNN catch-up for ML engineers, but hetero and dynamic graphs are undisclosed.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Context-Aware Child-Directed Speech Detection from Long-Form Recordings
The authors fine-tuned six self-supervised models on a multilingual dataset of 182 children and found that adding surrounding context improved average F1 by 13.8 absolute points.
#Audio#Fine-tuning#Benchmarking#arXiv
why featured
HKR-K passes via sample size, model count, and F1 gain. HKR-H/R are weak: this is specialized speech research, far from mainstream AI products or practitioner concerns, so it stays in the lower research-news band.
editor take
Six SSL models gain 13.8 F1 points with context on 182 children; isolated utterance benchmarks look too clean here.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Confidence-Adaptive SwiGLU for Mixture-of-Experts
The paper proposes κ-SwiGLU for MoE models, making expert gate sharpness a learnable function of token-level router logits, and evaluates it on FineWeb-Edu across 8- to 28-layer Transformer MoE models with negligible parameter growth and small compute overhead.
#Inference-opt#Benchmarking#FineWeb-Edu#Research release
why featured
HKR-K passes via a concrete mechanism and FineWeb-Edu 8-28-layer test setup. HKR-H/R fail because the title is specialist and the post gives no gain numbers; no hard-exclusion rule triggered.
editor take
κ-SwiGLU tests 8–28-layer MoEs; CORE gains lack numbers, so the gate-sharpness idea is neat but underproven.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
DVD: Discrete Voxel Diffusion for 3D Generation and Editing
DVD models voxel occupancy as a native discrete variable for first-stage sparse voxel priors in SLat-based 3D generative pipelines, using predictive entropy to identify ambiguous voxel regions and block-structured perturbation fine-tuning to support inpainting and editing within a single sampling round.
#Multimodal#Vision#Fine-tuning#Research release
why featured
HKR-K passes on concrete modeling and uncertainty mechanisms. HKR-H/R fail, with no metrics, open artifact, or mainstream-tool impact, so this stays in low all territory.
editor take
DVD makes voxel occupancy discrete; I buy the angle, since threshold hacks in 3D generation needed killing.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Partial Fairness Awareness: Belief-Guided Strategic Mechanism for Strategic Agents
The paper proposes Partial Fairness Awareness, which releases a candidate set of fairness constraints while hiding the grounding constraint, and lets strategic agents iteratively update beliefs from system feedback; experiments on real-world and synthetic datasets report lower group fairness gaps and more stable outcomes than fully public or private regimes.
#Alignment#Benchmarking#Research release#Safety/alignment
why featured
HKR-K passes via a concrete fairness mechanism and real/synthetic experiments. HKR-H/R are weak: the paper is niche mechanism design, with no deployment case, benchmark number, or product impact disclosed.
editor take
PFA exposes only candidate fairness constraints. It beats full disclosure on manipulation, but sample size and feedback cost are undisclosed.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
TimeBlocks: Foundational and Continual Time-Series Blockbase -- Extended Version
The paper proposes TimeBlocks for time-series streams, using a pool of modular model blocks and an iterative routing strategy to build lightweight task-specific models, with StreamCore maintaining a small representative stream subset for continual calibration.
#Inference-opt#TimeBlocks#StreamCore#Research release
why featured
HKR-K passes because the paper states concrete TimeBlocks and StreamCore mechanisms. HKR-H/R miss: the title is academic, and the practical impact for general AI practitioners is narrow.
editor take
TimeBlocks uses modular blocks plus StreamCore for streams; metrics and latency are undisclosed, so the “foundational” label feels inflated.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment
AdvCL repurposes adversarial perturbations as a geometric control signal for continual learning, combining three plug-in modules—Intra-Smooth, Proto-Clip, and Inter-Align—while the abstract does not disclose specific datasets or numerical gains.
#Alignment#Safety#AdvCL#Research release
why featured
HKR-K passes because the paper reframes adversarial perturbations as a continual-learning control signal and names three modules. No datasets, gains, or reproduction conditions are disclosed, so HKR-H/R stay weak.
editor take
AdvCL offers 3 anti-forgetting plugins but no datasets or gains; adversarial noise as geometry control beats another vague CL loss.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Multi-Objective Reinforcement Learning for Tactical Decision Making for Trucks in Highway Traffic
The paper presents a PPO-based multi-objective reinforcement learning framework for tactical truck decisions in highway traffic, learning Pareto-optimal policies on a scalable simulation platform across three objectives: safety via collisions and completion, energy efficiency via energy cost, and time efficiency via driver cost.
#Robotics#Reasoning#Research release
why featured
HKR-K passes for the PPO multi-objective setup and 3-objective tradeoff. HKR-H and HKR-R fail; this is narrow simulation research with no deployment, code, or industry validation disclosed.
editor take
PPO optimizes three truck-driving objectives; no road tests disclosed, so the Pareto frontier is not deployment evidence.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
ChurnNet: An Optimized Modern AI for Churn Prediction
The study compares Random Forests, XGBoost, SVM, and a Unified Multi-Task Time Series Model for binary time-series churn prediction, finding that conventional methods perform better across multiple datasets and churn labeling techniques in predictive accuracy, data efficiency, and training or deployment resource needs.
#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass via the benchmark result: classic ML beats a unified multi-task time-series model across datasets and labels. Narrow churn-prediction scope and no product or agent impact keep it in the low-value research band.
editor take
ChurnNet compares RF, XGBoost, SVM, and UMTTSM; scores are undisclosed. Churn prediction still rewards feature work over temporal-model swagger.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
UrbanFusion integrates street-view imagery, remote sensing data, cartographic maps, and POIs through Stochastic Multimodal Fusion, and the paper evaluates the spatial representation model on 41 tasks across 56 cities worldwide.
#Multimodal#Vision#Embedding#UrbanFusion
why featured
HKR-K passes on method and evaluation scale, but HKR-H/R are weak. This is specialized geospatial representation research; the post gives no product, open-source, or reproducibility hook.
editor take
UrbanFusion reports 56 cities and 41 tasks; stochastic missing-modality training is the useful bit, not four encoders glued together.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
RefDiffNet: Learning to Expose Subtle PCB Defects Before Detection
RefDiffNet adds a reference-image enhancement block before the detector backbone and reports up to 18% relative mAP50:95 gain on HRIPCB and DeepPCB, with only 0.004–0.005M extra parameters and 0.7–0.8 GFLOPs across YOLOv8–YOLOv26, RT-DETR, and Faster R-CNN.
#Vision#Benchmarking#RefDiffNet#YOLOv8
why featured
HKR-K passes via a testable architecture change and benchmark gains; HKR-H/R are weak because the PCB-defect niche lacks a broader practitioner hook. No hard exclusion, but it sits in the lower research-release band.
editor take
RefDiffNet reports 18% mAP50:95 gain on two PCB sets for 0.005M params; the catch is aligned reference images.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
E4GEN: Event-level Explainable Extreme-Enhanced Time-series Generation
E4GEN uses an explainable diffusion framework for extreme event-aware time-series generation, with E-Activator, E-Predictor, and E-Control components, and the paper evaluates it on 6 datasets with 17 metrics across fidelity, extreme-event fidelity, and downstream utility.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: E4GEN describes a diffusion-based extreme-event time-series generator with 6 datasets and 17 metrics. HKR-H and HKR-R are weak because the topic is narrow research, so it stays in the lower band.
editor take
E4GEN reports 6 datasets and 17 metrics; I trust the extreme-event tests more than the explainable-diffusion label.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
All Models Are Wrong, Knowing Where Is Useful: On Model Uncertainty in Reinforcement Learning
The paper presents an uncertainty-aware MBRL framework that handles probabilistic model inaccuracies to mitigate model exploitation, and it discusses recent results in direct hardware learning and safe exploration; the abstract does not disclose benchmark scores, robot platforms, or implementation details.
#Robotics#Safety#Reasoning#Research release
why featured
HKR-K passes for the uncertainty-aware MBRL mechanism and safe-exploration setting. HKR-H/R are weak, with no metrics, code, or product path, so it stays in all.
editor take
The paper claims uncertainty-aware MBRL, but gives no benchmarks or hardware; I don't buy safe exploration on abstract-only evidence.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
A Lightweight Hybrid MLP Framework for Real-Time Phishing URL Detection Using Structural URL Features
The paper proposes a hybrid phishing URL detection framework combining blacklist screening with an MLP using 16 structural URL features, and reports 99.24% accuracy, 99.34% F1, 99.65% ROC-AUC, 1.2 ms per-URL latency, and 4,200 URLs per second on the 235,795-sample PhiUSIIL dataset.
#Benchmarking#CyberGuard#Research release#Benchmark
why featured
HKR-K passes via concrete feature count, dataset size, accuracy, and latency. HKR-H/R are weak because this is a narrow phishing-URL classifier, far from mainstream AI products or model competition.
editor take
CyberGuard reports 99.24% accuracy on 235,795 PhiUSIIL URLs; I don’t buy deployment claims without temporal or domain-shift tests.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Machine Learning-Based Bitcoin Trading Under Transaction Costs: Evidence From Walk-Forward Forecasting
The paper evaluates XGBoost, LSTM, and iTransformer on about 70,000 hourly BTC-USDT observations from 2018-2026 using a 27-fold walk-forward protocol; a 10-basis-point transaction cost breaks naive sign-based strategies, while a cost-aware execution filter restores profitability in selected configurations.
#Benchmarking#XGBoost#LSTM#iTransformer
why featured
HKR-H and HKR-K pass: the paper gives a dataset, cost condition, and a concrete strategy failure result. HKR-R is weak because quant backtesting is outside the main AI product/model agenda.
editor take
XGBoost tops 65% annualized on 70k hourly BTC bars, but 10 bps kills naive signals; iTransformer isn’t the star here.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H1·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Generic Interpretation Approach for Transformer Models Incorporating Heterogeneous Attention Structures
The paper proposes an interpretation method for Transformer models with heterogeneous attention structures, classifies attention by input source into homogeneous and heterogeneous types, and reports experiments that perform semantic and logical interpretation on representative models.
#Interpretability#Multimodal#Research release
why featured
HKR-K passes for a concrete interpretability mechanism, but there are no numbers, artifacts, or industry implications. The academic framing keeps it in the lower research-news band without a hard exclusion.
editor take
The paper gives a heterogeneous-attention taxonomy, but no benchmark details; without reproducible tests, I’d treat the interpretability claim lightly.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
MINES: Explainable Anomaly Detection through Web API Invariant Inference
MINES infers explainable invariants from API signatures and database table structures, then evaluates web-tamper attack detection on five benchmarks including TrainTicket, Gitea, Mastodon, and NextCloud; the abstract claims high recall and almost zero false positives but does not disclose exact recall numbers.
#Reasoning#Code#MINES#Gitea
why featured
HKR-K passes: MINES gives a concrete invariant-inference mechanism, named benchmarks, and a near-zero-false-positive claim. HKR-H/R are weak, and recall is not disclosed, so this stays in all.
editor take
MINES tests web tampering on 5 benchmarks, but recall is undisclosed; near-zero false positives sound nice, LLM-made invariants need attack tests.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
IstGPT: LLM-based Anomaly Detection for Spatial-Temporal Graph in Industrial Systems
IstGPT uses LLMs and graph learning to detect anomalies in industrial spatial-temporal graphs, evaluates against 12 baselines on 9 datasets, and reports the highest F1-scores and eTaF1 across all datasets.
#RAG#Multimodal#Benchmarking#IstGPT
why featured
A specialized arXiv paper with HKR-K from its 9-dataset, 12-baseline benchmark claim. HKR-H and HKR-R miss, so it stays in the 40–59 low-value band.
editor take
IstGPT beats 12 baselines on 9 datasets; 6 are simulated, so real ICS replication matters more than the LLM wrapper.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Adaptive Order Policies for Masked Diffusion
The paper adds a lightweight policy network to masked diffusion models to learn token unmasking order, and evaluates it in 2 settings: training only the policy with a frozen denoiser, and jointly training the policy and denoiser with a weighted loss.
#Reasoning#Research release
why featured
HKR-K passes: the post states a lightweight policy network plus frozen-denoiser and joint-training settings. HKR-H/R are weak; no product angle or numeric result keeps it in the lower research-news band.
editor take
Adaptive Order Policies learns unmasking order with a lightweight policy net; no benchmark numbers disclosed, so don't crown it broadly yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Entropy Minimization without Model Collapse: Mitigating Prediction Bias in Medical Imaging
The paper proposes DSBR, a test-time bias-correcting objective evaluated on four medical-imaging datasets and ImageNet-C, which equalizes each predicted class’s contribution to unsupervised entropy minimization loss to reduce prediction bias and prevent model collapse under distribution shifts.
#Vision#Inference-opt#Safety#Research release
why featured
HKR-K passes on a concrete mechanism and evaluation setup; HKR-H/R are weak because this is a narrow methods paper. No hard exclusion, but product and industry relevance are limited, so it stays in the 40–59 band.
editor take
DSBR stabilizes test-time adaptation on 4 medical sets plus ImageNet-C; I buy this failure story over generic EM-collapse handwaving.
HKR breakdown
hook knowledge resonance
open source
51
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Interpretability in Deep Time Series Models Demands Semantic Alignment
The paper argues that interpretability for deep time series models should target semantic alignment, where predictions are expressed through end-user-meaningful variables and mediated by spatial and temporal mechanisms that preserve user-dependent constraints under temporal evolution.
#Interpretability#Research release
why featured
HKR-K passes on a testable semantic-alignment claim, but HKR-H and HKR-R fail: no numbers, artifact, model name, or product implication. This sits in low-value research-release territory, so tier is all.
editor take
This paper moves time-series interpretability to semantic variables, but discloses no experiments; useful framing, not a new method.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration
GeoSR-Bench evaluates super-resolution models with co-located remote-sensing image pairs from about 36,000 locations, spans 500 m to 0.6 m resolutions, and reports 270 experimental settings across 2 cross-platform SR tasks, 9 SR models, 3 downstream task models, and 5 downstream tasks per SR task.
#Vision#Benchmarking#GeoSR-Bench#Research release
why featured
HKR-K passes because GeoSR-Bench gives dataset scale, resolution range, and experiment count. HKR-H and HKR-R are weak; this is niche remote-sensing SR evaluation, so it stays in all below featured.
editor take
GeoSR-Bench ran 270 settings; PSNR/SSIM often decouple from downstream gains, so remote-sensing SR needs task-first evals.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Learning Action-Conditional and Object-Centric Gaussian Splatting World Models for Rigid Objects
The paper proposes MRO-GWM, which represents multi-object 3D scenes with object-centric Gaussians and uses a spatio-temporal transformer to predict future rigid-body motion from object Gaussian histories and future actions.
#Agent#Robotics#Vision#Research release
why featured
HKR-K lands on a concrete mechanism, but HKR-H and HKR-R miss; the post gives no benchmark, code, or dataset, and Gaussian-splatting world models are niche for the general AI audience.
editor take
MRO-GWM predicts rigid-object dynamics with object-centric Gaussians; synthetic household scenes and sim MPC keep the robotics claim contained.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Domain Adaptation with a Single Vision-Language Embedding
The paper proposes PIN, a domain-adaptation framework that uses 1 target vision-language embedding to mine multiple visual styles, then evaluates zero-shot and one-shot unsupervised adaptation on semantic segmentation datasets including Cityscapes and ACDC.
#Vision#Multimodal#Fine-tuning#CLIP
why featured
HKR-K passes with a concrete PIN mechanism and Cityscapes/ACDC tests. HKR-H/R are weak because this is niche vision domain-adaptation research with limited practitioner resonance.
editor take
PIN adapts Cityscapes/ACDC from one CLIP target embedding; gains aren’t disclosed, so treat it as a low-target-data trick.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
TabChange: Precise Attribute Changes in Tabular Data
TabChange generates tabular counterfactuals by first measuring how strongly a target attribute relates to other attributes: it flips weakly related attributes directly and uses an adversarial framework for strong relationships to remove target-attribute information from the latent space; experiments across 7 datasets report comparable naturalness, closer proximity to original instances, more valid counterfactuals, and fewer invalid counterfactuals than baselines.
#Fine-tuning#Benchmarking#TabChange#Research release
why featured
HKR-K passes via a concrete mechanism and 7-dataset evaluation; HKR-H and HKR-R are weak. The topic is narrow, with no product, open-source artifact, or adoption signal, so it sits in the upper low-value research range.
editor take
TabChange tests 7 tabular datasets; its split-by-correlation edit path is a clean fix for CVAE-style latent label leakage.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Understanding Identity Continuity in Thermal Video through Scene-Level Consistency
The paper adds an identity-repair backend to a YOLOv8 and SORT baseline, using conservative tracklet relinking to raise IDF1 on the PBVS Thermal Pedestrian MOT benchmark from 82.25 to 84.93 while preserving MOTA.
#Vision#Benchmarking#YOLOv8#SORT
why featured
HKR-K passes with a concrete identity-repair backend and IDF1 gain; HKR-H/R miss because thermal pedestrian MOT is a narrow incremental CV paper with limited broader practitioner pull.
editor take
YOLOv8+SORT gains 2.68 IDF1 with repair; for thermal MOT, fix fragmentation before piling on ReID.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Disentanglement-Based Equivariant Learning for Compositional VQA
The paper introduces DEAL for compositional VQA, using only ground-truth answers for supervision and evaluating visual and linguistic generalization on two benchmarks, CLEVR-CoGenT and GQA-SGL.
#Vision#Multimodal#Reasoning#Research release
why featured
HKR-K passes with a new framework and two benchmark conditions. HKR-H and HKR-R are weak: the title is academic, and compositional VQA generalization is too niche for featured.
editor take
DEAL uses answer-only supervision on CLEVR-CoGenT and GQA-SGL; scores are undisclosed, so I don’t buy the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation
CityTrajBench standardizes city-scale vehicle trajectory generation evaluation across 3 real-world urban datasets and heterogeneous generators, including statistical baselines, VAE, GAN, diffusion, and flow-matching models.
#Benchmarking#CityTrajBench#Research release#Benchmark
why featured
HKR-K passes because CityTrajBench discloses dataset count and model coverage. HKR-H and HKR-R fail: this is a narrow trajectory-generation benchmark with limited pull for general AI practitioners.
editor take
CityTrajBench covers 3 city datasets; Markov still holds coarse metrics, so diffusion is not the default answer here.
HKR breakdown
hook knowledge resonance
open source
47
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
UME: A Unified Meta-Generalization Framework for Cross-Domain ETA
UME uses a dual-branch architecture, hypernetwork-based meta-learning, and knowledge distillation for cross-domain ETA prediction, and the paper says it has been deployed on Meituan-keeta; the abstract does not disclose A/B test scale or exact performance gains.
#Fine-tuning#Meituan-keeta#Research release
why featured
HKR-K passes via concrete mechanisms and a production deployment clue; HKR-H/R fail because the angle is niche and lacks metrics. This is narrow applied ML, so it stays in the 40–59 band.
editor take
UME is deployed on Meituan-keeta, but A/B scale and gains are undisclosed; the cross-domain ETA cold-start idea is solid, evidence is thin.
HKR breakdown
hook knowledge resonance
open source
47
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Jointly Optimizing Debiased CTR and Uplift for Coupons Marketing: A Unified Causal Framework
The paper proposes UniMVT, a multi-valued treatment network that jointly reconstructs debiased base CTR and intensity-response curves for coupon marketing; the abstract reports experiments on synthetic and industrial datasets plus real-world A/B tests, but does not disclose dataset sizes, lift percentages, or production traffic scale.
#Benchmarking#UniMVT#Research release#Benchmark
why featured
HKR-K passes via UniMVT and real A/B-test evidence. HKR-H/R are weak because coupon marketing and causal recommender modeling are narrow, so this stays in all below featured.
editor take
UniMVT splits coupon CTR into base click and intensity response; A/B lift is claimed, but no sample size or lift is disclosed.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
VLBM: Variational Latent Basis Modeling for OOD-Robust Multivariate Time Series Forecasting
VLBM separates stable dynamics from OOD deviations in multivariate time-series forecasting and reports results on 12 benchmark tasks across transportation, weather, power systems, and other domains, with average MAE and MSE gains of 15.08% and 7.74% over the strongest baseline.
#Benchmarking#Research release#Open source#Benchmark
why featured
HKR-K passes on concrete benchmark deltas across 12 tasks. HKR-H and HKR-R are weak: this is a narrow time-series forecasting paper with no product or agent implication, so it stays in the low-value research band.
editor take
VLBM cuts MAE 15.08% across 12 tasks; the subspace-residual split is credible, pending the new OOD traffic set details.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction
The paper proposes an auxiliary reconstruction module that recovers input states from encoded representations, improving existing neural algorithmic reasoning architectures on standard benchmarks; the RSS snippet does not disclose benchmark names, model settings, or numerical gains.
#Reasoning#Benchmarking#Research release
why featured
HKR-K passes: the paper offers a testable auxiliary-reconstruction mechanism. HKR-H/R fail because the title is academic, gains and benchmark details are undisclosed, and the topic is narrow research.
editor take
The paper adds auxiliary reconstruction to the encoder, but discloses no gains; I buy the angle, but NAR has over-blamed processors.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain
The paper introduces SSNA and the Noise Adaptation Framework, using a synthetic noise domain to improve semi-supervised target-domain generalization; the abstract says code is available on GitHub but does not disclose dataset counts or concrete performance numbers.
#Fine-tuning#Benchmarking#AIResearch-Group#Research release
why featured
HKR-K passes because the post names a new task, framework, and open code. HKR-H/R fail: no metrics, scale, or practitioner-facing hook, so it stays in the lower-value band.
editor take
SSNA uses Gaussian noise as source domain; no datasets or gains disclosed, so I file it as a semi-supervised trick.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Machine Learning for Coding Retail Product Names to Consumer-Price Categories
The paper maps noisy retail product names to consumer-price categories with normalization, trie rules, and per-category binary confirmation; in a leakage-free one-category study with real positives, hard negatives, and five seeds, bag-of-words reached about 0.99 F1, a linear classifier matched an MLP, n-grams added nothing, and about 67 labeled examples were enough.
#Fine-tuning#Benchmarking#arXiv#UN COICOP
why featured
HKR-K passes on method and numbers, but HKR-H/R fail. This is narrow applied statistics with no agent, product, or model-ecosystem impact, so it stays in the low-value research band.
editor take
Bag-of-words hits ~0.99 F1 with 67 labels; using an MLP here smells like credentialed overkill.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Decision-Path Patterns as Tree Reliability Signals: Path-Based Adaptive Weighting for Random Forest Classification
The paper proposes using each random-forest tree’s root-to-leaf decision path as an instance-level reliability signal, and reports statistically significant accuracy gains over RF on 36 binary classification benchmarks with Wilcoxon p<0.0001.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a concrete RF weighting mechanism and 36-benchmark result. HKR-H/R fail: the angle is niche classical ML, with weak relevance to LLM/product practice, so it sits in the low-value research band.
editor take
Path-level RF weighting reports p<0.0001 across 36 binary sets; +0.99pp is unsexy, but cleaner than another Transformer tweak.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection
UE-MCM detects action mistakes in egocentric video with two branches: a CLIP4CLIP-based small branch for workflow-level inconsistency and a Qwen3-VL Embedding large branch for fine-grained action errors, then fuses predictions through a lightweight collaboration gate.
#Vision#Multimodal#Benchmarking#Qwen
why featured
HKR-K passes because the summary gives UE-MCM’s dual-branch and lightweight gate mechanism. HKR-H/R are weak: this is a narrow vision paper with no product or agent implication, so it stays in the low-value research band.
editor take
UE-MCM combines CLIP4CLIP and Qwen3-VL, but reports no dataset or scores; without ablations, the long-tail gain is just a claim.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
SentimentLens: Reconciling Sentiment and Ratings via Dual-Modality in the Hospitality Sector
SentimentLens analyzes more than 10,000 public hotel reviews by combining aspect-based sentiment analysis, numerical ratings, importance-performance analysis, and entropy-based analysis to produce region-level, hotel-level, and category-level evaluations.
#RAG#SentimentLens#Research release
why featured
HKR-K passes on the 10k+ review dataset and dual-modality evaluation flow. HKR-H/R fail because it is a niche applied analytics paper with no model, product, or agent implications.
editor take
SentimentLens runs on 10K+ hotel reviews; the useful bit is plumbing sentiment, ratings, IPA, and entropy into ops tables.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking
The paper tests a first-order autoregressive ESN on monthly and quarterly univariate M4 time series, using separate parameter and forecast datasets and comparing MASE and sMAPE against ARIMA, ETS, Theta, and TBATS.
#Benchmarking#M4 Forecasting Competition#Research release#Benchmark
why featured
This is a narrow time-series benchmarking paper with concrete datasets, metrics, and baselines, so HKR-K passes. HKR-H and HKR-R miss because there is no product angle, model release, or practitioner debate hook.
editor take
ESN matches ARIMA/TBATS monthly and wins quarterly mean MASE; reservoir computing still steals cheap wins while Transformers hog the room.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization
The paper introduces a KAN-based BiGRU for Bangladeshi legal documents in Bengali, English, and transliterated Bengali, reporting 67.96% classification accuracy, 0.65 F1, and summarization ROUGE-1/2/L F1 scores of 0.38, 0.23, and 0.31.
#Reasoning#Manupatra#Research release#Benchmark
why featured
HKR-K passes because the paper gives classification and summarization metrics for KAN-BiGRU across three legal-document language forms. HKR-H and HKR-R are weak; this is a narrow model-tweak paper with no product or adoption signal.
editor take
KAN-BiGRU lifts accuracy from 57.34% to 67.96%; beating pretrained baselines in low-resource law is the useful signal.
HKR breakdown
hook knowledge resonance
open source
43
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Optimizing Accuracy and Diversity: A Multi-Task Approach to Forecast Combinations
The paper presents a multi-task deep learning approach for forecast combinations, using separate model selection and weight optimization modules, and evaluates point-forecast accuracy on M4 competition series and real road-traffic data.
#Benchmarking#arXiv#M4#Research release
why featured
HKR-K passes: the post names model-selection and weight-optimization modules plus M4 and road-traffic evaluations. HKR-H/R are weak, and the paper lacks product or industry impact.
editor take
Tested on M4 and road-traffic series; no gain size disclosed, so don't crown a weighting network as forecasting progress.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
Cluster Analysis with Resampling for Validation and Exploration (CARVE)
CARVE provides an open-source Python and R package that evaluates multiple clustering algorithms and hyperparameters, returning stability and generalizability diagnostics at global, cluster, and sample levels, and the paper reports near-optimal clustering recovery across six synthetic benchmarks where classical validation indices degrade.
#Benchmarking#Tools#CARVE#scikit-learn
why featured
HKR-K passes with an open-source package, evaluation mechanism, and 6 synthetic benchmarks. HKR-H/R fail; clustering validation is niche academic tooling, so it stays in the lower non-featured band.
editor take
CARVE claims near-optimal recovery on 6 synthetic benchmarks; without quantified omics results, don’t treat it as a Seurat answer machine.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
04:00
7d ago
arXiv · cs.LG· atomEN04:00 · 06·02
ES-Merging: Biological MLLM Merging via Embedding Space Signals
The paper proposes ES-Merging, a biological MLLM merging framework that estimates layer-wise and element-wise coefficients from coarse- and fine-grained embedding-space signals; the abstract says it outperforms existing merging methods on cross-modal reasoning and single-modal knowledge preservation, but the snippet does not disclose benchmark names, model names, or numerical results.
#Multimodal#Reasoning#Research release#Benchmark
why featured
HKR-K passes for the ES-Merging mechanism, but HKR-H/R fail and the body discloses no experimental numbers. No hard exclusion, but this is a narrow research abstract with limited audience pull.
editor take
ES-Merging derives merge coefficients from embeddings; benchmarks, models, and scores are undisclosed, so treat it as a heuristic upgrade.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0

more

feeds

admin