ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-05-19

490 papers · updated 3m ago
2026-05-19 · Tue
17:59
20d ago
arXiv · cs.AI· atomEN17:59 · 05·19
Atoms of Thought: Universal EEG Representation Learning with Microstates
The paper clusters continuous EEG from a large medical dataset into discrete microstate sequences, builds a universal microstate tokenizer, and evaluates it on three downstream tasks: sleep staging, emotion recognition, and motor imagery classification.
#Embedding#Interpretability#Research release
why featured
Triggers hard-exclusion-4: AI representation learning for medical EEG signals, with no agent, product, or industry implication disclosed. HKR-H/K pass on hook and mechanism, but audience fit is narrow.
editor take
Atoms of Thought clusters medical EEG into microstate tokens and beats time/frequency features on 3 tasks; I buy the route, but dataset scale is undisclosed.
HKR breakdown
hook knowledge resonance
open source
51
SCORE
H1·K1·R0
17:59
20d ago
arXiv · cs.CL· atomEN17:59 · 05·19
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-Aware Expert Offload
TIDE uses interval-based expert refresh to reduce I/O traffic in MoE diffusion LLM inference, delivering up to 1.4× and 1.5× throughput gains over prior baselines on LLaDA2.0-mini and LLaDA2.0-flash in a single GPU-CPU system.
#Inference-opt#TIDE#LLaDA#Research release
why featured
HKR-K/R pass: TIDE adds interval expert refresh and reports 1.4×/1.5× throughput on a single GPU-CPU setup, tying to inference cost. HKR-H misses; no open-source or production evidence is disclosed.
editor take
TIDE gets LLaDA2.0-mini to 1.4× throughput; I buy I/O-aware lossless tricks over model mystique here.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
17:58
20d ago
HuggingFace Papers (takara mirror)· rssEN17:58 · 05·19
From Seeing to Thinking: Decoupling Perception and Reasoning Improves VLM Post-Training
The paper splits VLM post-training into visual perception, visual reasoning, and textual reasoning stages, and experiments across multiple VLMs show staged training raises reasoning accuracy by 1.5% while shortening reasoning traces by 20.8% versus merged training.
#Vision#Reasoning#Fine-tuning#Research release
why featured
HKR-H/K/R pass, but the gains are incremental: +1.5% accuracy and 20.8% shorter reasoning traces. No open weights, major lab deployment, or cross-source cluster is disclosed, so it stays at the high end of 60–71.
editor take
Staged VLM post-training adds 1.5% accuracy and cuts traces 20.8%; stop worshipping long CoT before fixing perception.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:58
20d ago
arXiv · cs.CL· atomEN17:58 · 05·19
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning
ClinSeekAgent actively retrieves evidence from EHRs, medical knowledge bases, and imaging tools on ClinSeek-Bench, raising Claude Opus 4.6 multimodal F1 from 47.5 to 62.6 and improving all evaluated models across three CXR task groups.
#Agent#Multimodal#Tools#ClinSeekAgent
why featured
HKR-H and HKR-K pass: the mechanism is active retrieval over EHRs, medical KBs, and imaging tools, with Claude Opus 4.6 F1 rising from 47.5 to 62.6. The clinical vertical narrows reach, so it stays in all.
editor take
ClinSeekAgent lifts Claude Opus 4.6 multimodal F1 to 62.6; clinical agents are back to evidence hunting, not prompt polish.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
17:54
20d ago
arXiv · cs.AI· atomEN17:54 · 05·19
A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
The paper defines the stochastic-deterministic boundary as a four-part contract for production LLM agents, organizes runtime design into 3 concerns, and provides 6 composable patterns, a 5-step selection methodology, diagnostics for production failures, and 1 runnable reference implementation for a 90-day contract-renewal agent.
#Agent#Tools#Memory#Research release
why featured
HKR-K/R pass: it offers an agent-runtime taxonomy, patterns, and a reference implementation. HKR-H is weak, and a single arXiv methodology paper lacks validation numbers or open-source traction, so it stays in 60–71.
editor take
The paper gives a 4-part SDB contract and 6 patterns; I buy the framing—agent engineering needs failure-boundary language.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
17:51
20d ago
arXiv · cs.AI· atomEN17:51 · 05·19
HaorFloodAlert Research Presents 72-Hour Flood Prediction Model for Bangladesh Wetlands
HaorFloodAlert forecasts 72-hour flood probability for the roughly 8,000 km² Sunamganj Haor wetlands, using a deseasonalized RF/XGBoost ensemble and 77 Sentinel-1 events to reach 89.6% LOOCV accuracy, 87.5% recall, and 0.943 AUC-ROC.
#Benchmarking#HaorFloodAlert#Sentinel-1#BRRI
why featured
Hard-exclusion-4 applies: remote-sensing disaster science uses AI as a tool, with no agent or product implication. HKR-K has concrete metrics, but HKR-H/R fail, so the score is capped below 40.
editor take
HaorFloodAlert forecasts 72 hours ahead on 77 Sentinel-1 events; 89.6% LOOCV is thin, but removing seasonal leakage is the right instinct.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
17:46
20d ago
arXiv · cs.AI· atomEN17:46 · 05·19
Study Evaluates Visual Attribution Methods in Large Vision Language Models for Chest X-ray Reasoning
The paper evaluates visual attribution for chest X-ray CXR-VQA with a causal framework covering 11 attribution methods, six open-source LVLMs, and two output modes. It proposes MedFocus, which uses unbalanced optimal transport and targeted interventions for spatial, concept-level, and token-level attribution.
#Vision#Multimodal#Interpretability#MedFocus
why featured
HKR-K is clear through the concrete evaluation grid; HKR-R comes from attribution trust in medical LVLMs. The topic remains niche medical-imaging research, with no product or general-model impact disclosed.
editor take
MedFocus tests 11 attribution methods on 6 open LVLMs; causal counterfactual filtering beats another pretty heatmap.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
17:40
20d ago
arXiv · cs.AI· atomEN17:40 · 05·19
Less Back-and-Forth: A Comparative Study of Structured Prompting
The paper compares raw, checklist-improved, and clarifying-question prompts across summarization, planning, explanation, and coding tasks; checklist prompts scored 7.50/8 on average, above 5.67 for raw prompts and 6.67 for clarifying-question prompts.
#Reasoning#Code#Benchmarking#ChatGPT
why featured
HKR-H/K/R pass, but this is a single prompt-engineering comparison paper. The summary gives scores, not sample size, model versions, or full reproducibility, so it stays in the 60–71 band.
editor take
Checklist prompts scored 7.50/8 versus raw 5.67; sample size is undisclosed, so don't crown a prompting law yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
17:28
20d ago
HuggingFace Papers (takara mirror)· rssEN17:28 · 05·19
Repeating Smaller Datasets Accelerates Neural Network Learning via Sampling Biases
The paper studies the small-vs-large gap: repeating a smaller dataset can reduce training compute versus using a larger dataset under comparable tasks. The authors report the effect across algorithmic tasks, architectures, and optimizers, and attribute the speedup to sampling biases that enable layer-wise growth.
#Reasoning#Benchmarking#Research release
why featured
HKR-H/K/R all pass: the claim is counterintuitive, gives a sampling-bias mechanism, and touches training cost. Still, this is one training-dynamics paper without disclosed LLM-scale reproduction or production impact, so it stays at the top of 60–71.
editor take
Repeating smaller datasets cuts training compute; no multiplier disclosed. I buy the sampling-bias mechanism, not web-scale pretraining extrapolation.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:14
20d ago
arXiv · cs.AI· atomEN17:14 · 05·19
Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment
The authors introduce target-space recovery profiles to identify reproducible brain-response dimensions from repeated fMRI, then compare brain-to-brain and vision-model predictions on a Natural Scenes Dataset subset where 8 subjects viewed the same natural images.
#Vision#Interpretability#Benchmarking#Natural Scenes Dataset
why featured
HKR-K passes via a new fMRI-based evaluation framework, while HKR-H/R are weak. The story triggers hard-exclusion-technical-accessibility and science-crossover: no agent or product implication, so the score is capped below 40.
editor take
Nakamura et al. use 8 NSD subjects for recovery profiles; same-accuracy models diverge, so brain alignment needs more than prediction scores.
HKR breakdown
hook knowledge resonance
open source
51
SCORE
H0·K1·R0
17:08
20d ago
arXiv · cs.AI· atomEN17:08 · 05·19
Toto 2.0 releases five open-weight time series forecasting models
Toto 2.0 releases five Apache 2.0 open-weight forecasting models, using one training recipe that improves forecast quality from 4M to 2.5B parameters and sets state of the art on BOOM, GIFT-Eval, and TIME benchmarks.
#Benchmarking#Toto 2.0#Research release#Open source
why featured
HKR-H and HKR-K pass via 5 open-weight models, 4M–2.5B params, and 3 benchmark claims. The topic is still niche time-series forecasting with limited entity pull, so it stays in the 60–71 band.
editor take
Toto 2.0 ships 5 open models up to 2.5B; time-series forecasting is now eating scaling laws too.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
16:38
20d ago
arXiv · cs.CL· atomEN16:38 · 05·19
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation
BalanceRAG calibrates LLM-only and RAG fallback thresholds as points on a two-dimensional lattice, using sequential graphical testing to certify target risk. Experiments on three open-domain QA benchmarks across multiple LLM backbones report controlled risk, higher coverage, more accepted correct answers, and fewer unnecessary retrieval calls than always-on RAG.
#RAG#Benchmarking#Research release#Benchmark
why featured
HKR-K/R pass: the paper targets risk control and retrieval cost in cascaded RAG, tested on 3 QA benchmarks. HKR-H is weak, and the feed text gives no concrete cost-reduction number, so it stays in the normal research band.
editor take
BalanceRAG calibrates 2D thresholds on three QA benchmarks. Always-on RAG looks lazy when retrieval cost fits risk control.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
16:06
20d ago
HuggingFace Papers (takara mirror)· rssEN16:06 · 05·19
Language Mutations Sustain the Persistence of Conspiracy Theories on Social Media
The study analyzes a three-year dataset of conspiracy-related posts on X and finds that claims with greater semantic mutations have longer lifespans, including shifts in pronouns, social-reference words, cognitive-process terms, risk and health vocabulary, and actor-action-target categories.
#Safety#X#Research release#Safety/alignment
why featured
HKR-H and HKR-K pass: the causal hook is counterintuitive, and the post gives a 3-year X dataset claim. AI-industry relevance is thin, with no model or product mechanism, so it sits in the 60–71 band.
editor take
Three years of X data links semantic mutation to longer conspiracy lifespans; keyword moderation loses to simplification and assimilation.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
15:48
20d ago
HuggingFace Papers (takara mirror)· rssEN15:48 · 05·19
FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration
FlexDraft introduces a lossless speculative decoding framework with three mechanisms for different batch sizes: Attention Tuning tunes only final-layer attention projectors on mask tokens, Bonus-guided Calibration uses a lightweight MLP conditioned on the resolved bonus token, and Flex Decoding switches between parallel and sequential draft-verify modes while adjusting verification length by draft confidence.
#Inference-opt#FlexDraft#Research release
why featured
HKR-K and HKR-R pass: the paper names concrete decoding mechanisms tied to inference cost. HKR-H fails, and the post gives no speed, throughput, or memory numbers, so it stays mid-band all.
editor take
FlexDraft freezes the AR path and tunes final attention projectors; no throughput numbers disclosed, so it reads like an engineering patch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
15:24
20d ago
HuggingFace Papers (takara mirror)· rssEN15:24 · 05·19
InterLight: Leveraging Intrinsic Illumination Priors for Low-Light Image Enhancement
InterLight proposes an illumination-aware low-light image enhancement pipeline using physics-guided augmentation, adaptive prompts, luminance-gated intrinsic memory, and a self-supervised consistency objective; the RSS snippet says experiments cover multiple benchmarks but does not disclose benchmark names or scores.
#Vision#InterLight#Research release#Open source
why featured
HKR-K passes via concrete vision mechanisms; HKR-H/R fail because the title is academic and the audience impact is narrow. No hard exclusion, but this is niche CV research, so it sits in the 40–59 band.
editor take
InterLight open-sources an LLIE pipeline, but names zero benchmarks or scores; I’d test dark-region noise and color shift first.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
15:17
20d ago
HuggingFace Papers (takara mirror)· rssEN15:17 · 05·19
Your Neighbors Know: Argus Backdoor Detection Method for Decentralized Learning
The paper introduces Argus, a decentralized-learning backdoor detector where nodes share suspected triggers with neighbors and filter updates using structural similarity; across three standard datasets, Argus cuts attack success rates by up to 90 percentage points versus no defense while keeping utility within 5 points of an omniscient oracle.
#Safety#Benchmarking#Argus#Research release
why featured
HKR-H/K/R pass, but this is niche decentralized-learning security research. The mechanism and 3-dataset result give signal, yet it stays in the 60-71 band rather than featured.
editor take
Argus cuts ASR by up to 90 points on 3 datasets; the wild part is it improves as heterogeneity rises.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
14:54
20d ago
HuggingFace Papers (takara mirror)· rssEN14:54 · 05·19
What Are LLMs Doing to Scientific Communication? Measuring Changes in Writing Practices and Reading Experience
The study measures LLM-related changes in NLP scientific communication using over 37,000 ACL Anthology papers from 2020-2024 and a synthetic dataset of 3,000 human-written passages plus LLM-generated improvements.
#Benchmarking#ACL Anthology#Research release
why featured
HKR-H/K/R pass, but the summary discloses corpus size and scope only, not the main findings or reproducible outcomes. This fits the upper end of ordinary research coverage, below featured.
editor take
This scans 37K ACL papers; sneering at AI prose is too easy when 20 experts rated LLM edits clearer and more exciting.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
14:47
20d ago
HuggingFace Papers (takara mirror)· rssEN14:47 · 05·19
JAXenstein: Accelerated Benchmarking for First-Person Environments
Researchers released the open-source JAXenstein benchmark, a JAX implementation of the Wolfenstein 3D rendering engine for visual first-person reinforcement-learning tasks, and the post says it runs several times faster than comparable vision-based benchmarks.
#Agent#Vision#Benchmarking#JAXenstein
why featured
HKR-H and HKR-K pass: a retro FPS engine as a first-person RL benchmark is clickable, and the JAX implementation plus multi-x speed claim adds substance. HKR-R is weak, so this stays in the 60–71 all tier.
editor take
JAXenstein fills JAX’s first-person visual RL gap; “several times faster” lacks tables, so treat it as throughput plumbing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
14:08
20d ago
HuggingFace Papers (takara mirror)· rssEN14:08 · 05·19
Structural Energy Guidance for View-Consistent Text-to-3D Generation
SEGS constructs structural energy in the PCA subspace of U-Net features and injects its gradient into denoising, reducing Janus Rate by about 10% on average across baselines including DreamFusion, Magic3D, and LucidDreamer.
#Multimodal#Vision#SEGS#DreamFusion
why featured
HKR-K passes with a concrete mechanism, about 10% Janus Rate reduction, and named baselines. HKR-H and HKR-R are weak because text-to-3D consistency remains a narrow research lane.
editor take
SEGS cuts Janus Rate about 10%, but runtime is undisclosed; the training-free plug-in matters more than prettiness claims.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
13:42
20d ago
HuggingFace Papers (takara mirror)· rssEN13:42 · 05·19
CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models
CLIF uses influence functions on CEBaB and Yelp to identify helpful and harmful training samples, then restores model performance to baseline without retraining by changing those samples’ labels and weights.
#Interpretability#Research release
why featured
HKR-K is clear: CLIF uses influence functions to find harmful samples and restores performance without retraining via relabeling/reweighting. HKR-H is weak and HKR-R is niche, so this stays in all.
editor take
CLIF restores CEBaB/Yelp baselines without retraining; I want proof it survives messier real-world labels.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
12:18
20d ago
HuggingFace Papers (takara mirror)· rssEN12:18 · 05·19
CPC-VAR: Continual Personalized and Compositional Generation in Visual Autoregressive Models
CPC-VAR introduces GCNS and a context-aware composition strategy for VAR text-to-image models, targeting two conditions: sequential personalized concept learning, where catastrophic forgetting occurs, and multi-concept synthesis, where feature entanglement and attribute inconsistency occur; the post says experiments improve long-sequence continual personalization and multi-concept synthesis over baselines, but does not disclose exact metrics or datasets.
#Vision#Multimodal#Fine-tuning#Research release
why featured
HKR-K passes via two named mechanisms and a clear problem setting, but the body gives no metrics, effect size, or reproduction setup. HKR-H and HKR-R are weak, so this stays as niche research signal below featured.
editor take
CPC-VAR shows GCNS plus localized cross-attention, but no metrics; VAR personalization must beat diffusion LoRA on forgetting curves.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
12:03
20d ago
HuggingFace Papers (takara mirror)· rssEN12:03 · 05·19
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
LIFT and PLACE split diffusion distillation into coarse alignment and fine refinement, then use error-based groups for local adaptive guidance; with a 1.3M-parameter student at 1.6% of the teacher size, the method remains stable and reaches 15.73 FID while conventional KD degrades to 50–200+ FID.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the mechanism and numbers are concrete, and diffusion compression maps to inference-cost concerns. This is still a single paper summary with no product adoption or open-source traction, so it stays in the 60–71 band.
editor take
LIFT and PLACE gets 15.73 FID with a 1.3M student; error-split distillation beats naïve teacher mimicry here.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
12:01
20d ago
HuggingFace Papers (takara mirror)· rssEN12:01 · 05·19
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention
The paper introduces BA-Att, a pre-downsampled block-sparse attention method for diffusion language models; it reports up to 6.95x faster attention computation than FlashAttention and near full-attention performance at 50% sparsity across language, multimodal, and video generation models.
#Inference-opt#Multimodal#Research release
why featured
HKR-H/K/R pass, but diffusion LMs and sparse attention keep this research-heavy. The 6.95x speedup and 50% sparsity claim are testable; code, benchmark breadth, and transfer to mainstream LLMs are not disclosed, so it stays in 60–71.
editor take
BA-Att reports 6.95x attention speedup at 50% sparsity; DLM long-context needs data-driven sparsity, not brittle position priors.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
11:50
20d ago
HuggingFace Papers (takara mirror)· rssEN11:50 · 05·19
LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets
The paper presents an Arabic financial sentiment framework for Saudi markets, using an 84K-sample corpus, five-class sentiment labels, and company entity linking to analyze sentiment dynamics relative to Saudi Exchange stock behavior.
#Embedding#Benchmarking#Saudi Exchange#Research release
why featured
HKR-K passes with 84k samples and five-class labels. HKR-H/R are weak; this is niche NLP research with no hard exclusion, so it sits in the 60–71 band.
editor take
The paper ships 84K Arabic finance samples; annotation agreement and return-prediction results are undisclosed, so don’t price this as alpha.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
11:04
20d ago
HuggingFace Papers (takara mirror)· rssEN11:04 · 05·19
Beyond Rational Illusion: Behaviorally Realistic Strategic Classification
The paper defines behaviorally realistic strategic classification and introduces Pro-SF, which adds three prospect-theory mechanisms to Stackelberg interactions: benefit-cost asymmetry, subjective reference points, and non-rational probability distortion.
#Benchmarking#Research release
why featured
HKR-K has concrete mechanisms, and HKR-R links to classifier gaming in deployment. HKR-H is weak; the post gives no experiment scale, datasets, or effect sizes, so it stays in the 60-71 research-signal band.
editor take
Pro-SF adds 3 prospect-theory mechanisms to Stackelberg classification; I buy the setup, but datasets and gains aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
10:11
20d ago
HuggingFace Papers (takara mirror)· rssEN10:11 · 05·19
Paper Proposes Closed-form Predictive Coding via Hierarchical Gaussian Filters
The paper formulates predictive coding networks as deep hierarchical Gaussian filters, restoring precision-weighted message passing so activations, weights, and precisions train under one free-energy objective without global error signals, iterations, or automatic differentiation. On FashionMNIST, the method approaches backpropagation in epoch-level wall-clock cost, converges in fewer epochs, and performs better on online learning, data efficiency, and concept-drift tasks.
#Inference-opt#Interpretability#Benchmarking#Research release
why featured
HKR-K passes with a concrete mechanism and FashionMNIST runtime/convergence claim. HKR-H and HKR-R are weak, and the post lacks production-scale evidence that this challenges backprop, so it stays in the 60-71 research-signal band.
editor take
HGF-PC nears backprop epoch cost on FashionMNIST. I’d hold applause until depth, scale, and error bars are disclosed.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
09:47
20d ago
HuggingFace Papers (takara mirror)· rssEN09:47 · 05·19
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
The paper introduces Spectral Integrated Gradients, which builds baseline-to-input integration paths with SVD and activates singular components from largest to smallest; across multiple image classification datasets, SIG reports cleaner attribution maps and improved quantitative results versus existing path-based attribution methods.
#Interpretability#Vision#Research release#Open source
why featured
HKR-K passes: Spectral Integrated Gradients gives a concrete SVD path and vision attribution comparison. HKR-H/R are weak; no noise-reduction numbers or production implication are disclosed.
editor take
SIG changes IG paths with SVD; cleaner vision maps, but datasets and metrics aren't disclosed here, so don't equate pretty heatmaps with interpretability.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
09:31
20d ago
HuggingFace Papers (takara mirror)· rssEN09:31 · 05·19
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects
SceneCode compiles a natural-language prompt into executable indoor-world programs, not static meshes. It uses a planner-designer-critic loop, routes each AssetRequest through five code-generation strategies, creates part-wise Blender Python assets, and exports SDF files for physics simulation.
#Agent#Code#Robotics#SceneCode
why featured
HKR-H/K pass: the prompt-to-executable-world-program angle is fresh and the mechanism is specific. HKR-R is weak; no benchmark, repo, or production-replacement evidence is disclosed, so it stays in the 60–71 band.
editor take
SceneCode routes assets through 5 code strategies into SDF; I buy this—embodied sim needs editable articulated assets, not prettier meshes.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
09:21
20d ago
HuggingFace Papers (takara mirror)· rssEN09:21 · 05·19
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition
The researchers propose Lens Privacy Sealing, a hardware method that obscures camera lenses with adjustable laminating film, and release P³AR-NTU with 114K videos plus P³AR-PKU for privacy-preserving action recognition.
#Vision#Benchmarking#MSPNet#P³AR
why featured
HKR-H/K/R pass, but this is a niche computer-vision privacy benchmark, not a broad model or product release. The 114K-video dataset and physical occlusion mechanism make it useful signal in the 60–71 band.
editor take
LPS masks lenses before capture and ships 114K videos; I buy the hardware angle over betting privacy on post-processing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
09:05
20d ago
HuggingFace Papers (takara mirror)· rssEN09:05 · 05·19
TORQ: Two-Level Orthogonal Rotation Improves MXFP4 Quantization
TORQ applies two-level orthogonal rotation to MXFP4 activation quantization without training. On Qwen3-32B, WikiText perplexity drops to 8.43, versus 7.61 for BF16, and average accuracy rises from 38.40% with direct RTN to 73.63%, versus 74.82% for BF16.
#Inference-opt#LLaMA3#Qwen3#Research release
why featured
HKR-K and HKR-R are strong: TORQ gives concrete quantization metrics tied to inference cost. HKR-H is narrow, and the paper lacks an artifact or production validation, so it stays in 60–71.
editor take
TORQ lifts Qwen3-32B RTN accuracy from 38.40% to 73.63%; training-free near-BF16 MXFP4 smells hardware-ready, not benchmark theater.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
09:02
20d ago
HuggingFace Papers (takara mirror)· rssEN09:02 · 05·19
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs
EgoCoT-Bench provides 3,172 verifiable QA pairs over 351 egocentric videos, covering 4 task groups and 12 sub-task groups, with STSG-guided generation and human refinement for operation-centric grounded reasoning evaluation.
#Reasoning#Multimodal#Benchmarking#EgoCoT-Bench
why featured
HKR-K passes via concrete dataset size, task structure, and STSG plus human correction. HKR-H/R are weak, making this a useful but narrow multimodal benchmark below featured threshold.
editor take
EgoCoT-Bench adds 3,172 QA over 351 videos; its bite is catching MLLMs that answer right with bogus evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
08:52
20d ago
HuggingFace Papers (takara mirror)· rssEN08:52 · 05·19
Self-Creative Text-to-Object Generation Using Semantic-Aware Spatial Weighting
The paper proposes SCDiff for text-to-image generation with two modules, LSW and VSML; the RSS snippet says experiments improve creativity, semantic alignment, and visual coherence, but the post does not disclose specific benchmark numbers.
#Multimodal#Vision#Research release
why featured
HKR-K barely passes because SCDiff, LSW, and VSML are new mechanism names. HKR-H/R fail: no metrics, no reproducible setup, and no practitioner nerve beyond a niche vision-paper abstract.
editor take
SCDiff adds LSW and VSML, but benchmark numbers are undisclosed; reducing “creativity” to center weighting plus diversity loss smells thin.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
08:46
20d ago
HuggingFace Papers (takara mirror)· rssEN08:46 · 05·19
Provable Fairness Repair Method for Deep Neural Networks
ProF repairs fairness issues in deep neural networks by combining interval bound propagation with a MILP constraint-solving formulation, and the paper reports results on four benchmark datasets with up to 95.93% generalization on full datasets, 93.16% on the entire input space, and around 90% fairness improvement under configurable sensitive attributes and fairness definitions.
#Safety#Alignment#Benchmarking#Research release
why featured
HKR-K passes with IBP+MILP, 4 benchmarks, 95.93% generalization, and ~90% fairness gains. HKR-H/R are weak: it reads as a narrow paper and lacks a mainstream LLM/agent practice hook.
editor take
ProF reports 95.93% full-dataset generalization on 4 benchmarks; I buy the proof angle, but MILP scaling is undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
08:08
21d ago
HuggingFace Papers (takara mirror)· rssEN08:08 · 05·19
Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing
SafeMark adds a thresholded watermark-decoding loss to a diffusion editor’s training objective, preserving watermark bit accuracy after text-guided image edits without architectural changes.
#Vision#Multimodal#Safety#SafeMark
why featured
HKR-H/K/R pass, but the item discloses only the paper mechanism, not bit-accuracy numbers, datasets, or release status. Useful image-safety research, not same-day must-write.
editor take
SafeMark changes only the loss, not architecture; the snippet gives no bit-accuracy numbers, so don’t call editable watermarking solved.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
07:00
21d ago
HuggingFace Papers (takara mirror)· rssEN07:00 · 05·19
Targeted Downstream-Agnostic Attack
The paper proposes Targeted DAA, using a threat image as a feature-level anchor to attack pre-trained encoders under unknown downstream tasks, with experiments on 10 self-supervised methods across 3 benchmark datasets.
#Vision#Embedding#Safety#Research release
why featured
HKR-K/R pass: Targeted DAA gives a concrete feature-anchor attack and tests it across 3 benchmarks and 10 SSL methods. HKR-H is weak, and the specialist security angle keeps it in all.
editor take
Targeted DAA tests 3 datasets and 10 SSL methods; it smells like a red-team recipe for targeted vision-encoder poisoning.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
06:11
21d ago
HuggingFace Papers (takara mirror)· rssEN06:11 · 05·19
Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling
SIGMA models trust, conflict, and neutral relations among agents with a confidence-weighted signed relational graph, then uses conflict-aware message passing and weighted aggregation; the paper reports gains over state-of-the-art baselines on six benchmark datasets across multiple LLM backbones and multi-agent configurations.
#Agent#Reasoning#Benchmarking#SIGMA
why featured
HKR-H/K/R pass, but the post gives only abstract-level facts: no dataset names, effect sizes, code, or reproducible setup. That keeps it in the 60–71 research-signal band.
editor take
SIGMA beats baselines on 6 benchmarks; gains are undisclosed, so treat it as a MAS aggregation paper for now.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
06:10
21d ago
HuggingFace Papers (takara mirror)· rssEN06:10 · 05·19
LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models
LambdaPO replaces GRPO’s group-mean baseline with pairwise preference advantage estimation and adds a semantic density reward based on precision-recall alignment between reasoning traces and ground-truth solutions; the post does not disclose the exact datasets, model sizes, or performance gains.
#Reasoning#Alignment#Research release
why featured
HKR-K passes because it describes a concrete GRPO training change. HKR-H/R are weak: datasets, model scale, and gains are not disclosed, so this stays a normal research-release item.
editor take
LambdaPO tweaks GRPO advantage estimation, but datasets, scale, and gains are undisclosed; nice objective story, not yet a recipe.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
05:40
21d ago
HuggingFace Papers (takara mirror)· rssEN05:40 · 05·19
EmbGen: Teaching with Reassembled Corpora
EmbGen decomposes a corpus into entity-description pairs, reassembles them using embedding similarity, and generates QA pairs with proximity, intra-cluster, and inter-cluster sampling; under 5M and 20M token budgets, it improves Binary Accuracy on the most heterogeneous dataset by 12.5% and 88.9% over the strongest baseline.
#Fine-tuning#Embedding#Benchmarking#EmbGen
why featured
HKR-H/K/R pass via a clear data-reassembly hook, concrete gains, and fine-tuning cost relevance. Still a single paper listing with missing model and dataset details, so it stays in the 60–71 band.
editor take
EmbGen gains 88.9% at 20M tokens on heterogeneous data; I buy the pipeline, but Binary Accuracy needs human audit.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
05:32
21d ago
HuggingFace Papers (takara mirror)· rssEN05:32 · 05·19
MatPhys: Learning Material-Aware Physics Parameters for Deformable Object Simulation from Videos
MatPhys predicts spring-mass parameters from single-view video, using DINO features for part decomposition and a learned material codebook for cross-scene consistency; experiments report reconstruction and future prediction matching per-scene optimization baselines, with stronger generalization to unseen interactions and objects, but the snippet does not disclose dataset size.
#Vision#Robotics#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism for learning deformable-object physics from monocular video and links to robotics simulation cost. HKR-H is weak, dataset size is not disclosed, so it sits in the 60–71 research band.
editor take
MatPhys predicts spring-mass parameters from monocular video; dataset size is undisclosed, but matching per-scene optimization deserves replication.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:41
21d ago
HuggingFace Papers (takara mirror)· rssEN04:41 · 05·19
SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models
SciCustom builds custom scientific benchmarks from large-scale data using ontology-grounded knowledge units, voting-based multi-model consensus, binary-search retrieval, proxy subset selection, and data-grounded benchmark generation, with chemistry and healthcare experiments showing fine-grained LLM capability differences that standard benchmarks miss.
#Benchmarking#SciCustom#Research release#Benchmark
why featured
HKR-K and HKR-R pass: the paper offers concrete eval mechanisms and targets benchmark blind spots. HKR-H is weak, and the article shows no adoption signal or broad release impact, so it stays in all.
editor take
SciCustom uses ontology units and model voting for science evals; without model rankings, I’d audit its tagger bias first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:39
21d ago
HuggingFace Papers (takara mirror)· rssEN04:39 · 05·19
CompoSE: 3D Shape Synthesis and Editing with Part-Aware Control
CompoSE synthesizes part-separated 3D objects from coarse geometric primitives, using a diffusion transformer that alternates local part processing with global context aggregation; the post says it outperforms existing methods on guided synthesis, but does not disclose specific metric values.
#Multimodal#Vision#CompoSE#Research release
why featured
HKR-K passes on the part-aware primitive-control mechanism; HKR-H and HKR-R are weak because the post lacks metrics, datasets, or a broader practitioner nerve. This fits a normal research update, not featured.
editor take
CompoSE controls 3D parts from coarse primitives; no metric values are disclosed, so don’t buy the “significantly outperforms” line yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:31
21d ago
HuggingFace Papers (takara mirror)· rssEN04:31 · 05·19
Retrieval-Augmented Linguistic Calibration
The paper introduces RALC, a lightweight post-hoc pipeline that uses retrieval-augmented rewriting to propagate calibrated confidence into language, improving in-domain faithfulness by up to 66% and calibration by up to 58% across three QA benchmarks and five LLM families.
#RAG#Alignment#Benchmarking#Research release
why featured
HKR-K/R pass: the method, test scope, and gains are concrete, and RAG reliability is a real practitioner pain. HKR-H is weak, and the post shows no code or production evidence, so it stays in 60–71.
editor take
RALC lifts faithfulness 66% on 3 QA benchmarks; in-domain only, so don’t trust “probably” as calibrated UI yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:01
21d ago
HuggingFace Papers (takara mirror)· rssEN04:01 · 05·19
Exploring and Developing a Pre-Model Safeguard with Draft Models
The paper proposes a pre-model guard that uses SLM draft responses before target LLM inference to detect jailbreak prompts; the snippet says it lowers false negatives versus prompt-only guards but does not disclose numeric reductions.
#Safety#Alignment#Inference-opt#Research release
why featured
HKR-H/K/R pass through the draft-model-as-guard hook, the pre-inference mechanism, and safety/cost resonance, but the body gives no attack set, false-positive rate, or reduction figure.
editor take
SLM draft responses screen jailbreaks before target inference; no false-negative drop is disclosed, so I buy the mechanism, not the claim.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Genflow Ad Studio: A Compound AI Architecture for Brand-Aligned, Self-Correcting Video Generation
Genflow uses a retrieval-based Brand DNA module and an adversarial multi-agent QC loop to generate brand-aligned ad videos, raising brand-compliant output yield from 42% to 89% under the paper’s reported setup.
#Agent#RAG#Vision#Genflow
why featured
HKR-H and HKR-K pass: the paper gives a concrete agent/RAG mechanism and a 42%→89% metric. No major lab, open artifact, or cross-source debate is shown, so it stays at the top of 60–71.
editor take
Genflow lifts brand-compliant yield from 42% to 89%; I buy the direction, but the 6-page paper lacks dataset scale.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning
The paper proposes Distinguishable Deletion, constraining unlearned knowledge with energy boundaries in latent representations, then applying EUA during training and an energy-based refusal mechanism at inference; the arXiv abstract says the code is available on GitHub.
#Alignment#Safety#Research release#Open source
why featured
HKR-H/K/R all pass, but the post gives no benchmark numbers, author authority, or deployment result. This is useful safety research with code, not a must-write release.
editor take
D² unifies erasure and refusal via energy boundaries, but model scale is undisclosed; I don’t buy “significantly outperforms” before replication.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents
HINT-SD uses full-trajectory hindsight to select failure-relevant actions and applies feedback-conditioned distillation only to targeted action spans; on BFCL v3 and AppWorld, it improves over a dense per-turn feedback baseline by up to 18.80% while reducing time per training step by 2.26×.
#Agent#Fine-tuning#Reasoning#HINT-SD
why featured
HKR-H/K/R pass: targeted hindsight self-distillation gives clear agent-training signal with +18.80% and 2.26x claims, but it remains an arXiv benchmark paper rather than a broadly shipped tool.
editor take
HINT-SD gains up to 18.80% on BFCL v3/AppWorld and cuts step time 2.26×; long-horizon agents need fewer wasted targets.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State
The paper introduces discipline stability, a trace-based evaluation paradigm, and shows in a two-hotel pricing benchmark and a compact hidden-budget bidding task that reward-only PPO variants can meet revenue-like outcomes while failing to align price or bid traces.
#Agent#Benchmarking#Alignment#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv methods paper whose impact depends on replication and adoption. Concrete mechanism and benchmarks make it useful, not same-day featured.
editor take
Reward-only PPO passes two KPI-like benchmarks while drifting off-trace; I buy the critique, deployment gates need behavior traces.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
The paper proves that a broad class of work-conserving schedulers reaches maximum throughput for individual requests and AI-agent workloads with DAG or fork-join routing, and its evaluations identify Orca and Sarathi-Serve as throughput-optimal while FasterTransformer and vanilla vLLM are not maximally stable.
#Agent#Inference-opt#Orca#Sarathi-Serve
why featured
HKR-H/K/R all pass, but this is a theory-heavy scheduling paper with a narrow infra audience. It stays in the lower 60–71 band at 70 rather than featured.
editor take
The paper proves work-conserving schedulers are throughput-optimal for DAG agents; vanilla vLLM being non-maximally stable is the jab.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
The paper proposes ConSPO as an RLVR framework that replaces GRPO’s clipped ratio scores with length-normalized sequence log-probabilities and a group-wise InfoNCE objective, and reports evaluations across multiple backbone models, parameter scales, and training datasets on mathematical reasoning benchmarks.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K is strong: ConSPO replaces GRPO scoring with length-normalized log-prob plus group InfoNCE. HKR-H is weak, and metrics, code, and model names are not disclosed, so this stays in 60-71.
editor take
ConSPO swaps GRPO scores for length-normalized log-prob; I buy the target, but the snippet gives no math-gain numbers.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Beyond Scaling: Agents Are Heading to the Edge
The position paper argues that personal-agent architectures should move to the edge, citing 3 structural reasons: high-fidelity local context, zero-latency execution loops, and real-time local interaction as the source of implicit preference data.
#Agent#Memory#Alignment#Research release
why featured
HKR-H/K/R all pass, but this is a position paper with mechanisms rather than experiments, code, benchmarks, or a major-lab release. It fits the 60–71 band as useful commentary, not featured news.
editor take
The paper gives 3 edge-agent reasons; I buy local context, not “must move edge”—security and sync costs aren’t counted.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
D²Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning
D²Evo trains an RL framework with fewer than 2K real mathematical samples, mines medium-difficulty anchors based on the current Solver capability, and jointly optimizes the Questioner and Solver to improve reasoning on mathematical and general reasoning benchmarks.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-K/R pass: <2K-sample RL, difficulty-aware self-evolution, and dual-role optimization are useful. HKR-H is weak, and gains, base models, and release status are not disclosed, so it stays below featured.
editor take
D²Evo uses under 2K real math samples; the medium-difficulty anchor loop beats another synthetic-data volume story.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning
The paper uses token-level confidence trajectories to separate correct and incorrect reasoning traces across GSM8K, MATH, and MMLU, links Davies-Bouldin clustering strength to correctness-discrimination AUC, and proposes NeuralConf to improve confidence-weighted answer aggregation under a fixed trace budget.
#Reasoning#Benchmarking#Inference-opt#NeuralConf
why featured
HKR-K/R pass: the paper gives a testable confidence-trace mechanism for reasoning reliability and budgeted aggregation. HKR-H is weak, and the abstract does not disclose NeuralConf’s lift, so it stays in 60–71.
editor take
NeuralConf uses only token confidence traces; nice constraint, but no AUC numbers are disclosed, so don’t crown it a verifier replacement.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models
The paper introduces LURE, a diffusion-model concept reawakening method that reconstructs latent space, applies Gradient Field Orthogonalization, and uses LSIS sampling to recover multiple erased concepts under diverse erasure tasks and methods.
#Vision#Safety#Alignment#Research release
why featured
HKR-H/K/R all pass, but the source gives only arXiv-summary detail: no metrics, code status, or affected model list. The diffusion-safety angle is real but narrow, so it sits high in 60–71.
editor take
LURE revives multiple erased concepts, metrics undisclosed; erasure-based safety needs to explain why latent space keeps a backdoor.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LoopQ: Quantization for Recursive Transformers
LoopQ targets W4A4 post-training quantization for LoopLMs across seven benchmarks, improving average downstream accuracy by 68.8% and reducing average perplexity by 87.7% versus the strongest static PTQ baseline.
#Inference-opt#Benchmarking#LoopQ#Research release
why featured
HKR-K is solid with seven benchmarks, W4A4, +68.8% accuracy and -87.7% perplexity; HKR-R hits inference cost. HKR-H is weak, and LoopLMs are still niche, so it stays all.
editor take
LoopQ lifts W4A4 accuracy 68.8% across 7 benchmarks; recursive block reuse is a nastier PTQ target than standard Transformers.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
TeleRAG uses lookahead retrieval to prefetch CPU data to GPU in parallel with LLM generation, and evaluations report up to 1.53x average end-to-end latency reduction for single-query inference and 1.83x higher average throughput for batched inference.
#RAG#Inference-opt#TeleRAG#Research release
why featured
HKR-K/R pass: the mechanism and numbers are concrete, and production RAG latency is a real pain point. HKR-H is weak; as a single arXiv paper with no disclosed code or deployment, it stays in the 60–71 band.
editor take
TeleRAG cuts single-query latency up to 1.53x. RAG speed is still a scheduler-and-memory fight.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra
The study tests 10 optimization phases on Apple M3 Ultra, and SDXS-512 with CoreML conversion plus a 3-thread camera pipeline reaches 22.7 FPS for real-time camera img2img at 512x512 resolution.
#Inference-opt#Vision#Apple#NVIDIA
why featured
HKR-H/K/R pass, but this is a hardware-specific inference-optimization paper, not a model or product launch. The 22.7 FPS result is useful; the audience is narrower, so it stays in 60–71.
editor take
SDXS-512 hits 22.7 FPS on M3 Ultra; quantization, parallel inference, and Neural Engine fail, so this beats leaderboard noise for Mac deployment.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Forgetting is Competition: Rethinking Unlearning as Representation Interference in Diffusion Models
The paper introduces SurgUn for concept unlearning in diffusion models, using distractor-conditioned gradient competition and pixel-grounded weight localization; it reports stronger erase-retain balance than baselines across Stable Diffusion v1.5, SDXL, SANA-1.5, and five benchmarks including UnlearnCanvas and EraseBench.
#Alignment#Safety#Vision#SurgUn
why featured
HKR-H/K/R pass: the title reframes unlearning as competition, and the summary gives SurgUn, 3 diffusion backbones and 5 benchmarks. Still an arXiv method paper with no code, adoption signal or community debate, so it stays in 60–71.
editor take
SurgUn spans 3 diffusion models and 5 benchmarks; I buy interference competition over pretending concept removal is surgery.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Exemplar Partitioning for Mechanistic Interpretability
The paper introduces Exemplar Partitioning, an unsupervised method that builds interpretable dictionaries from LLM activations using about 10^3 fewer tokens than comparable SAEs, and reports 0.881 mean AUROC on AxBench latent concept detection at Gemma-2-2B-it L20.
#Interpretability#Benchmarking#Gemma#GemmaScope
why featured
HKR-H/K/R all pass via the 10^3-token reduction, benchmark result, and safety/transparency angle. Scope is narrow mechanistic interpretability with no product adoption or source cluster, so it stays in the high 60–71 band.
editor take
EP hits 0.881 AUROC on Gemma-2-2B-it L20; 10^3 fewer tokens and near SAE-A is a clean shot at SAE cost.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning
LaDi-RL uses diffusion latent trajectories and hierarchical latent-text rollouts, beating token-level RL by 9.4% on code and 5.7% on math pass@1.
#Reasoning#Code#Benchmarking#Research release
why featured
HKR-H is the latent-diffusion-versus-entropy-collapse hook, and HKR-K has a concrete rollout mechanism plus pass@1 gains. It remains a single arXiv method paper with no code, replication, or adoption signal, so it stays in 60–71.
editor take
LaDi-RL lifts pass@1 by 9.4% on code and 5.7% on math; I buy the reward aggregation, not the entropy-collapse headline.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When a Zero-Shooter Cheats: Improving Age Estimation via Activation Steering
The paper finds that zero-shot VLM age estimation uses an “identity shortcut,” mapping recognized people to memorized ages instead of visual cues; activation steering intervenes in hidden states and reduces mean absolute error by up to 25% across popular benchmarks.
#Vision#Multimodal#Interpretability#Research release
why featured
HKR-H/K pass: the “cheating” frame is clickable, and the paper gives an identity-shortcut mechanism plus a 25% MAE drop. HKR-R is weak because age estimation is a narrow use case, so it stays in the interesting-not-featured band.
editor take
VLM age MAE drops up to 25%; the uglier finding is benchmarks mistaking identity memorization for visual robustness.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
GIM Benchmark Introduces 820 Problems to Evaluate Multi-Domain Cognitive Integration
GIM introduces 820 original problems, with 615 public and 205 private items, and calibrates a 2PL IRT model on over 200,000 prompt-response pairs from 28 models to evaluate multi-operation reasoning.
#Reasoning#Benchmarking#GIM#Research release
why featured
HKR-K and HKR-R pass: task counts, public/private split, 28 models, and 2PL IRT are concrete. HKR-H is weak, and this remains an arXiv benchmark release rather than a same-day industry story.
editor take
GIM ships 820 items and 200k responses; I buy integration tasks, but 28-model IRT won't erase author-style bias.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
ESI-Bench benchmark for embodied spatial intelligence closes perception-action loop
ESI-BENCH introduces an OmniGibson-based benchmark with 10 task categories and 29 subcategories, and experiments on state-of-the-art MLLMs find active exploration outperforms passive observation while most failures come from action blindness rather than weak perception.
#Agent#Multimodal#Benchmarking#OmniGibson
why featured
HKR-K comes from the benchmark structure and findings; HKR-R comes from the embodied-agent failure mode. As a single arXiv paper with a narrow robotics-agent audience and weak HKR-H, it stays in all.
editor take
ESI-BENCH has 10 categories and 29 subcategories; action blindness is a cleaner diagnosis than feeding MLLMs more views.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation
The paper introduces a PPE framework for contextual leakage detection in RAG, and its T3+OCSVM detector reaches 0.93+ borderline AUROC on synthetic medicine, finance, and law data while reducing false positives by 44–55 percentage points.
#RAG#Embedding#Safety#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete RAG privacy mechanism and metrics. As a single arXiv paper using synthetic data, with no major lab or deployment artifact, it stays in the 60–71 band.
editor take
T3+OCSVM hits 0.93+ AUROC on three synthetic RAG domains; I buy the direction, not real-world leakage proof.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs
The paper proposes SARE, which formulates hallucination unlearning in multimodal LLMs as targeted min-max optimization and uses Targeted-SAM to flatten the loss landscape around hallucinated concepts under simulated worst-case parameter perturbations.
#Multimodal#Vision#Safety#Research release
why featured
HKR-H/K/R pass: the paper has a clear hook, a concrete SARE/Targeted-SAM mechanism, and a safety-reliability angle. The post lacks model names, metrics, code, and effect size, so it stays below featured.
editor take
SARE uses Targeted-SAM for object hallucination erasure; models, datasets, and gains are undisclosed, so treat it as a robustness hypothesis.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Breaking Winner-Takes-All: Cooperative Policy Optimization Improves Diverse LLM Reasoning
The paper proposes GCPO, replacing independent rollout scoring with team-level credit assignment, where each rollout is rewarded by its marginal contribution to valid solution coverage, defined as determinant volume over reward-weighted semantic embeddings.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but the item only gives GCPO’s reward mechanism, not authors, model scale, benchmark gains, or release details. As a single arXiv reasoning-training paper, it lands high in the 60–71 band.
editor take
GCPO credits rollouts by marginal coverage; the snippet gives no scores, so I buy the idea only after code reproduces it.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations
Narges Babadi and Hadis Karimipour introduce X-Shift, a grey-box attack on CLIP-based vision-language models. It perturbs patch-level visual representations to redirect explanation heatmaps on ImageNet-1k, MS-COCO, and Flickr30K while preserving the original prediction and without changing model parameters.
#Vision#Multimodal#Interpretability#Narges Babadi
why featured
HKR-H/K/R all pass, but this is a single arXiv paper with thin body detail. Code release, affected deployment scope, and broader model replication are not disclosed, so it stays in all at 70.
editor take
X-Shift shifts CLIP heatmaps on 3 datasets while preserving predictions; heatmap audits alone now smell like placebo.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Lever: Speculative LLM Inference on Smartphones
Lever optimizes flash-backed LLM inference on smartphones by keeping a small draft model in DRAM while a larger target model stays in flash, and its token-tree drafting, early-exit verification, and CPU-NPU execution mapping reduce average latency by 2.93x versus baseline flash-offloaded inference and 1.50x versus conventional speculative decoding.
#Inference-opt#Research release
why featured
HKR-H/K pass: the hook is smartphone LLM inference via flash-hosted speculative decoding, with 2.93× and 1.50× latency gains. As a single arXiv systems paper, its reach is too narrow for featured.
editor take
Lever cuts flash-backed phone LLM latency 2.93x; I want device and model details, and the snippet omits them.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
Mistletoe attacks the acceptance mechanism in speculative decoding by jointly reducing drafter-target agreement and preserving the target model’s output distribution, using null-space projection to lower the average accepted length τ while maintaining output quality and perplexity.
#Inference-opt#Safety#Mistletoe#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv technical security paper with a serving-infra audience. The summary lacks attack magnitude, affected models, and reproducible setup, so it stays in the 60–71 band.
editor take
Mistletoe lowers speculative decoding τ, with no effect size disclosed; acceleration layers are an attack surface, not plumbing.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
The paper decouples prefix source from token-level KL direction and derives four LLM distillation objectives spanning SFT, DAgger-style on-policy SFT, offline-RL-style distillation, and OPD; its entropy-gated length curriculum raises Avg@k by 3.6 points, raises Pass@k by up to 5.8 points, and cuts average response length by roughly 3x versus fixed long-horizon training.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a narrow arXiv training-method paper with SFT/DAgger/KL overhead. Concrete mechanism and numbers keep it near the top of the 60–71 band.
editor take
The paper decouples prefix source and token KL, adding 3.6 Avg@k; I buy the entropy-gated curriculum more, with 3x shorter outputs.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Geometric Scaling of Bayesian Inference in LLMs
The paper studies Pythia, Phi-2, Llama-3, and Mistral families and finds last-layer value representations align with a single dominant axis strongly correlated with predictive entropy; targeted Pythia-410M interventions disrupt local uncertainty geometry, while random-axis controls do not, indicating the axis is a privileged uncertainty readout rather than a singular computational bottleneck.
#Reasoning#Interpretability#Pythia#Llama-3
why featured
HKR-H/K/R all pass, but this is a technical arXiv interpretability paper without an artifact, production test, or cross-source momentum; it lands at the top of 60–71, tier all.
editor take
Pythia-to-Mistral shows an entropy axis, but Pythia-410M edits only damage local geometry; calling it Bayesian machinery feels overclaimed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models
RAM corrects the pretraining regression target with rewards for diffusion and flow-matching RL post-training. On Stable Diffusion 3.5M, it matches Flow-GRPO’s peak reward in up to 50× fewer training steps.
#Fine-tuning#Alignment#Inference-opt#Stable Diffusion
why featured
HKR-H/K/R pass via the 50x-step claim, RAM mechanism, and training-cost angle, but the diffusion/flow-matching RL niche narrows audience fit. This stays below featured despite a useful benchmark claim.
editor take
RAM matches Flow-GRPO on SD 3.5M with up to 50× fewer steps; dragging RL back to regression beats rollout theater.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Where Pretraining Writes and Alignment Reads: The Asymmetry of Transformer Weight Space
The paper analyzes Transformer weight deltas with a relative-subspace-fraction probe and finds alignment deltas concentrate in the read pathway, W_Q and W_K, while cross-entropy pretraining forms prediction geometry in the write pathway, W_O and W_2.
#Alignment#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the title has a real asymmetry hook, and the summary gives a testable weight-path claim. The item stays all because it is niche interpretability research with no author signal, model scale, or replication setup disclosed.
editor take
The paper pins alignment deltas to W_Q/W_K; if the probe holds, RLHF edits reading more than knowledge.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LLM Agents Are the Antidote to Walled Gardens
arXiv:2506.23978v3 argues that LLM agents can use AI-mediated adapters to let any two digital services exchange data, while the abstract flags security risks, technical debt, and legal frictions.
#Agent#Tools#Safety#Research release
why featured
HKR-H/K/R pass via the adapter thesis and lock-in angle, but the article gives no metrics, implementation detail, or deployment case. It stays in the 60–71 band.
editor take
arXiv 2506.23978v3 gives a thesis, not evidence; calling agents an antidote to walled gardens oversells it.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Stress-Testing Neural Network Verifiers with Provably Robust Instances
The paper introduces VeriStressGT, a framework that generates verification instances with known robustness labels via analytic construction, evaluates five state-of-the-art neural network verifiers, and reports multiple numeric tolerance concerns plus one implementation bug in popular verifiers.
#Safety#Benchmarking#VeriStressGT#arXiv
why featured
HKR-H/K/R pass via a concrete verifier-stress hook, 5-tool evaluation, and safety-tool trust angle. Importance stays below featured because neural-network verification is niche and carries a technical-accessibility penalty.
editor take
VeriStressGT tests 5 verifiers; honestly, ground-truth stress cases beat another leaderboard built on label-free heuristics.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Transformation-Augmented GRPO for Enhancing Large Language Model Reasoning Exploration
The paper proposes TA-GRPO to reduce zero gradients and diversity collapse in GRPO. It generates equivalent rephrasings for each training question, then pools responses and computes advantages over the expanded set. Experiments on four LLMs show gains on AMC, OlympiadBench, AIME24, AIME25, Minerva, and GPQA-Diamond. Qwen3-1.7B and Qwen3-4B average pass@32 rise by 4.97 and 4.34 points.
#Reasoning#Fine-tuning#Benchmarking#Qwen
why featured
HKR-K is solid via the TA-GRPO question-rewriting mechanism and Qwen3 pass@32 gains. HKR-R is present for small-model post-training teams, but HKR-H is weak and the single arXiv paper lacks ecosystem uptake.
editor take
TA-GRPO lifts Qwen3-1.7B pass@32 by 4.97 points; question rephrasing is blunt, but it hits GRPO’s zero-gradient dead zone.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation
PropGuard uses a dual-view spatio-temporal graph to trace malicious instruction propagation in LLM-based multi-agent systems, and experiments across 4 communication architectures and 5 attack settings report lower attack success while preserving task-level defense success.
#Agent#Safety#Memory#PropGuard
why featured
HKR-H/K/R all pass, but the feed gives only abstract-level facts; effect size, code, and benchmark details are not disclosed. Strong all-tier agent-safety research, below the 72 featured threshold.
editor take
PropGuard spans 4 architectures and 5 attacks; effect sizes are undisclosed, so I’d file it as MAS security provenance work.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SE-GA: Memory-Augmented Self-Evolution for GUI Agents
SE-GA applies hierarchical memory and iterative self-improvement to GUI agents, using TTME for inference-time retrieval and MASE for training, and reports 89.0% success on ScreenSpot and 75.8% on AndroidControl-High.
#Agent#Memory#Benchmarking#SE-GA
why featured
HKR-K and HKR-R pass via a concrete mechanism and two benchmark numbers. Single arXiv paper, with no code, author authority, real-task evidence, or cross-source discussion, keeps it in the 60–71 band.
editor take
SE-GA reports 89.0% on ScreenSpot and 75.8% on AndroidControl-High; GUI agents are again gated by memory retrieval quality.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
ToolMATH: A Diagnostic Benchmark for Long-Horizon Tool Use under Systematic Tool-Catalog Constraints
ToolMATH converts stepwise MATH solutions into Python tools with natural-language descriptions and typed schemas, then evaluates language models under gold tools, graded distractors, and long executed tool-call chains across adaptability, robustness, and tool connectivity metrics.
#Agent#Tools#Benchmarking#ToolMATH
why featured
HKR-K and HKR-R pass for a concrete agent-tool benchmark, but the summary gives no model scores, failure rates, or release details. This fits a solid research item, not featured.
editor take
ToolMATH turns MATH solutions into Python tool chains; sample count is undisclosed, but catalog distractors beat final-accuracy toy evals.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Gated KalmaNet computes the exact Kalman gain with full error covariance and reports over 10% relative improvement over existing SSM layers on long-context RAG and LongQA up to 128k tokens.
#RAG#Inference-opt#Benchmarking#Liangzu Peng
why featured
HKR-K and HKR-R pass: the article gives a concrete mechanism and 128k RAG/LongQA numbers, with clear relevance to long-context engineering. HKR-H is weak, and the method is technical, so it stays in all.
editor take
Gated KalmaNet reports >10% gains at 128k RAG/LongQA; the Apache 2.0 Triton/vLLM code is the credibility check.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps
The paper proposes Diamond Maps, stochastic flow map models that amortize many simulation steps into a single-step sampler while preserving stochasticity for inference-time alignment to arbitrary rewards; experiments report efficient distillation from GLASS Flows and stronger reward alignment than existing methods.
#Alignment#Inference-opt#Diamond Maps#GLASS Flows
why featured
HKR-H and HKR-K pass: Diamond Maps claim to amortize multi-step simulation into a one-step stochastic sampler. The item is technical and lacks large-model results, open artifacts, or deployment evidence, so it stays in the 60–71 band.
editor take
Diamond Maps compress multi-step simulation into one-step sampling; task counts and baselines are undisclosed, so don’t buy “arbitrary rewards” yet.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
TIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool Composition
TIER derives rewards from function schemas and runtime execution, not reference trajectories, and exceeds 90% accuracy on DepthBench tasks with 1 to 6 steps. Trajectory-supervised rewards collapse beyond step 4, while the paper reports gains on BFCL v3 and NestFUL plus ablations showing all reward components are necessary.
#Agent#Tools#Reasoning#TIER
why featured
HKR-K/R pass: it gives a concrete reward mechanism, DepthBench numbers, and a testable claim that trajectory supervision fails after 4 steps. Single arXiv paper with limited industry spillover, so 60-71.
editor take
TIER tops 90% on DepthBench depth 1–6; stop treating one trajectory as gold, tool RL rewards should bind to execution.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction
The paper compares seven KV cache eviction policies and finds that, without structural protection, six pure-transformer models collapse to F1≤0.064; reserving 10% of cache at each boundary recovers 69–90% of the C=2,048 reference-ceiling quality at C=256.
#Inference-opt#Benchmarking#Qwen#Mistral
why featured
HKR-H/K/R pass: the paper has a contrarian KV-eviction hook, concrete benchmark numbers, and an inference-cost nerve. Its infra-heavy scope and lack of product impact keep it in high all, not featured.
editor take
Seven KV eviction policies fall to F1≤0.064 without boundary guards; reserve 10% first, then debate H2O/SnapKV scoring.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Forecasting Downstream Performance of LLMs With Proxy Metrics
The paper proposes proxy metrics built from token-level statistics on expert-written solutions, ranking heterogeneous reasoning models with mean Spearman Rho of 0.81 versus 0.36 for cross-entropy loss.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K/R pass: the paper gives a concrete proxy-metric mechanism and 0.81 vs 0.36 correlation result, with relevance to eval cost. HKR-H is weak, and a single arXiv eval paper stays below featured.
editor take
Proxy metrics hit ρ=0.81 for model ranking; expert-solution token stats look like a better early picker than loss.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points
WinQ accelerates quantization-aware training with periodic interpolation resets between full-precision and quantized weights plus gradients from noise-injected weights, reaching up to 4x faster QAT and up to 8.8% better sub-4-bit quantization under the same training cost across 16 model, method, and bit-width settings.
#Fine-tuning#Inference-opt#Benchmarking#WinQ
why featured
HKR-K and HKR-R pass: the paper gives a concrete QAT mechanism, 16 settings, up to 4x speedup, and 8.8% sub-4-bit gain. HKR-H is weak; the angle is niche optimization, not a broad product/model release.
editor take
WinQ hits up to 4x faster QAT across 16 settings; sub-4-bit pain now has a Hessian-spectrum target, not folklore tuning.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
AutoRubric-T2I synthesizes explicit rubrics from preference pairs and selects Top-N discriminative rules with an L1-regularized logistic regression refiner, producing interpretable reward signals with less than 0.01% of annotated preference data.
#Vision#Alignment#Reasoning#AutoRubric-T2I
why featured
HKR-K and HKR-R pass: the 0.01% preference-data claim and L1 rule-selection mechanism add testable signal, and T2I alignment cost resonates. Single arXiv paper and dry title keep it below featured.
editor take
AutoRubric-T2I uses <0.01% preference data; without MMRB2 scores, I don’t buy the claimed margin over baselines.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
PyHealth 2.0: A Comprehensive Open-Source Toolkit for Reproducible Clinical Deep Learning
PyHealth 2.0 unifies 15+ datasets, 20+ clinical tasks, and 25+ models for clinical deep learning, supports predictive modeling in as few as 7 lines of code, and reports up to 39x faster processing with 20x lower memory use.
#Multimodal#Interpretability#Benchmarking#PyHealth
why featured
HKR-H and HKR-K pass: PyHealth 2.0 provides testable scale and performance claims. Its clinical-ML scope limits practitioner resonance, so it stays in the 60–71 interesting band.
editor take
PyHealth 2.0 unifies 15+ datasets and 25+ models; clinical AI needs auditable data semantics more than 7-line training.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Geometry-Aware Attention Guidance for Diffusion Models via Modern Hopfield Dynamics
The paper proposes Geometry-Aware Attention Guidance, a training-free plug-and-play attention extrapolation rule for diffusion models, and reports improved generation quality across UNet, MMDiT, FLUX.1, FLUX.2, and Qwen-Image; the abstract does not disclose exact metric values or benchmark scores.
#Vision#Inference-opt#FLUX#Qwen-Image
why featured
HKR-K is clear through a testable mechanism and named model families; HKR-R is limited to image-generation practitioners. No metrics are disclosed, and the academic framing keeps it in the 60–71 band.
editor take
GAG claims training-free gains on UNet, MMDiT, FLUX, and Qwen-Image; no scores disclosed, so I’d file it as elegant attention-CFG theory.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Fidelity Probes for Specification-Code Alignment
The paper introduces fidelity probes for specification-code alignment and raises frozen-test specification fidelity from 0.63 to 0.94 over eight iterations on a 15-program, roughly 12k-line COBOL benchmark.
#Code#Benchmarking#Tools#AWS
why featured
HKR-K and HKR-R pass: the method, sample size, and 0.63→0.94 gain are concrete and relevant to coding-agent evaluation. HKR-H is weak; a single niche arXiv paper stays in the 60–71 band.
editor take
Fidelity probes lift COBOL spec fidelity from 0.63 to 0.94 on 15 programs; I buy this, legacy migration needs auditable specs.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
AMARIS: Memory-Augmented Rubric Improvement System for Reinforcement Learning
AMARIS analyzes individual rollouts at each training step, retrieves persistent evaluation memory via static recent-step and dynamic semantic matching, and updates rubrics asynchronously inside the RL loop with about 5% time overhead.
#Memory#Fine-tuning#Reasoning#AMARIS
why featured
HKR-K/R pass: the mechanism and ~5% overhead add usable signal, and RL evaluator drift is a real practitioner pain. Single arXiv paper with no disclosed gain numbers keeps it in the 60–71 band.
editor take
AMARIS adds persistent memory to RL rubrics at ~5% async overhead; I buy the direction, pending baselines and task details.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Capturing LLM Capabilities via Evidence-Calibrated Query Clustering
The paper proposes ECC, which calibrates semantic embeddings with limited posterior model comparisons and models cluster capability profiles using Bradley-Terry, improving LLM capability ranking quality by an average of 17.64 percentage points over human-labeled baselines and 18.02 points over embedding-based baselines.
#Benchmarking#Embedding#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the paper gives an ECC mechanism and a 17.64 pp gain for model capability ranking. HKR-H is weak, and this remains a niche arXiv evaluation method, so it stays in all.
editor take
ECC beats human labels by 17.64 points on ranking quality; I buy the premise—semantic clusters are too blunt for capability eval.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MiniGPT: Rebuilding GPT from First Principles
MiniGPT implements a GPT-style autoregressive pipeline in one PyTorch notebook and trains on Tiny Shakespeare with character-level tokenization; a 0.83M-parameter baseline reaches 1.7236 validation loss after 3,000 iterations, while a 10.77M-parameter configuration reaches 1.4780 and generates recognizable Shakespeare-style dialogue.
#Code#Benchmarking#MiniGPT#Andrej Karpathy
why featured
HKR-H and HKR-K pass: the first-principles GPT rebuild is clickable and the post gives dataset, parameter counts, and losses. HKR-R is weak because this is an educational notebook, not a new model or capability release.
editor take
MiniGPT hits 1.4780 loss with 10.77M params on Tiny Shakespeare; honestly, an arXiv nanoGPT remake in 2026 reads like coursework.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Plan First, Diffuse Later: Extrinsic Graph Guidance for Long-Horizon Diffusion Planning
XDiffuser first computes a plan on a state-space graph and then uses it to guide denoising for one trajectory; the abstract says it outperforms diffusion-based baselines on long-horizon tasks, especially with low-quality data, unseen tasks, multi-agent coordination, and TSP-style reasoning.
#Agent#Reasoning#Robotics#XDiffuser
why featured
HKR-H/K pass: the title has a clean inversion, and the post gives a graph-planning-then-denoising mechanism across low-quality data, unseen tasks, multi-agent settings, and TSP. No major lab, artifact, or numbers; technical depth keeps it in all.
editor take
XDiffuser moves search outside denoising; no eval numbers in the abstract, but I buy the direction and want the low-quality-data curves.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer
The paper studies AIR, a two-state recurrent architecture that reuses one Transformer for L and H updates; on Sudoku-Extreme and Maze, decoded rollouts show L retains local uncertainty while H acts as a committed proposal state.
#Reasoning#Interpretability#Benchmarking#Research release
why featured
HKR-H/K pass: one shared model specializing into L/H roles is a fresh mechanism with Sudoku-Extreme and Maze evidence. HKR-R is weak because the arXiv item lacks product stakes, cost impact, or reproducibility details.
editor take
AIR reuses one Transformer for L/H states; neat, but Sudoku-Extreme and Maze are too narrow for general reasoning claims.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
OrbiSim: World Models as Differentiable Physics Engines for Embodied Intelligence
OrbiSim defines world models as a fully differentiable physics engine for embodied intelligence, covering the simulation loop from explicit state transitions to visual observation generation; the arXiv snippet does not disclose benchmark numbers, code availability, or training setup details.
#Robotics#Reasoning#Benchmarking#OrbiSim
why featured
HKR-H/K/R pass: the angle is clickable, the mechanism is specific, and robotics practitioners care about simulation cost. No benchmark numbers, code link, or reproducible setup are disclosed, so this stays in the 60–71 band.
editor take
OrbiSim claims end-to-end differentiable simulation; the RSS gives no benchmarks, code, or training setup, so I’d treat it as abstractware.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Charon: Unified Fine-Grained Simulator for Large-Scale LLM Training and Inference
Charon simulates LLM training and inference performance across models and configurations, with overall prediction error consistently below 5.35% and below 3.74% for training on a large-scale GPU cluster.
#Inference-opt#Charon#arXiv#Research release
why featured
HKR-K and HKR-R pass: the error rates are concrete, and GPU cost planning matters. HKR-H is weak, and this is a single arXiv systems paper with no disclosed open-source status or production adoption.
editor take
Charon reports <5.35% error; I buy the accuracy, not the “better config” claim without baseline details.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Active Budget Allocation for Efficient Scaling Law Estimation via Surrogate-Guided Pruning
The paper uses Successive Halving with parametric and non-parametric surrogate models to allocate training budgets for scaling-law estimation, reporting mean relative improvements up to 2.84% on real-world learning curves and 5.47% on synthetic datasets, with compute savings up to 98.7% versus exhaustive evaluation.
#Benchmarking#Inference-opt#Research release
why featured
HKR-K and HKR-R are strong: the paper gives a concrete allocation method and compute-savings numbers. Its niche scaling-law focus keeps it in the 60–71 band, below featured.
editor take
Successive Halving with surrogates saves up to 98.7% compute; 2.84% real-curve gain is modest, but exhaustive scaling-law sweeps look lazy.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Dual-Rate Diffusion: Accelerating diffusion models with an interleaved heavy-light network
Dual-Rate Diffusion interleaves a heavy high-capacity context encoder with a light denoising model, reusing sparse high-dimensional features at each sampling step and reducing ImageNet computational cost by 2-4x while matching standard baseline quality.
#Inference-opt#Vision#Research release
why featured
HKR-K is strong: the paper gives a 2-4x compute-reduction claim and a concrete heavy-light mechanism. As a single arXiv methods paper with no disclosed deployment, code, or independent replication, it stays in the 60-71 band.
editor take
Dual-Rate Diffusion cuts ImageNet compute 2-4x; I’d test whether distillation hides quality debt in few-step sampling.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
The paper proposes a symmetry-compatible optimizer principle that matches gradient updates to each weight block’s symmetry group, covering embeddings, LM heads, SwiGLU MLP projections, and MoE routers; pre-training runs on Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures report lower final validation loss than corresponding AdamW baselines.
#Qwen#Gemma#OLMoE#Research release
why featured
HKR-K is solid: 4 parameter classes, Qwen3-0.6B/Gemma 3 1B/OLMoE tests, and AdamW comparison are concrete. HKR-R is narrow, and no code or large-scale replication is disclosed, so it stays in 60–71.
editor take
The paper swaps equivariant updates into 4 parameter blocks; it beats AdamW on Qwen3-0.6B-style runs, but RSS omits token budgets.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention logits before softmax, preserving the pretrained backbone and standard sampling process while requiring no external supervision or inference-time editing for structured, multi-object text-to-image prompts.
#Vision#Multimodal#MaskAttn-SDXL#SDXL
why featured
HKR-H and HKR-K pass: the mechanism is concrete and targets multi-object attribute and spatial errors. Scope stays limited to SDXL image-generation research, with no open-source status, benchmark numbers, or product adoption disclosed.
editor take
MaskAttn-SDXL only gates attention logits before softmax; I buy the direction, but the snippet gives no benchmark numbers.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers
DiRotQ applies PCA-based rotation-aware activation quantization for W4A4 post-training quantization, reports FID 15.9 and PSNR 19.1 dB on PixArt-Σ over MJHQ-30K, and reduces 12B FLUX.1-dev memory use by 2.1x while delivering 2.3x speedup over BF16 on a 24 GB RTX 4090.
#Vision#Inference-opt#Benchmarking#Sayeh Sharify
why featured
HKR-H/K/R pass, but this is an arXiv inference-optimization paper with impact concentrated in diffusion deployment. The 2.1x memory cut and 2.3x speedup are useful, not broad enough for featured.
editor take
DiRotQ runs 12B FLUX.1-dev 2.3x faster on an RTX 4090; 4-bit DiT quantization now smells deployable.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
WELD: The First Naturalistic Long-Period Small-Team Workplace Emotion Dataset
WELD releases a 30.1-month workplace emotion dataset from 49 employees at a Chinese software company, with 733,780 per-frame seven-class facial-expression probability vectors, and public downloads are limited to aggregated probabilities under a four-tier access model.
#Vision#Benchmarking#Safety#WELD
why featured
HKR-H/K/R pass, but this is a niche affective-computing dataset, not a model or product shift. Public access is limited to aggregate probabilities, so reuse value stays modest.
editor take
WELD spans 49 workers for 30.1 months; AUC 0.79 with C-index 0.52 says don't sell turnover prediction as workplace truth.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Factored Causal Representation Learning for Robust Reward Modeling in RLHF
The paper proposes a factored causal representation learning framework for RLHF reward modeling, splitting contextual embeddings into causal and non-causal factors and using gradient reversal so the reward head depends only on the causal component.
#Fine-tuning#Alignment#Safety#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete reward-modeling mechanism tied to RLHF robustness and alignment safety. HKR-H is weak, and the body gives no metrics, code, or benchmark results.
editor take
The paper splits embeddings into 2 factors for reward modeling; no gains disclosed, so treat it as anti-spurious regularization.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
The paper introduces PROF, a data curation method that uses PRM-ORM consistency for sample selection, keeping correct responses with strong process support and incorrect responses with weak process support under a balanced training ratio.
#Reasoning#Alignment#Fine-tuning#PROF
why featured
HKR-K and HKR-R pass: PROF gives a concrete RL training mechanism for reasoning models. HKR-H is weak, and the feed discloses no model scale, benchmarks, or gains, so it stays in 60–71.
editor take
PROF filters samples by PRM-ORM consistency; I like the direction, but no tasks, models, or gains are disclosed here.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Geometry-aware 4D Video Generation for Robot Manipulation
The paper introduces a 4D video generation model for robot manipulation that uses cross-view pointmap alignment during training, generating future video sequences from novel viewpoints given one RGB-D image per view without camera poses as input.
#Robotics#Vision#Multimodal#Research release
why featured
HKR-H and HKR-K pass: the paper links 4D video generation to robot manipulation and names pointmap alignment with single-view RGB-D input. HKR-R is weak because metrics, code, and real-robot evidence are not disclosed.
editor take
The paper uses cross-view pointmap supervision for 4D prediction; metrics aren’t disclosed, but pose-free views make it closer to usable robotics.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care
The paper treats clinician overrides of clinical AI recommendations as implicit preference data, proposes a five-category override taxonomy, and conditions preference learning on patient state, organizational context, and clinician capability while jointly training reward and capability models.
#Alignment#Fine-tuning#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the paper turns clinician overrides into preference data and gives a 5-class taxonomy plus modeling path. No deployment results or broader product impact are disclosed, so it stays below featured.
editor take
The paper defines 5 override types; treating clinician pushback as RLHF data is tempting, but validation is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DynMuon: A Dynamic Spectral Shaping View of Muon
The paper proposes DynMuon, changing Muon-style updates from UΣVᵀ to UΣ^pVᵀ and scheduling p from positive to mildly negative during training, reaching the same target validation loss with 10.6%–26.5% fewer steps than Muon across model sizes, architectures, and training settings.
#Fine-tuning#Inference-opt#DynMuon#Muon
why featured
HKR-K/R pass: the paper gives a concrete update rule and a 10.6%-26.5% step reduction claim tied to training cost. As a single technical arXiv optimizer paper without cross-source validation, it stays in all.
editor take
DynMuon cuts 10.6%–26.5% steps to target loss; Muon’s spectral exponent p now looks like a cheap training knob.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
GenoMAS uses six LLM agents for code-driven gene expression analysis, reaching 89.13% Composite Similarity Correlation on GenoTEX preprocessing and 60.48% F1 for gene identification, ahead of prior art by 10.61% and 16.85%, with code released on GitHub.
#Agent#Code#Benchmarking#GenoMAS
why featured
HKR-K is solid and HKR-H has a clear science-agent hook; HKR-R is weak because gene-expression analysis is niche for AI practitioners. The post gives benchmark numbers but not broader agent-engineering impact, so this stays in all.
editor take
GenoMAS uses 6 agents on GenoTEX and hits 60.48% gene-ID F1; agentic science still lives or dies by baselines.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Rethinking Generative Image Pretraining: How Far Are We From Scaling Up Next-Pixel Prediction?
The paper trains Transformer families with IsoFlops profiles up to 7e19 FLOPs and finds that, at 32x32 resolution, the generation-optimal setup requires data size to grow three to five times faster than the classification-optimal setup.
#Vision#Multimodal#Benchmarking#arXiv
why featured
HKR-H/K/R pass, but this is a single arXiv scaling paper centered on 32x32 images and IsoFLOPs conditions. Practical industry impact is limited, so it stays in the high 60-71 band.
editor take
The paper spends 7e19 FLOPs on 32x32 images; I don’t buy the five-year pixel-modeling extrapolation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking
SynCABEL uses LLMs to generate context-rich training examples for candidate concepts in a target knowledge base, reaches state-of-the-art results on three multilingual biomedical entity linking benchmarks—MedMentions, QUAERO, and SPACCC—and matches full human supervision with up to 60% less annotated data.
#Fine-tuning#Inference-opt#Benchmarking#SynCABEL
why featured
HKR-K and HKR-R are solid: mechanism, three benchmarks, and 60% label savings are concrete. The biomedical entity-linking scope is narrow, with no product or general-model impact, so it stays in 60–71.
editor take
SynCABEL hits SOTA on 3 BEL benchmarks and matches full supervision with 60% less labeling; synthetic data is becoming real plumbing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Prompt Reinforcing for Long-Term Planning of Large Language Models
The paper proposes a reinforcement-learning-inspired prompt optimization framework that modifies only the task instruction prompt, uses turn-by-turn feedback and experience replay for prompt rewriting, and reports improved performance on multi-turn tasks including text-to-SQL and task-oriented dialogue.
#Agent#Reasoning#Tools#Research release
why featured
HKR-H/K/R pass: the prompt-only planning angle is useful and practical. The article gives no gain size, model setup, or artifact, so it stays in the 60–71 all band.
editor take
It only rewrites the task instruction, with no gains disclosed; I’d discount “long-term planning” as prompt-memory patchwork.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MLCommons Chakra Standardized Execution Traces Advance AI Performance Benchmarking
MLCommons Chakra defines open, portable graph-based execution traces for distributed AI/ML workloads. The traces capture compute, memory, communication, dependencies, timing, and resource constraints, with tools for collection, analysis, generation, and adoption across simulators, emulators, and replay tools; the paper cites production cluster case studies and industry participation from NVIDIA, AMD, and Meta.
#Benchmarking#Tools#Inference-opt#MLCommons
why featured
HKR-K is strong and HKR-R applies to AI infrastructure teams, with NVIDIA, AMD, and Meta adding credibility. HKR-H is weak and the ML-systems angle keeps it in the 60–71 band, below featured.
editor take
Chakra standardizes distributed-training traces as graphs; no speedup numbers disclosed, but NVIDIA, AMD, and Meta sharing a trace format matters.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Characterizing Paraphrase-Induced Failures in Lean 4 Autoformalization
The paper applies deterministic paraphrase rules to undergraduate and Olympiad math datasets and finds that, across four frontier models and three open-weight autoformalizers, Lean 4 autoformalization failures are dominated by code-generation errors rather than theorem semantics.
#Code#Reasoning#Benchmarking#Lean 4
why featured
HKR-H/K/R all pass, but the Lean 4 autoformalization focus is narrow. The summary lacks failure rates, model names, and reproducible details, keeping it in the 60–71 band.
editor take
Four frontier models and three open autoformalizers fail under paraphrases; Lean 4 autoformalization still has a codegen problem.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Activation Steering with a Feedback Controller
The paper proposes PID Steering for LLM activation steering, using proportional, integral, and derivative terms in a closed-loop controller. It frames existing steering methods as P controllers, reports tests across multiple LLM families and benchmarks, and publishes code, but the snippet does not disclose model names, benchmark counts, or numeric gains.
#Alignment#Safety#Interpretability#Research release
why featured
HKR-H/K/R all pass, but the post gives the mechanism and broad coverage only; exact model counts and effect sizes are not disclosed. Solid arXiv research signal, below featured threshold.
editor take
PID Steering casts activation steering as closed-loop control; model counts and gains are undisclosed, so the stability claim stays provisional.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry
GIST recovers a task-specific subspace from validation gradients via SVD, projects training gradients into that coupled subspace, and scores examples by target-direction alignment; experiments report that it matches or exceeds the state-of-the-art baseline using 0.29% of storage and 25% of compute time under the same selection budget.
#Fine-tuning#Alignment#Inference-opt#GIST
why featured
HKR-K and HKR-R pass: the method and efficiency numbers are concrete for fine-tuning data selection. The paper is narrow and technically framed, so it stays in the lower research-release band, not featured.
editor take
GIST reports 0.29% storage and 25% compute time; for LoRA data selection, Adam’s diagonal proxy looks exposed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models
The paper benchmarks 4 classical models and 5 tabular foundation models on Home Credit and Lending Club; across 7 context-construction strategies and 1K–50K context sizes, sampling strategy explains more AUC-ROC variance than TFM family, with balanced and hybrid sampling adding 3–4 AUC points over uniform sampling.
#Benchmarking#Home Credit#Lending Club#Research release
why featured
HKR-H and HKR-K pass: the paper has a contrarian claim and concrete test numbers. HKR-R is weak because the use case is credit-risk tabular prediction, not a broad AI product or agent shift.
editor take
Seven context strategies beat five TFM families; for tabular FMs, sampling buys 3–4 AUC points before architecture does.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping
The paper evaluates LTSF models on simulated and real-world datasets, finding that affine mapping dominates common benchmark performance and learns similar input-to-output transition matrices; it works on periodic signals but struggles with non-periodic signals and time series whose periods vary across channels.
#Benchmarking#Research release#Benchmark#Open source
why featured
HKR-H and HKR-K pass: affine mapping beating richer LTSF models challenges the benchmark story. HKR-R is narrow beyond forecasting evaluation, with no product or agent implication disclosed.
editor take
Affine mapping dominates common LTSF benchmarks; before stacking architecture tricks, prove you beat linear periodic extrapolation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models
LEAP replaces categorical mask parameterization with a per-weight Bernoulli-via-Gumbel-sigmoid relaxation for end-to-end unstructured pruning, and across five 0.5B to 8B LLM families at 50% and 60% sparsity, it improves six-task average zero-shot accuracy by 2.59 points over ADMM.
#Inference-opt#LEAP#ADMM#MaskLLM
why featured
HKR-K is strong: LEAP gives a testable pruning mechanism and cross-model numbers. HKR-R is moderate because inference cost matters, but the topic is narrow; no hard exclusion, so it sits in the 60–71 research-signal band.
editor take
LEAP beats ADMM by 2.59 points across five 0.5B–8B families. I buy end-to-end masks over OBS surrogates.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LLMForge: Multi-Backend Hardware-Aware Neural Architecture Search with Infinite-Head Attention for Edge Language Models
LLMForge presents a hardware-aware NAS framework for edge language models; its Infinite-Head Attention expands the attention search space by about 400×, and its multi-backend search returns three 300M-scale Pareto variants on a multi-chip ring substrate.
#Inference-opt#Benchmarking#LLMForge#SmolLM2
why featured
HKR-H/K pass via a specific architecture hook and numbers; HKR-R is weak because hardware gains are not quantified. As an arXiv research release without deployment or artifact details, it stays in 60–71.
editor take
LLMForge reports three 300M ring-edge variants and loss 2.798; the 40% energy cut is the claim to reproduce.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Parallel Recursive LSTM
The paper introduces PR-LSTM, a hierarchical recurrent architecture that recursively merges token states over a balanced tree, reducing recurrent parallel depth from linear to logarithmic and solving more formal-language benchmark tasks than standard RNN, LSTM, and Transformer baselines without quadratic attention scaling.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R pass, but this is an arXiv architecture paper with evidence centered on formal-language benchmarks, not a product or frontier-model release. That keeps it in the 60–71 band and tier all.
editor take
PR-LSTM cuts recurrent depth to logarithmic; formal-language wins are nice, but don’t sell it as long-context RAG yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Continuous Diffusion Scales Competitively with Discrete Diffusion for Language
RePlaid achieves a 22.1 PPL bound on OpenWebText among continuous diffusion language models, keeps a 20× compute gap versus autoregressive models, uses fewer parameters than Duo, and outperforms MDLM under over-trained conditions.
#Benchmarking#Reasoning#RePlaid#Plaid
why featured
HKR-K is strong: PPL bound 22.1, a 20x compute gap, and MDLM comparison are testable. HKR-R comes from architecture-cost pressure; HKR-H is weak and the arXiv-only source keeps it in 60–71.
editor take
RePlaid hits 22.1 PPL bound on OpenWebText; continuous DLMs look viable, but the 20× AR compute gap still stings.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Assured Autonomy: How Operations Research Powers and Orchestrates Generative AI Systems
The paper proposes an operations-research framework for assured autonomy, using flow-based generative models and adversarial robustness constraints to address feasibility, distribution shift, and stress testing for agentic GenAI systems in high-consequence operational domains.
#Agent#Safety#Alignment#Research release
why featured
HKR-K/R pass: the paper frames OR as orchestration for assured agents, with robustness constraints, distribution shift, and stress testing. No numbers, artifact, or major-lab pull keeps it in all, not featured.
editor take
arXiv 2512.23978 gives a framework, no experiments; I don't buy OR-as-GenAI-architect until reproducible stress tests appear.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CooT: Learning to Coordinate In-Context with Coordination Transformers
CooT uses in-context learning for real-time partner adaptation on Overcooked and Google Research Football, requires no parameter updates, and outperforms population-based methods, gradient-based fine-tuning, and Meta-RL baselines under the reported evaluations.
#Agent#Reasoning#Fine-tuning#Google Research
why featured
HKR-H/K pass: CooT frames multi-agent coordination as in-context adaptation and names two testbeds plus baseline classes. HKR-R is weak because it lacks an artifact or production setting, so this stays below featured.
editor take
CooT adapts without updates on 2 multi-agent benchmarks; I’m skeptical until it leaves low-entropy Overcooked-style coordination.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CoLLM: Continuous Adaptation for SLO-Aware LLM Serving on Shared GPU Clusters
CoLLM unifies FL PEFT and inference on shared edge replicas and model parameters, using unmerged inference, shadow adapters, and two-timescale inter-replica coordination to balance training and serving, with evaluations across multiple LLMs and real-world traces reporting up to 3x higher goodput than state-of-the-art LLM systems.
#Fine-tuning#Inference-opt#CoLLM#Research release
why featured
HKR-K/R pass: the paper gives a 3x goodput claim and three mechanisms, tied to LLM serving cost/SLO pressure. HKR-H is weak; this is niche systems research, not a product release, so it stays in 60–71.
editor take
CoLLM co-runs FL PEFT and inference for up to 3x goodput; edge clusters need this, but the baseline decides the hype.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
The paper studies key components of JEPA-WMs for physical planning, using simulated environments and real-world robotic data to test architecture, training objective, and planning algorithm choices, and reports better navigation and manipulation results than DINO-WM and V-JEPA-2-AC.
#Agent#Robotics#Benchmarking#Meta AI
why featured
HKR-K and HKR-R pass: the paper gives real-robot evidence and ablations for JEPA world models. HKR-H is weak, and the arXiv-only, robotics-heavy scope keeps it in the 60–71 band.
editor take
JEPA-WMs beat DINO-WM and V-JEPA-2-AC on navigation and manipulation; gains are undisclosed, so trust the ablations first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Compositional Adversarial Training for Robust Visual Watermarking
CAT formulates visual watermark robustness as a min-max problem over compositional transformations, using a differentiable sequential adversary to choose attack families; it improves overall watermark capacity by up to 63.5% in single-step attacks and 13.0% in compositional attacks.
#Vision#Safety#Alignment#Anirudh Satheesh
why featured
HKR-K and HKR-R pass: CAT’s min-max setup and 63.5%/13.0% gains are concrete, and watermark attacks matter for AI-media trust. HKR-H misses; single arXiv paper with limited deployment context stays in the 60–71 band.
editor take
CAT lifts watermark capacity up to 63.5% under single-step attacks. I buy the premise: random augmentation misses the nasty compositions.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
RLBFF: Binary Flexible Feedback to Bridge Human Feedback and Verifiable Rewards
RLBFF extracts binary principles from natural-language feedback to train reward models as entailment tasks, reaches 86.2% on RM-Bench and 81.4% on JudgeBench, and releases an open-source recipe with data for aligning Qwen3-32B.
#Alignment#Fine-tuning#Benchmarking#Nvidia
why featured
HKR-K and HKR-R pass: the paper offers a concrete reward-modeling mechanism, metrics, and an open recipe. HKR-H is weak, and without cross-source traction or product impact it stays in the 60–71 band.
editor take
RLBFF hits 86.2% RM-Bench and 81.4% JudgeBench; binary principles are practical, but off-benchmark generalization needs verification.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A More Word-like Image Tokenization for MLLMs
DiVT clusters image patch embeddings into coherent semantic units and adapts the token budget to image complexity; the abstract says it modifies neither the vision encoder nor the language model and matches or surpasses baselines on diverse multimodal benchmarks with fewer visual tokens.
#Multimodal#Vision#Inference-opt#DiVT
why featured
HKR-H/K/R all pass, but this is a single arXiv methods paper; the body gives mechanism and benchmark claims, not token-reduction numbers or release details, so it stays in the 60–71 band.
editor take
DiVT clusters patch embeddings and adjusts token budgets; no reduction numbers in the snippet, so I’d file it under pragmatic vision compression.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Distilling Tabular Foundation Models for Structured Health Data
The paper distills tabular foundation models with stratified out-of-fold teacher labeling, testing 6 teachers and 4 student families across 19 healthcare datasets; the students retain at least 90% of teacher AUC, run at least 26x faster on CPU, and multi-teacher averaging does not consistently beat the best single teacher.
#Fine-tuning#Inference-opt#Benchmarking#arXiv
why featured
HKR-K is strong and HKR-R is real for cost-sensitive deployment, but this is a single arXiv paper in a narrower tabular-health lane. No open-source artifact, product adoption, or cross-source cluster is disclosed, so it stays in all.
editor take
Across 19 health datasets, students kept 90% teacher AUC; leakage-aware distillation beats bigger TFM ensembles for deployment.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Memory-Efficient Differentially Private Training with Gradient Random Projection
DP-GRAPE replaces SVD subspaces with random Gaussian projections, privatizes gradients after projection, and applies projection during backpropagation, reducing memory by over 63% for ViT pre-training and over 70% for RoBERTa-Large fine-tuning versus DP-Adam while scaling to OPT models with up to 6.7 billion parameters.
#Fine-tuning#Safety#Inference-opt#DP-GRAPE
why featured
HKR-K is strong with a testable projection method and memory numbers; HKR-R touches DP training cost. HKR-H is weak, and the post lacks code, author authority, and reproducibility details, so it stays in all.
editor take
DP-GRAPE cuts DP training memory 63–70%; random projection replacing SVD is the practical lever for private LLM fine-tuning.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
DISA moves partition-function estimation outside the RL loop and matches or exceeds FlowRL across two open-weight backbones, six math benchmarks, and three code benchmarks.
#Reasoning#Code#Benchmarking#DISA
why featured
HKR-K is clear: DISA gives an offline importance-sampling mechanism plus results on 2 open-weight backbones and 9 math/code benchmarks. HKR-H is weak, and HKR-R mainly reaches LLM-RL training practitioners.
editor take
DISA matches or beats FlowRL on 2 backbones and 9 benchmarks; freezing Z estimation is cleaner than co-training it.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Where Does Warm-Up Come From? Adaptive Scheduling for Norm-Constrained Optimizers
The paper proposes an adaptive learning-rate scheduler for norm-constrained optimizers such as Muon and Lion, derives warm-up followed by decay from a generalized smoothness assumption, and reports LLaMA pretraining results where automatic warm-up selection matches or beats the best manually tuned schedules without extra hyperparameter search.
#Fine-tuning#Benchmarking#Muon#Lion
why featured
HKR-H/K/R pass: the title has a training puzzle, and the post claims adaptive warm-up for Muon, Lion, and LLaMA pretraining. No effect sizes or reproducible setup are disclosed, and optimizer scheduling is narrow, so it stays in 60–71.
editor take
Warm-up gets a derivation, not a knob; LLaMA scale is undisclosed, so don’t retire manual schedules yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Coordinate Heterogeneity Governs Binary Quantization: From InfoNCE to Recall
The paper links Gaussian structure in InfoNCE-trained representations to binary quantization quality, deriving closed-form ranking-fidelity expressions and a two-parameter scaling law. Experiments on 13 datasets and 6 embedding families validate the predictions and explain when random rotation or coordinate-axis preservation fits.
#Embedding#Inference-opt#Benchmarking#arXiv
why featured
HKR-K is strong and HKR-R is moderate: the binary-quantization recall scaling law is useful for vector retrieval. HKR-H is weak, and this is a single arXiv paper with no product release, code, or cross-source debate, so it stays in all.
editor take
The paper tests BQ scaling on 13 datasets; coordinate heterogeneity is the useful lever, not default random rotation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking
Forget-It-All proposes FIA, a training-free framework for multi-concept unlearning in text-to-image diffusion models, using Contrastive Concept Saliency, Concept Sensitive Neurons, and a unified mask to prune concept-specific neurons while preserving general generation neurons, with experiments across three unlearning tasks and code released on GitHub.
#Vision#Safety#Fine-tuning#Forget-It-All
why featured
HKR-H/K/R pass, but the article only discloses the framework and task categories, not metrics, code quality, or adoption. As a single arXiv research item, it stays in all.
editor take
FIA masks concept neurons across 3 task types; training-free is nice, but diffusion unlearning still lives or dies by eval design.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
TabH2O: A Unified Foundation Model for Tabular Prediction
TabH2O v1 uses 29.2M parameters for tabular classification and regression on the TALENT benchmark with 300 datasets, achieving an average rank of 2.55 among 6 methods and placing in the top three on 81% of test datasets.
#Reasoning#Benchmarking#TabH2O#TALENT
why featured
HKR-K and HKR-R pass: the paper gives concrete model size and 300-dataset benchmark results, with practical relevance to tabular AutoML. Single arXiv paper, no disclosed code or deployment detail, so it stays in 60–71.
editor take
TabH2O v1 runs 29.2M params on 300 tabular sets; it trails TabICL v2 but beats tuned CatBoost, so go easy on “foundation.”
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Bug or Feature²: Weight Drift, Activation Sparsity, and Spikes
The paper proves that MSE or cross-entropy induces negative downstream weight drift at initialization with positively biased activations, and reports across 79 configurations that GPT-nano with ReLU reaches up to 90% activation sparsity while accuracy drops sharply above about 70% sparsity.
#Interpretability#Benchmarking#Inference-opt#GPT-nano
why featured
HKR-H/K pass: the paper has a concrete hook and new testable numbers—79 configs, 90% sparsity, 70% accuracy cliffs. HKR-R is weak because the training-dynamics angle is niche, so it stays in 60–71 rather than featured.
editor take
GPT-nano ReLU hits 90% sparsity; accuracy cliffs past 70%, and ReLU² amplifies mid-layer spikes.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery
ArtifactLinker models HuggingFace as an artifact graph and uses a two-stage pipeline to discover SOTA models for datasets: rank unobserved model-dataset links with GNNs or graph-augmented LLMs, then verify top links through coding experiments with LLM-based agents. ArtifactBench contains 14,053 artifacts and 51,337 relations for evaluating both stages.
#Agent#Code#Benchmarking#HuggingFace
why featured
HKR-K and HKR-R pass: the artifact-graph mechanism and dataset scale are concrete, and SOTA tracking is a real workflow pain. It remains a narrow arXiv methods paper without product adoption or broad industry impact, so it stays in 60–71.
editor take
ArtifactBench has 14,053 artifacts and 51,337 relations; I like SOTA discovery framed as runnable graph link prediction.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
The paper proposes selecting preference data by DPO implicit reward gap, choosing smaller-gap examples as harder cases, and reports better performance than five strong baselines across multiple datasets and alignment tasks using only 10% of the original data.
#Alignment#Fine-tuning#Research release
why featured
HKR-H/K/R all pass, but this is a niche arXiv alignment-data selection paper, not a model or product release. The 10% data vs. five baselines result lifts it to the upper 60–71 band.
editor take
DPO reward-gap selection uses 10% preference data; I buy the direction, but no models or margins are disclosed.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Convex Dataset Valuation for Post-Training
The paper proposes a convex dataset-level valuation method using KMM in gradient space for budget-constrained LLM post-training, selecting and weighting auxiliary datasets while accounting for target-task alignment and redundancy; the abstract reports stronger performance than existing valuation baselines with low computational overhead, and the code is available on GitHub.
#Fine-tuning#Benchmarking#Research release#Open source
why featured
HKR-K/R pass: the paper offers a concrete mechanism for post-training data selection and cost control. HKR-H is weak, and the post gives no results, author signal, or real-task gains, so it stays in 60–71.
editor take
arXiv 2605.16704 prices post-training datasets with gradient-space KMM; I buy the problem, but the snippet gives no numbers.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
IVF-TQ: Streaming-Robust Approximate Nearest Neighbor Search via a Codebook-Free Residual Layer
IVF-TQ replaces the residual codebook with a fixed random rotation and Lloyd-Max scalar quantization, holding recall from 87.4% to 86.6% on streaming Deep-10M while IVF-PQ drops 3.23 percentage points.
#Embedding#Inference-opt#Benchmarking#arXiv
why featured
HKR-K and HKR-R pass: the method and Deep-10M numbers are concrete, and the use case maps to vector-db ingest. HKR-H is weak, and ANN quantization is narrow, so it stays in the 60–71 all band.
editor take
IVF-TQ drops only 0.80pp recall on streaming Deep-10M; I buy the ops win, not superiority over high-bit PQ.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
The paper proposes SLIM, a dynamic skill lifecycle framework for agentic reinforcement learning that treats the active external skill set as an optimization variable and uses leave-one-skill-out validation; experiments report a 7.1 percentage-point average gain over the best baselines on ALFWorld and SearchQA.
#Agent#Reasoning#Tools#SLIM
why featured
HKR-K and HKR-R pass: the mechanism and +7.1-point result are concrete, and agent skill management is relevant. HKR-H is weak, and this is a single arXiv benchmark paper without disclosed code or production validation.
editor take
SLIM gains 7.1 points on ALFWorld and SearchQA; retiring weak skills is a saner agent recipe than hoarding tools forever.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning
The paper tests adversarial action masking in self-play reinforcement learning, where an attacker removes legal actions before a victim acts. Experiments span poker games from 6 to 5,531 information states and two non-poker domains, with stronger damage than random masking or learned perturbations.
#Agent#Reasoning#Safety#Research release
why featured
HKR-H/K pass: the paper studies removal of legal actions and gives concrete coverage numbers. HKR-R is weak because self-play RL robustness is niche for the broader AI-practitioner audience.
editor take
The paper tests 6 to 5,531-state tasks; action removal beats perturbation, so self-play agents still leak through action APIs.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CLAP: Contrastive Latent-Space Prompt Optimization for End-to-End Autonomous Driving
CLAP adapts a frozen VLA driving model with per-roadblock soft prompts retrieved through V2X, and on NAVSIM it reduces challenging-scenario planning error by 24% with no regression on normal frames.
#Robotics#Vision#Fine-tuning#CLAP
why featured
A single arXiv methods paper with strong HKR-K: mechanism, benchmark, and a 24% number. HKR-R comes from AV safety and no-regression claims, but HKR-H is weak and validation is NAVSIM-only.
editor take
CLAP cuts NAVSIM hard-case error 24%; I buy roadblock prompts, but V2X retrieval hides the deployment bill.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability
RRFP changes pipeline schedules into hint-based ranking for currently ready work, and in a Megatron-based framework with up to 128 GPUs, it reports up to 1.77x speedup on language-only workloads and 2.77x on multimodal workloads.
#Inference-opt#Multimodal#RRFP#Megatron
why featured
HKR-K and HKR-R pass on concrete training speedups and GPU-cost relevance. HKR-H is weak, and the systems-paper scope lacks code or adoption signals, so it stays in all.
editor take
RRFP reports 2.77x on 128-GPU Megatron multimodal runs; I buy the direction, static pipelines are brittle under jitter.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Membership Inference Attacks on Discrete Diffusion Language Models
The paper studies membership inference attacks on fine-tuned MDLMs: a 46-dimensional reconstruction-loss feature vector with XGBoost reaches 0.878 mean AUC across six MIMIR text domains and peaks at 0.930 on Pile CC.
#Fine-tuning#Safety#Benchmarking#arXiv
why featured
HKR-K and HKR-R pass: the paper gives concrete attack features and AUC results, and it targets fine-tuning data leakage. HKR-H is weak because the angle stays specialist, so this fits the upper “all” band.
editor take
46 reconstruction-loss features hit 0.878 AUC, so MDLM privacy needs a recount; ELBO drives it, attention features add noise.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Enhancing LLM Code Reasoning via Consistency-Based Reinforcement Learning
The paper introduces CodeThinker, a consistency-driven reinforcement learning framework for code reasoning with three components, and reports a 4.3% accuracy gain over the strongest baseline on Qwen2.5-Coder-7B-Instruct.
#Reasoning#Code#Fine-tuning#Qwen
why featured
HKR-K is clear and HKR-R is modest, but HKR-H is weak: this is a single arXiv benchmark-improvement paper, not a model release or production pipeline replacement.
editor take
CodeThinker adds 4.3% on Qwen2.5-Coder-7B-Instruct. I don't buy the SOTA gloss, but consistency rewards hit reward hacking cleanly.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Strategic Over-Parameterization for Generalizable Low-Rank Adaptation
LoRA-Over injects auxiliary parameters into low-rank adapters during training, then folds them back into a standard low-rank structure at inference; the paper evaluates it on GLUE, MT-Bench, GSM8K, and HumanEval with LLaMA 2-7B and LLaMA 3.1-8B.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-K is clear via the train-time over-parameterization and inference-time folding mechanism, and HKR-R lands on fine-tuning cost. HKR-H is weak, with no code, headline number, or production replacement claim disclosed.
editor take
LoRA-Over adds train-time parameters and folds to vanilla LoRA at inference; no code yet, so the benchmark win stays provisional.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Universal Adversarial Triggers
The paper proposes POS filtering plus a perplexity-based loss to generate natural-phrase universal triggers; on SST sentiment analysis, the triggers reduce flipped positive-to-negative and negative-to-positive accuracies to 0.04 and 0.12.
#Safety#Alignment#Benchmarking#arXiv
why featured
HKR-K and HKR-R pass: the post gives mechanisms and SST numbers, and it speaks to adversarial-trigger risk. Scope stays on sentiment benchmarks, so it remains in the 60–71 band.
editor take
POS filtering plus perplexity loss drives SST flip accuracy to 0.04/0.12; natural-phrase triggers belong in red-team suites.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol
The paper proposes an audit-constrained protocol for LLM reasoning evaluation, using finite component grammars, deterministic rendering, and fixed query budgets; across three audited slices, CAPS did not improve audited yield or unique prompt-key discovery over uniform sampling.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K and HKR-R pass: the paper gives a reproducible audit protocol and a CAPS-vs-uniform negative result. Still, it is a single arXiv methods paper without product impact or broad industry stakes.
editor take
CAPS lost to uniform sampling across 3 audited slices; stop treating raw mismatches as reasoning-failure evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A Systematic Analysis of OOD Detection Under Representation and Training Paradigm Shifts
The paper benchmarks OOD detection CSFs across CNN and ViT backbones, four image-classification source datasets, and near, mid, and far OOD regimes defined by CLIP semantic distances. It finds detector rankings depend more on learned representations than score design alone, and proposes PCA projection filtering plus an NC-based detector shortlist method that needs no additional OOD data.
#Vision#Benchmarking#Research release#Benchmark
why featured
HKR-K is solid: 4 source datasets, three OOD distances, PCA projection filtering, and NC-based detector prediction are testable. HKR-H is weak, and the research angle keeps it below featured.
editor take
The paper tests 4 source datasets across near/mid/far OOD; NC-based shortlisting is the useful bit, not another score-function bakeoff.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Parallelizable Memory Recurrent Units
The paper introduces memory recurrent units that use multistability for persistent memory and derives BMRU as a proof of concept compatible with parallel scan; the abstract says BMRU performs well on long-term dependency tasks and can be combined with state-space models, but it does not disclose benchmark numbers in the snippet.
#Memory#Inference-opt#Benchmarking#Research release
why featured
HKR-K/R pass: the mechanism is concrete and tied to long-range memory plus inference efficiency; HKR-H is weak. A single arXiv abstract gives no benchmark names, gains, or code, so this sits in the 60-71 research-signal band.
editor take
BMRU adds bistable memory to parallel scan; no scores in the abstract, but it belongs on the SSM long-context shortlist.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OSCAR uses offline attention-aware covariance estimates to derive fixed rotations and clipping thresholds for INT2 KV-cache quantization, reducing the BF16 accuracy gap to 3.78 and 1.42 points on Qwen3-4B-Thinking-2507 and Qwen3-8B across 5 tasks with reasoning traces up to 32k tokens.
#Inference-opt#Reasoning#Qwen#GLM
why featured
HKR-K/R are strong, and HKR-H works for inference engineers: OSCAR gives an offline rotation/clipping mechanism plus Qwen3 4B/8B numbers. The topic is specialized KV-cache quantization, so it stays in all rather than featured.
editor take
OSCAR cuts INT2 KV error to 1.42 points; I care whether its SGLang/vLLM kernel reproduces 7x throughput.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Flowing with Confidence
The paper proposes Flow Matching with Confidence, which injects input-dependent multiplicative noise at selected layers, propagates variance in closed form, and integrates it along the ODE trajectory to produce a per-sample confidence score at standard sampling cost.
#Inference-opt#Interpretability#Research release
why featured
HKR-K and HKR-R pass: the mechanism is specific and targets confidence plus sampling cost. HKR-H is weak, and the post lacks benchmark numbers or deployment evidence, so it stays in all.
editor take
FMwC gives per-sample confidence in one sampling run; I like the target, but the abstract gives no benchmark numbers.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Attention Sinks and Outliers in Attention Residuals
The paper proposes OASIS for AttnResidual architectures using a Softmax1 null space and an inter-layer null signal; experiments compare five baselines on three real-world datasets, reducing W8A8 perplexity by 75.85% and improving GSM8K Pass@1 under W4A4 by 12.42%.
#Inference-opt#Reasoning#Benchmarking#OASIS
why featured
HKR-K/R pass: the paper gives a concrete mechanism and quantization metrics tied to inference cost. HKR-H fails because the angle is technical and niche, so it stays in the 60–71 band.
editor take
OASIS cuts W8A8 perplexity 75.85% on 3 datasets; I want replication, but the AttnResidual quantization critique lands.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift
The paper proposes a stage-wise preference optimization framework for VLM hallucination reduction. It trains DPO on four targeted preference-pair types: spatial orientation, object relationships, OCR uncertainty, and adversarial false premises, while the abstract does not disclose model names, dataset sizes, or benchmark scores.
#Multimodal#Vision#Alignment#Research release
why featured
HKR-K and HKR-R pass because the paper names a concrete DPO-based mechanism for VLM hallucination. HKR-H is weak, and the feed snippet lacks benchmark gains, scale, or an artifact, so it stays in the 60–71 research-signal band.
editor take
This uses DPO on four VLM hallucination types, but no model names, data sizes, or scores; don't buy the frontier-VLM claim yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Spherical Steering: Geometry-Aware Activation Rotation for Language Models
Spherical Steering replaces inference-time activation addition with geodesic rotation and uses a confidence gate to modulate steering strength, outperforming addition-based baselines by 10% on TruthfulQA, COPA, and Storycloze while preserving open-ended generation quality.
#Inference-opt#Alignment#Benchmarking#Research release
why featured
HKR-K is clear: a new steering mechanism plus a 10% benchmark gain. HKR-R passes on inference-time control and alignment, but HKR-H is weak and the arXiv paper remains niche, so it fits the 60–71 band.
editor take
Spherical Steering beats activation addition by 10% on three benchmarks; norm-preserving rotation deserves a slot in steering toolkits.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Truthful Calibration Errors for Multi-Class Prediction
The paper introduces truthful calibration errors for multiclass prediction, covering full multiclass calibration, classwise calibration, and a truthful correction for confidence calibration, and reports that non-truthful confidence-based errors can reverse model rankings when the number of bins changes.
#Benchmarking#Haghtalab et al.#Hartline et al.#Research release
why featured
HKR-H and HKR-K pass: the ranking-flip claim is testable and the metric scope is specific. HKR-R is weak because calibration methodology is useful but narrow, with no product or safety spillover.
editor take
Haghtalab et al. add truthfulness to multiclass calibration error; bin-sensitive ECE rankings are too brittle for model selection.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CausalSynth: Generating Structurally Sound Synthetic Data
CausalSynth generates causally valid synthetic data with a three-phase pipeline, preserving conditional independencies on ASIA, ALARM, and MIMIC-Struct with false-positive rates near alpha=0.05 and achieving above 96% realizability using 70B-parameter LLM backbones.
#Reasoning#Safety#Benchmarking#CausalSynth
why featured
HKR-K passes with a concrete method, benchmarks, and the >96% number. HKR-H/R are weak, and the arXiv summary gives no code, production replacement, or adoption evidence, so this stays in all.
editor take
CausalSynth holds α=0.05 across 3 benchmarks. Over 96% realizability on 70B makes causal synthetic data auditable.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Video Reconstruction Using Diffusion-Based Image-to-Video Generation with Trajectory Guidance
The paper uses GPS telemetry and one reference frame to guide SG-I2V for reconstructing top-down drone video of maritime vessels without domain-specific fine-tuning, reporting BRISQUE 25.52 versus ground-truth 23.64 and stronger trajectory adherence than optical-flow and RIFE baselines.
#Multimodal#Vision#SG-I2V#RIFE
why featured
HKR-H and HKR-K pass: single-frame plus GPS video reconstruction offers a concrete mechanism and metric. HKR-R is weak; this is a narrow arXiv vision paper, so it stays in all below featured.
editor take
SG-I2V reconstructs drone maritime video from GPS plus one frame, BRISQUE 25.52; I trust trajectory constraints more than naturalness scores.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
f-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control
The paper introduces f-OPD, which uses a sample-level freshness score to regulate stale-sample influence in asynchronous on-policy distillation and reports performance comparable to synchronous optimization across reasoning, tool-use, and coding-agent tasks with increasing interaction horizons.
#Agent#Reasoning#Code#Research release
why featured
HKR-K comes from the freshness-aware control mechanism, and HKR-R from stability in async long-horizon agent training. No result numbers or major-lab signal keeps it in the interesting-but-not-featured band.
editor take
f-OPD adds sample freshness to tame async OPD drift; throughput numbers aren't disclosed, but agent post-training gets a measurable knob.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CADS: Conformal Adaptive Decision System for Cost-Efficient Image Classification
CADS uses conformal prediction to estimate image uncertainty at runtime and routes samples through a Scout-to-Oracle model cascade; on two datasets, the paper reports comparable or better accuracy with computational cost up to 12 times lower than heavy-model inference.
#Vision#Inference-opt#CADS#Research release
why featured
HKR-H/K/R pass on the 1/12 cost claim, conformal routing mechanism, and inference-cost nerve. The scope is an arXiv image-classification optimization paper, not a broad LLM or agent product story, so it stays in 60–71.
editor take
CADS cuts cost to 1/12 of heavy inference on two datasets; conformal routing is practical, but clinical reliability needs external validation.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective
The paper compares FT and ICL using a formal-language task with controlled string sampling and no data contamination; FT shows stronger in-distribution generalization, both modes perform similarly out of distribution, and ICL varies more across model sizes, model families, and token vocabularies.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
HKR-K and HKR-R pass: the FT/ICL generalization split and ICL sensitivity are useful. The academic formal-language setup limits reach, so it stays below featured.
editor take
FT beats ICL in-distribution on formal languages, ties OOD; I trust this cleaner testbed over messy natural-language leaderboards.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Thinking with Patterns: Breaking the Perceptual Bottleneck in Visual Planning via Pattern Induction
The paper proposes training-free Pattern Inference and Pattern Induction for VLM visual planning, evaluating them in three domains—FrozenLake, Crafter, and CubeBench—where reusable local visual patterns reduce reliance on repeated Thinking with Images operations, while the RSS snippet does not disclose exact accuracy or compute numbers.
#Vision#Reasoning#Agent#Research release
why featured
Single arXiv visual-planning paper with a clear mechanism and three eval environments, so HKR-K passes. No accuracy or delta is disclosed, keeping it below featured.
editor take
Pattern Induction spans FrozenLake, Crafter, and CubeBench; no accuracy or compute numbers, so I don’t buy the efficiency claim yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LEAF: A Living Benchmark for Event-Augmented Forecasting
LEAF introduces a living benchmark for event-augmented forecasting across future event probabilities, trend forecasting, and time-series forecasting, using a recursive retrieval agent system plus dual-agent cross-validation to supply auxiliary text for evaluating proprietary and open-weight LLMs.
#Agent#RAG#Benchmarking#LEAF
why featured
HKR-K passes because LEAF introduces a living event-augmented forecasting benchmark with concrete agent mechanisms. HKR-H and HKR-R are weak, so this stays in the 60–71 all band.
editor take
LEAF spans probability, trend, and time-series forecasting; sample size and refresh cadence are undisclosed, so don’t overtrust “living” as contamination armor.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs
The paper proposes using theoretical computer science to synthesize paired Lean4 and Markdown theorem-proving tasks; DeepSeekProver-V2-671B reaches 57.5% success on Busy Beaver problems and 12% on Mixed Boolean Arithmetic problems.
#Reasoning#Benchmarking#Code#DeepSeekProver-V2
why featured
HKR-K passes with a reproducible Lean4/Markdown synthesis setup and DeepSeekProver-V2-671B results. The formal-proof/TCS angle is narrow and technically dense, so it stays below featured.
editor take
DeepSeekProver-V2-671B hits 57.5% on Busy Beaver, 12% on MBA; generated Lean tasks beat artisanal benchmarks for pressure-testing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems
LogRouter routes log QA queries through four execution paths and selects 14B-class or 32B-class generators for semantic retrieval; on 70 LogHub questions, it reaches 88.4% mean router accuracy and cuts offline mean latency by 55% versus Fixed-32B, from 102.1 s to 46.3 s.
#RAG#Tools#Inference-opt#TUBITAK BILGEM
why featured
HKR-K and HKR-R pass: the item gives a test setup, accuracy, and latency numbers tied to production cost. HKR-H is weak and the log-QA scope is narrow, so it stays in the 60–71 band.
editor take
LogRouter cuts 32B latency from 102.1s to 46.3s on 70 questions; tiny benchmark, but routing beats blind bigger-model spending.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Probing for Representation Manifolds in Superposition
The paper introduces Manifold Probe, a supervised method that discovers representation manifolds in superposition, and demonstrates it on time and space representations in Llama 2-7b, where steering along the time manifold changes completions about release years for famous songs, movies, and books.
#Interpretability#Llama 2#Research release
why featured
HKR-K is solid: a named method, Llama 2-7b experiments, and steering conditions. HKR-R is present for interpretability/control, but the paper stays research-niche with no tool release or production claim.
editor take
Manifold Probe finds time/space linear manifolds in Llama 2-7b; I buy half, since supervised probes still need ablation baselines.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization
The paper defines AET to compare neural and heuristic combinatorial-optimization solvers under matched solution quality; on CVRP with 50 customers, Kool et al.’s attention solver trained for 100 epochs on 20,000 instances crosses the HGS/PyVRP operational-energy baseline at about 4.56e3 deployed instances.
#Inference-opt#Benchmarking#Kool et al.#PyVRP
why featured
HKR-K/R pass: AET and the 4.56e3-deployment crossover are testable details, and cost payback matters to engineers. The niche combinatorial-optimization frame keeps it below featured.
editor take
AET pegs CVRP-50 break-even at 4.56e3 runs; calling neural solvers energy-wasteful without deployment volume is lazy.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
arXiv 2605.00155v2 proposes DRRO for RLHF, replacing worst-case value pessimization with worst-case regret under plausible reward perturbations; under an ℓ1-ground-cost Wasserstein ambiguity set, the promptwise inner problem has an exact solution and a water-filling policy structure, leading to a policy-gradient algorithm with minor changes to GRPO-style training.
#Alignment#Fine-tuning#Reasoning#Research release
why featured
HKR-K/R pass: the paper gives an exact inner solution for ℓ1 Wasserstein DRRO, a water-filling structure, and a GRPO-style training tweak. HKR-H is weak; no experiment numbers or code are disclosed, so reach stays niche.
editor take
DRRO swaps RLHF robustness to worst-case regret, with an exact ℓ1 Wasserstein inner solve; I buy the mechanism, scale is undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Position: Weight Space Should Be a First-Class Generative AI Modality
The position paper argues that neural network checkpoints should be treated as a generative AI modality and organizes existing methods into a five-stage pipeline; the abstract says adapter-scale and conditional generation are advancing, while unrestricted frontier-scale checkpoint synthesis remains open.
#Fine-tuning#Inference-opt#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the checkpoint-as-modality framing is novel, and the paper adds a five-stage process plus an adapter/frontier-scale boundary. HKR-R is weak; near-term product impact is unclear.
editor take
The paper frames millions of checkpoints as a modality; I buy adapter-scale generation, not the frontier-model factory pitch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
ECHO uses Direct Conditional Distillation for one-step-per-block diffusion inference in chest X-ray report generation, improving RaTE by 64.33% and SemScore by 60.58% over state-of-the-art autoregressive methods while reaching up to 8× inference speedup with negligible clinical-accuracy degradation.
#Vision#Multimodal#Inference-opt#ECHO
why featured
HKR-K is strong via a concrete mechanism and metrics; HKR-R lands through cost and latency for medical AI. The scope is still a vertical research paper, not a general model, product, or open framework, so it stays in all.
editor take
ECHO compresses CXR report diffusion to one step per block; 8× speed is nice, but “negligible” clinical loss needs tables.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement
ERFSL uses LLMs to search reward functions for custom multi-objective RL tasks without human feedback or reward examples. Its reward critic fixes reward code with one feedback instance per requirement, and when a weight is 500 times off, the framework averages 5.2 iterations to meet user requirements.
#Agent#Code#Reasoning#ERFSL
why featured
HKR-K/R pass via a concrete LLM reward-search mechanism and numbers, but this remains a niche RL research paper with no disclosed code, benchmark scale, or real-task deployment; importance stays in the interesting band.
editor take
ERFSL converges in 5.2 rounds with 500x weight error; I buy log-driven weight edits, not LLMs understanding RL.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Lost or Hidden? Concept-Level Forgetting in Supervised Continual Learning
arXiv:2605.16374 introduces an SAE-based diagnostic framework for concept-level forgetting in supervised continual learning. It decomposes forgetting into three cases: apparent concept deletion, recoverability, and decodability, and reports that much seemingly lost information is recoverable under a linearity assumption.
#Interpretability#Vision#Research release
why featured
HKR-H comes from the lost-vs-hidden framing, and HKR-K from the SAE diagnostic split into three forgetting types. As a single arXiv continual-learning paper with no disclosed scale or reproducible results here, it stays in all.
editor take
SAEs split forgetting into 3 cases; I buy the diagnostic angle, but “recoverable” leans on linearity, not a fix.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A Comparative Study in Surgical AI: Potential and Limitations of Data, Compute, and Scaling
The paper tests neurosurgical tool detection with state-of-the-art 2026 AI methods, and multi-billion-parameter VLMs with extensive training still fall short while larger models and longer training deliver diminishing metric gains.
#Vision#Multimodal#Benchmarking#arXiv
why featured
HKR-K passes on a concrete negative scaling result; HKR-R is modest because high-stakes VLM reliability matters. HKR-H is weak, and no product or open artifact keeps it in all.
editor take
Multi-billion-parameter VLMs still miss neurosurgical tools; surgical AI needs less scaling gospel and more task-specific proof.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning
The paper introduces a fairness layer, a differentiable optimization layer appended to a model output layer, and an online primal-dual inference algorithm that provides provable aggregate fairness guarantees for streaming predictions with arbitrarily small batch sizes.
#Fine-tuning#Alignment#Safety#Research release
why featured
HKR-K/R pass: the mechanism is concrete and fairness guarantees matter for safety/compliance. But it is a single arXiv paper with a specialist title and no disclosed metrics, code, or adoption, so it stays in all.
editor take
Fairness layer guarantees aggregate parity in streaming inference; useful for tiny batches, but costs and accuracy tradeoffs hinge on experiments.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Causal Bias Detection in Generative Artificial Intelligence
The paper arXiv:2605.11365v2 proposes a causal fairness framework for generative AI, decomposes fairness effects across causal pathways and replacements of real-world mechanisms by model mechanisms, and applies efficient estimators to analyze race and gender bias in large language models across multiple datasets.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-K and HKR-R pass: the paper offers a causal path decomposition and estimator for fairness testing. HKR-H is weak, and the post does not disclose metrics, model names, or an open artifact, so it stays in the 60–71 band.
editor take
arXiv:2605.11365v2 decomposes genAI fairness by causal paths and mechanism replacement; LLM names are undisclosed, so trust framework over findings.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Inducing Spatial Locality in Vision Transformers through the Training Protocol
The study compares Baseline and Modern training protocols for ViT across 3 datasets, and the minimum MAD on CIFAR-100 drops from 0.316 to 0.008. Ablations identify CutMix as the determining factor: conditions with CutMix show MAD 0.024, while conditions without CutMix remain at MAD 0.210.
#Vision#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the paper has a counterintuitive training-mechanism angle plus MAD and CutMix ablation numbers. HKR-R is weak because it is niche ViT training work, so it stays in the 60–71 band.
editor take
CutMix drives CIFAR-100 ViT min MAD to 0.024; stop crediting early locality purely to architecture bias.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Unifying Contrastive and Generative Objectives for Visual Understanding and Text-to-Image Generation
DREAM unifies text-image contrastive learning and T2I generation with Masking Warmup, then uses Semantically Aligned Decoding to score partial images after 12.5% decoding, improving over CLIP by 1.1% on ImageNet linear probing and 4.1% on 5-shot transfer, and over FLUID by 6.2% FID on CC12M while maintaining CLIP Score.
#Multimodal#Vision#Benchmarking#DREAM
why featured
HKR-K passes with a concrete mechanism and ImageNet, 5-shot, and CC12M FID numbers. HKR-H and HKR-R are weak; this is an arXiv research increment without product impact or major-lab release signal.
editor take
DREAM picks trajectories at 12.5% decoding; +1.1% linear probe and 6.2% FID are modest, but joint training didn’t collapse.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search
The paper formulates rank-1 steering as budgeted optimization over layer and coefficient; GRACE uses activation geometry to guide search and reduces trials needed to recover 95% of best-found utility by 39.8% on average across three model families.
#Alignment#Interpretability#Inference-opt#GRACE
why featured
HKR-K passes with a concrete search mechanism and 39.8%/95% result. HKR-H and HKR-R are weak because rank-1 steering is specialized research with no product tie-in or visible debate.
editor take
GRACE cuts trials by 39.8% to hit 95% utility; framing rank-1 failures as search cost is a useful prior for inference-time control.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CoX-MoE: CPU-GPU Co-Execution for High-Throughput MoE Inference with AMX
CoX-MoE uses AMX-enabled CPU-GPU co-execution for MoE inference, replacing micro-batched expert computation with ordinary batches and pre-assigning frequently activated experts to the GPU, achieving up to 7.1x higher throughput than FlexGen and 2.4x higher throughput than MoE-Lightning under the paper’s reported setup.
#Inference-opt#CoX-MoE#FlexGen#MoE-Lightning
why featured
HKR-K and HKR-R pass: the paper gives concrete mechanisms and 7.1x/2.4x throughput claims tied to MoE serving cost. HKR-H is weak and the systems focus keeps it below featured.
editor take
CoX-MoE claims 7.1x over FlexGen and 2.4x over MoE-Lightning; I buy AMX co-exec, but static hot experts hate drift.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets
The paper introduces Weighted BC, which trains a binary discriminator on a small verified clean reference set to estimate trajectory-level density ratios, clips them as behavioral cloning weights, and evaluates the method under reward, state, transition, and action poisoning on continuous-control benchmarks.
#Robotics#Alignment#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete density-ratio weighting mechanism for four poisoning settings. HKR-H is weak, and the offline-control framing limits general AI-practitioner reach, so it stays in all.
editor take
Weighted BC estimates trajectory density ratios from a small clean set; the hard part is verifying that set, not clipping weights.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Prune, Update and Trim: Robust Structured Pruning for Large Language Models
Putri proposes three post-training pruning changes for LLMs: updating unpruned FFN weights, pruning FFN layers sequentially, and removing individual attention heads instead of full attention layers. The paper says Putri supports Grouped-Query Attention, tests multiple models, sparsity ranges, and datasets, and releases code on GitHub.
#Inference-opt#Putri#Research release#Open source
why featured
HKR-K/R pass: structured pruning and GQA support matter to inference readers. HKR-H is weak, and the summary lacks accuracy, speed, or memory numbers, so it stays in the 60–71 research band.
editor take
Putri changes 3 PTP steps, but omits extreme-sparsity numbers; I’d verify GQA head pruning before buying the SOTA claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
The paper proposes EDAS, a post-hoc advantage shaping method for RLVR that scales penalties for incorrect rollouts by intra-group error diversity, and reports a 6.29-point average gain over DAPO on Qwen3-8B across seven math benchmarks.
#Reasoning#Fine-tuning#Benchmarking#Qwen
why featured
HKR-K is clear: EDAS reweights erroneous rollouts in RLVR and reports +6.29 over DAPO on seven Qwen3-8B math benchmarks. HKR-H and HKR-R are weak because the angle stays inside reasoning-training research.
editor take
EDAS beats DAPO by 6.29 points on Qwen3-8B across seven math sets; feeding error diversity into advantage is simple and testable.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Reducing Credit Assignment Variance via Counterfactual Reasoning Paths
The paper introduces IBPO, which samples multiple reasoning trajectories for the same input and uses trajectory differences as an implicit process-level advantage estimator to convert sparse terminal rewards into step-sensitive learning signals for math and code reasoning benchmarks.
#Reasoning#Code#Fine-tuning#Research release
why featured
HKR-K and HKR-R pass: IBPO offers a concrete multi-path process-advantage mechanism for reasoning-model post-training. No result numbers are disclosed, and the RL method angle keeps it below featured.
editor take
IBPO samples multiple same-prompt trajectories for counterfactual advantages; no gains disclosed, so I file it as RL credit-assignment repair.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
UxSID models ultra-long user sequences with Semantic IDs and dual-level attention, capturing target-aware preferences without item-specific model cost; the abstract reports state-of-the-art performance and a 0.337% revenue lift in a large-scale advertising A/B test.
#Memory#Inference-opt#UxSID#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and online A/B revenue number. The recommender-ad focus and academic title keep it below the featured threshold.
editor take
UxSID reports a 0.337% ad revenue lift; honestly, SID-shared memory smells more production-ready than another long-attention stack.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Identifiable Token Correspondence for World Models
The paper models next-frame prediction as structured inference with latent token correspondence variables and reports state-of-the-art results on 4 benchmarks, including 72.5% return and 35.6% score on Craftax-classic versus prior best 67.4% and 27.9%.
#Reasoning#Vision#Benchmarking#Research release
why featured
HKR-K passes with a concrete mechanism and Craftax numbers. HKR-H/R are weak: the title is dry and the audience impact stays inside world-model research, so this fits the 60–71 research-signal band.
editor take
ITC reports SOTA on 4 benchmarks, with 72.5% Craftax return; explicit token correspondence beats pretending frames are just text.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Universal Pose Pretraining for Generalizable Vision-Language-Action Policies
Pose-VLA separates VLA training into pose pretraining and robot-specific action alignment, achieving a 79.5% average success rate on RoboTwin 2.0 and 96.0% on LIBERO, with real-world tests using 100 demonstrations per task.
#Vision#Robotics#Multimodal#Pose-VLA
why featured
HKR-K/R pass: Pose-VLA gives a concrete pose-pretraining plus action-alignment recipe with RoboTwin 2.0 and LIBERO numbers. HKR-H is weak, and the robotics-paper scope keeps it below featured.
editor take
Pose-VLA hits 79.5% on RoboTwin 2.0; pretraining 3D pose looks more robot-native than piling on VQA backbones.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)
The paper proposes aligned training, a parameter-free SAE reparameterization that constrains each encoder–decoder inner product to 1, reporting Pareto improvements on SAEBench across multiple models, dictionary sizes, and sparsity levels while reducing dead features and seed instability.
#Interpretability#Benchmarking#SAEBench#Research release
why featured
HKR-K/R pass on a concrete SAE training mechanism and stability concern; HKR-H is weak because the title is a niche method paper. This sits in 60–71 as a useful but technical research release.
editor take
Aligned training fixes each SAE encoder–decoder inner product at 1; I buy the geometric patch, though SAEBench gains need ablations.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A Production-Ready RL Framework for Personalized Utility Tuning with Pareto Sweeping in Pinterest Recommender Systems
Pinterest proposes PRL-PUTS, a ranker-independent one-step value-based RL framework that selects utility-weight vectors per request. Homefeed online experiments report a 0.13% increase in successful sessions versus baseline, while the framework runs parallel to ranking inference without added serving latency.
#Agent#Inference-opt#Pinterest#Research release
why featured
HKR-K passes with a concrete production mechanism and online A/B number. HKR-H/R are weak: the angle is technical and mainly relevant to recommender-ranking teams, with no hard-exclusion trigger.
editor take
Pinterest turns utility-weight tuning into one-step RL and gets +0.13% successful sessions; useful governance, not a recommender leap.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
How Few-Shot Examples Add Up: A Causal Decomposition of Function Vectors in In-Context Learning
The paper decomposes an n-shot function vector into a linear combination of example-level sub-FVs and separates Query-Key routing from Value updates to explain attention reweighting in few-shot in-context learning.
#Reasoning#Interpretability#Research release
why featured
HKR-H/K pass: the title has an additive-mechanism hook, and the post states a sub-FV linear combination plus QK/Value separation. No model results or practitioner impact, so it stays in 60–71.
editor take
The paper decomposes n-shot FVs into per-example sums; I buy it because Q-K routing beats Value updates as a testable mechanism.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Goal-Conditioned Supervised Learning for LLM Fine-Tuning
The paper proposes goal-conditioned supervised learning for offline LLM fine-tuning, treating feedback signals as explicit goals and training with supervised learning, then evaluates the method on three tasks: non-toxic generation, code generation, and LLM-based recommendation, where it outperforms standard offline fine-tuning baselines while keeping supervised learning’s simpler data and deployment requirements.
#Fine-tuning#Alignment#Code#arXiv
why featured
HKR-K passes via the feedback-as-goal mechanism and three task settings; HKR-R passes on post-training cost/control. HKR-H is weak, and the post lacks gains, model scale, or code artifacts, so this stays in all.
editor take
GCSL beats offline baselines on 3 tasks; gains aren’t disclosed, but it’s a practical detour around DPO data costs.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Position: AI Evaluations Should Be Grounded on a Theory of Capability
arXiv:2509.19590v2 argues that generative model evaluations should be framed as inference tasks grounded in an explicit theory of capability, and it proposes an Evaluation Card to document capability definitions, modeling assumptions, and evaluation decisions.
#Benchmarking#arXiv#Commentary#Benchmark
why featured
HKR-K and HKR-R pass: the paper offers a concrete Evaluation Card mechanism and targets eval validity. HKR-H fails, and the piece is methodological rather than event-driven, so it stays below featured.
editor take
The paper frames evals as inference tasks, but omits experiment scale; I buy it—leaderboards owe us capability assumptions.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
WriteSAE: Sparse Autoencoders for Recurrent State
WriteSAE decomposes and edits matrix-cache writes in state-space and hybrid recurrent language models, and atom substitution beats matched-norm ablation on 92.4% of 4,851 firings at Qwen3.5-0.8B L9 H4.
#Interpretability#Qwen#Mamba-2#RWKV
why featured
HKR-K passes on a concrete mechanism and numbers; HKR-H and HKR-R are weak because the title is dry and the audience is mostly interpretability researchers. Useful research signal, not a featured industry event.
editor take
WriteSAE wins 92.4% on Qwen3.5-0.8B firings; interpretability for recurrent models has to leave residual-stream comfort.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Interactive Benchmarks
The paper proposes Interactive Benchmarks to evaluate reasoning through budgeted multi-turn interaction; experiments cover two settings, Interactive Proofs and Interactive Games, with tasks including Logic, UI2Html, Mathematics, and long-horizon utility maximization.
#Reasoning#Benchmarking#Agent#Research release
why featured
A single arXiv benchmark paper with a clear evaluation mechanism but no disclosed model results, code, or adoption signal; HKR-K/R pass, HKR-H is weak, so it fits the 60–71 research-signal band.
editor take
Interactive Benchmarks test reasoning via budgeted multi-turn interaction; I buy the direction as static leaderboards rot under contamination.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DPrivBench: Benchmarking Large Language Models' Differential Privacy Reasoning
The paper introduces DPrivBench, where each instance asks whether a function or algorithm satisfies a stated differential-privacy guarantee under specified assumptions; experiments show the strongest models handle textbook mechanisms, but all tested models struggle with advanced algorithms.
#Reasoning#Benchmarking#DPrivBench#Research release
why featured
HKR-K passes via a new benchmark and a concrete failure claim. The DP-algorithm focus is specialist and narrow for AI practitioners, so this stays in all.
editor take
DPrivBench tests per-case DP guarantees; models pass textbook mechanisms and fail advanced algorithms, so don't outsource privacy audits to general reasoning.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support
HPC-LLM combines RAG, QLoRA fine-tuning, and local inference to support Slurm, MPI, GPU use, filesystem management, and cluster troubleshooting, using about 9,000 to 24,000 HPC-focused examples to adapt Llama 3.1 8B on JetStream2.
#RAG#Fine-tuning#Inference-opt#HPC-LLM
why featured
HKR-K/R pass: sample counts, Llama 3.1 8B, RAG+QLoRA, and local inference add usable detail. The HPC support niche limits reach, so it stays in the 60-71 band.
editor take
HPC-LLM tunes Llama 3.1 8B on 9k–24k samples; narrow RAG beats asking a general model to bluff Slurm.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A No-Defense Defense Against Gradient-Based Adversarial Attacks on ML-NIDS: Is Less More?
The paper tests ML-NIDS robustness in about 2,200 experiments and finds that shallower networks, reduced feature sets, and ReLU jointly reduce vulnerability under FGSM, PGD, and BIM gradient-based attacks.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-H and HKR-K pass: the title has a counterintuitive hook, and the post gives ~2,200 experiments with named attacks. HKR-R is weak because ML-NIDS robustness is narrow for the broader AI-practitioner audience.
editor take
About 2,200 runs favor shallow, low-dimensional ReLU NIDS against FGSM/PGD/BIM; useful, but dataset transfer is the trap.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
The paper recasts CoT budget forcing as conditional information bottleneck optimization and identifies a Markov-property gap in naive information bottleneck use with transformer attention. It proposes a reinforcement learning objective that maximizes task reward while compressing reasoning traces under a prior, using token-level surprisal as semantic cost with negligible training-loop overhead.
#Reasoning#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the paper reframes CoT budget control with a conditional information bottleneck and token-surprisal pricing. It stays theory-heavy, with no disclosed empirical numbers or usable artifact, so it sits in 60-71.
editor take
CIB prices CoT by token surprisal; I buy the theory patch, but cross-model gains lack numbers here.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning
The paper introduces Ranking-Aware Calibration, a training-time framework that adds a ranking-aware group loss and a clean-corrupted pairwise loss to group-based RL, then evaluates Qwen2.5-VL and InternVL-3.5 on six multimodal reasoning benchmarks under clean and corrupted inputs.
#Multimodal#Vision#Alignment#Qwen
why featured
HKR-K and HKR-R pass: the method, models, and 6 benchmarks are concrete. HKR-H is weak, and the post gives no gain size or reproducibility details, so it stays mid-low research signal.
editor take
RAC tests six multimodal benchmarks with no new labels; useful trick, but “majority accuracy gains” needs effect sizes.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers
FishBack replaces the Euclidean assumption for activation steering with a pullback Fisher metric on GPT-2, where the induced geometry deviates by over 97% in relative spectral norm and has only 2–17% effective dimensionality of the ambient space.
#Interpretability#Alignment#Reasoning#GPT-2
why featured
HKR-K and HKR-R pass: the paper gives testable GPT-2 geometry numbers and questions a common activation-steering assumption. HKR-H fails, and the math-heavy framing plus GPT-2 scope keep it in all.
editor take
FishBack shows 97% metric deviation on GPT-2; sharp result, but three verb-morphology concepts are too thin for alignment claims.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
The paper proposes TabGRAA, a generate-score-align post-training method for tabular language models, and reports that across five mixed-type benchmarks it outperforms additional supervised fine-tuning and achieves a stronger average fidelity-utility trade-off than adapted DPO, KTO, and NPO while keeping empirical privacy diagnostics near the supervised baseline.
#Fine-tuning#Alignment#Benchmarking#TabGRAA
why featured
HKR-H and HKR-K pass: the paper provides a named method, a concrete training loop, and results on 5 benchmarks. HKR-R is weak because the topic is narrow and lacks product impact or a production-replacement claim.
editor take
TabGRAA beats extra SFT on five mixed-type table benchmarks; tabular generation is borrowing RLHF, but privacy rests on diagnostics.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CoUn: Empowering Machine Unlearning via Contrastive Learning
CoUn adjusts retained-data representations with contrastive and supervised learning, training only on retain data; the arXiv abstract says it outperforms state-of-the-art machine unlearning baselines across multiple datasets and model architectures.
#Fine-tuning#Alignment#Benchmarking#CoUn
why featured
HKR-K passes for a testable retain-data-only unlearning mechanism; HKR-R is moderate via deletion compliance and safety. HKR-H fails because the title reads like a routine arXiv paper, so this stays in the 60–71 band.
editor take
CoUn trains only on retain data; I buy that constraint—MU touching forget data still smells like cheating.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SignMuon: Communication-Efficient Distributed Muon Optimization
Sign-Muon compresses Muon-style polar directions into 1-bit signs and aggregates them by majority vote, requiring one integer sum-allreduce per iteration and reducing bandwidth by 32× versus float32.
#Fine-tuning#Inference-opt#Benchmarking#Sign-Muon
why featured
HKR-H/K/R pass, but this is a specialized distributed-optimization paper. The post gives a 32x bandwidth claim and mechanism, but no real training-cost or convergence comparison, so it stays in 60–71.
editor take
Sign-Muon needs one integer allreduce and cuts float32 bandwidth 32×; I buy the comms story, not CIFAR-10 as LLM evidence.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences
The paper proposes an evaluator-preference learning algorithm that assumes only coordinate-wise non-decreasing preference functions. It theoretically characterizes mismatch under common assumptions, proves the algorithm can learn any preference function without losing performance under linearity, and evaluates it on synthetic simulations and real-world data for LLM and human preferences.
#Alignment#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper offers a monotone preference assumption with several validations, tied to eval/alignment reliability. HKR-H fails; no benchmark numbers, open artifact, or production impact are disclosed.
editor take
The paper assumes only coordinate-wise monotonic preferences; I buy it—linear LLM-as-judge scoring keeps asking for trouble.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Perceptual implications of automatic anonymization in pathological speech
The study evaluated original and automatically anonymized recordings from 180 German speakers with 10 listeners, finding 91% zero-shot and 93% few-shot anonymization detection accuracy, a 30-point quality drop on a 0–100 scale, and preserved clinical severity ratings for Dysarthria, Dysglossia, and Dysphonia with kappa 0.87–0.94.
#Audio#Safety#Benchmarking#Research release
why featured
HKR-H/K/R pass, but the work is narrow pathological-speech anonymization rather than a mainstream model, product, or developer workflow story. Concrete experiment numbers keep it in all, not featured.
editor take
Ten listeners detected anonymized speech at 91% zero-shot; privacy metrics alone do not license clinical speech release.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When Marginals Match but Structure Fails: Covariance Fidelity in Generative Models
The paper proposes D_Sigma=||Sigma_P-Sigma_Q||_F to evaluate covariance-level structure in synthetic data, and validates it on Fashion-MNIST with 60,000 samples, TCGA-BRCA with 1,111 samples, and an Alzheimer’s gene-expression stress test with 113 samples.
#Benchmarking#arXiv#Fashion-MNIST#TCGA-BRCA
why featured
This is a modest generative-model evaluation paper: HKR-H comes from the title’s mismatch hook, and HKR-K from a concrete metric plus three datasets. No product, tool release, or industry conflict keeps it in the 60–71 band.
editor take
D_Sigma tests covariance fidelity across 60,000 images and 113 gene samples; it attacks the false comfort of marginal-only evals.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models
UB-SMoE modifies heterogeneous federated fine-tuning with Dynamic Modulated Routing and Universal Pseudo-Gradient, reducing compute by up to 45.0% on low-resource clients and improving their performance by 8.7x over heterogeneous LoRA-rank methods.
#Fine-tuning#Inference-opt#UB-SMoE#Research release
why featured
HKR-K and HKR-R pass: the paper gives concrete compute and performance numbers tied to low-resource fine-tuning cost. HKR-H fails because the acronym-heavy title has no broad product or open-source hook.
editor take
UB-SMoE cuts low-resource client compute 45.0%; the 8.7x gain sounds strong, but model scale and benchmarks stay thin.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics
The paper presents an agentic query planning system that combines a rule-based teacher planner, UCB1 bandit search, cost prediction, and distillation, reducing latency by 23% versus default planners on NYC Taxi and IMDB while maintaining 94% constraint satisfaction.
#Agent#Inference-opt#Research release#Open source
why featured
HKR-K is strong on numbers and datasets, and HKR-R touches cost/latency pain in analytics. The work remains an academic query-planning paper without product traction, so it sits in the 60–71 band.
editor take
This planner cuts latency 23% on two datasets; honestly, the 15x student inference gain beats the agentic label.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Multi-Dimensional Behavioral Evaluation of Agentic Stock Prediction Systems Using LLM Judges
The paper proposes an evaluation framework for agentic stock prediction systems, scoring five-day behavioral traces across six dimensions with three LLM judges and reducing one-day MAPE from 0.61% to 0.54% after three fine-tuning cycles on the 2017–2025 held-out test period.
#Agent#Reasoning#Fine-tuning#Research release
why featured
HKR-H/K pass: stock-prediction agents create a hook, and the paper gives testable numbers. As a single arXiv method paper with a small MAPE gain and weak HKR-R, it stays in 60–71.
editor take
Three LLM judges score six process dimensions; MAPE drops 0.07 points. I buy the diagnostics, not trading alpha.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Researchers Propose Egalitarian Gradient Descent to Accelerate Grokking
The paper proposes Egalitarian Gradient Descent, which normalizes gradient dynamics to the same speed across principal directions, and reports that it removes grokking plateaus in classical arithmetic tasks including modular addition and sparse parity.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-H/K pass: EGD equalizes principal gradient-direction speeds and removes grokking plateaus on modular addition and sparse parity. HKR-R is weak because no large-model or production-training impact is shown.
editor take
EGD removes plateaus on modular addition and sparse parity; I want to see what survives beyond toy grokking tasks.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
FlightSense: End-to-End MLOps Platform for Real-Time Flight Delay Prediction
FlightSense trains an XGBoost classifier on 7.07 million BTS 2018 records, raising AUC from 0.732 to 0.875 after adding 11 aircraft rotation-chain delay propagation features.
#Agent#Tools#FlightSense#AWS
why featured
HKR-K passes on dataset size, feature mechanism, and AUC lift, making it a useful applied ML/MLOps case. HKR-H and HKR-R are weak; one arXiv vertical use case stays below featured.
editor take
FlightSense gets AUC to 0.875 with 11 rotation-chain features; weather adds 0.004, so don't let Bedrock steal credit.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Could Large Language Models Work as Post-hoc Explainability Tools in Credit Risk Models?
The study evaluates GPT-4-turbo, Claude-Sonnet-4.5, and Gemini-2.5-Flash on a LendingClub dataset, finding that controlled prompts reproduce SHAP and coefficient-based feature rankings while autonomous explanations show limited alignment.
#Interpretability#Reasoning#OpenAI#Anthropic
why featured
HKR-K is clear: named models, LendingClub, and SHAP-alignment results. HKR-R is moderate for regulated AI explainability, but HKR-H is weak and there is no product or cross-source signal, so it stays in 60–71.
editor take
Three models on LendingClub mostly echo SHAP rankings; I don’t buy LLMs as autonomous credit explainers.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models
DACA-GRPO adds Denoising Progress Scores and Stratified Masking Likelihood to diffusion language model RL, improving three GRPO-style base methods across seven benchmarks, with reported gains up to 5.6pp in math reasoning, 7.4pp in code generation, 36.3pp in constraint satisfaction, and 5.9pp in JSON schema adherence.
#Reasoning#Code#Fine-tuning#Research release
why featured
HKR-K passes with concrete mechanisms, 7 benchmarks, and a +36.3pp gain. HKR-H/R are weak because diffusion-LM RL is still a niche research topic, so this stays in all.
editor take
DACA-GRPO reports up to 36.3pp on 7 benchmarks; diffusion LLM RL is still paying for sloppy denoising credit.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification
The paper proposes ADAP, a shellwise adaptive generate-rank-verify algorithm that samples and verifies candidates when the score distribution and success function are unknown; under a monotonicity assumption, its expected cost stays within a constant factor of the distribution-aware optimal policy.
#Reasoning#Code#Inference-opt#Research release
why featured
HKR-K/R pass, but the item only provides an arXiv-level mechanism and theory guarantee, with no tasks, models, or cost numbers. It fits all, below the featured bar.
editor take
ADAP gives constant-factor cost under unknown distributions; I’d stress-test the monotonicity assumption, since hidden tests often punish reward scores.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Improving MLLM Training Efficiency via Stage-Aware Sparsity
The paper proposes Sparse Training Scheme for MLLM training, using visual token compression during modality alignment and dynamic layer skipping during instruction tuning; the abstract does not disclose speedup ratios, compute savings, or benchmark scores.
#Multimodal#Vision#Inference-opt#Research release
why featured
HKR-K passes on a concrete sparsity mechanism and HKR-R on MLLM training cost. HKR-H is weak, and no speedup or benchmark numbers are disclosed, so this stays in the all band.
editor take
STS compresses visual tokens and skips layers by stage, but reports no speedup; without FLOPs accounting, I don't buy it yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CarbonScaling: Extending Neural Scaling Laws for Carbon Footprint in Large Language Models
The paper introduces CarbonScaling, a hardware-aware analytical framework for estimating emissions from frontier LLM training, jointly modeling tensor, pipeline, data, and expert parallelism, with source code released on GitHub.
#Benchmarking#UnchartedRLab#Research release#Open source
why featured
HKR-K/R pass via a concrete framework and 4 parallelism strategies, plus cost/carbon-audit relevance. HKR-H is weak, and a single arXiv paper without headline emission numbers stays in the 60–71 band.
editor take
CarbonScaling models 4 parallelism modes and embodied carbon; stronger than regression carbon math, but fidelity gains stay undisclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Locally Coherent Parallel Decoding in Diffusion Language Models
CoDiLA delegates local decoding to a 0.6B auxiliary autoregressive model over diffusion latents, preserving parallel generation and bidirectional block modeling while reducing syntactic inconsistency and broken multi-token structures in code generation benchmarks.
#Code#Inference-opt#Reasoning#CoDiLA
why featured
HKR-K and HKR-R pass: the 0.6B auxiliary AR mechanism is concrete and code-structure consistency matters to practitioners. HKR-H is weak, and no performance numbers are disclosed, so this stays in the 60–71 band.
editor take
CoDiLA uses a 0.6B AR helper for DLM parallel decoding; I buy it, code latency dies on block-local syntax debt.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Minimal-Intervention KV Retention via Set-Conditioned Diversity
The paper tests seven KV-cache compression mechanisms on MATH-500 using Qwen-7B and Llama-8B DeepSeek-R1-Distill variants at budgets 64 and 128, rejects all seven, then reports an α scoring change to TriAttention that passes Bonferroni in two of four model-budget cells with λ=0.5.
#Reasoning#Inference-opt#Benchmarking#Qwen
why featured
HKR-K/R pass because the post names concrete KV-cache compression tests and budgets; HKR-H fails. The topic is useful for inference engineers but narrow, and no effect size is disclosed.
editor take
Seven KV-compression ideas fail; α passes Bonferroni in 2/4 cells. I buy the protocol, not a universal win.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
DashAttention replaces top-k KV-block selection with adaptive sparse α-entmax, keeps the sparse and dense hierarchy differentiable, reports near full-attention accuracy at 75% sparsity, and provides a Triton implementation; the abstract claims inference speedup over FlashAttention-3 but does not disclose the exact multiplier in the snippet.
#Inference-opt#Reasoning#DashAttention#FlashAttention-3
why featured
HKR-K passes with α-entmax KV-block selection, 75% sparsity, and a Triton artifact. HKR-H is weak, and no FlashAttention-3 speedup is disclosed, so this stays an interesting systems paper, not featured.
editor take
DashAttention keeps near full attention at 75% sparsity; the FlashAttention-3 speedup number is missing, so Triton repro decides this.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Long Context Modeling with Ranked Memory-Augmented Retrieval
The paper introduces ERMAR, a ranked memory-augmented retrieval framework that scores relevance and applies pointwise reranking to key-value embeddings; the abstract claims state-of-the-art results on standard benchmarks, but the snippet does not disclose benchmark names or scores.
#RAG#Memory#Benchmarking#Research release
why featured
HKR-K/R pass: ERMAR gives a concrete memory-reranking mechanism tied to long-context engineering pain. HKR-H is weak, and the post lacks exact SOTA scores, model scale, and reproducible conditions, so it stays in all.
editor take
ERMAR ranks memory with relevance scoring and pointwise reranking; no benchmark names or scores, so I don’t buy the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning
The paper proposes a Creator-Appraiser framework where a Creator generates candidates, an Appraiser adapts for a few inner-loop steps, and the Appraiser’s improvement rewards a frozen diffusion Creator, tested with an autoencoder on MNIST and a CLIP Appraiser with a low-rank adapter on natural images.
#Fine-tuning#Multimodal#Reasoning#arXiv
why featured
HKR-H and HKR-K pass: the angle is novel and the post gives a testable Creator-Appraiser mechanism. No product impact, benchmark result, or major-lab release keeps it in the 60–71 research band.
editor take
Creator-Appraiser rewards frozen diffusion via few-step appraiser gains; I buy the objective, not the MNIST-to-natural-image leap.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Cost-aware Duration Prediction for Software Upgrades in Datacenters
The paper introduces Acela for datacenter software-upgrade duration prediction. On Meta production systems, it improves upgrade-window utilization by 1.25x and increases completed upgrades by 41%.
#Benchmarking#Meta#Research release
why featured
HKR-K and HKR-R pass: Meta production metrics of 1.25x window utilization and 41% more upgrades are useful. HKR-H is weak, and the datacenter-ops scope keeps it in all.
editor take
Acela lifts completed Meta upgrades by 41%; I buy it because it optimizes misprediction cost, not another predictor flex.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Language Game: Talking to Non-Human Systems
The paper proposes Language Game, freezing a system’s internal dynamics as the nonlinear core of a reinforcement-learning policy and training only linear input and output interfaces, then testing the framework on gene regulatory networks and reinforcement-learning tasks.
#Agent#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the title has a novel non-human-systems hook, and the summary gives the frozen-dynamics plus linear-interface mechanism. No metrics or reproducible details are disclosed, and HKR-R is weak, so it stays in all.
editor take
Language Game trains only linear interfaces over frozen dynamics; I like the setup, but “fluent dialogue” lacks reproducible numbers here.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
TabKDE: Simple and Scalable Tabular Data Generation with Kernel Density Estimates
TabKDE generates tabular rows using copula transformations and kernel density estimates, aiming to match prior methods on accuracy and leakage avoidance; the paper says it runs on datasets orders of magnitude larger than prior state of the art on a laptop, with code released on GitHub.
#Fine-tuning#Benchmarking#TabKDE#arXiv
why featured
HKR-H/K pass: the simple KDE angle, copula mechanism, and laptop-scale claim add signal. It remains a single arXiv method paper with no adoption, product impact, or cross-source cluster, so it sits in 60–71.
editor take
TabKDE claims orders-larger tabular generation on a laptop; I like the direction, but accuracy, leakage, and memory numbers aren’t disclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
The paper introduces SeqRejectron for selective imitation under arbitrary dynamics shift, using labeled training demonstrations and unlabeled test trajectories to learn a stopping rule; for deterministic policies, it gives horizon-free Õ(log|Π|/ε²) sample complexity under sparse costs.
#Agent#Reasoning#SeqRejectron#Research release
why featured
HKR-H/K/R pass, but this is a theory-heavy imitation-learning paper with an algorithm and sample-complexity claim, not code, real-task evidence, or product impact; keep it in all below featured.
editor take
SeqRejectron gives Õ(log|Π|/ε²) samples; I buy the stop option—deployed agents need refusal more than bravado.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic
The paper proposes CATA for continual machine unlearning in VLMs, representing each removal request as an unlearning task vector and using historical vectors with sign-aware conflict-averse aggregation under single-shot and continual experimental settings.
#Multimodal#Vision#Research release
why featured
HKR-K and HKR-R pass: CATA offers a concrete continual-unlearning mechanism for VLMs, but no metrics, benchmark results, or artifact are disclosed here; it stays in the 60–71 band.
editor take
CATA turns VLM deletion requests into task vectors; no benchmark numbers disclosed, so the “first attempt” claim stays provisional.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When Bits Break Recourse: Counterfactual-Faithful Quantization
The paper introduces CFQ, which trains quantizer parameters and mixed-precision bit allocation under a global bit budget, using Validity Drop and Counterfactual Recourse Gap to measure quantization-induced recourse failures on Adult, German Credit, and COMPAS.
#Inference-opt#Alignment#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv methods paper on tabular recourse benchmarks. It gives a useful deployment-risk claim, not a product or foundation-model capability update.
editor take
CFQ tests recourse failure on 3 datasets; VD/CRG numbers are missing, but low-bit fairness debt is the point.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Tailored Agentic Reasoning for Few-Shot Multimodal Time Series Classification with VLMs
The paper proposes MarsTSC, a three-role agentic reasoning framework with a self-evolving knowledge bank, and evaluates few-shot multimodal time series classification across 12 time-series benchmarks and 6 VLM backbones.
#Agent#Reasoning#Multimodal#Research release
why featured
HKR-K is clear: 12 benchmarks, 6 VLMs, and a three-agent mechanism. HKR-H passes on the VLM-for-time-series angle, but the niche arXiv method lacks broad product or industry impact, so it stays in all.
editor take
MarsTSC tests 12 benchmarks and 6 VLMs; smells like test-time memory for time series, but gains aren’t disclosed here.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization
DyGRO-VLA introduces a two-stage optimization framework that uses information-theoretic latent representations and a mixture-of-RL-residuals to improve cross-task VLA training, with evaluations on LIBERO, RoboTwin2, and real-world settings under multi-task training and distribution shift.
#Robotics#Multimodal#Fine-tuning#DyGRO-VLA
why featured
HKR-K is clear: the paper names concrete mechanisms and three validation settings. HKR-R is limited to robotics/VLA specialists, and no result numbers are disclosed, so it stays in the interesting-but-not-featured band.
editor take
DyGRO-VLA reports 2-stage training and 3 eval settings; no gains disclosed, so I don’t buy the cross-task generalization story yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Counterfactual Explanations Under Concept Drift
The paper proposes a model-agnostic CFE maintenance scheme that uses local sampling to repair explanations under online model concept drift; experiments on synthetic drifting streams show initial CFEs rapidly lose validity, while maintained CFEs preserve validity and local plausibility at lower cost than repeated regeneration.
#Interpretability#Research release
why featured
HKR-K and weak HKR-R pass: the paper gives a local-sampling mechanism for maintaining CFEs under drift and tests cost against regeneration. The academic framing, no major-lab hook, and no real production data keep it in all.
editor take
CFEs fail fast on synthetic drifting streams; this paper frames explanations as maintenance debt, narrow setup but the cut is clean.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Scale Determines Whether Language Models Organize Representation Geometry for Prediction
The paper introduces Subspace PGA to test whether layer distance geometry aligns with the unembedding readout subspace, and evaluates seven Pythia models from 70M to 6.9B plus three cross-family models, finding intermediate-layer predictive alignment with peak z-scores of 9–24.
#Interpretability#Benchmarking#Pythia#Research release
why featured
HKR-K passes with a new method, model set, and z-scores. HKR-H/R are weak because this is narrow interpretability research without a product hook or safety incident, so it sits in the 60–71 band.
editor take
Subspace PGA tests 10 models, peak z=9–24; I buy the angle: loss hides late-layer geometry drift.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning
The paper introduces LMAC, an LLM-driven protocol design method for cooperative multi-agent reinforcement learning that iteratively optimizes communication with an explicit state-awareness criterion; experiments span multiple MARL benchmarks and report better state reconstruction and performance than prior baselines, but the snippet does not disclose exact gains.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the LLM-designed communication angle is novel and the LMAC mechanism is specific. No benchmark gains are disclosed, and MARL is narrow for general AI practitioners, so this stays in the 60–71 band.
editor take
LMAC uses an LLM to iteratively design MARL communication protocols; no gain numbers disclosed, so I’d treat it as protocol search.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems
The paper proposes population-aware coordination interfaces that condition learned primal and dual maps on compact population summaries, cutting forecast error by 16–19% and capacity violations by 20–51% against population-unaware baselines in a supply-chain capacity-control case study.
#Agent#Tools#arXiv#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete coordination mechanism and supply-chain numbers. HKR-H is weak, and the technical framing keeps it in the 60–71 band.
editor take
Population summaries let 20K agents coordinate 500K; I buy the direction—constrained agent systems need backtestable interfaces.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model
KIT-TIP-NLP presents a multi-stage framework for detecting LGBTQ+-related reclaimed slurs in English, Spanish, and Italian tweets, evaluates eight multilingual embedding models, selects XLM-RoBERTa by macro-F1, and uses GPT-4o-mini back-translation to triple the training corpus while preserving class ratios.
#Embedding#Fine-tuning#Benchmarking#KIT-TIP-NLP
why featured
HKR-K and HKR-R pass: the paper gives reproducible details around 8 models and 3x back-translated data, and it maps to moderation safety. HKR-H is weak, so it stays in all rather than featured.
editor take
KIT-TIP-NLP triples data with GPT-4o-mini back-translation; I trust the 2–5% threshold gain more than foundation-model theater.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning
The paper tests self-play reinforcement learning across poker variants, matrix games, a dice game, and multiple algorithms, finding that removing all positive-reach contingent decisions drives rapid convergence to a deterministic exploitation attractor at near-maximal loss.
#Agent#Benchmarking#Research release#Benchmark
why featured
HKR-H/K pass: the title has a collapse hook, and the summary gives a testable mechanism across poker, matrix games, and dice. No code, scale, or product/agent deployment impact is disclosed, so it stays in the lower research band.
editor take
The paper tests poker, matrix games, and dice; delete all positive-reach contingent decisions and self-play collapses. Clean zero-threshold probe for self-play safety.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
FediLoRA: Practical Federated Fine-Tuning of Foundation Models Under Missing-Modality Constraints
FediLoRA proposes a lightweight federated LoRA aggregation framework for VLLMs that handles two conditions together: imbalanced LoRA ranks across institutions and missing modalities from user errors or device failures, and the authors released code on GitHub.
#Fine-tuning#Multimodal#FediLoRA#Research release
why featured
HKR-K passes with a concrete mechanism and open-source code. HKR-H/R are weak: the title is academic, and the audience impact is mostly limited to federated multimodal fine-tuning researchers.
editor take
FediLoRA handles rank imbalance and missing modalities; no gains are disclosed, so I’d file it as a federated VLLM engineering patch.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise
The paper studies two-layer neural networks on modular arithmetic tasks with heavy label noise and finds that frequency-based extraction recovers internal generalization structure, achieving near-perfect test accuracy even with 80% label noise.
#Interpretability#Benchmarking#Research release
why featured
HKR-H/K pass: 80% noisy labels still allow structure extraction and near-perfect test accuracy. HKR-R fails because modular arithmetic is a toy setting with no product or engineering path.
editor take
Two-layer nets hide near-perfect modular arithmetic structure at 80% label noise; I want proof frequency extraction leaves toy tasks.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Prior Knowledge Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods
The paper models multi-step reasoning as s-t connectivity on a knowledge graph; when the prior graph over n vertices is split into small components, augmentation needs Ω(√n) oracle queries, while after correct knowledge density crosses a giant-component threshold, paths can be found with an expected constant number of queries.
#RAG#Reasoning#Tools#Research release
why featured
HKR-K is strong because the paper gives a concrete query-complexity threshold; HKR-H/R come from the test-time cost angle. The graph-theory barrier and lack of an artifact keep it in all, not featured.
editor take
The paper shows an Ω(√n)-to-constant query phase change; I buy the abstraction, not RAG latency claims from it.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Graph Hierarchical Recurrence for Long-Range Generalization
The paper introduces Graph Hierarchical Recurrence, which runs jointly on the input graph and a pooled hierarchical abstraction, and reports stronger long-range benchmark results than existing graph models while using as little as 1% of current state-of-the-art parameters.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass on the 1% parameter claim and named hierarchy-recurrence mechanism, but HKR-R is weak: this is a niche graph-learning benchmark paper without product or market impact.
editor take
GHR claims long-range graph wins at 1% parameters; I like the bet, but no task table is disclosed here.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
RAP: Runtime Adaptive Pruning for LLM Inference
The paper proposes RAP, an RL-driven pruning framework for LLM inference that adapts compression to runtime memory budgets and tracks the ratio between model parameters and KV-cache; the RSS snippet does not disclose specific compression rates, latency gains, or benchmark numbers.
#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: RAP targets inference memory/cost with an RL pruning mechanism. HKR-H is weak, and the post lacks compression, latency, or quality-loss numbers, so it stays in the mid-interest band.
editor take
RAP prunes by live memory budget with RL, but RSS gives no compression or latency numbers; I don't buy the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
PH-Dreamer: Physics-Driven World Model Using Port-Hamiltonian Mechanisms
PH-Dreamer embeds a Port-Hamiltonian mechanism into recurrent state-space world models for visual control benchmarks, reducing latent phase-space volume by 4.18–8.41%, energy consumption by up to 7.80%, and mean squared jerk by up to 9.38% while aligning imagined and real rewards with lower variance.
#Robotics#Reasoning#Benchmarking#PH-Dreamer
why featured
HKR-K lands with a named mechanism and three benchmark deltas; HKR-R is limited to robotics/control. The technical title weakens HKR-H, so this stays in the 60–71 research-paper band without a hard exclusion.
editor take
PH-Dreamer cuts latent phase volume 4.18–8.41%; I care whether it survives contact-heavy robot tasks.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Concordia: Self-Improving Synthetic Tables for Federated LLMs
Concordia trains federated LLMs for tabular tasks with a tri-level optimization loop: clients use LoRA on synthetic tables, learn utility scorers from private validation feedback, and refine local generators with GRPO, while sharing heterogeneous scorer ensembles rather than raw records, validation data, or generator parameters.
#Fine-tuning#Alignment#Benchmarking#Concordia
why featured
HKR-K and HKR-R pass: the article gives a concrete federated LLM training mechanism and privacy boundary. HKR-H is weak, and this is still a single arXiv method paper without benchmark numbers, code, or deployment proof.
editor take
Concordia shares scorer ensembles, not records, validation sets, or generators; I want privacy audits, and the abstract gives no numbers.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models
KamonBench introduces 20,000 synthetic composite kamon images with known container, modifier, and motif factors, evaluating vision-language models through program-code factor metrics, recombination splits, counterfactual motif-sensitivity groups, and linear probes rather than caption accuracy alone.
#Vision#Multimodal#Benchmarking#KamonBench
why featured
HKR-K passes via 20,000 samples and three controlled factors for VLM evaluation. HKR-H/R are weak: no surprising result, release detail, or product implication, so this sits in the 60–71 research-benchmark band.
editor take
KamonBench ships 20k synthetic crests; I like the factor-recovery setup more than another caption-score benchmark.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DP-SelFT: Differentially Private Selective Fine-Tuning for Large Language Models
The paper proposes DP-SelFT for private LLM fine-tuning, using a lightweight DP synthetic dataset to select layers without extra privacy cost, then matching temporary layer training to downstream DP noise with same-scale worst-case perturbations, and reports better privacy-utility trade-offs than DP fine-tuning baselines under the same privacy guarantees.
#Fine-tuning#Safety#Benchmarking#Research release
why featured
HKR-K/R pass: DP-SelFT adds a concrete layer-selection mechanism and reports gains over DP fine-tuning baselines under the same privacy guarantee. HKR-H is weak, and the topic is niche research, so it stays in all.
editor take
DP-SelFT selects layers via DP synthetic data; I like the direction, but ε and task count are undisclosed.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
MARR assigns module-specific residual scaling coefficients for low-bit post-training quantization and updates them with PID feedback from reconstruction error. The paper reports results at ≤4-bit quantization, with up to 20.2% gains on LLMs and up to 4.6% relative gains on ViTs over residual reconstruction baselines.
#Inference-opt#MARR#Research release
why featured
HKR-K/R pass: the post gives a concrete mechanism and ≤4-bit gains, and it touches inference cost. HKR-H is weak, and low-bit PTQ is narrow, so it stays in the 60–71 band.
editor take
MARR reports 20.2% LLM gains at ≤4-bit PTQ; until code lands, treat the PID scaling as a paper trick.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Foundation Models for Credit Risk Prediction: A Game Changer?
The paper benchmarks tabular foundation models on two credit-risk tasks, PD and LGD modeling, across multiple datasets, metrics, and experimental conditions, and reports that they generally perform best out of the box, with larger predictive gains as dataset size shrinks.
#Benchmarking#Research release#Benchmark
why featured
This is a narrow tabular-FM benchmark with concrete PD/LGD tasks and a low-data claim, so HKR-K passes. HKR-H/R miss: the title is academic packaging, and the post gives no production-changing evidence.
editor take
Paper tests PD and LGD; model names and datasets are undisclosed, so credit teams should not yell game changer.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SwordBench: Evaluating Orthogonality of Steering Image Representations
The authors introduce SwordBench to evaluate steering of image representations in vision models across multiple backbones and concept removal tasks, adding cross-concept robustness and collateral damage metrics to measure second-order effects of concept-vector orthogonalization.
#Vision#Interpretability#Safety#SwordBench
why featured
HKR-K and HKR-R pass: a new benchmark and second-order effect metrics are concrete, and model-editing safety matters. HKR-H fails because the angle is niche research jargon, so it stays in all.
editor take
SwordBench spans multiple backbones and concept removals; SVM separates well yet still causes collateral damage, so linear separability is a weak steering brag.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Filter-then-Verify: A Multiphase GNN and ModernBERT Framework for Social Engineering Detection in Email Networks
The authors propose Filter-then-Verify, a two-stage framework that uses inductive GNNs to filter anomalous sender-receiver structures and a co-attention ModernBERT model to verify message content, reporting 86% recall in structural filtering and over 92% precision after BERT refinement on an augmented Enron dataset.
#Reasoning#Safety#Benchmarking#Enron
why featured
HKR-K/R pass: the paper gives a concrete GNN-to-ModernBERT pipeline and metrics on an Enron-derived dataset. Its scope is narrow email-security research, not a broad model or product update, so it stays in 60–71.
editor take
Filter-then-Verify reports 86% recall and 92%+ precision on augmented Enron; I’d audit the synthetic campaigns first.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data
The paper proposes a post-hoc multimodal alignment method that trains only learnable anchors and uses token-level similarities to align image and text encoders, reporting gains over existing methods on zero-shot classification, cross-modal retrieval, and zero-shot segmentation under limited paired data.
#Multimodal#Vision#Embedding#Research release
why featured
HKR-K passes: the method is specific and spans zero-shot classification, cross-modal retrieval, and zero-shot segmentation. HKR-H is weak; HKR-R is narrow without benchmark numbers or clear reproduction conditions.
editor take
The paper trains only learnable anchors; data scale is undisclosed, but token-level alignment smells like a cheap CLIP patch.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting
MixCount introduces a dataset and benchmark for mixed-object counting, using an automatic pipeline to generate images, fine-grained text descriptions, and pixel-perfect annotations, and training on its synthetic data reduces MAE by 20.14% on FSC-147 and 18.3% on PairTally.
#Vision#Benchmarking#MixCount#FSC-147
why featured
HKR-K is solid: MixCount adds generated images, fine-grained text, pixel labels, and two MAE gains. HKR-H/R are weak, so this is a useful but narrow vision benchmark paper with no hard-exclusion trigger.
editor take
MixCount cuts FSC-147 MAE by 20.14%; I buy the automatic pixel labels, not the “unlimited data” pitch.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Learning Quantifiable Visual Explanations Without Ground Truth
The paper proposes an XAI quality metric based on continuous input perturbation, evaluating whether attributed information is sufficient and necessary for a model decision. It also trains an adapter with a differentiable approximation of the metric, producing causal explanations on top of black-box models without degrading performance.
#Vision#Interpretability#Fine-tuning#Research release
why featured
HKR-K passes via a testable metric and adapter mechanism. HKR-H/R are weak because there is no model release, code artifact, or production deployment hook, so this stays in the low research-story band.
editor take
2605.18681 scores explanations via continuous perturbations; I buy the metric, but “causal explanations” on black boxes gets a 50% discount.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Improved Baselines with Representation Autoencoders
RAEv2 combines sums of the last k encoder layers, complementary REPA training, and DiT output reparameterization, reaching gFID 1.06 on ImageNet-256 in 80 epochs and EP_FID@2 in 35 epochs versus 177 for the original RAE.
#Vision#Fine-tuning#Benchmarking#arXiv
why featured
HKR-K passes with three RAEv2 mechanisms and ImageNet-256 gFID 1.06 after 80 epochs. HKR-H and HKR-R are weak, and the vision-baseline angle is too specialized for featured.
editor take
RAEv2 hits gFID 1.06 on ImageNet-256 in 80 epochs; I buy the boring baseline when it cuts convergence so cleanly.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Research paper introduces Discrete Tilt Matching for diffusion language model fine-tuning
The paper introduces Discrete Tilt Matching, a likelihood-free fine-tuning method for masked diffusion LLMs, using weighted cross-entropy and control variates, and tests it on LLaDA-8B-Instruct across Sudoku, Countdown, MATH500, and GSM8K.
#Fine-tuning#Reasoning#Alignment#LLaDA
why featured
HKR-K passes: the item names a concrete fine-tuning mechanism for masked diffusion LLMs and test tasks. HKR-H and HKR-R are weak, and the available text is abstract-level only, so this stays in the mid all band.
editor take
DTM improves LLaDA-8B-Instruct on Sudoku and Countdown, scores undisclosed; diffusion LLM fine-tuning finally dodges intractable likelihoods.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks
KASER trains a student code error simulator with a hybrid reinforcement-learning reward, evaluating code similarity, error matching, and prediction diversity on two real-world datasets.
#Code#Fine-tuning#Benchmarking#KASER
why featured
HKR-K passes: hybrid rewards and two real datasets give testable information. HKR-H and HKR-R are weak because this is a niche education-code evaluation paper, so it stays in all.
editor take
KASER beats baselines on 2 real datasets; I buy the education-code niche, not a broader coding-intelligence claim.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Adaptive Control in Autonomous Driving via Real-Time Recurrent RL
The paper applies RTRRL to online fine-tune autonomous-driving control policies at every time step, and validates it in CarRacing simulation plus a 1:10-scale RoboRacer platform using event-camera observations.
#Robotics#Fine-tuning#Memory#RoboRacer
why featured
HKR-K passes via per-step RTRRL adaptation tested in CarRacing and 1:10 RoboRacer event-camera hardware. HKR-H is weak, and HKR-R stays niche to autonomy-control reliability.
editor take
RTRRL updates the policy every step and runs on CarRacing plus 1:10 RoboRacer; avoiding BPTT is the deployment hook.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Time Series Foundation Models as Strong Baselines in Transportation Forecasting: A Large-Scale Benchmark Analysis
The paper benchmarks Chronos-2 zero-shot on 10 real-world transportation datasets and finds state-of-the-art or competitive accuracy on most tasks, with no task-specific fine-tuning, while also evaluating native probabilistic outputs through prediction-interval coverage and sharpness.
#Benchmarking#Chronos-2#Benchmark#Research release
why featured
HKR-K is solid: 10 real transport datasets and zero-shot conditions give testable signal. HKR-R is narrower, mostly for forecasting practitioners, with no broad product or model-release impact.
editor take
Chronos-2 runs zero-shot on 10 transport datasets and stays SOTA-competitive; papers omitting TSFM baselines now deserve reviewer pushback.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Mitigating Extrinsic Gender Bias for Bangla Classification Tasks
The study builds four Bangla classification benchmarks for sentiment, toxicity, hate speech, and sarcasm, then uses gendered name and term perturbations to evaluate bias and tests RandSymKL, a training strategy combining symmetric KL divergence with cross-entropy loss.
#Alignment#Benchmarking#Fine-tuning#Research release
why featured
HKR-K is clear: 4 Bangla benchmarks and RandSymKL are concrete new facts. HKR-R lands on fairness, but the academic, narrow scope keeps it in the 60–71 band.
editor take
They released 4 Bangla classification benchmarks; without bias-accuracy curves, RandSymKL still reads like tidy low-resource fairness homework.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Estimating Item Difficulty with Large Language Models as Experts
The study evaluates three off-the-shelf LLMs as difficulty raters for newly created items across six primary-school math domains, comparing LLM estimates with empirical difficulty via Spearman rank correlations; pairwise comparison outperformed absolute judgment, while token probabilities plus few-shot examples improved absolute judgment to moderate-to-high alignment.
#Benchmarking#Reasoning#Research release#Benchmark
why featured
HKR-K passes: the paper reports 3 off-the-shelf LLMs, 6 elementary math domains, and pairwise comparison outperforming absolute judgment. HKR-H/R are weak, so this stays in the lower interesting band.
editor take
Three off-the-shelf LLMs rated six primary-math domains; pairwise beats absolute scoring, and cheap expert calibration looks practical here.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation
The paper proposes three metrics for evaluating inter-column logical relationships in synthetic tabular data, validates them on a real-world industrial dataset, and reports that existing generators fail on hierarchical, temporal, and mathematical dependencies.
#Benchmarking#Research release#Benchmark#Open source
why featured
HKR-K passes: the paper offers 3 evaluation metrics and industrial-dataset validation for synthetic tabular data. HKR-H/R fail because the angle is narrow and lacks a practitioner nerve, so it sits in the 60–71 all band.
editor take
TabLogicEval adds 3 column-logic metrics; I buy the target, since joint-distribution scores let tabular generators fake realism.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MCQ Difficulty Prediction via Modeling Learner Heterogeneity Using Data-Driven Cognitive Profiling
The researchers use EEDI interaction data and latent class analysis to build learner personas, condition an LLM to simulate MCQ response distributions, and feed aggregated signals plus topic context into Ridge Regression; under five-fold cross-validation, MSE drops from 0.367 to 0.274 and R2 rises from 0.525 to 0.686.
#Reasoning#Benchmarking#EEDI#Research release
why featured
HKR-K passes with a clear method and five-fold validation metrics; HKR-H/R are weak because this is an edtech assessment paper, not a broad AI-practitioner event. No hard exclusion, so it lands in interesting-not-featured.
editor take
EEDI five-fold MSE drops to 0.274; LCA personas feeding an LLM beats hand-waving about learner heterogeneity.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
Adaptive Layerwise Perturbation injects learnable perturbations into each layer’s hidden states during LLM RL updates and uses the perturbed policy as the importance-ratio numerator against the unchanged inference policy; experiments on single-turn math and multi-turn tool-integrated reasoning report lower ratio tails and KL spikes, but the abstract does not disclose model sizes, task counts, or numeric scores.
#Reasoning#Fine-tuning#Research release
why featured
HKR-K passes because ALP gives a concrete off-policy correction mechanism for LLM RL. HKR-H and HKR-R are weak, and model scale plus scores are not disclosed, so it stays in the lower research-interest band.
editor take
ALP perturbs every layer’s hidden states; no model sizes or scores disclosed, so don’t crown ratio-tail control yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A Survey of On-Policy Distillation for Large Language Models
This arXiv survey formalizes On-Policy Distillation as f-divergence minimization over student-sampled trajectories and organizes related distillation, RLHF, and imitation-learning work along three design axes: the optimization target, the feedback source, and practical training stabilization.
#Fine-tuning#Alignment#Reasoning#arXiv
why featured
HKR-K passes: the article offers a concrete OPD formulation and 3-axis taxonomy for post-training readers. HKR-H/R fail because the title and abstract read like a standard survey, with no broader industry nerve.
editor take
This survey maps OPD across 3 axes; I buy the focus on quadratic exposure-bias growth.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
arXiv:2508.04227v2 surveys continual learning for VLMs and MLLMs, proposes four method families, and frames evaluation as dual-track Domain CL and Ability CL with micro-diagnostic CoT tests.
#Multimodal#Vision#Memory#arXiv
why featured
HKR-K passes: the survey adds a VLM/MLLM continual-learning taxonomy and eval split. HKR-H and HKR-R are weak, with no experiment result, tool release, or industry event, so it fits the 60-71 research-signal band.
editor take
arXiv:2508.04227v2 names four VLM CL families; the Domain CL/Ability CL split is the sharper contribution.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SHED: Style-Homogenized Embedding Alignment for Domain Generalization
SHED introduces a CLIP-based style-homogenized embedding alignment method for domain generalization. It removes source-domain style centroids during training, uses prompt-averaged text embeddings, and at inference projects textual domain centroids into visual space; experiments on five benchmarks report state-of-the-art results, including a 4.0% gain on DomainNet over standard fine-tuning.
#Embedding#Vision#Benchmarking#CLIP
why featured
HKR-K passes with a concrete mechanism and a +4.0% DomainNet result. HKR-H and HKR-R are weak; this is useful vision-generalization research but below the featured bar.
editor take
SHED reports SOTA on 5 DG benchmarks and +4.0% on DomainNet; CLIP generalization still pays the style-leakage tax.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
T-GEMs: Text-Guided Exit Modules for Decreasing CLIP Image Encoder Cost
The paper introduces T-GEMs and a rate-based regularizer to guide early exits in CLIP image encoders from text descriptions, controlling encoder usage cost while maintaining cross-modal understanding performance; the RSS snippet does not disclose benchmark numbers, datasets, or latency gains.
#Multimodal#Vision#Inference-opt#CLIP
why featured
This is an engineering-leaning CLIP inference-optimization paper with a concrete mechanism but no metrics in the feed; HKR-K/R pass, HKR-H fails, so it sits in the 60–71 band.
editor take
T-GEMs adds text-guided exits to CLIP; RSS gives no benchmarks or latency, so file it under early-exit papers.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Multi-task Learning on Partially Labeled Datasets via Invariant/Equivariant Semi-supervised Learning
The paper evaluates FixMatch and Dense FixMatch on Cityscapes and BDD100K for object detection and semantic segmentation, and reports that invariant and equivariant semi-supervised learning beat supervised baselines in most settings, with the largest gains when a task has fewer labeled samples.
#Vision#Fine-tuning#Cityscapes#BDD100K
why featured
HKR-K and HKR-R pass: the paper names a concrete semi-supervised mechanism, datasets, and low-label gains. HKR-H is weak, and the impact is narrow academic CV rather than a broad model or product release.
editor take
FixMatch/Dense FixMatch beat supervised baselines on Cityscapes and BDD100K; I care whether this survives outside low-label sweet spots.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search
CoLLM-NAS uses a stateful Navigator LLM, a stateless Generator LLM, and a Coordinator in a two-stage NAS framework, outperforming existing NAS methods on ImageNet and NAS-Bench-201 while reducing search costs by 4–10x.
#Agent#Reasoning#Benchmarking#CoLLM-NAS
why featured
HKR-K passes with a concrete mechanism and 4–10x cost reduction, but HKR-H and HKR-R are weak. The NAS focus is research-heavy and lacks a product, open-source, or broad practitioner hook.
editor take
CoLLM-NAS cuts ImageNet and NAS-Bench-201 search cost 4–10x; valid architectures are the real test, not LLM gloss.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
GenTS Comprehensive Benchmark Library for Generative Time Series Models Released
The paper introduces GenTS, an open-source benchmark library for generative time series models, covering synthesis, forecasting, and imputation tasks with a unified preprocessing pipeline, a model collection, panoramic evaluation metrics, and customizable datasets or models.
#Benchmarking#GenTS#Research release#Open source
why featured
HKR-K passes: GenTS adds task coverage, unified preprocessing, model collections, metrics, and open source. HKR-H/R are weak because generative time-series evaluation is vertical, so this fits all, not featured.
editor take
GenTS covers synthesis, forecasting, and imputation; model and dataset counts are undisclosed, so don't crown it Time-Series GLUE yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Trust the Uncertain Teacher: Distilling Dark Knowledge via Calibrated Uncertainty
The paper proposes Calibrated Uncertainty Distillation, which shapes the teacher’s predictive distribution before transfer; the abstract says students improve accuracy and calibration under distribution shift across diverse benchmarks, but the RSS snippet does not disclose specific benchmark names or numerical results.
#Fine-tuning#Alignment#Benchmarking#Research release
why featured
HKR-H comes from the counterintuitive “uncertain teacher” hook, and HKR-K from calibrating teacher distributions before distillation. No accuracy deltas or benchmark details are disclosed, so HKR-R stays weak.
editor take
CUD calibrates teacher distributions before distillation; no benchmarks or numbers disclosed, so I’d file it as incremental anti-overconfidence distillation.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Towards Migrating Neural Network Implementations
The paper proposes an automatic migration method for neural network code between PyTorch and TensorFlow using a pivot NN model, and validates it on five neural networks that the authors report as functionally equivalent to the originals.
#Code#PyTorch#TensorFlow#Research release
why featured
HKR-K is clear via the pivot-model mechanism and five-network test; HKR-R is limited to framework-migration pain. No hard exclusion, but the evidence is too small for featured.
editor take
The paper tests PyTorch/TensorFlow migration on 5 NNs; I don’t buy coverage for dynamic graphs or custom-op mess.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Joint Enhancement and Classification Using Coupled Diffusion Models of Signals and Logits
The paper proposes a coupled two-diffusion framework over input signals and classifier logits, requiring no classifier retraining or fine-tuning, introduces three strategies for joint distribution modeling, and evaluates the method on noisy image classification and automatic speech recognition, where it outperforms sequential enhancement baselines.
#Multimodal#Audio#Inference-opt#Research release
why featured
HKR-K passes on the coupled-diffusion mechanism and no-retraining condition. HKR-H/R are weak: no headline hook, no metrics, and limited practitioner debate value.
editor take
Coupled diffusion links signals and logits, but gains are undisclosed; I’d check inference cost before buying the no-retraining pitch.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
The paper proposes Kernelized Advantage Estimation for RL-based LLM reasoning, using kernel smoothing to estimate value functions when only a small number of reasoning traces can be sampled per prompt, avoiding a trained value network while targeting lower-variance policy-gradient estimation.
#Reasoning#Fine-tuning#Research release
why featured
HKR-K passes because the mechanism targets variance and value estimation in LLM reasoning training. HKR-H/R are weak: no metrics, code, or reproducible setup are disclosed, so this stays in the normal research-release band.
editor take
KAE uses kernel smoothing with few traces per prompt; I like the no-value-network angle, but scale and cost baselines are undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Training data Attribution in Diffusion Models via Mirrored Unlearning and Noise-Consistent Skew
The paper proposes MUCS for training data attribution in diffusion models, fine-tuning a second model with bounded mirrored gradient ascent and measuring normalized skew against the original model with consistent noise samples, reporting larger gains over existing methods on three datasets while the abstract does not disclose exact metrics.
#Interpretability#Fine-tuning#Research release
why featured
HKR-K passes: a new method, mechanism, and 3-dataset result are disclosed. HKR-H is weak and HKR-R is limited; this is relevant diffusion attribution research but still a narrow technical paper, so it sits low in 60–71.
editor take
MUCS beats prior TDA on 3 datasets, but metrics aren’t disclosed; I trust noise-consistent skew more than “large margin.”
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SuReNav: Superpixel Graph-based Constraint Relaxation for Navigation in Over-constrained Environments
SuReNav addresses over-constrained navigation with a three-part pipeline: superpixel graph map generation, GNN-based regional constraint relaxation trained on human demonstrations, and interleaved relaxation-planning-execution, evaluated on 2D semantic maps, OpenStreetMap 3D maps, and real-world urban navigation with a Spot quadruped robot.
#Robotics#Agent#Benchmarking#OpenStreetMap
why featured
HKR-K passes because the method and evaluation settings are concrete, including Spot urban tests. HKR-H/R are weak: the title is academic and the industry nerve is narrow, so this lands in the 60–71 band.
editor take
SuReNav learns constraint relaxation from human demos; Spot trials matter, but sample size and failure rate are undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates
The paper proposes a MEMIT-like knowledge-editing framework for MoE LLMs, formulates edits at the per-expert level, and uses the Woodbury identity to avoid full stacked weight-matrix inversion, matching strong baselines on main KE metrics while accelerating editing by up to 6x without extra backward passes.
#Fine-tuning#Inference-opt#Research release
why featured
HKR-K passes with a concrete MoE editing mechanism and 6x speedup; HKR-H/R are weak because the title is dense and deployment impact is not shown, so this stays in all.
editor take
MoE knowledge editing gets up to 6x speedup; I care more about router drift, and the abstract doesn’t disclose it.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Drift Flow Matching
The paper proposes Drift Flow Matching, connecting one-step Drift Models with multi-step Flow Matching so generation can use direct transport maps or multiple inference steps under different quality-efficiency requirements.
#Inference-opt#Research release
why featured
HKR-K and HKR-R pass, but the post only gives the method mechanism, with no benchmark numbers, code, or production replacement claim. It is useful research signal, not featured-level industry news.
editor take
DFM links one-step Drift to multi-step Flow; experiments are undisclosed, so judge it by the quality-compute curve.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SeamCam: Quantifying Seamless Camouflage via Multi-Cue Visual Detectability
SeamCam frames camouflage evaluation as visual localization, scores one minus the maximum recoverable localization signal, and reaches 78.82% agreement with human judgments in a 94-participant, 2,390-comparison two-alternative forced-choice study, about 25% above prior state of the art.
#Vision#Benchmarking#Fine-tuning#SeamCam
why featured
HKR-H and HKR-K pass: the angle is unusual and the article gives concrete experiment counts and metrics. HKR-R fails because it stays in narrow vision benchmarking with no product, agent, or industry-competition tie.
editor take
SeamCam hits 78.82% human agreement over 2,390 choices; using localization residue for DPO beats vague vision-alignment talk.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Beyond Neural Incompatibility: Cross-Scale Knowledge Transfer in Language Models through Latent Semantic Alignment
The paper introduces SemAlign for cross-scale parametric knowledge transfer in language models, using activations rather than parameter blocks as the transfer medium. SemAlign has two stages, layer attribution and semantic alignment, trains only the frontier target layer during shallow-to-deep transfer, and reports evaluations on four benchmarks, but the snippet does not disclose model sizes or benchmark names.
#Fine-tuning#Reasoning#Benchmarking#SemAlign
why featured
HKR-K passes via SemAlign’s activation-transfer mechanism and two-stage design. HKR-H/R are weak: the title is academic, and no effect size or cost gain is disclosed, so this sits in the 60–71 band.
editor take
SemAlign trains only the frontier target layer via residual geometry; four benchmarks are unnamed, so don’t crown it a LoRA replacement.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
RealUID incorporates real data into distillation for matching models without an extra GAN discriminator; the paper says the framework covers Flow Matching, Diffusion, Bridge Matching, and Stochastic Interpolants, and releases code at the listed GitHub repository.
#Inference-opt#RealUID#Research release#Open source
why featured
HKR-K passes because RealUID gives a concrete mechanism: real-data supervision for distillation without a GAN discriminator across several matching-model families. HKR-H/R are weak; this is a narrow research release, so it stays in all.
editor take
RealUID covers 4 matching families; don’t buy “universal” yet—the snippet gives no one-step quality or latency numbers.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks
BoLT introduces an LLM-centric black-box optimization benchmark for training and inference configurations, using lightweight surrogate models fitted on thousands of real LLM experiments and covering multi-fidelity, multi-objective, heteroscedastic-noise, and high-dimensional search settings.
#Benchmarking#Inference-opt#Fine-tuning#BoLT
why featured
HKR-K has concrete benchmark mechanics and experiment scale; HKR-R touches costly LLM tuning. HKR-H is weak, and black-box optimization is niche, so it stays in the 60–71 band.
editor take
BoLT fits surrogates on thousands of real LLM runs; good, BBO needs fewer toy functions and more ugly tuning reality.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion
LLM-TabLogic uses LLM reasoning to capture and compress inter-column constraints, then passes them into a score-based diffusion model, reaching over 90% accuracy on column reasoning for unseen tables.
#Reasoning#LLM-TabLogic#Research release#Open source
why featured
HKR-K passes via the mechanism and >90% result, while HKR-H/R miss because the tabular synthetic-data angle is narrow and lacks product or ecosystem pull. No hard exclusion; lower 60-71 band.
editor take
LLM-TabLogic tops 90% on unseen-table column reasoning; I buy the direction, not the “no domain knowledge” claim yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Kelvin v1.0: A Neural Pre-Encoder for H.264 with -27.62% BD-VMAF on UVG
Kelvin v1.0 adds a lightweight learned pre-encoder before unmodified libx264, bounds pixel adjustments to ±1/255 per channel, and reports -27.62% mean BD-VMAF across seven 1080p UVG sequences versus baseline libx264 preset medium.
#Vision#Inference-opt#Benchmarking#Kelvin
why featured
HKR-H and HKR-K pass: the mechanism and compression number are concrete, and “no codec change” is a real hook. HKR-R is weak because this is niche video-codec research, so it stays in all.
editor take
Kelvin v1.0 saves 27.62% BD-VMAF before libx264; don’t compare it to x265, compare H.264 lock-in costs.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Text2CAD-Bench: A Benchmark for LLM-based Text-to-Parametric CAD Generation
Text2CAD-Bench introduces 600 human-curated text-to-parametric-CAD examples across L1-L4, covering basic geometry, complex topology, freeform surfaces, and real-world domains beyond mechanical parts.
#Benchmarking#Code#Text2CAD-Bench#Research release
why featured
HKR-K passes because the benchmark adds 600 leveled Text-to-CAD samples. HKR-H/R stay weak: the topic is narrow, with no model results, release artifact details, or production-impact claim disclosed.
editor take
Text2CAD-Bench ships 600 four-level CAD tasks; L3/L4 will separate geometry reasoning from sketch-extrude cosplay.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Sequential Structure in Intraday Futures Data: LSTM vs Gradient Boosting on MNQ
The paper tests four LSTM and gradient-boosting configurations on 944 trading days of five-minute MNQ OHLCV data from 2021-2025, and no setup achieves statistically significant out-of-sample accuracy above the 51.8% base rate.
#Benchmarking#arXiv#Kronos#MNQ
why featured
HKR-H/K/R all pass, but this is a quant-finance ML paper rather than a model, tool, or product update. The concrete negative result is useful, so it lands in the 60-71 research-signal band.
editor take
944 MNQ trading days topped out at 50.89% OOS; Kronos-style candlestick models look dead on single-instrument small data.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H1·K1·R1
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Symphony for Speech-to-Text: Supporting Real-Time Medical Voice Interfaces
The paper introduces Symphony for Speech-to-Text, a medical speech recognition system that splits recognition, formatting, and contextual correction for real-time streaming and batch clinical transcription; the abstract says it outperforms state-of-the-art systems on public benchmark and medical speech datasets, but does not disclose exact error rates or dataset sizes.
#Audio#Multimodal#Benchmarking#Symphony
why featured
HKR-K passes: the paper offers a concrete component split, but the body does not disclose error rates, dataset size, or clinical deployment results. Useful niche research, not featured-level signal.
editor take
Symphony splits ASR into 3 layers; no WER or dataset size is disclosed, so don’t trust “substantially” yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Scalable and Verifiable Federated Learning for Cross-Institution Financial Fraud Detection
DSFL partitions participants into ephemeral clusters of fixed size m and reduces communication complexity to O(N*m); on 284,807 transactions across 10 simulated banking nodes, it reached 91.2% global fraud recall and, at N=1000, showed about 34x lower aggregation latency than Paillier-based secure aggregation via analytical extrapolation.
#Safety#Benchmarking#arXiv#Google
why featured
HKR-K passes with a concrete mechanism and metrics. HKR-H/R are weak because this is an academic federated-learning paper with no real institutional deployment or open artifact disclosed.
editor take
DSFL hits 91.2% recall on 10 simulated banks; I don’t buy the 34x at 1000 nodes until real banks show up.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet
The paper proposes cmUNet and MarrNet to learn modality-agnostic representations via cross-modality transformation, in-modality reconstruction, and adversarial/perceptual loss, and validates the method on five cross-modality matching tasks including spectrum matching, person re-identification, and heterogeneous face recognition.
#Multimodal#Vision#arXiv#Research release
why featured
This is a standard arXiv multimodal-representation paper with concrete mechanisms and 5 task tests, so HKR-K passes. HKR-H and HKR-R stay weak because there is no product, open-source artifact, or industry adoption signal.
editor take
MarrNet covers 5 cross-modal matching tasks; without metrics here, the SOTA claim gets a haircut, but occlusion robustness is a useful diagnostic.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling
The paper tests changing task distributions during training in in-context linear regression sequence modelling, and reports that temporal task diversity increases small transformers’ inductive bias toward generalisation over memorisation.
#Reasoning#Benchmarking#Research release
why featured
HKR-K lands: non-stationary task distributions affect small Transformer generalization vs. memorization bias. HKR-H is weak and HKR-R is narrow, so this fits the lower all band.
editor take
The paper only covers small transformers on linear regression; I buy the direction, not any jump to pretraining.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models
SIPO applies DPO-C&M to clip and mask uninformative diffusion timesteps, then adds timestep-aware importance reweighting, with experiments on SD1.5, SDXL, CogVideoX-2B/5B, and Wan2.1-1.3B for preference alignment.
#Alignment#Vision#Multimodal#arXiv
why featured
HKR-K passes: the post gives the DPO-C&M mechanism and tests on SD1.5, SDXL, CogVideoX, and Wan2.1. The method is specialist and lacks HKR-H / HKR-R, so it stays in all.
editor take
SIPO tests five diffusion backbones with timestep clipping; I buy the diagnosis—Diffusion-DPO’s variance problem needs timestep surgery.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding
DARC proposes a retraining-free inference-time reranking method that selects candidates with a KL-robust entropic satisfaction objective and constrains the entropic risk premium against the mean through explicit risk budgets.
#Alignment#Safety#Inference-opt#DARC
why featured
HKR-K passes: DARC frames alignment as inference-time candidate reranking with KL-robust satisfaction and an entropy risk budget. HKR-H/R are weak because no results, code, or production impact are disclosed.
editor take
DARC only changes inference-time reranking, not training; no benchmark numbers disclosed, so I’d treat it as a risk knob, not alignment solved.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
ARROW: Augmented Replay for Robust World Models
ARROW extends DreamerV3 for continual reinforcement learning with short-term and long-term replay buffers, and evaluates forgetting and forward transfer on Atari tasks without shared structure and Procgen CoinRun variants with shared structure.
#Agent#Memory#ARROW#DreamerV3
why featured
HKR-K passes via the short/long-term replay buffers and Atari/Procgen CoinRun setup. HKR-H and HKR-R are weak, and the post gives no performance numbers or production claim, so this stays in the ordinary research-release band.
editor take
ARROW adds dual replay to DreamerV3 and tests Atari/CoinRun; I’d wait for same-memory curves before buying the bio-inspired pitch.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification
TailedTS introduces a 2024 Wikipedia hourly page-view benchmark with about 24.69 billion data points across roughly 3 million pages per month, where 5% of pages account for over 70% of views, and evaluates forecasting models with l1, Huber, quantile, and lp losses under heavy-tailed, zero-inflated, non-Gaussian conditions.
#Benchmarking#Wikipedia#TailedTS#Research release
why featured
HKR-K passes because the dataset scale, source, and evaluation losses are concrete. HKR-H and HKR-R are weak: this is a specialized time-series benchmark, not a model launch or product update, so it stays in all.
editor take
TailedTS ships 24.69B Wikipedia hourly points; 5% of pages drive 70% of views, so forecasting benchmarks finally get messy.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Does Weight Decay Enhance Training Stability?
The paper analyzes weight decay at the Edge of Stability and finds it slows progressive sharpening, dampens EoS oscillations in CNNs, and in MLPs induces a phase transition where sharpness stabilizes below the theoretical 2/η boundary.
#Reasoning#Benchmarking#Research release
why featured
HKR-K passes with a concrete mechanism claim, but HKR-H and HKR-R are weak: this is niche training-dynamics research with limited practitioner pull. Lower-band research item, not featured.
editor take
Weight decay triggers different stability mechanisms in CNNs and MLPs; the 2/η sharpness line looks brittle under regularization.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Anomaly-Preference Image Generation
The paper introduces Anomaly Preference Optimization, using real anomalies as positive references and deriving optimization signals from denoising trajectory deviations without human annotation; the RSS snippet does not disclose dataset counts or concrete metric values.
#Vision#Fine-tuning#Research release
why featured
HKR-K passes: the paper gives a concrete APO training-signal design. Dataset count and metrics are not disclosed, and the niche vision-QA angle keeps it in the interesting-not-featured band.
editor take
APO uses real anomalies as positives; metrics and dataset counts are undisclosed, so don’t cash the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
KairosHope: A Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture
KairosHope replaces quadratic attention with a HOPE block that combines Titans short-term memory and CMS long-term memory, then adapts to UCR classification tasks after Monash pretraining using an LP-FT protocol.
#Memory#Fine-tuning#Benchmarking#KairosHope
why featured
HKR-K passes via concrete architecture details: HOPE, Titans memory, CMS, and LP-FT on UCR. HKR-H/R miss; no performance numbers or artifact are disclosed, and time-series classification is niche.
editor take
KairosHope swaps quadratic attention for HOPE, but no UCR scores are disclosed; I’d treat this as architecture pitch, not a win.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Venom: A PyTorch Generative Modeling Toolkit
Venom provides a unified MNIST-first PyTorch interface for generative modeling, covering 7 families including diffusion, score-based models, flow matching, VAEs, normalizing flows, GANs, and energy-based models.
#Fine-tuning#Inference-opt#Benchmarking#Venom
why featured
HKR-K passes: the article gives a unified PyTorch toolkit spanning diffusion, flow matching, VAE, GAN, energy models, and 7 total families. HKR-H and HKR-R are weak, so this stays below featured.
editor take
Venom covers 7 generative families but commits to MNIST-first; useful for teaching APIs, not judging production generative stacks.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
ClaHF: A Human Feedback-inspired Reinforcement Learning Framework for Classification Tasks
ClaHF converts text-classification labels into preference signals for RL optimization, evaluates the framework on eight classification tasks across three scenario categories, and reports improved classification performance and confidence calibration across diverse language models.
#Fine-tuning#Alignment#Benchmarking#ClaHF
why featured
HKR-K passes via a concrete mechanism and 8-task evaluation. HKR-H/R are weak: no major lab, no broad capability release, and limited practitioner urgency.
editor take
ClaHF turns labels into preferences across 8 tasks; smells like RLHF packaging for classification, with gains undisclosed here.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
The Loupe: A Plug-and-Play Attention Module for Amplifying Discriminative Features in Vision Transformers
The Loupe raises Swin-Base accuracy on CUB-200-2011 from 88.36% to 91.72% by inserting a lightweight spatial gating module into an intermediate Vision Transformer feature stage, where a small CNN predicts a single-channel mask; the added parameters stay under 0.1%.
#Vision#Benchmarking#The Loupe#Swin
why featured
HKR-K passes via concrete benchmark gains and a spatial-mask mechanism; HKR-H/R are weak. This is a niche ViT module paper, not a product or foundation-model update, so it stays in the 60 band.
editor take
The Loupe adds <0.1% params and gives Swin-Base +3.36 points; old-school spatial gating still has bite in FGVC.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
CheckSupport: A Local LLM Tool for Automated Manuscript Submission Checklist Selection and Completion
CheckSupport uses locally run instruction-tuned LLMs to recommend and complete scientific reporting checklists, reaching 90% checklist recommendation accuracy and 88% item-level completion accuracy on a peer-reviewed manuscript corpus, with 12.5 seconds average wall-clock time per manuscript on CPU-only hardware.
#Tools#Inference-opt#CheckSupport#arXiv
why featured
HKR-K passes with concrete accuracy and CPU-latency numbers for a local LLM workflow. HKR-H and HKR-R are weak because the use case is narrow academic submission admin, so it stays in all.
editor take
CheckSupport hits 90% recommendation accuracy on peer-reviewed manuscripts; 12.5s CPU-local is nice, but corpus size is undisclosed.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery
AdaGraph performs clustering directly on kNN graph topology without a preset number of clusters k; the paper reports Graph-SCOPE mean ARI=0.900 on 10 synthetic benchmarks and correct k selection on 9 of 10 datasets.
#Benchmarking#AdaGraph#Graph-SCOPE#WGCNA
why featured
HKR-K is concrete and HKR-H has a real hook, but this remains niche clustering research with no code, production replacement, or effect on mainstream model workflows disclosed.
editor take
AdaGraph reports ARI=0.900 on 10 synthetic sets; “dissolves the curse of dimensionality” is too loud without replication.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Empirical Evaluation of Time Series Foundation Models for Day-Ahead and Imbalance Electricity Price Forecasting in Belgium
The study evaluates Chronos-2, Chronos-Bolt, and TimesFM 2.5 for Belgian day-ahead and imbalance electricity price forecasting; Chronos-2 in ARX mode achieves 5% lower MAE than the best machine-learning ensemble in the day-ahead market, but its imbalance-price MAE is 10% higher across horizons except two-hour-ahead.
#Benchmarking#Amazon#Google#Research release
why featured
HKR-K passes on concrete TSFM benchmark numbers, but HKR-H and HKR-R are weak: the scope is Belgian electricity pricing, with no product, agent, or general model-release signal.
editor take
Chronos-2 ARX cuts day-ahead MAE 5% but raises imbalance MAE 10%; TSFMs still flinch at power-market tails.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data
EdgeFD uses a KMeans-based density-ratio estimator to filter in-distribution and out-of-distribution proxy data on clients, removing server-side filtering; the arXiv v2 paper evaluates strong non-IID, weak non-IID, and IID client distributions without requiring a pretrained teacher model on the server, and says code is available for reproducibility.
#Fine-tuning#Inference-opt#EdgeFD#arXiv
why featured
HKR-K passes via EdgeFD’s client-side filtering mechanism and three distribution settings. HKR-H/R are weak, and the post gives no accuracy, communication, or edge-cost gains, so it stays low-tier all.
editor take
EdgeFD moves filtering to client-side KMeans; no overhead numbers in the snippet, so I read it as engineering tradeoff work.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples
DAASH composes multiple Lp-constrained base attacks with learned adaptive weights across stages to generate perceptually aligned adversarial examples, and on CIFAR-10, CIFAR-100, and ImageNet it reports up to a 20.63% attack-success improvement over AdvAD plus SSIM, LPIPS, and FID gains.
#Vision#Safety#Benchmarking#DAASH
why featured
HKR-H and HKR-K pass via the stealthy attack hook and 20.63% success-rate gain. HKR-R is weak: this is academic robustness work, with no product impact, incident tie, or mainstream model deployment angle disclosed.
editor take
DAASH beats AdvAD by 20.63% across CIFAR/ImageNet; robustness evals need this kind of meta-attack, not single-Lp comfort tests.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse
ZeroSiam uses a learnable predictor and stop-gradient before the classifier to build an asymmetric Siamese architecture for test-time entropy minimization, preventing dominant-class one-hot collapse; the paper reports empirical and theoretical results on vision adaptation and LLM reasoning tasks, but the snippet does not disclose benchmark counts or exact gains.
#Reasoning#Vision#Inference-opt#ZeroSiam
why featured
HKR-K passes via a concrete test-time entropy optimization mechanism across vision and LLM reasoning. HKR-H/R are weak, and no effect sizes or reproducible setup are disclosed, so it stays in the lower research band.
editor take
ZeroSiam adds predictor plus stop-gradient to stop entropy-collapse; gains are undisclosed, so I’d treat it as a TTA stability patch.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Architecture-Aware Explanation Auditing for Industrial Visual Inspection
The paper audits heatmap explanations on 172k WM-811K wafer maps, where ViT-Tiny with Attention Rollout achieves a Deletion AUC of 0.211 versus 0.432-0.525 for Swin-Tiny, ResNet18+CBAM, and DenseNet121 with Grad-CAM under a three-seed zero-fill perturbation protocol.
#Vision#Interpretability#Benchmarking#WM-811K
why featured
HKR-K passes on dataset size and Deletion AUC comparisons. HKR-H and HKR-R are weak; the niche industrial-vision interpretability angle keeps it below the interesting-news band, with no hard-exclusion rule triggered.
editor take
ViT-Tiny+Attention Rollout hits 0.211 Deletion AUC on 172k wafer maps; RISE near 0.1 keeps native explainers humble.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Fine-grained List-wise Alignment for Generative Medication Recommendation
FLAME frames medication recommendation as sequential single-drug additions or removals. It uses step-wise GRPO with potential-based reward shaping to model DDIs and each drug’s prescription contribution, and the authors report state-of-the-art results on benchmark datasets with code released on GitHub.
#Alignment#Safety#Fine-tuning#FLAME
why featured
HKR-K passes via a concrete mechanism: sequential add/remove decisions, step-wise GRPO, and DDI rewards. HKR-H/R are weak because this is a domain-specific medical recommender paper, not a broad agent/product story.
editor take
FLAME uses single-drug edits plus step-wise GRPO; NeurIPS Spotlight is strong, but real EHR validation decides the value.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When Does Non-Uniform Replay Matter in Reinforcement Learning?
The paper compares non-uniform replay with uniform sampling in off-policy reinforcement learning and identifies three drivers of gains: replay volume, expected recency, and sampling entropy; its Truncated Geometric replay improves sample efficiency in low-volume regimes across three modern algorithms and five RL benchmark suites.
#Reasoning#Benchmarking#arXiv#Research release
why featured
HKR-K passes with concrete mechanisms and test settings. HKR-H and HKR-R are weak because replay sampling is a narrow RL methods topic with limited practitioner resonance; no hard-exclusion rule is triggered.
editor take
Truncated Geometric replay gains across 3 algorithms and 5 suites at low replay volume; I buy it because recency and entropy are isolated.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
TPV: Parameter Perturbations Through the Lens of Test Prediction Variance
The paper introduces test prediction variance, a label-free first-order sensitivity measure for post-training robustness. TPV covers SGD noise, label noise, quantization, and pruning, proves training-set TPV converges to test-set TPV in the overparameterized limit, and yields JBR, a label-free pruning criterion with code released on GitHub.
#Fine-tuning#Inference-opt#Benchmarking#arXiv
why featured
HKR-K passes with TPV and the JBR pruning criterion; HKR-H is weak and HKR-R is narrow. The item is technical ML theory, not a hard-exclusion, so it sits in the low-value research band.
editor take
TPV unifies 4 perturbation types via first-order sensitivity; I buy JBR more, but model scales are undisclosed.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Visual Timelines of Police Encounters in Body-Worn Camera Footage for OpenBWC
The paper segments body-worn camera footage into 10-second windows, labels each window by operational context and motion intensity, and trains CLIP-frame and optical-flow models; the best test accuracy is 78.75% for context classification and 88.33% for activity intensity classification.
#Vision#Benchmarking#OpenBWC#CLIP
why featured
HKR-K passes on the 10-second windowing method and two accuracy figures. HKR-H/R miss: this is a vertical body-camera vision paper with no product release, open dataset, or practitioner workflow impact disclosed.
editor take
OpenBWC hits 78.75% context accuracy on 10-second windows; bodycam search is becoming engineering, but low-evidence windows decide usability.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
The Laplacian Keyboard: Beyond the Linear Span
The paper introduces Laplacian Keyboard, a hierarchical RL framework that builds a task-agnostic behavior library from Laplacian eigenvectors and trains a meta-policy to stitch behaviors, with theoretical bounds on zero-shot approximation error and empirical gains in sample efficiency over standard RL methods.
#Agent#Reasoning#Research release
why featured
HKR-K passes on a concrete mechanism and theory claim; HKR-H/R are weak. The item is theory-heavy RL with no product, open-source artifact, or reproducible experiment details, so it stays in the low-value research band.
editor take
Laplacian Keyboard builds behavior libraries from eigenvectors; I care about scale, and the RSS omits environments and baselines.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Residual Semantic Decomposition of Word Embeddings
The paper introduces Residual Semantic Decomposition for neural additive decomposition of word embeddings; each K=2 fit extracts one local semantic axis, while residuals expose information not absorbed by that axis.
#Embedding#Interpretability#Research release
why featured
HKR-K passes: RSD decomposes word embeddings via residual semantic axes and gives the K=2 fitting mechanism. HKR-H/R are weak; the post does not disclose scale, benchmark gains, or code, so it stays in all.
editor take
RSD splits GloVe with K=2 semantic axes, but the authors limit residual neighborhoods to diagnostics; don't sell it as sense prediction.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
FIM-LoRA: Task-Informative Rank Allocation for LoRA via Calibration-Time Gradient-Variance Estimation
FIM-LoRA uses eight calibration backward passes before fine-tuning to estimate LoRA-B gradient variance and reallocate rank per layer; on GLUE with DeBERTa-v3-base it scores 88.6 versus 88.7 for LoRA at the same parameter budget.
#Fine-tuning#Inference-opt#LoRA#DeBERTa
why featured
HKR-K passes on a concrete mechanism and reproducible condition, but the reported result does not beat the baseline and the angle is specialist. No hard exclusion; this is a low-value research increment for all.
editor take
FIM-LoRA spends 8 calibration backprops on rank allocation; GLUE 88.6 trails LoRA 88.7, so I don’t buy the upgrade story.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems
The paper presents KCGen-KT, an LLM-based pipeline for generating and tagging knowledge components for open-ended programming problems, and evaluates it on two real-world student code submission datasets, where it outperforms existing knowledge tracing methods and human-written KCs for future response prediction.
#Code#Benchmarking#Interpretability#Research release
why featured
HKR-K passes: the paper offers a new pipeline, two real datasets, and a comparison with human KCs. HKR-H/R are weak because knowledge tracing is niche edtech research, so this stays in all.
editor take
KCGen-KT beats human KCs on two real coding datasets; I want leakage checks and course transfer, not abstract confidence.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Sustainable Intelligence for the Wild: Knowledge-Adaptive Edge Expert Agents for Ecological Monitoring
Jiaxing Li and seven coauthors propose an edge expert-agent architecture for ecological monitoring, using a visual encoder plus a dynamic knowledge base instead of cloud-based model retraining; the 10-page arXiv abstract does not disclose benchmark results or deployment metrics.
#Agent#Vision#RAG#Jiaxing Li
why featured
HKR-K passes on a concrete edge-agent mechanism, but HKR-H/R are weak. The excerpt discloses no benchmark, code, or reproducible result, and ecological monitoring is peripheral for most AI practitioners.
editor take
Li’s 8-author edge-agent paper gives zero benchmarks in 10 pages; I don’t buy “sustainable intelligence” without field power and false-positive rates.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited
arXiv 2605.17017 formulates Behavior Foundation Model task inference as robust minimax optimization, adapting to worst-case dynamics shifts using only offline data from a single nominal environment. The abstract says it outperforms standard BFM and robust offline imitation-learning baselines, but the snippet does not disclose metrics, tasks, or effect sizes.
#Agent#Robotics#Benchmarking#arXiv
why featured
HKR-K passes: the method and perturbation setting are concrete, covering friction, actuator, and sensor noise. HKR-H and HKR-R are weak, so this stays in all rather than featured.
editor take
BFM task inference gets minimax robustness; only the abstract is disclosed, so I discount the “significant” win claims.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration
The paper re-annotates subsets of MNIST and a synthetic variant to isolate soft-label supervision from label mode shifts, and finds that human soft labels improve calibration on difficult samples and produce more stable convergence across training runs.
#Alignment#Benchmarking#Research release
why featured
HKR-K passes: the paper offers a testable setup and concrete calibration finding. HKR-H and HKR-R are weak, and the item only provides abstract-level detail, so it stays below featured.
editor take
The authors test re-annotated MNIST subsets; narrow scope, but decoupling calibration gains from mislabels is useful for RLHF label-noise audits.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Avoiding Structural Failure Modes in Tabular Fair SSL: Online Primal-Dual Allocation under Confidence Gating
The paper proposes OPDA, an online controller that schedules fairness and entropy stability penalties under confidence-gated pseudo-labeling, and evaluates it on three tabular benchmarks: Adult, ACSIncome, and COMPAS.
#Safety#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: OPDA is a concrete mechanism tested on three tabular fairness benchmarks. HKR-H/R are weak because the title is academic and the practical stakes for AI practitioners are limited.
editor take
OPDA runs on 3 tabular benchmarks and avoids two collapses; I buy the diagnostic, not the calibration-free pitch.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Ordinal Adaptive Correction: A Data-Centric Approach to Ordinal Image Classification with Noisy Labels
The paper proposes ORDAC for correcting noisy labels in ordinal image classification; on Adience with 40% noise, ORDAC_R reduced mean absolute error from 0.86 to 0.62 and raised recall from 0.37 to 0.49.
#Vision#Fine-tuning#Benchmarking#arXiv
why featured
HKR-K passes via a concrete noisy-label correction result, but HKR-H and HKR-R fail: this is a narrow arXiv method paper with no product, open-source tool, or major-model implication.
editor take
ORDAC_R cuts Adience 40% noise MAE to 0.62; for ordinal labels, correcting distributions beats throwing samples away.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Elastic-dLLM: Position-Preserving Context Compression and Augmentation of Diffusion LLMs
Elastic-dLLM proposes position-preserving [MASK] token compression and terminal-aware augmentation for diffusion LLM decoding, targeting full-sequence dLLMs such as LLaDA-8B-Instruct and LLaDA-1.5 and block dLLMs such as LLaDA2.0-mini; the abstract does not disclose concrete speedup numbers or benchmark scores.
#Inference-opt#Reasoning#LLaDA-8B-Instruct#LLaDA-1.5
why featured
HKR-K passes via concrete compression and augmentation mechanisms; HKR-H/R fail because the title is niche and no speedup or cost gain is disclosed. Keep it in all, below featured threshold.
editor take
Elastic-dLLM compresses [MASK] compute across 3 LLaDA models; no speedup numbers, so treat it as an idea paper.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Uncertainty-Calibrated Recommendation Framework for Low-Active Users
The paper introduces an uncertainty-calibrated recommendation framework that applies risk-averse deboosting for LAUs and UCB exploration for HAUs; the abstract says it was validated on a major livestream platform, but the post does not disclose exact improvement numbers.
#Benchmarking#Research release
why featured
A narrow recommender-systems paper: HKR-K passes via the LAU/HAU uncertainty mechanism, while HKR-H and HKR-R are weak. The post says it was tested on a large live-streaming platform but gives no lift numbers, keeping it in the upper low-value band.
editor take
LAUs get deboosting and HAUs get UCB; no lift numbers disclosed, so I’d file this as sensible recsys plumbing, not proof.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Uncertainty Quantification as a Principled Foundation for Explainable AI: A Case Study of Counterfactual Explanations
The paper uses uncertainty quantification to express core counterfactual explanation properties and builds two explainer variants: one using uncertainty estimates only and one adding feature-space distance; the RSS abstract says experiments compare against many state-of-the-art methods, but it does not disclose datasets, metrics, or exact scores.
#Interpretability#Benchmarking#Research release#Benchmark
why featured
HKR-K passes for a concrete UQ framing and two variants. HKR-H/R are weak: the RSS gives no datasets, metrics, or scores, so this stays a niche academic research item.
editor take
The paper gives 2 UQ counterfactual explainers; datasets, metrics, and scores are undisclosed, so don’t buy “comprehensive experiments” yet.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
XCTFormer: Leveraging Cross-Channel and Cross-Time Dependencies for Enhanced Time-Series Analysis
XCTFormer models pairwise token dependencies across time and channels with CRAB, and on three time-series benchmarks it reports state-of-the-art imputation results, reducing MSE by 20.8% and MAE by 15.3% on average versus the second-best method.
#Reasoning#Benchmarking#XCTFormer#Research release
why featured
HKR-K passes via CRAB plus 3 benchmark gains, but HKR-H and HKR-R fail: this is a niche time-series imputation paper with no product, agent, or industry rivalry hook.
editor take
XCTFormer cuts imputation MSE 20.8% across 3 benchmarks; without latency and memory tables, CRAB still feels under-proven.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Bi-Level Chaotic Fusion Based Graph Convolutional Network for Stock Market Prediction Interval
The paper proposes a bi-level chaotic fusion graph convolutional network for stock-market prediction intervals, testing it on 43 NSE companies across eight sectors from 2016 to 2026 and reporting 96.6% PICP, a 0.0778 Winkler score, 0.1407 PIAW, and p < 0.001 significance versus LSTM, GRU, GCN, and HGNN baselines.
#Benchmarking#NSE#Research release#Benchmark
why featured
HKR-K passes on concrete method and metrics, but HKR-H is weak and HKR-R is narrow for AI practitioners. No hard exclusion is triggered, so it sits in the low-value research-update band.
editor take
BCF-GCN reports 96.6% PICP on 43 NSE stocks; I don’t buy finance forecasting papers without costs and rolling backtests.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning
The paper proposes FedHybrid and FedNewton for differentially private federated M-estimation, gives finite-sample MSE upper bounds and a minimax lower bound as functions of client count, local sample size, privacy budget, and iterations, and evaluates logistic regression and neural networks on MNIST and CIFAR-10.
#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes via new algorithms and statistical bounds. HKR-H/R miss: the story is specialized learning theory with weak product implications, so it stays in the low-value research band.
editor take
FedHybrid and FedNewton get MSE bounds; FedNewton’s fewer-round claim hinges on slow client growth, but the snippet gives no threshold.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Research Presents Transformer Model for Unified Lagrangian Particle Dynamics Simulation
The paper presents a single Transformer-based particle simulator using a prediction-correction design to model six dynamics categories, including cloth, elastic solids, Newtonian and non-Newtonian fluids, granular materials, and molecular dynamics.
#Reasoning#Research release
why featured
Triggers hard-exclusion-4: a physics/molecular-dynamics simulation paper with no agent, product, or practitioner on-ramp disclosed. Only HKR-K passes, so the score is capped and excluded.
editor take
WorldParticle runs six particle dynamics classes with one Transformer; don’t retire solvers yet—the abstract gives no error or compute bill.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
UNR-Explainer: Counterfactual Explanations for Unsupervised Node Representation Learning Models
The paper introduces UNR-Explainer, a Monte Carlo Tree Search method for counterfactual explanations in unsupervised node representation learning; it identifies subgraphs whose perturbation changes a target node’s k-nearest neighbors in embedding space, and the abstract reports tests across diverse datasets for unsupervised GraphSAGE and DGI without disclosing dataset names or metrics.
#Interpretability#Embedding#Benchmarking#Research release
why featured
HKR-K passes through a concrete method and evaluation target; HKR-H and HKR-R are weak. The graph representation focus is specialized, so this stays as a low-weight research item.
editor take
UNR-Explainer uses MCTS to perturb subgraphs and track kNN shifts; no datasets or metrics disclosed, so “superior” is unearned.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Compositional Generalization in Continual Few-Shot Learning
The paper proposes a dual-phase framework for continual few-shot learning: training optimizes slot representations for holistic class identity, while inference dynamically composes preserved slots for novel scenes; the abstract claims state-of-the-art unseen-concept generalization and minimal forgetting, but the RSS snippet does not disclose benchmark names or numerical results.
#Vision#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes on the testable slot-training/inference mechanism, but benchmark names and scores are not disclosed. HKR-H and HKR-R are weak, so this stays a low-value research item.
editor take
The paper discloses a two-phase slot setup, but no benchmarks or numbers; I don’t buy the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
SAS: Semantic-aware Sampling for Generative Dataset Distillation
The paper introduces SAS, a semantic-aware post-sampling method for generative dataset distillation, using CLIP as a semantic prior with 3 scoring functions and a two-stage selection strategy.
#Vision#Embedding#Fine-tuning#CLIP
why featured
HKR-K passes on a concrete mechanism, but the post gives no accuracy, compression, or cost numbers. As a niche algorithm paper with weak HKR-H/R, it stays in all.
editor take
SAS adds CLIP post-sampling to distilled image pools; gains are undisclosed, so I buy the filter—not a distillation breakthrough.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps
The paper proposes AIM, a saliency-guided adversarial feature replacement framework that evaluates saliency-map faithfulness and masking-operator reliability across image, audio, and EEG tasks, comparing degradation under complementary masking orders and measuring random-attribution bias plus stability of faithfulness rankings.
#Interpretability#Vision#Audio#Research release
why featured
HKR-K passes: AIM offers a testable saliency-faithfulness evaluation mechanism across image, audio, and EEG. HKR-H/R fail because the angle is niche research with no product or industry spread.
editor take
AIM tests masking bias across image, audio, and EEG; saliency papers still using zero masks now look lazy.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Universal Time-Series Representation Learning: A Survey
The arXiv survey organizes universal time-series representation learning methods around three fundamental design elements, reviews prior studies under that taxonomy, and summarizes common experimental setups, datasets, future research directions, and an associated GitHub resource.
#Benchmarking#arXiv#Research release
why featured
HKR-K passes because the survey packages a 3-element framework and resource list. HKR-H and HKR-R fail: it is a routine arXiv survey with no product impact, model release, or practitioner nerve beyond time-series specialists.
editor take
arXiv 2401.03717v4 uses a 3-part taxonomy; the GitHub list matters more, but benchmark coverage is undisclosed.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Towards Principled Test-Time Adaptation for Time Series Forecasting
The paper proposes a TSF-TTA protocol that uses only matured ground truth and introduces FAC, which parameterizes prediction corrections in the frequency domain; across datasets, forecasting horizons, and source forecasters, the abstract reports consistent competitive performance with substantially fewer trainable parameters.
#Fine-tuning#Inference-opt#Research release
why featured
HKR-K passes via the TSF-TTA protocol and FAC frequency-domain correction, but HKR-H and HKR-R miss: the angle is narrow research with no product or industry conflict. Lower-band default puts it in browseable all.
editor take
FAC uses only matured ground truth for TSF-TTA; parameter savings lack numbers, so the protocol cleanup is the useful part.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for MLLMs
The paper presents an OCR-aware multilingual multimodal training framework using synthetic OCR-to-translation data, LoRA-based supervised fine-tuning, and structured visual chain-of-thought prompting, but the RSS abstract does not disclose dataset size, benchmark scores, or numerical gains.
#Multimodal#Vision#Fine-tuning#LLaMA
why featured
HKR-K passes for a concrete method mix; HKR-H and HKR-R fail, and the summary gives no dataset size, metrics, or artifact. This is browseable multimodal OCR research, not a featured item.
editor take
LoRA SFT claims stronger multilingual OCR, but no data size or scores; I don’t buy qualitative GPT-5/Gemini comparisons.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data
DAD4TS uses a diffusion model and reinforcement learning to generate augmented time-series samples for small-scale forecasting, and the paper evaluates it against 7 comparison methods across 6 real-world datasets and 8 time-series models, with reported validation on 5 datasets.
#Fine-tuning#Benchmarking#DAD4TS#Research release
why featured
HKR-K passes on a concrete benchmark setup: 6 datasets, 8 models, 7 baselines, with gains on 5 datasets. HKR-H and HKR-R miss; this is niche time-series augmentation research with no product, ecosystem, or open-source signal.
editor take
DAD4TS tests 8 models on 6 datasets; I’d inspect the 1 failure first—augmentation papers often hide there.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Research paper proposes nested spatio-temporal time series forecasting framework
The paper proposes a nested forecasting framework that uses spectral clustering to build macro regions and a progressive coarse-to-fine predictor to inject future trend signals into micro-level spatiotemporal time-series forecasts.
#Reasoning#Research release
why featured
HKR-K passes on the nested mechanism, but HKR-H/R fail: the title is dry and the post gives no metrics or deployment stakes. Narrow ML-research signal; no hard exclusion, so it stays in the 40–59 band.
editor take
NestedST uses spectral clustering for macro regions, but no datasets or gains are disclosed; I’d inspect the noise-filtering proof first.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights
The paper proposes a maximum-entropy reinforcement-learning model for customer trajectories and evaluates it on real convenience-store trajectory data; actual customer paths deviate from shortest paths by 28% on average, and RL-generated paths outperform TSP and PNN for impulse purchase rates, shelf traffic density, and product repositioning decisions.
#Agent#Reasoning#arXiv#GitHub
why featured
HKR-K passes because the paper gives a testable 28% path-deviation result and code. HKR-H/R fail: it is a niche retail RL paper, with no foundation-model, agent-product, or broad practitioner impact.
editor take
Real paths deviate 28% from shortest paths, and RL beats TSP/PNN; single convenience-store data keeps the claim narrow.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
FlowMixer: A Depth-Agnostic Neural Architecture for Interpretable Spatiotemporal Forecasting
FlowMixer uses a single non-negative matrix mixing layer inside a reversible mapping framework to model spatiotemporal patterns, and its semi-group property supports algebraic prediction-horizon manipulation without retraining; the RSS abstract says experiments match state-of-the-art methods but does not disclose datasets, metrics, or numeric results.
#Interpretability#Reasoning#FlowMixer#Research release
why featured
HKR-K passes: semigroup-based horizon changes without retraining are testable. HKR-H/R fail; no experiment data is disclosed, and the niche forecasting angle keeps it in the low-value research band.
editor take
FlowMixer discloses one non-negative mixing layer and a semi-group trick; no datasets, metrics, or numbers, so don’t buy SOTA yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MedMIX: Modality-Internal Expert Fusion for Multimodal Medical Diagnosis
MedMIX evaluates a multimodal medical prediction framework on three benchmarks—OpenI, MIMIC-IV-MM, and MMIST-ccRCC—using intra-modality small-expert embedding aggregation, learned fusion over available modalities, and training-only large-teacher collaboration with no added inference cost.
#Multimodal#Fine-tuning#Inference-opt#MedMIX
why featured
HKR-K passes because the paper names concrete fusion mechanisms and three benchmarks. HKR-H/R fail: it is a narrow medical ML paper with no disclosed gains and limited practitioner resonance.
editor take
MedMIX reports 3 medical benchmarks; gains are undisclosed, so I’d file it under robustness engineering, not diagnostic breakthrough.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
FLEX-MoE: Federated Mixture-of-Experts with Load-Balanced Expert Assignment for Edge Computing
FLEX-MoE jointly optimizes expert assignment and load balancing for federated MoE on edge networks, using client-expert fitness scores from training feedback and an optimization-based algorithm to enforce balanced expert utilization under limited client capacity.
#Fine-tuning#Inference-opt#Research release
why featured
HKR-K passes for a concrete federated MoE assignment mechanism, but HKR-H/R are weak: no result numbers, artifact, or broader practitioner stakes are disclosed.
editor take
FLEX-MoE assigns experts via training feedback; no accuracy numbers disclosed, so treat it as an edge-FL engineering candidate.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Tensor Cookbook: Mastering Tensors through Diagrams
arXiv 2605.16610v1 presents a self-contained tensor network guide that uses diagrams to express tensor contractions, decompositions, gradient derivations, and operations on high-dimensional probability distributions.
#Reasoning#arXiv#Research release
why featured
HKR-K passes because it offers concrete diagrammatic mechanisms for tensor networks. HKR-H/R fail: this is a niche math tutorial, with weak industry signal for AI practitioners.
editor take
arXiv 2605.16610v1 offers a self-contained tensor-network diagram guide; no experiments disclosed, but ML notation badly needs this cleanup.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis
DeMa applies a dual-path Mamba backbone to multivariate time series analysis across five task types, decomposing intra-series dynamics and inter-series interactions while using delay-aware linear attention to model cross-variate dependencies under Mamba’s linear-complexity design.
#Reasoning#Inference-opt#Benchmarking#DeMa
why featured
HKR-K passes because the paper states a concrete architecture and evaluation setup, but HKR-H/R fail: the angle is niche and lacks results numbers, code, or product implications. No hard-exclusion rule is strong enough to cap it below 40.
editor take
DeMa spans 5 MTS task types; no SOTA numbers are disclosed, so don’t crown dual-path Mamba over Transformers yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
S2Aligner: Efficient Transferable Pre-Training for Sparse Text-Attributed Graphs
S2Aligner decouples graph-text representations into semantic and structural components, then uses a global-domain density ratio and graph reliability estimation to reduce cross-domain risk for sparse text-attributed graphs.
#Embedding#Fine-tuning#S2Aligner#Research release
why featured
hard-exclusion-technical-accessibility applies: sparse text-attributed graph pre-training is specialist graph ML, with no product, agent, or industry hook disclosed. Only HKR-K passes, so the score is capped at 39.
editor take
S2Aligner tackles sparse TAG pretraining in 19 pages; gains are undisclosed here, so I’d test it on real missing-text graphs first.
HKR breakdown
hook knowledge resonance
open source
51
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A Feature-Driven Framework for Software Fault Prediction
The study evaluates 4 feature-selection methods and 3 hyperparameter-tuning techniques for software fault prediction, where CFS plus GA with random forest reaches 88.40% accuracy, 18% above baselines without feature selection or tuning, with cross-validation variability within ±1.0%.
#Code#Benchmarking#Research release#Benchmark
why featured
HKR-K passes on concrete methods and an 88.40% result. HKR-H and HKR-R miss: this is a narrow software-fault-prediction benchmark, not a product, model, or developer-workflow story.
editor take
CFS+GA+RF hits 88.40% accuracy. For SFP, this is feature engineering doing the work, not a model leap.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MSTN: A Lightweight and Fast Model for General TimeSeries Analysis
MSTN reports new best results on 21 of 27 time-series datasets, with about 0.40M parameters for MSTN-BiLSTM and about 1.06M for MSTN-Transformer, using a multi-scale convolutional encoder, recurrent or attention sequence modeling, and self-gated fusion.
#Benchmarking#Sumit S Shevtekar#Chandresh K Maurya#Research release
why featured
HKR-K passes on the 21/27 dataset result and 0.40M/1.06M parameter counts; HKR-H/R are weak. The paper is specialized time-series ML with no deployment, open-source, or LLM/agent link, so it stays in the low-value band.
editor take
MSTN claims SOTA on 21/27 datasets; at 0.40M params, time-series baselines look embarrassingly bloated.
HKR breakdown
hook knowledge resonance
open source
49
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning
The paper proposes AES, an adaptive entropy scheduling method that adjusts entropy coefficients or temperature online using observable drift proxies, and reports lower drift-induced performance degradation plus faster recovery across 4 algorithm variants, 12 tasks, and 4 drift modes.
#Reasoning#Benchmarking#arXiv#Research release
why featured
HKR-K passes because the summary gives a mechanism and test scope; HKR-H/R are weak. Non-stationary RL entropy scheduling is a narrow research item with no product or agent adoption angle, so it stays in the low-value research band.
editor take
AES tunes entropy across 4 algorithms, 12 tasks, 4 drift modes; I buy the direction, but gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Federated Learning by Utility-Constrained Stochastic Aggregation for Improving Rational Participation
The paper introduces FedUCA, a federated learning framework that models the server as an optimizer and uses utility-constrained stochastic aggregation to sustain rational client participation; the abstract says standard-dataset experiments improve client retention and global model performance, but the post does not disclose specific numbers.
#Fine-tuning#Benchmarking#FedUCA#Research release
why featured
HKR-K passes: FedUCA adds a concrete utility-constrained stochastic aggregation mechanism, but the abstract gives no retention or performance numbers. HKR-H and HKR-R are weak, so this stays a low-value research signal.
editor take
FedUCA puts client retention into aggregation constraints; no numbers disclosed, so I buy the setup, not the “significant” win.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
JSON-Bag: A Generic Game Trajectory Representation
The paper introduces JSON-Bag to represent game trajectories by tokenizing JSON descriptions, then evaluates JSD with prototype-based nearest-neighbor search across 6 tabletop games and 3 classification tasks.
#Benchmarking#Research release
why featured
Only HKR-K passes: the paper gives a concrete representation and evaluation setup, but the angle is a niche academic format proposal without product, open-source, or practitioner competition hooks.
editor take
JSON-Bag spans 6 tabletop games and 3 tasks; I like the ugly baseline, but token distance is not policy understanding.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Investigation into In-Context Learning Capabilities of Transformers
The paper tests Transformer in-context learning on Gaussian-mixture binary classification tasks, controlling input dimension, number of in-context examples, and number of pre-training tasks.
#Reasoning#Benchmarking#Frei#Vardi
why featured
HKR-K passes via a concrete experimental setup and three controlled factors. HKR-H and HKR-R are weak, and Gaussian-mixture ICL mechanism work sits far from product practice, so it stays in the low-value research band.
editor take
This only sweeps three variables on Gaussian-mixture binary tasks; I wouldn’t generalize it to real ICL, but the failure map is useful.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Understanding Self-Supervised Learning via Latent Distribution Matching
The paper formulates self-supervised learning as latent distribution matching, using alignment to maximize latent log-probability and uniformity to maximize entropy, then derives a nonlinear sampling-free Bayesian filtering model with a Kalman-based predictor and proves predictive LDM identifies nonlinear latent representations under mild assumptions.
#Research release
why featured
HKR-K passes because the paper offers a concrete theoretical mechanism and identifiability claim. HKR-H/R fail: it is narrow SSL theory with no model release, tool, or industry-facing consequence, so it stays in the lower research band.
editor take
LDM unifies ICA, contrastive, non-contrastive, and predictive SSL; I buy the theory map, not the new-method guidance yet.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Automatic Unsupervised Ensemble Outlier Model Selection--Extended Version
MetaEns learns marginal ensemble gains from labeled meta-datasets, then combines that signal with diversity-aware discounting and family-level risk regularization at test time to greedily select compact outlier-detection ensembles across 39 real-world datasets without ground-truth labels.
#Benchmarking#MetaEns#Research release#Benchmark
why featured
HKR-K passes via concrete mechanisms and 39 real datasets; HKR-H/R fail because the title is academic and the use case is narrow. No hard exclusion, but this is niche ML research, so it stays in the low-value band.
editor take
MetaEns tests on 39 datasets with fewer detectors; I buy the direction, but no AP lift or ensemble size is disclosed.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Improving Random Forests by Smoothing
The paper proposes a kernel smoothing mechanism for piecewise-constant random forest outputs and releases code, datasets, and experiment results; its experiments report more consistent predictive performance in data-scarce settings.
#Benchmarking#Research release#Open source
why featured
HKR-K passes on a concrete smoothing mechanism plus code/data/results, but HKR-H and HKR-R fail: this is a niche classical-ML methods paper, not a model, agent, or product story. Score stays in the 40–59 band.
editor take
SmoothedRandomForest adds kernel smoothing to RF outputs; gains lack numbers in the snippet, so I file it as a useful old-model patch.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Boundedly Rational Meta-Learning in Sequential Consumer Choice
The researchers designed a hierarchical airline-route choice task and found that BRMDP(1), a boundedly rational meta dynamic programming policy using one hyper-posterior draw, fits trial-by-trial human choices better than both no-transfer and fully integrated Bayesian meta-learning benchmarks.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a concrete experiment and baselines, while HKR-H/R fail. This is a niche academic paper summary with no product, agent, or industry consequence, so it lands in the low-value non-noise band.
editor take
BRMDP(1) beats no-transfer and full Bayes; I buy the coarse-transfer story, not the fantasy of exact integration.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Deep Reinforcement Learning Framework for Diversified Portfolio Management Across Global Equity Markets
The study uses Soft Actor-Critic to learn continuous portfolio weights and evaluates five configurations with walk-forward optimization across 16 out-of-sample folds from 2003 to 2026 on the Nasdaq-100, Nikkei 225, and Euro Stoxx 50.
#Agent#Reasoning#Benchmarking#Nasdaq-100
why featured
HKR-K passes on concrete method and evaluation details; HKR-H/R are weak. This is a niche quant-finance RL paper with no model, product, or open-source impact, so it sits in the 40-59 band.
editor take
SAC only clears Euro Stoxx 50 across 16 out-of-sample folds; the global-allocation story smells like regional overfit.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis
The paper proposes FractalNet, a recursive fractal-template framework that generated and evaluated over 1,200 CNN architectures on CIFAR-10, using PyTorch SGD with AMP and gradient checkpointing, and reported 60-70% average validation accuracy and 80.18% peak accuracy after five training epochs.
#Vision#Benchmarking#Inference-opt#Research release
why featured
HKR-K passes with concrete counts and CIFAR-10 results, but HKR-H and HKR-R are weak. The CNN-on-small-benchmark angle is far from current LLM or agent product concerns, so it stays in the low-value research band.
editor take
FractalNet tested 1,200 CNNs and hit 80.18% on CIFAR-10 after 5 epochs; the LLM-analysis framing is unsupported.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
MV-Gate: Insider Threat Detection via Multi-View Behavioral Statistics and Semantic Modeling
MV-Gate builds three aligned behavioral sequences—activity tokens, multi-scale status signals, and frequency-deviation signals—and evaluates insider-threat detection on CERT r4.2, CERT r5.2, and ADFA-LD, with the RSS snippet claiming gains over classical, deep-learning, and domain-specific baselines but not disclosing exact metrics.
#Safety#Benchmarking#MV-Gate#CERT
why featured
HKR-K passes: the summary gives three modeling signals and CERT r4.2, CERT r5.2, ADFA-LD as evaluation settings. HKR-H/R are weak, and the item is a niche security paper, so it stays in all.
editor take
MV-Gate tests on CERT r4.2, r5.2, and ADFA-LD; no metrics disclosed, so I don’t buy the “notable gains” yet.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Stable Routing for Mixture-of-Experts in Class-Incremental Learning
The paper proposes StaR-MoE for expandable MoE in class-incremental learning, using sensitivity-aware routing alignment and asymmetric capacity regularization to preserve old-class routing and use new experts, with experiments on four standard CIL benchmarks reporting higher average and last accuracy than prior methods.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes: the post names StaR-MoE, two routing/capacity mechanisms, and results on 4 CIL benchmarks. HKR-H/R are weak, so this stays a low-value research item rather than featured.
editor take
StaR-MoE improves average and last accuracy on 4 CIL benchmarks; routing drift is a real fix, but RSS gives no margins.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation
The paper proposes FedNL, reformulating federated learning as a three-level nested optimization system with Titans-based linear attention, and tests it on Non-IID MMLU and long-context benchmarks; the abstract reports competitive short-context reasoning, improved long-context retrieval and streaming cross-entropy, and constant inference memory, but does not disclose exact scores.
#Memory#Reasoning#Inference-opt#FedNL
why featured
HKR-K passes for FedNL’s three-layer nested optimization, but HKR-H/R are weak. The post gives no scores and stays in specialist federated-learning/test-time-adaptation territory, so it sits in the lower research band.
editor take
FedNL casts FL as three-level nested optimization; no scores disclosed, so I file it as neat framing, weak evidence.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Spherical Harmonic Optimal Transport for Climate Model Comparison
The paper proposes a spherical harmonic Sinkhorn algorithm for comparing measures on the 2-sphere, requiring O(n) memory and O(n^3/2) time per iteration, and validates its computational efficiency on synthetic data while discussing use in global climate model evaluation.
#Benchmarking#arXiv#Research release
why featured
HKR-K passes on algorithmic complexity, but HKR-H/R fail. hard-exclusion-1/4 applies: deep numerical methods plus climate-model comparison without agent or product implications, so the score is capped below 40.
editor take
Spherical harmonic OT claims O(n^3/2) time and O(n) memory per step; climate eval needs runnable sphere metrics, not prettier scores.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Transfer Learning for Customized Car Racing Environments
The paper trains an agent on one OpenAI Car Racing circuit and evaluates customized target tracks through zero-shot transfer or additional fine-tuning; its abstract says model-based methods outperform and converge faster than model-free methods, but the post does not disclose lap-time numbers or benchmark tables.
#Agent#Fine-tuning#Benchmarking#OpenAI
why featured
HKR-K passes on a testable claim: model-based transfer performs better and converges faster on custom tracks. HKR-H and HKR-R fail, and the post lacks lap-time or convergence numbers, so it stays in the low-value keep band.
editor take
The paper gives Car Racing transfer setup, but no lap-time table; I wouldn’t overbuy the model-based win yet.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
A3B2: Adaptive Asymmetric Adapter for Branch Bias in Few-Shot Vision-Language Classification
The paper proposes A3B2, an adaptive asymmetric adapter that uses UAAD to suppress image-branch adaptation under high prediction uncertainty, and evaluates it on 3 few-shot image classification tasks across 11 datasets against 11 prompt- and adapter-based baselines.
#Vision#Multimodal#Fine-tuning#CLIP
why featured
A narrow VLM few-shot classification paper. HKR-K passes via the UAAD mechanism and 11-dataset evaluation; HKR-H/R fail because the title is academic and lacks product or industry stakes.
editor take
A3B2 tests 3 few-shot tasks across 11 datasets. UAAD’s uncertainty gate is a sane CLIP adapter default.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
An Efficient Machine Learning-based Framework for Detection and Prevention of Frauds in Telecom Networks
The paper evaluates telecom fraud detection on a Telecom CDR dataset with 101,174 customer records and 8,830 fraud cases; Random Forest reached 99.9% accuracy, precision, recall, and F1 after missing-value handling, Min-Max scaling, and SMOTE balancing.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via dataset size and Random Forest 99.9% metrics. HKR-H/R are weak: this is applied ML for telecom risk, with no LLM, agent, or product implication, so it stays in the low-value research band.
editor take
RF hit 99.9% F1 on 101,174 CDR records; after SMOTE, I’d audit leakage before trusting this.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Cross-modal Affinity-aligned Multimodal Learning Analytics for Predicting Student Collaboration Satisfaction in Game-Based Learning
The researchers propose AAMLA, using the CAMA module to align facial action units, head pose, eye gaze, and interaction logs on data from 50 middle school students to predict collaboration satisfaction in the EcoJourneys game-based learning environment.
#Multimodal#Embedding#Interpretability#EcoJourneys
why featured
A narrow arXiv learning-analytics paper: HKR-K passes via the AAMLA/CAMA mechanism and 50-student dataset, while HKR-H and HKR-R fail. No product, open-source artifact, or adoption signal keeps it in the 40–59 band.
editor take
AAMLA is tested on 50 students; education multimodal papers live or die on replication, and CAMA’s degradation gains aren’t disclosed.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Beyond the Next Port: A Multi-Task Transformer for Forecasting Future Voyage Segment Durations
The authors propose a multi-task Transformer for future voyage segment duration forecasting, using historical sailing durations, port congestion proxies, and vessel descriptors, and report on a 2021 global dataset that it reduces MAE by 4.70%, MAPE by 4.95%, and RMSE by 2.59% versus sequential deep learning baselines.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes on concrete benchmark deltas, while HKR-H and HKR-R fail because the topic is a narrow logistics-forecasting task with little practitioner pull; no hard-exclusion rule is triggered.
editor take
2021 global voyage data shows 4.70% lower MAE; I buy the framing, future segments without AIS beat another ETA leaderboard.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Hierarchical Two-Stage Framework for Environment-Aware Long-Horizon Vessel Trajectory Prediction
The paper proposes a hierarchical two-stage vessel trajectory forecasting framework using 3-hour inputs for a 10-hour horizon. On Australian North West CTS data aligned with Copernicus Marine products, it reports 25% lower ADE and 17% lower FDE than the state of the art.
#Multimodal#Benchmarking#Australian Craft Tracking System#Copernicus Marine Service
why featured
HKR-K passes with a concrete 3-hour input, 10-hour forecast, and ADE/FDE gains. HKR-H/R are weak: the work is niche vessel-trajectory research with no agent or product implication.
editor take
This forecasts 10-hour vessel paths from 3-hour inputs with 25% lower ADE; I’d audit CTS splits and AIS noise first.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Robust Player-Conditional Champion Ranking for League of Legends: Style Similarity, Mastery Priors, and Archetype-Constrained Discovery
The paper presents a player-conditional champion recommender for League of Legends that combines four signals: population strength, player-style similarity, mastery priors, and archetype guardrails. Its prototype uses Python/Pandas, Supabase storage, and a web interface, with one 100-game case study for DIVINERAINRACCON; the post does not disclose large-scale evaluation results.
#Interpretability#Benchmarking#Research release
why featured
HKR-K passes on concrete signals and a prototype condition, but this is a niche game recommender paper with no product, agent, or major-model impact. No hard exclusion applies; it stays in the low-value research band.
editor take
The paper validates on one player’s 100 games; the interpretability is tidy, the recommender quality is still unproven.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
From Imitation to Interaction: Mastering Game of Schnapsen with Shallow Reinforcement Learning
The paper evaluates MLPBot and RLBot against RdeepBot, with RLBot trained via asynchronous Monte Carlo updates and experience replay; when its learned value function is combined with deeper lookahead at play time, RLBot achieves statistically higher win rates than the strongest evaluated RdeepBot baseline.
#Agent#Reasoning#RdeepBot#MLPBot
why featured
Only HKR-K passes: the paper gives concrete training and benchmark details, but the Schnapsen setting is too narrow for broad AI practitioners and lacks product, open-source, or general-agent impact.
editor take
RLBot beats RdeepBot with shallow nets plus deeper lookahead; win rates aren't disclosed, so don't sell this as general game reasoning.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H0·K1·R0
04:00
21d ago
arXiv · cs.LG· atomEN04:00 · 05·19
Attention-Aware Transformer-Based Aggregation Network for Video Periocular Recognition
The paper proposes a video periocular recognition framework that uses a CNN for frame-level embeddings and an encoder-only Transformer for aggregation, reporting 99.8% TPR@1e-1 and 96.6% Rank-5 in the best scenario on the COX Face dataset.
#Vision#Multimodal#Benchmarking#COX Face
why featured
HKR-K passes on the stated architecture and COX Face metrics. HKR-H and HKR-R fail because this is a narrow vision-recognition paper without product, tooling, or broad industry impact.
editor take
COX Face best case hits 99.8% TPR@1e-1; I want cross-camera splits, because single-dataset biometrics scores age badly.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
03:19
21d ago
HuggingFace Papers (takara mirror)· rssEN03:19 · 05·19
Research on Bidirectional Knowledge Distillation Between Random Forests and Deep Neural Networks
The paper studies bidirectional knowledge distillation between Random Forests and deep neural networks across 144 experiments on 6 datasets, reporting 98.13% classification accuracy for NN-COMPACT and 92.6% R² for NN-WIDE in regression.
#Fine-tuning#Inference-opt#Interpretability#Research release
why featured
HKR-K passes with concrete experiment count and accuracy. HKR-H/R fail because the paper is a niche method comparison, far from model launches, agents, or product impact, so it stays in the low-to-interesting band.
editor take
144 experiments report 98.13% accuracy; without baseline deltas disclosed, RF↔DNN distillation is not yet a compression win.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
03:03
21d ago
HuggingFace Papers (takara mirror)· rssEN03:03 · 05·19
Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection
The paper proposes LONSREX, a data synthesis pipeline for explainable misinformation detection that scores each verification step by its contribution to the final prediction; the snippet reports two failure modes in label-only filtering, insufficient rationales from coarse binary labels and unnecessary verbose rationales from stronger LLMs, but does not disclose dataset size or benchmark numbers.
#Reasoning#Fine-tuning#Safety#Research release
why featured
HKR-H/K/R all pass via a concrete rationale-eval question and safety resonance, but the work is narrow misinformation-detection research with no major-lab release, product impact, or disclosed open-source artifact.
editor take
LONSREX scores each verification step; dataset size and benchmarks are undisclosed, but label-only rationale filtering deserves retirement.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
00:35
21d ago
HuggingFace Papers (takara mirror)· rssEN00:35 · 05·19
Researchers propose worst-group equalized odds regularization for fair medical image classification
The paper proposes a worst-group equalized-odds margin regularizer that identifies subgroups with the largest margin deviations across attributes such as age, sex, and race, and reduces Equalized Odds and Equalized Opportunity disparities on two medical imaging datasets with minimal AUC impact.
#Vision#Alignment#Research release
why featured
HKR-K/R pass: the paper gives a concrete fairness mechanism and 2 medical-imaging datasets, with resonance around high-stakes bias. HKR-H fails, and single-paper impact keeps it in 60-71.
editor take
Two imaging datasets show lower EO gaps; AUC loss is undisclosed, so I’d test fixed-threshold transfer across hospitals first.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1

more

feeds

admin