ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-05-20

273 papers · updated 3m ago
2026-05-20 · Wed
23:04
19d ago
HuggingFace Papers (takara mirror)· rssEN23:04 · 05·20
When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering
OGCaReBench evaluates free-form clinical QA beyond guidelines using expert-validated case reports. GPT-5.2 answers 56% correctly as a baseline, specialized models reach 42%, and retrieval over medical articles raises GPT-5.2 performance to 82%.
#RAG#Reasoning#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but this is a single clinical QA benchmark, narrower than a general model or tool release. The 56% to 82% retrieval result places it at the top of 60–71.
editor take
OGCaReBench lifts GPT-5.2 from 56% to 82% with RAG; clinical long-tail QA cannot lean on parametric memory.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
18:17
19d ago
HuggingFace Papers (takara mirror)· rssEN18:17 · 05·20
UniVL: Unified Vision-Language Embedding for Spatially Grounded Image Generation
UniVL binds text semantics to spatial locations through one visual input, where instructions are rendered on the mask. On the 477K-image UniVL-ImgGen benchmark, it reduces FID from 14 to 11 and raises PSNR from 16 to 20. It removes the standalone T5-style text encoder, cutting inference TFLOPs by up to 52% and runtime by up to 44%.
#Multimodal#Vision#Embedding#UniVL
why featured
HKR-K and HKR-R pass: the item gives concrete benchmark and compute numbers tied to image-generation cost. As a single paper summary without open-source or major-lab impact disclosed, it stays in the 60–71 research-signal band.
editor take
UniVL cuts FID 14→11 on 477K masked images; rendering text into masks is clever, but the text interface is narrow.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
18:08
19d ago
HuggingFace Papers (takara mirror)· rssEN18:08 · 05·20
Benchmarking and Improving Monitors for Out-of-Distribution Alignment Failure in LLMs
MOOD evaluates LLM alignment-failure monitoring with one restricted training set and seven out-of-distribution test sets, and combining a guard model with Mahalanobis-distance and perplexity-based OOD detectors raises recall from 39% to 45%.
#Alignment#Safety#Benchmarking#MOOD
why featured
HKR-K is solid: MOOD gives a concrete setup and a 39%→45% recall gain. HKR-R lands on guardrail failure risk, but HKR-H is weak and the source shows no broad industry discussion, so it stays in the 60–71 band.
editor take
MOOD tests 1 training set against 7 OOD sets; 39% to 45% recall says bigger guards are a weak safety crutch.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
17:59
19d ago
arXiv · cs.AI· atomEN17:59 · 05·20
Variance Reduction for Expectations with Diffusion Teachers
CARV uses a hierarchical Monte Carlo estimator to reuse expensive upstream computation, delivering 2-3x effective compute multipliers in text-to-3D distillation and attribution experiments without changing the objective.
#Inference-opt#Multimodal#CARV#Research release
why featured
HKR-K and HKR-R pass: CARV reuses upstream compute and reports 2-3x effective gains. The diffusion-distillation focus is narrow and technically dense, so technical-accessibility keeps it in the 60-71 band.
editor take
CARV shows 2-3x effective compute on diffusion-teacher pipelines; single-step FID stays flat, so variance was not the bottleneck there.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
17:58
19d ago
arXiv · cs.AI· atomEN17:58 · 05·20
WikiVQABench Knowledge-Grounded Visual Question Answering Benchmark Released with Model Evaluations
WikiVQABench uses Wikipedia images, captions, and Wikidata to build a human-curated knowledge-grounded VQA benchmark, and evaluations of 15 VLMs from 256M to 90B parameters show accuracy ranging from 24.7% to 75.6%.
#Vision#Multimodal#Benchmarking#Wikipedia
why featured
HKR-K passes: WikiVQABench adds a testable benchmark and accuracy range across 15 models. HKR-H and HKR-R are weak, so this sits in the 60-71 research-release band.
editor take
WikiVQABench tests 15 VLMs from 256M-90B, scoring 24.7%-75.6%; Wikipedia plus Wikidata should punish synthetic-benchmark polish.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
17:55
19d ago
HuggingFace Papers (takara mirror)· rssEN17:55 · 05·20
Stream3D: Sequential Multi-View 3D Generation via Evidential Memory
Stream3D turns a frozen view-conditioned 3D generator into a streaming generator using constant cross-chunk evidential memory, which caches a fixed number of informative historical frames and avoids memory growth linear in sequence length without retraining, architecture changes, or auxiliary losses.
#Vision#Memory#Multimodal#Stream3D
why featured
HKR-K lands via the fixed-memory mechanism, and HKR-R lands on 3D generation cost. No major lab, benchmark number, or runnable release is disclosed, so it stays in the 60–71 band.
editor take
Stream3D streams single-view 3D with fixed-frame memory; frame count and metrics aren't disclosed, so don't overbuy training-free.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
17:19
19d ago
HuggingFace Papers (takara mirror)· rssEN17:19 · 05·20
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
PALS integrates GPU power caps as a control knob inside vLLM and jointly tunes them with batch size, improving energy efficiency by up to 26.3% across multi-GPU dense and MoE serving while reducing QoS violations by 4x to 7x under power constraints.
#Inference-opt#PALS#vLLM#Research release
why featured
HKR-K/R pass: PALS has a concrete mechanism plus 26.3% and 4-7x results tied to LLM serving cost. HKR-H is weak and the systems angle is narrow, so it stays in all.
editor take
PALS tunes power caps and batch size in vLLM for 26.3% better efficiency; this ugly systems work will matter for MoE serving.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
17:08
19d ago
HuggingFace Papers (takara mirror)· rssEN17:08 · 05·20
RoadTones: Tone-Controllable Text Generation from Road Event Videos
RoadTones introduces the RoadTones-51K dataset, RoadTones-VL-CoT model, and RoadTones-Eval suite for tone-controllable road video captioning, with evaluation covering factual consistency and tone adherence under human-validated data generation and a user study.
#Multimodal#Vision#Interpretability#RoadTones
why featured
HKR-K passes because the post gives three traceable artifacts: a dataset, model, and evaluation suite. HKR-H and HKR-R are weak; road-event captioning is useful research signal but too narrow for featured.
editor take
RoadTones ships a 51K road-tone dataset; no baseline scores disclosed, so I read it as AD alert-copy research.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
16:45
19d ago
arXiv · cs.CL· atomEN16:45 · 05·20
Post-Hoc Understanding of Metaphor Processing in Decoder-Only Language Models via Conditional Scale Entropy
The paper introduces conditional scale entropy, a wavelet-derived measure, and finds metaphorical tokens show higher spectral breadth than literal tokens across tested decoder-only models from 124M to 20B parameters, including GPT-2, LLaMA-2 7B, and GPT-oss 20B.
#Interpretability#Reasoning#GPT-2#LLaMA-2
why featured
HKR-K passes with a new CSE metric and stated model range; HKR-H/R fail because the angle is academic and has little practitioner pull. No hard exclusion, but this stays in the 40-59 low-value band.
editor take
CSE flags higher spectral breadth for metaphor tokens across 124M–20B models; I buy the signal, not a causal circuit yet.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
16:35
19d ago
arXiv · cs.CL· atomEN16:35 · 05·20
Findings of the Fifth Shared Task on Multilingual Coreference Resolution: Expanding Datasets for Long-Range Entities
The CODI-CRAC 2026 fifth shared task on multilingual coreference resolution added 5 datasets and 2 languages, with 10 participating systems including 4 LLM-based approaches, while traditional systems still led the results.
#Reasoning#Fine-tuning#Benchmarking#CODI-CRAC
why featured
HKR-K passes with 5 new datasets, 2 languages, 10 systems, and 4 LLM methods. HKR-H/R are weak: this is a narrow NLP shared-task report with little product pull or practitioner nerve, so it stays in low-value research-news range.
editor take
CODI-CRAC 2026 had 10 systems, 4 LLM-based; traditional systems still led, so long-range coref resists prompt-only swagger.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
16:10
19d ago
HuggingFace Papers (takara mirror)· rssEN16:10 · 05·20
OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation
OcclusionFormer uses the SA-Z dataset to model explicit occlusion order, decouples instances with a Diffusion Transformer, and composites overlapping regions through volume rendering.
#Vision#Multimodal#OcclusionFormer#SA-Z
why featured
HKR-K/R pass: the paper gives a concrete occlusion-order mechanism for controllable image generation. It lacks release details, benchmark numbers, or product impact, so it stays in the 60–71 band.
editor take
OcclusionFormer adds explicit Z-order for overlapping boxes. SA-Z size and metrics are undisclosed, so don’t buy “substantial gains” yet.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
14:48
19d ago
HuggingFace Papers (takara mirror)· rssEN14:48 · 05·20
Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
The paper proposes a structural latent points pretraining framework that inserts a point-wise latent VAE into a point-cloud autoencoder latent space and evaluates it on RLBench, ManiSkill2, and a real-robot platform.
#Robotics#Vision#Multimodal#RLBench
why featured
HKR-K passes with a concrete mechanism and three evaluation settings. HKR-H and HKR-R are weak, and the post gives no performance numbers or artifact, so this stays in all.
editor take
Structural latent points sit inside a point-cloud AE, but no success-rate numbers are disclosed; without tables, this is a 3D-rep candidate.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
14:19
19d ago
HuggingFace Papers (takara mirror)· rssEN14:19 · 05·20
Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models
LexNeo-Bench tests three multilingual LLMs on 3,050 Luxembourgish tokens across 34 prompt settings, and knowledge-graph prompts raise borrowing classification accuracy from 25–35% to 71–81% while neology detection remains sensitive to few-shot design.
#Benchmarking#RAG#Reasoning#LexNeo-Bench
why featured
HKR-H and HKR-K pass through the odd language hook and concrete benchmark numbers. HKR-R misses: the paper is useful NLP signal, but too niche for featured AI-industry discussion.
editor take
LexNeo-Bench tests 3,050 tokens on three multilingual LLMs; KG prompts hit 71–81%, so don’t raw-prompt low-resource lexicons.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
13:53
19d ago
HuggingFace Papers (takara mirror)· rssEN13:53 · 05·20
Semantic Granularity Navigation in Image Editing
NaviEdit decouples edit progress from model scale traversal with a training-free inference-time controller, reallocating a fixed step budget toward semantically responsive intermediate scales without changing the pretrained model; experiments report positive average gains across compatible editors and flow backbones, while the snippet does not disclose exact datasets or scores.
#Vision#Inference-opt#NaviEdit#Research release
why featured
HKR-K has a testable mechanism: training-free scale reallocation under a fixed step budget, and HKR-R fits image-editing cost/quality concerns. HKR-H is weak, and the post lacks concrete gain numbers, so this stays below featured.
editor take
NaviEdit reallocates fixed steps across intermediate scales; scores are undisclosed. Training-free is attractive, but portability still needs proof.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
13:49
19d ago
HuggingFace Papers (takara mirror)· rssEN13:49 · 05·20
Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding
The authors built Manga109-v2026 with OCR-based issue detection and manual revision, revising about 29,000 dialogue annotations across five issue types, including transcription errors, missing text regions, overlapping dialogue and onomatopoeia, and under-segmented speech balloons.
#Multimodal#Vision#Benchmarking#Manga109
why featured
HKR-K passes with a concrete dataset update: ~29k annotation fixes and a reproducible OCR-plus-human workflow. HKR-H/R are weak because manga understanding is a narrow benchmark topic for most AI practitioners.
editor take
Manga109-v2026 revises ~29K dialogue labels; stop treating old Manga109 as clean ground truth for manga OCR.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
13:45
19d ago
HuggingFace Papers (takara mirror)· rssEN13:45 · 05·20
Metaphors in Literary Post-Editing: Opening Pandora's Box?
The paper studies post-editing of literary translations from NMT and LLMs, finding that post-editors changed one in three metaphors and rated the MT output as poor, with post-editing requiring more work than translating from scratch.
#Benchmarking#Research release
why featured
HKR-H/K/R pass, but the paper is narrow literary-translation research, not a model release, product mechanism, or broad benchmark. It fits the 60-71 band as useful but not feature-level signal.
editor take
Post-editors changed one in three metaphors; for literary translation, LLM drafts trap translators inside bad first passes.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
13:04
19d ago
HuggingFace Papers (takara mirror)· rssEN13:04 · 05·20
SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary
SurgOnAir processes surgical video frames sequentially and generates commentary without future access, using the SurgOnAir-11k dataset with action-, step-, and phase-level supervision; the paper says code and dataset will be public, but the RSS snippet does not disclose benchmark scores or release dates.
#Vision#Multimodal#SurgOnAir#Research release
why featured
HKR-H and HKR-K pass: the hook is real-time surgical commentary, and SurgOnAir-11k adds three-level labels. The niche medical-vision scope lacks product traction or industry controversy, so it stays in 60–71.
editor take
SurgOnAir streams surgical commentary on SurgOnAir-11k; RSS gives no scores or latency, so “real-time” is still unproven.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
10:31
19d ago
HuggingFace Papers (takara mirror)· rssEN10:31 · 05·20
CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction
CHOIR reconstructs 4D hand-object interactions from monocular open-world videos. It initializes a coarse sequence, predicts ray-depth corrections, derives per-frame contact correspondences, and jointly optimizes geometry, timing, and contact constraints for 6D object pose, articulated hand motion, and physical consistency.
#Vision#Robotics#CHOIR#Research release
why featured
HKR-K passes via the concrete reconstruction mechanism, while HKR-H and HKR-R are weak. This is a narrow vision/robotics paper with no product, open-source artifact, or adoption data.
editor take
CHOIR reconstructs 4D hand-object interaction from monocular video; metrics are undisclosed, so I file it under robot data mining, not deployable perception.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
09:49
19d ago
HuggingFace Papers (takara mirror)· rssEN09:49 · 05·20
Towards UAV Detection in the Real World: A New Multispectral Dataset UAVNet-MS and a New Method
UAVNet-MS includes 15,618 temporally synchronized RGB-MSI data cubes with bounding boxes, and MFDNet improves AP50 by 6.2% over the best RGB-only method in evaluations against 20 detectors under RGB-only, MSI-only, and RGB+MSI protocols.
#Vision#Multimodal#Benchmarking#UAVNet-MS
why featured
HKR-K passes on concrete dataset size and benchmark delta; HKR-H and HKR-R are weak because this is a niche CV research release with limited product or practitioner impact.
editor take
UAVNet-MS has 15,618 RGB-MSI cubes; with 93.7% targets ≤32² pixels, MFDNet’s +6.2 AP50 feels field-relevant.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
09:47
19d ago
HuggingFace Papers (takara mirror)· rssEN09:47 · 05·20
Preserve, Reveal, Expand: Faithful 4D Video Editing with Region-Aware Conditioning
PREX splits the target spatiotemporal volume into Preserve, Reveal, and Expand regions, then uses calibrated observation-backed cues and a region-aware adapter on a frozen video diffusion backbone to reduce preservation drift, ghosting, and unstable extrapolation.
#Multimodal#Vision#Benchmarking#PREX
why featured
HKR-K passes via a concrete region-conditioning mechanism for reducing drift and ghosts. HKR-H and HKR-R are weak, and no metrics, code, or product path are disclosed, so this stays in all.
editor take
PREX splits targets into three evidence roles; I like the framing, but no PREBench size or deltas are disclosed.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R0
09:45
19d ago
HuggingFace Papers (takara mirror)· rssEN09:45 · 05·20
JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
The paper introduces JobArabi, an Arabic job-announcement corpus with 20,528 public X posts collected from January 2024 to October 2025 using 21 Arabic recruitment keyword families.
#Benchmarking#JobArabi#X#Research release
why featured
HKR-K passes on the 20,528-post Arabic corpus and date range. HKR-H and HKR-R fail: this is a niche dataset release with no product, model-capability, or industry-pressure angle.
editor take
JobArabi ships 20,528 X hiring posts; Arabic NLP needs more messy corpora like this, not another leaderboard.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
08:56
19d ago
HuggingFace Papers (takara mirror)· rssEN08:56 · 05·20
Research on Learning Action Duration in Fighting Games
The paper trains fighting-game RL agents in the open-source FightLadder environment to predict both an action and its duration, then tests different frame-skip settings; learned timing matches well-chosen fixed skips, but most high-skip agents perform best by repeating actions that exploit scripted built-in bots.
#Agent#Robotics#Benchmarking#FightLadder
why featured
HKR-H and HKR-K pass: the game framing is clickable, and the post gives testable action-duration and frame-skip mechanics. The niche RL benchmark has limited impact on mainstream AI products or practitioner workflows, so it sits in the 60-71 band.
editor take
FightLadder agents learn action duration; high-skip wins mostly spam scripted bots, so I don’t buy it as robust timing.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
08:55
19d ago
HuggingFace Papers (takara mirror)· rssEN08:55 · 05·20
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching
FlowLong generates long videos at inference time with overlapping sliding windows and Tweedie matching, requiring no extra training; the post says it reaches several times the native window length, but does not disclose exact frame counts.
#Multimodal#Vision#Inference-opt#FlowLong
why featured
HKR-H and HKR-K pass: the paper offers an inference-time mechanism for longer video without training. Frame counts, model comparisons, and release details are not disclosed, keeping it in the 60–71 band.
editor take
FlowLong extends video with sliding windows and Tweedie matching, no training; exact frames are missing, so don’t buy “several times” yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
08:47
19d ago
HuggingFace Papers (takara mirror)· rssEN08:47 · 05·20
JFAA: Technical Report for the EPIC-KITCHENS-100 Action Anticipation Challenge at EgoVis 2026
JFAA achieved first place in the EgoVis 2026 EK-100 Action Anticipation Challenge, using a frozen V-JEPA 2.1-style encoder and predictor, a lightweight attentive probe for verb, noun, and action logits, and a field-aware ensemble over selected epoch-level predictions.
#Vision#Benchmarking#JFAA#EPIC-KITCHENS-100
why featured
HKR-K passes through the concrete V-JEPA 2.1 frozen-stack method; HKR-H and HKR-R are weak outside vision benchmarking, so it stays in the 60–71 band.
editor take
JFAA won EK-100 anticipation, but scores are undisclosed; frozen V-JEPA 2.1 plus a small probe smells like representation wins, not architecture.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
08:31
19d ago
HuggingFace Papers (takara mirror)· rssEN08:31 · 05·20
FruitEnsemble: MLLM-Guided Arbitration for Heterogeneous Ensemble in Fine-Grained Fruit Recognition
FruitEnsemble builds a dataset with 306 fruit categories and 116,233 samples, then triggers MLLM arbitration when ensemble confidence falls below 0.6, achieving 70.49% classification accuracy in fine-grained fruit recognition.
#Multimodal#Vision#Reasoning#FruitEnsemble
why featured
HKR-K passes with dataset size, arbitration threshold, and accuracy; HKR-H/R are weak because the niche fruit-vision task has little product or industry impact. No hard exclusion, but it stays in the lower research-update band.
editor take
FruitEnsemble hits 70.49% on 306 fruit classes and 116,233 samples; the 0.6-confidence MLLM arbiter smells like an engineering patch.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
07:14
20d ago
HuggingFace Papers (takara mirror)· rssEN07:14 · 05·20
OSGNet with MLLM Reranking at Ego4D Episodic Memory Challenge 2026
The OSGNet team generated candidate segments with an existing localization model, then used an MLLM reranker to select the segment matching each query, achieving first place in both Natural Language Queries and GoalStep tracks at the CVPR 2026 Ego4D Episodic Memory Challenge.
#Multimodal#Vision#Reasoning#OSGNet
why featured
HKR-H and HKR-K pass: a lightweight reranking setup wins two tracks. HKR-R fails because Ego4D episodic memory is a niche vision benchmark with limited practitioner pull, so it stays in all.
editor take
OSGNet used MLLM reranking and won two Ego4D tracks; practical trick, but the snippet gives no lift numbers.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
06:14
20d ago
HuggingFace Papers (takara mirror)· rssEN06:14 · 05·20
VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering
VIHD uses targeted visual token masking to calibrate semantic entropy for hallucination detection in medical VQA, and experiments cover three medical VQA benchmarks and two medical MLLMs; the post does not disclose exact scores or the names of the compared models.
#Multimodal#Vision#Safety#VIHD
why featured
HKR-H/K/R all pass: the mechanism, test setup, and medical-safety angle are clear. Missing scores and a niche medical VQA setting keep it in the 60–71 all band, not featured.
editor take
VIHD spans 3 medical VQA benchmarks and 2 MLLMs; no scores or model names disclosed, so the masking idea outruns the evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
05:07
20d ago
HuggingFace Papers (takara mirror)· rssEN05:07 · 05·20
Rethinking Cross-Layer Information Routing in Diffusion Transformers
The paper proposes Diffusion-Adaptive Routing as a drop-in replacement for residual addition in DiTs, reducing SiT-XL/2 FID from 9.67 to 7.56 on ImageNet 256×256 and matching the baseline’s converged quality with 8.75× fewer training iterations.
#Vision#Inference-opt#Fine-tuning#Research release
why featured
HKR-K and HKR-R pass: DAR replaces DiT residual addition, with FID and 8.75x iteration deltas. HKR-H is narrow, and no code or broad replication is disclosed, so this stays all.
editor take
DAR cuts SiT-XL/2 FID from 9.67 to 7.56; DiT residual streams were stale debt, now paid in iterations.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:41
20d ago
HuggingFace Papers (takara mirror)· rssEN04:41 · 05·20
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
The paper proposes LFD, an LLM-assisted feature discovery method that screens lexical and semantic features with cross-LLM Cohen’s κ and residual held-out predictive gain. Across 10 text-classification tasks over 7 corpora, plus human audits with 232 raters, LFD matches a strong text bottleneck baseline while producing clearer, less label-entangled features.
#Interpretability#Alignment#Benchmarking#Research release
why featured
HKR-K passes with a concrete method, filtering mechanism, and validation scale. HKR-H and HKR-R are weak: the title is academic, and the article does not show a practitioner-facing impact path.
editor take
LFD screens features with cross-LLM Cohen’s κ; 10 tasks and 232 raters look solid, but show me failures, not averages.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches
Shaoke Fang and coauthors propose SAECache, a semantic-adaptive eviction policy for LLM prefix KV caches, and report 1.4x-2.7x TTFT improvement over production-style baselines across heterogeneous workloads.
#Inference-opt#Shaoke Fang#Ziang Li#SAECache
why featured
HKR-H/K/R pass via the cache hook, semantic eviction mechanism, and 1.4-2.7x TTFT claim. It stays below featured because this is an arXiv-only inference paper with no disclosed code, deployment scale, or independent replication.
editor take
SAECache reports 1.4-2.7x TTFT gains; the 756x reuse gap makes LRU look painfully blunt.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Language Model Memory and Memory Models for Language
arXiv 2602.13466v2 reports that language model embeddings retain little input information across data and compute scales. Autoencoders trained for input regeneration form near-perfect memories, while combined causal and information-retention objectives train encoder-decoder memory models to store and decode information-rich memories.
#Memory#Embedding#Inference-opt#arXiv
why featured
HKR-H/K/R all pass: the memory claim is counterintuitive, the mechanism is concrete, and agent/RAG builders care. Single arXiv source with abstract-level detail only; no code, scale, or adoption disclosed, so it stays in the 60–71 band.
editor take
arXiv 2602.13466v2 says LM embeddings retain little input information; I buy the warning against betting memory compression on causal loss alone.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
The Routing and Filtering Structure of Attention
Shafayeth Jamil and Rehan Kapadia decompose 1,776 attention heads across five pretrained transformers, introduce S-Dattention to separate routing from filtering, and report that linearizing the first seven layers of a 125M S-Dattention model costs under 5% perplexity while standard attention collapses under the same intervention.
#Interpretability#Inference-opt#Benchmarking#Shafayeth Jamil
why featured
HKR-K is strong and HKR-H works for interpretability readers; HKR-R is weak with no cost, jobs, safety, or competition hook. The arXiv paper has concrete results, but limited author/institution pull and no clear deployment path keep it in 60-71.
editor take
S-Dattention decomposes 1,776 heads; linearizing seven 125M layers costs under 5% PPL. I buy the compression signal, not the mysticism.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
EngiAI introduces a three-part benchmark and a LangGraph multi-agent reference system with seven specialized agents for simulation, RAG, HPC orchestration, and 3D printer control; proprietary models reach 96-97% average task completion on Beams2D, while open-source 4B models reach 55-78%.
#Agent#RAG#Benchmarking#EngiAI
why featured
HKR-H/K/R all pass, but this is an arXiv paper in a niche engineering-design benchmark, not a major lab release or broad product update. Concrete mechanisms and completion rates put it high in the 60-71 band.
editor take
EngiAI benchmarks 7 engineering agents; don’t overread 96-97% on Beams2D when Photonics2D branching falls to 20-53%.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Dr.LLM: Dynamic Layer Routing in LLMs
Dr.LLM adds lightweight per-layer routers to frozen pretrained LLMs, choosing whether to skip, execute, or repeat transformer blocks; on ARC and DART it improves accuracy by up to 3.4 percentage points while saving 5 layers per example on average, with code released on GitHub.
#Inference-opt#Reasoning#Tools#Dr.LLM
why featured
HKR-H/K/R pass, but this is a single arXiv research item with ARC/DART gains and five-layer savings only. Model scale, reproducibility, and production evidence are not disclosed, so it stays in all.
editor take
Dr.LLM gains up to 3.4pp on ARC/DART and saves 5 layers; MCTS-labeled routing is practical, but training cost matters.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions
WARC-Bench evaluates multimodal AI agents on 438 archived-web subtask executions, including date pickers and container scrolling; the best observed computer-use model reaches 64.8% success, supervised fine-tuning reaches 48.8%, and RLVR training over SFT checkpoints raises performance to 52.8% under data-scarce conditions.
#Agent#Multimodal#Benchmarking#WARC-Bench
why featured
HKR-K/R pass: WARC-Bench adds concrete GUI-agent task counts and success rates, with direct relevance to agent evaluation. HKR-H is weak, and this is a single arXiv benchmark, so it stays in the 60–71 band.
editor take
WARC-Bench tests 438 web subtasks, topping at 64.8%; archived replay makes GUI evals less hostage to live-site drift.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting
TEMPO trains LLMs to enforce cutoff-date evidence selection in backtesting with a two-mode reward and a GRPO pipeline; across 3 prediction tasks and 2 models, it reduced post-cutoff leakage from 2–13% to 0.6–3.7% and improved task performance by 6–13% when strong pre-cutoff signals existed.
#Reasoning#Alignment#Benchmarking#TEMPO
why featured
HKR-K/R pass: the paper offers a concrete mechanism and leakage-rate numbers, and it touches evaluation trust. HKR-H is weak, and this is a single arXiv paper without code or visible industry debate, so it stays at the top of 60–71.
editor take
TEMPO cuts leakage from 2–13% to 0.6–3.7%; backtesting benchmarks need temporal discipline before accuracy claims.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
HoReN: Normalized Hopfield Retrieval for Large-Scale Sequential Model Editing
HoReN wraps a single MLP layer with discrete key-value memory for parameter-preserving model editing. On ZsRE, it scales to 50K sequential edits while keeping overall performance above 0.93, while the abstract says prior editors collapse or degrade before 10K edits.
#Memory#Fine-tuning#Benchmarking#HoReN
why featured
HKR-K is solid via the mechanism and 50k-edit result; HKR-R lands for model editing and memory teams. HKR-H is weak, and the paper remains too specialized for featured.
editor take
HoReN stays above 0.93 after 50K ZsRE edits; wrapping one MLP layer looks more maintainable than parameter surgery.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support
MoBayes confines the LLM to a language interface, while a Bayesian module tracks posteriors, selects follow-up questions by expected information gain, and uses calibrated thresholds to decide when to stop or defer.
#Reasoning#Safety#Tools#MoBayes
why featured
HKR-H/K/R pass, but the item only discloses mechanisms, not results, code, or clinical validation conditions. As an arXiv methods paper, it is useful signal, not a featured-grade release.
editor take
MoBayes keeps LLMs as the chat layer and moves posteriors to Bayes; clinical AI shouldn't bet diagnosis on token sampling.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
In-Context Learning Operates as Concept Subspace Learning
Wei Tang and three coauthors frame in-context learning as concept subspace learning, showing that on CounterFact-derived multi-relation prompts with Llama-3-8B, a 68–73-dimensional subspace of the 4096-dimensional residual stream restores 78.8% of the clean–corrupted accuracy gap, while patching the complementary subspace restores 0%.
#Reasoning#Interpretability#Benchmarking#Wei Tang
why featured
HKR-H/K pass: the paper offers a clear ICL mechanism claim plus testable numbers, including 68–73D subspaces and 78.8% recovery. HKR-R is weak because it is mechanistic research, not a product or workflow shift.
editor take
Llama-3-8B recovers 78.8% with 68–73 dims; ICL circuits remain open, but subspace stories got harder to dismiss.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
ARC-RL Reinforcement Learning Playground Introduces Four MuJoCo Continuous Control Environments
ARC-RL introduces four MuJoCo continuous-control environments covering the 18-DoF Queen, 12-DoF Bastion, 18-DoF Tick, and 12-DoF Leaper, and compares SAC, SPEQ, SOPE-EO, plus prior-data variants under shared observations, actions, cadence, and a closed-form reward.
#Robotics#Benchmarking#ARC Raiders#MuJoCo
why featured
HKR-H/K pass: the title has a game-inspired benchmark hook, and the summary gives 4 MuJoCo envs, DoF counts, and algorithm comparisons. Audience fit is narrow for RL/control, so it stays below featured.
editor take
ARC-RL ships 4 MuJoCo tasks; game-creature RL benchmarks are fresh, but code availability is undisclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding
Graft optimizes speculative decoding with a sequential prune-then-graft mechanism, reaching up to 5.41x speedup on short-context benchmarks and improving average speedup over EAGLE-3 by up to 21.8% on Qwen3-235B.
#Inference-opt#Benchmarking#Yuhao Shen#Tianyu Liu
why featured
HKR-K and HKR-R are solid: Graft has a concrete prune-then-graft mechanism and speedup numbers. HKR-H is niche; without code, production deployment, or a major-lab signal, this stays in all.
editor take
Graft hits 5.41x on short context; I trust training-free pruning tricks more than brute-force bigger draft trees.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts
The paper introduces ReElicit, a Bayesian optimization framework that uses an LLM to elicit feature spaces from task descriptions, prior prompts, and scalar scores; across 10 system-prompt optimization tasks, it reports the strongest aggregate performance among aggregate-only baselines under a 30-evaluation budget per task.
#Embedding#Tools#Benchmarking#arXiv
why featured
HKR-H and HKR-K pass: the mechanism is novel and the setup names 10 tasks with 30 evaluations each. HKR-R is weak because no performance lift is disclosed, keeping it in the upper 60–71 band.
editor take
ReElicit leads on 10 tasks with 30 evaluations each; using LLMs as feature engineers beats treating them as prompt spammers.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
MaxShapley: Towards Incentive-Compatible Generative Search with Fair Context Attribution
MaxShapley computes fair attribution for generative search with a decomposable max-sum utility function, matching exact Shapley-level attribution quality on HotPotQA, MuSiQUE, and MS MARCO while reducing resource consumption by up to 9x versus prior state-of-the-art methods at the same accuracy.
#RAG#Benchmarking#MaxShapley#Research release
why featured
HKR-H/K/R all pass, but this is still a single arXiv RAG-attribution paper with no disclosed production deployment or artifact in the feed. Defaulting to the lower band keeps it at all.
editor take
MaxShapley cuts tokens up to 9x on 3 QA sets; fair search payouts first hit an engineering-cost wall.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
Prompt2Fingerprint reformulates LLM fingerprinting as conditional parameter generation, mapping textual identity descriptions to low-rank parameter increments in one forward pass. The abstract says P2F avoids separate fine-tuning for each new identity and reports high fingerprint accuracy, harmlessness, and robustness, but it does not disclose model sizes, datasets, or exact overhead numbers in the RSS snippet.
#Fine-tuning#Safety#Tools#Research release
why featured
HKR-H/K/R all pass, but the supplied facts stop at a title-level mechanism with no authors, metrics, artifact, or deployment case. This fits the upper 60–71 band for a single arXiv research release.
editor take
Prompt2Fingerprint generates LoRA-style deltas in one pass; no model sizes or overhead figures, so its robustness claim is unverified.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection
LiMA reformulates black-box attribution as submodular subset selection and reports 36.3% higher Insertion and 39.6% higher Deletion across eight foundation models. The paper also reports 1.6x faster attribution than naive greedy search, with code released on GitHub.
#Interpretability#Vision#Benchmarking#LiMA
why featured
HKR-K/R pass: the paper gives a concrete method, 8-model evaluation, and open code for interpretability work. HKR-H is weak, and as a single arXiv paper without deployment evidence it stays below featured.
editor take
LiMA reports +36.3% Insertion and +39.6% Deletion across 8 models; black-box attribution finally looks like optimization, not heatmap aesthetics.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
PhyWorld Physics-Faithful World Model for Video Generation Research Paper Released
PhyWorld improves video continuation with two-stage post-training: flow-matching fine-tuning for stable motion, then DPO on physics preference pairs, reaching 0.769 average VBench score and 3.09 on its physical-faithfulness benchmark.
#Multimodal#Vision#Fine-tuning#PhyWorld
why featured
HKR-H and HKR-K pass: the title has a physics-faithfulness hook, and the post gives a two-stage training mechanism plus scores. As a single arXiv paper without product release, open weights, or major-lab signal, it stays in the interesting band.
editor take
PhyWorld scores 3.09 on physics, up 0.10; that margin cannot carry “world model” branding.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence
The paper introduces Protocol-Driven Development, defining a protocol as P=(S,B,O) and admitting generated implementations only when they satisfy structural, behavioral, and operational invariants with a verifiable Evidence Chain.
#Code#Tools#Safety#Research release
why featured
HKR-K/R pass: PDD uses protocols, invariants, and Evidence Chains to govern generated software, a real AI-coding reliability issue. But it is an arXiv method paper with no benchmark, tool release, or production case disclosed.
editor take
PDD defines protocols as P=(S,B,O) and gates code via Evidence Chain; I buy the direction, but no evaluation scale is disclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information
The paper introduces OW and ISP, two training-free aggregation algorithms that use first- and second-order information, and reports better performance than majority-voting baselines on synthetic data, UltraFeedback, MMLU, and ARMMAN.
#Agent#Reasoning#Benchmarking#arXiv
why featured
HKR-K/R pass: the paper gives named methods and benchmarks tied to LLM aggregation reliability. HKR-H is weak, and as a single arXiv method paper without production adoption evidence, it stays in the 60–71 band.
editor take
OW and ISP beat majority voting on 4 eval sets; no gains disclosed, so I’d test correlated model votes first.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production
The paper presents a Document AI microservice architecture that processes thousands of multi-page documents per hour; batch profiling shows OCR, not LLM parsing, dominates end-to-end latency, and system saturation is determined by shared GPU inference capacity rather than worker count.
#Vision#Inference-opt#Tools#arXiv
why featured
HKR-K and HKR-R pass: it gives throughput, latency bottlenecks, and a concurrency mechanism. HKR-H is weak, and the arXiv architecture angle fits the 60–71 practical-signal band.
editor take
This runs thousands of multi-page docs per hour; OCR dominates latency, so stop blaming LLM parsing first.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Vision-OPD: Multimodal Large Language Model Improves Fine-Grained Vision Understanding via Self-Distillation
Vision-OPD uses the same MLLM to instantiate a crop-conditioned teacher and a full-image student, then minimizes token-level divergence on student on-policy rollouts; the method requires no external teacher, ground-truth labels, reward verifier, or inference-time tool use, and the abstract reports competitive or superior results on multiple fine-grained vision benchmarks.
#Multimodal#Vision#Fine-tuning#Vision-OPD
why featured
HKR-H/K/R pass, but the body gives only the method sketch; benchmark gains, code, affiliations, and reproducible setup are not disclosed. Solid arXiv method paper, below featured threshold.
editor take
Vision-OPD uses one MLLM as crop teacher and full-image student; I buy the idea, but no benchmark numbers means SOTA-smell.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation
MetaFine decomposes fine-grained manipulation evaluation into understanding, perception, and controlled behavior, and the paper says binary success rates inflate reported embodied-AI capability by up to 70%. The framework rebuilds heterogeneous benchmarks into diagnostic scenarios, evaluates VLA models, identifies local spatial preservation in the visual encoder as a bottleneck, and plans a public release at metafine.github.io.
#Robotics#Vision#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the 70% inflation claim and 3-axis diagnostic frame add signal. Scope is narrow robotics evaluation, so it stays below featured.
editor take
MetaFine says binary success rates inflate capability by up to 70%; good cut, but model roster and replication details aren’t disclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems
The paper reports a single-subject autoethnographic case in which System A, a multimodal prompt-engineering setup for offloading self-regulation to an LLM, was followed within 48 hours by transferred decision authority, use of outputs to deflect criticism, and reduced self-initiated reasoning observed by two uninformed witnesses; System B used physical conversation isolation and avoided analogous failures.
#Safety#Multimodal#Memory#Research release
why featured
HKR-H/K/R all pass, but the evidence is a single self-report case plus two blinded observers. It is useful safety signal, not a strong empirical release for featured.
editor take
Single-subject autoethnography saw System A shift agency within 48 hours; thin evidence, but prompt isolation against emotional context contamination is a real trap.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-Making
The paper proposes attribution-based human prior alignment that encodes priors as input regions, penalizes off-prior evidence during training, and validates the method on image classification plus MLLM-based GUI agent click decision tasks.
#Interpretability#Alignment#Agent#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv methods paper with mechanism and task validation only; no code, scale result, or top-lab signal is disclosed, so it stays at the high end of 60–71.
editor take
They penalize off-region attribution with human priors, but disclose no gains; GUI-agent clicks make this more useful than another classifier paper.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Neuron Incidence Redistribution for Fairness in Medical Image Classification
The paper proposes NIR, a regularizer that needs no demographic labels during training; on HAM10000, it reduces TPR disparity from 10.81% to 0.93% across age groups and from 12.04% to 0.74% across gender, while improving AUC by 0.51 points.
#Vision#Safety#arXiv#HAM10000
why featured
Single arXiv medical-imaging fairness paper with a clear mechanism and HAM10000 gap reductions, so HKR-H/K/R pass lightly; narrow deployment scope and no code, product, or cross-source pickup keep it in 60–71.
editor take
NIR cuts HAM10000 age TPR gap to 0.93%; label-free fairness is neat, but multicenter clinical transfer needs proof.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
The paper introduces noise-robust GRPO and Dr.GRPO, models reward corruption as Bernoulli flip noise, applies correction after estimating flip probabilities, and reports gains of up to 6.7 percentage points on math tasks and 1.5 points on code tasks under realistic reward-model conditions.
#Reasoning#Alignment#Fine-tuning#Research release
why featured
HKR-H and HKR-K pass: noise-corrected GRPO gives a concrete mechanism and measured gains. HKR-R is weak because this is a niche training paper with abstract-level evidence only.
editor take
Dr.GRPO reports up to +6.7 math accuracy points; reward-noise correction looks like cheaper gain than more prompt tuning.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
OpenCompass: A Universal Evaluation Platform for Large Language Models
The paper proposes and open-sources OpenCompass, a general LLM evaluation platform with 5 components: configuration, task partitioning, execution and scheduling, task execution, and result visualization; it supports rule-based, LLM-as-a-Judge, and cascaded evaluators.
#Benchmarking#Reasoning#Code#OpenCompass
why featured
HKR-K and HKR-R pass: the paper gives a concrete evaluation architecture and targets a real LLM-eval pain point. HKR-H misses, and the article lacks a major result or cluster signal, so it stays in all.
editor take
OpenCompass ships a 5-part eval stack; benchmark coverage is undisclosed, and eval platforms win on dataset governance, not diagrams.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
TADA! Tuning Audio Diffusion Models through Activation Steering
TADA uses activation patching to identify a semantic bottleneck in audio diffusion models: a small shared set of consecutive attention layers controls concepts such as instruments, vocals, and genres, and the paper compares activation steering with prompt-level, score-space, and weight-space interventions on a new benchmark with a user study.
#Audio#Interpretability#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the counterintuitive hook is semantic control via a few layers, with activation patching, a benchmark, user study, and 4 interventions. HKR-R is limited; no product or platform impact, so it stays in 60–71.
editor take
TADA compares 4 audio steering methods; user-study size is undisclosed, so the SOTA claim needs replication.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Research proposes Pion optimizer to improve vision-language and reinforcement learning training
Chongyu Fan and coauthors propose Pion as a drop-in Muon replacement, using high-pass Newton-Schulz iterations to suppress noisy tail singular components; with VLA-Adapter on LIBERO Object, Pion reaches a 100% success rate after 1,500 training steps, versus 97.0% for Muon and 32.2% for AdamW.
#Fine-tuning#Robotics#Inference-opt#Chongyu Fan
why featured
HKR-H/K pass: the title frames a Muon failure and the post gives Pion plus a 1,500-step robotics result. HKR-R is narrow; spectral analysis and NS iteration limit reach, with no open-source or cross-source signal.
editor take
Pion hits 100% on LIBERO Object at 1,500 steps; I’d reproduce the RLVR Muon-to-zero collapse first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing
TwinRouterBench provides two routing evaluation tracks. The static track includes 970 router-visible prefixes from 520 instances, and the dynamic track runs routers on the 500-case SWE-bench Verified suite with official task resolution and realized API spend.
#Agent#Benchmarking#Inference-opt#CommonstackAI
why featured
HKR-K/R pass: the two-track design and SWE-bench Verified 500 setup give practitioners concrete eval data. HKR-H is weak, and a single arXiv benchmark stays in the 60–71 band.
editor take
TwinRouterBench gives routers 970 mid-step prefixes; I like that it drops LLM judges and ties savings to task resolution.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection
The paper introduces ABSS, a training-free inference-time method that ranks candidate seeds using cross-attention to prompt core tokens during the first denoising steps, keeps only the top-k for full generation, and reports improved alignment and visual quality for Stable Diffusion variants across three benchmarks.
#Vision#Inference-opt#Multimodal#Stable Diffusion
why featured
HKR-H/K/R pass: ABSS gives a concrete early-denoising seed-selection mechanism across three benchmarks. Impact stays inside T2I diffusion workflows, with no code, major-lab release, or cross-source cluster.
editor take
ABSS filters seeds via early cross-attention; candidate count and extra compute are undisclosed, so don’t call it free quality.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds
The paper introduces CGR, an evaluation protocol for executable MCQA scaffolds, and reports 66.21% macro assisted accuracy versus 38.11% direct accuracy on 20,498 retained MCQA result rows, while assisted inference uses a larger solver-call budget and some generated programs violate the no-hard-coding instruction.
#Reasoning#Code#Tools#Research release
why featured
HKR-K and HKR-R pass: the paper gives concrete accuracy numbers and a budget caveat for code-guided SLM reasoning. HKR-H is weak, and as a single arXiv eval paper it stays below featured.
editor take
CGR gains 28.10 points on 20,498 MCQA rows, but with bigger solver-call budgets; audit hard-coding before celebrating.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Language Models Struggle with Compartmentalization
The paper shows that LLMs can learn parallel internal representations for different presentations of the same latent concept; in small models, early multilingual learning is nearly fully compartmentalized, and synthetic parallel data does not reliably fix the issue.
#Benchmarking#Interpretability#Reasoning#Research release
why featured
HKR-H/K pass: the paper has a counterintuitive representation-learning claim and testable findings on isolation plus parallel data. It remains a single arXiv research item with unclear practitioner impact, so it stays below featured.
editor take
Small models nearly fully compartmentalize early multilingual learning; parallel data is no magic glue for shared concepts.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Multi-axis Analysis of Image Manipulation Localization
The paper introduces AUDITS, an image manipulation detection benchmark with over 530K images from user and news photo sources, covering diffusion-based inpainting manipulations across types, sizes, and domain-shift evaluation conditions.
#Vision#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R pass, but this is a single arXiv vision benchmark with no disclosed open-source artifact, broad model impact, or cross-source pickup. It fits the 60–71 research-signal band.
editor take
AUDITS ships 530K images for manipulation localization; news-domain shift matters, but diffusion inpainting alone is a narrow threat model.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection
LYNX remaps low-affinity token-to-expert assignments within each batch using AffinityBinning, reducing invoked experts and improving throughput by up to 1.30x across four model families and nine benchmarks while keeping accuracy loss below 1 percentage point.
#Inference-opt#Benchmarking#LYNX#Research release
why featured
HKR-K/R pass: the 1.30x throughput and <1 percentage point accuracy loss are testable, and MoE serving cost matters. HKR-H is weak, and the systems-heavy mechanism keeps it in the 60–71 band.
editor take
LYNX gets up to 1.30x throughput on 4 model families and 9 benchmarks; batch-local routing surgery beats another MoE kernel chase.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Inferring Sensitive Attributes from Knowledge Graph Embeddings: Attack and Defense Strategies
The paper studies attribute inference attacks on knowledge graph embedding outputs and proposes post-processing sanitization as a defense. Preliminary results show the attacks work on KGE model outputs, then evaluate the trade-off between recommendation quality and privacy protection under randomization-based approaches.
#Embedding#Reasoning#Safety#Research release
why featured
HKR-H/K/R all pass, but the body gives only abstract-level detail: no datasets, attack success rates, or utility-loss numbers. This is useful academic safety work, not a featured industry story.
editor take
KGE outputs leak sensitive attributes; datasets and attack rates are undisclosed. Don’t oversell sanitization when randomization taxes recommendation quality.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training
Hybrid-LoRA applies full fine-tuning to 10% of selected modules and LoRA to the remaining candidates, using a Hybrid-LoRA Score to rank low-rank sensitivity; experiments report performance close to full fine-tuning and gains of up to 5.65%, averaging 4.36%, over the best PEFT post-training baseline.
#Fine-tuning#Reasoning#Alignment#Research release
why featured
HKR-K is clear: 10% of modules get full fine-tuning while the rest use LoRA, with +5.65% max and +4.36% average over PEFT baselines. HKR-R hits the tuning cost/quality tradeoff, but this is a single arXiv method paper, so it stays in the 60–71 band.
editor take
Hybrid-LoRA fully tunes 10% of modules and beats PEFT by 4.36% average; I buy it, but memory costs are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models
The paper introduces an automated benchmark generation framework that grounds problems in reference materials such as textbooks, uses a multi-agent pipeline and solution-graph strategy, generates 3 benchmarks in machine learning, corporate finance, and personal finance, and evaluates 12 commercial and open-source models.
#Agent#Benchmarking#arXiv#MMLU
why featured
HKR-K and HKR-R pass: the paper gives a concrete benchmark-generation mechanism and evaluation scale, and it touches model-eval pain points. But it is a single arXiv paper with no disclosed result strength, so it stays in the 60–71 band.
editor take
The paper builds 3 fine-grained benchmarks for 12 models; no error-rate numbers disclosed, so don’t bank on the MMLU claim yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Theory-optimal Quantization Based on Flatness
The paper proposes BDQ, a post-training quantization framework, and reports under 1% accuracy drop for W4A4 quantization on LLaMA-3-8B.
#Inference-opt#LLaMA#DeepSeek#Research release
why featured
HKR-K and HKR-R pass: BDQ gives a testable LLaMA-3-8B W4A4 result and maps to inference cost. HKR-H fails, and the single arXiv quantization paper is technical, so it stays in the 60–71 band.
editor take
BDQ reports under 1% drop on LLaMA-3-8B W4A4; if reproducible, low-bit PTQ costs get repriced.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs
SAGE reshapes the reverse-KL anchor distribution with a guide function q(x,y) for RLVR training, targeting the exploration constraint that keeps policies near the reference distribution; the paper reports consistent gains in both pass@1 and pass@k across challenging mathematical reasoning benchmarks and releases code at github.com/tally0818/SAGE.
#Reasoning#Alignment#Benchmarking#SAGE
why featured
HKR-K and HKR-R pass: the item gives a concrete RLVR mechanism and open code tied to reasoning gains. HKR-H is weak, and exact lift numbers are not disclosed, so it stays in all.
editor take
SAGE reshapes reverse-KL anchors via q(x,y); I buy the setup, since RLVR pass@k stalls don’t smell like temperature tuning.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Descriptive versus Regulatory Uncertainty in Bounded Predictive Systems
The paper separates descriptive uncertainty from regulatory uncertainty and proves current transformers only have descriptive uncertainty at inference. The authors test three local language models with 3B, 8B, and 70B parameters; token entropy stays within 0.011–0.028 nats while task accuracy ranges from 0% to 100%.
#Reasoning#Interpretability#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv theory paper with no code, named lab, or deployment signal; the 60–71 band fits better than featured.
editor take
Authors test 3B/8B/70B models: entropy stays 0.011–0.028 nats. The energy-cost framing is wild, but hard to operationalize.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Fine-tuning Large Language Models for Automated Algorithm Design
The paper fine-tunes Llama-3.2-1B-Instruct with DAR sampling and DPO across three algorithm-design tasks, reports gains over its off-the-shelf baseline, and matches Llama-3.1-8B-Instruct on the admissible set problem; the code is available on GitHub, while exact metric values are not disclosed in the RSS snippet.
#Fine-tuning#Code#Benchmarking#Llama
why featured
HKR-H/K/R pass via the 1B-vs-8B hook, DAR+DPO method, and cost angle. Single arXiv paper in a niche algorithm-design benchmark lacks broad product or ecosystem impact, so it stays in 60–71.
editor take
DAR+DPO-tuned Llama-3.2-1B beats its base on 3 algorithm tasks; exact metrics are missing, so no victory lap yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems
ARM introduces Agentic Reasoning Modules, found by tree search over code starting from simple CoT modules and mutated using reflection on execution traces. The abstract says ARM-based multi-agent systems outperform manual and automatic MAS designs across models and task domains, but the snippet does not disclose exact benchmark scores.
#Agent#Reasoning#Code#Research release
why featured
HKR-K/R pass on the code-space search mechanism and agent reliability angle; HKR-H is weak. No scores, artifact, or experiment detail are disclosed, so it stays in the 60–71 band.
editor take
ARM searches code trees to mutate CoT modules; no scores are disclosed, so don’t buy the “significantly outperforms” claim yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
TSR moves lightweight tree-style search into training rollouts for multi-turn LLM agents, selects high-scoring actions per turn with state feedback, and reports up to 15% gains with PPO and GRPO on Sokoban, FrozenLake, and WebShop.
#Agent#Reasoning#Tools#Research release
why featured
HKR-K has a concrete rollout mechanism and 15% result; HKR-R hits multi-turn agent training quality. HKR-H is weak, and this is a single arXiv method without an artifact or adoption signal.
editor take
TSR adds tree search to training rollouts and reports 15% gains; I buy the direction, but “modest compute” lacks numbers.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SEAL: Semantic Aware Image Watermarking
SEAL embeds semantic information from generated images into image watermarks and infers key patterns with locality-sensitive hashing, so verification does not require a database of used keys; the paper tests two attack conditions: reusing extracted initial noise to generate a new image, and inserting an unrelated object while preserving the watermark.
#Vision#Safety#Research release#Safety/alignment
why featured
HKR-K/R pass: the summary gives semantic watermarking, LSH key-pattern inference, and two attack settings. HKR-H is weak; no lab, metrics, or artifact is disclosed, keeping it in the normal research band.
editor take
SEAL verifies watermarks via semantic embeddings and LSH, no key database; two attacks tested, still far from production forensics.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management
The paper separates two settings for Transformer Turing-completeness: a fixed autoregressive Transformer with fixed context management, and a scaling family with increasing context window or numerical precision; it argues existing proofs often cover the second setting, while real LLM deployment and the standard notion of Turing-completeness align with the first.
#Reasoning#Research release#Commentary
why featured
HKR-H/K/R all pass, but this is a theory-heavy arXiv position paper with only the argument frame disclosed, not experiments, author signal, or debate traction. It stays in the 60–71 band.
editor take
The paper splits Turing-completeness into 2 settings; I buy it—fixed model plus fixed context matches deployed LLMs.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Agentic Discovery of Cryomicroneedle Formulations
The study uses a closed-loop AI workflow to discover cryomicroneedle cryoprotectant formulations, starting from 198 mesenchymal stem-cell formulations across 42 studies and validating over 10 iterations with 106 wet-lab observations; batch RMSE fell from 41.21 to 6.86 percentage points, and the best formulation reached 95.15% post-thaw viability.
#Agent#Benchmarking#Research release#Open source
why featured
HKR-H/K/R pass, but the biomedical formulation domain is far from mainstream AI products and developer workflows. No hard-exclusion applies because the core claim is an agentic closed-loop wet-lab mechanism.
editor take
10 rounds and 106 wet-lab runs cut RMSE from 41.21 to 6.86; call it closed-loop correction, not autonomous science.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Skill Neologisms: Towards Skill-based Continual Learning
The paper proposes skill neologisms, soft tokens added to the model vocabulary and optimized for one skill, and tests them as a continual-learning method without weight updates.
#Fine-tuning#Memory#Benchmarking#Research release
why featured
HKR-H/K/R pass, but the item only discloses the method idea, not datasets, metrics, or code. Useful continual-learning research signal, below featured because the practical evidence is missing.
editor take
Skill neologisms learn one skill via soft tokens, but model scale is undisclosed; this smells like memory-heavy prompt tuning.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
LoRA vs. Full Fine-Tuning: A Theoretical Perspective
The paper compares LoRA and full fine-tuning through excess risk in a simple linear regression setting, and predicts LoRA can achieve lower excess risk in both overdetermined and underdetermined regimes when the gap between pretraining and downstream tasks is effectively low-rank.
#Fine-tuning#Research release
why featured
HKR-H/K/R pass, but the claim is bounded to simple linear regression and excess risk. Strong for fine-tuning theory, not broad enough for featured.
editor take
This proves LoRA can beat full fine-tuning in linear regression under low-rank task gaps. Don’t sell it as an LLM law.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Quantifying the Pre-training Dividend: Generative vs. Latent SSL for Time Series Foundation Models
The paper compares generative SSL with time-series adaptations of LeJEPA and DINO, using DWT augmentations, and reports up to 375% gains for anomaly detection and classification while forecasting gains remain marginal.
#Benchmarking#LeJEPA#DINO#Research release
why featured
HKR-K is strong: the 375% gain and weak forecasting payoff are testable claims. HKR-R is niche to time-series model teams, while HKR-H is weak, so it stays in all.
editor take
SSL gains hit 375% on anomaly/classification, but forecasting barely moves; stop using forecasting as the judge for time-series pretraining.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Study shows volatility forecast accuracy does not guarantee better portfolio performance
The paper tests GraphSAGE volatility models on weekly realized volatility for 465 S&P 500 equities from 2015 to 2025, and finds that the lowest forecast MSE, the highest cross-sectional ranking accuracy, and the highest portfolio Sharpe ratio come from three different models, so forecast accuracy and portfolio performance are not interchangeable objectives.
#Benchmarking#S&P 500#Research release#Benchmark
why featured
HKR-H/K/R pass via a clear metric-vs-portfolio hook, concrete S&P 500 test setup, and practitioner evaluation resonance. Importance stays in the lower band because it is a niche finance-GNN paper, not a broad AI product or model release.
editor take
On 465 S&P stocks over 2015–2025, lowest MSE and highest Sharpe split across models; forecast-leaderboard alpha gets slapped here.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting
SAGA trains on the Swedish LISA register from 1990 to 2022, covering 2,143,817 individuals and 61,284,903 person-years, and reduces CRPS by 31.9% at the 10-year horizon and MAE by 37.7% at the 20-year horizon against parametric and neural baselines.
#Reasoning#Benchmarking#SAGA#Swedish LISA
why featured
HKR-H/K pass via the large Swedish longitudinal dataset and concrete error reductions. HKR-R is weak, and the specialist forecasting focus keeps it in the 60–71 all band.
editor take
SAGA cuts 10-year CRPS 31.9% on 61.3M person-years; I buy half, since raw LISA stays locked away.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
EVA-0: Test-Time Model Evolution with Only Two Forward Passes per Sample
EVA-0 performs inference and adaptation within two forward passes per sample without backpropagation; experiments on ImageNet-C with ViT-Base report higher performance than BP-based DeYO and BP-free FOA, plus a 14x speed-up over FOA.
#Inference-opt#Fine-tuning#Vision#EVA-0
why featured
HKR-H/K pass: two forwards, no backprop, and 14x speedup are concrete. But this is a narrow vision test-time adaptation arXiv paper, so it fits the 60–71 “interesting, not featured” band.
editor take
EVA-0 adapts in two forwards and claims 14x over FOA on ImageNet-C; I’d wait for code, zeroth-order TTA loves tuning wins.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation
The paper applies split conformal prediction and adaptive conformal inference to continuous AI agent evaluation, reporting calibration error below 0.02 across all nominal levels at a 24-hour horizon and 35% interval widening after agent releases before reconvergence.
#Agent#Benchmarking#Research release#Benchmark
why featured
HKR-K lands via conformal prediction for continuous agent evals and <0.02 calibration error; HKR-R lands on eval reliability. HKR-H is weak, and this remains an arXiv methods paper below featured threshold.
editor take
50 agents get 18 hourly signals; I buy the calibration machinery, not the leaderboard-stability excitement.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training
The paper proposes Grouped Sequential Training for Audio Large Language Model training, and reports 30–40% faster convergence than standard parallel training across 14 AudioQA datasets covering speech, music, and environmental sounds.
#Audio#Fine-tuning#Inference-opt#Research release
why featured
HKR-K is strong with a concrete 30–40% convergence claim; HKR-R is cost-relevant. HKR-H is weak and the single arXiv audio-training method stays in the 60–71 band.
editor take
GST reports 30–40% faster convergence on 14 AudioQA sets; audio multitask training is paying a mixing tax.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement
The paper introduces Iterative Partial Refinement for sequential diffusion models, re-noising and regenerating selected regions without an external verifier, and reports that MNIST Sudoku valid solution rate rises from 55.8% to 75.0% under global constraint satisfaction tasks.
#Reasoning#Inference-opt#Research release#Open source
why featured
HKR-H/K pass: the mechanism is local re-noising/regeneration without an external verifier, with a 55.8%→75.0% MNIST Sudoku result. The audience fit is research-heavy, with no product adoption signal, so it stays in the 60–71 all band.
editor take
IPR lifts MNIST Sudoku validity from 55.8% to 75.0%; no verifier is solid, but don’t extrapolate to general reasoning yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models
HELLoRA attaches LoRA modules only to each layer’s most frequently activated MoE experts. On OlMoE, it uses 15.7% of LoRA’s trainable parameters. It cuts adapter FLOPs by 38.7%, reaches 1.9x throughput, and improves accuracy by 9.2%.
#Fine-tuning#Inference-opt#Alignment#DeepSeek
why featured
HKR-H/K/R all pass, but this is a single arXiv method paper with evidence limited to OlMoE experiments and no adoption signal. Lower-band scoring puts it in all, not featured.
editor take
HELLoRA beats LoRA on OlMoE by 9.2 points with 15.7% parameters; stop slapping adapters on cold MoE experts.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
ReCrit models critic interaction as inter-turn correctness transitions and raises average Critic accuracy on ChemBench, TRQA, and EarthSE from 38.15 to 51.49 for Qwen3.5-4B and from 45.40 to 55.59 for Qwen3.5-9B.
#Reasoning#Alignment#Benchmarking#Qwen
why featured
HKR-H and HKR-K pass: the paper gives a concrete mechanism and a 38.15→51.49 result. HKR-R is weak because this is a single arXiv method paper without production replacement or broad practitioner impact.
editor take
ReCrit lifts Qwen3.5-4B from 38.15 to 51.49; in science, resisting bogus critique beats first-turn cleverness.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Dynamic Model Merging Made Slim
DiDi-Merging compresses dynamic model merging with differentiable rank allocation and a data-free refinement step. It matches prior dynamic baselines at 1.24x the parameters of one fine-tuned model, surpasses them at 1.4x, and uses less storage than methods requiring over 2x.
#Fine-tuning#Inference-opt#Multimodal#Research release
why featured
HKR-H/K/R pass via a concrete compression hook, mechanism, and cost angle. It stays in 60–71 because this is a narrow arXiv methods paper without disclosed code, mainstream-model validation, or production replacement evidence.
editor take
DiDi-Merging matches dynamic merging baselines at 1.24x parameters; differentiable rank allocation beats treating expert capacity as free.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Drifting Objectives for Refining Discrete Diffusion Language Models
The paper introduces TokenDrift, a drifting objective that maps categorical predictions into soft-token features and applies anti-symmetric drifting in a frozen semantic space, reducing Gen.-PPL at 4 NFEs by 89% on MDLM and 86% on DUO against matched continuation baselines.
#Reasoning#Inference-opt#TokenDrift#MDLM
why featured
HKR-H/K pass via the 4-NFE 89%/86% drops and soft-token objective. HKR-R fails because diffusion LMs remain niche; no code, adoption data, or cross-source discussion is disclosed.
editor take
TokenDrift cuts MDLM Gen.-PPL 89% at 4 NFEs. I'd inspect samples first; lower PPL doesn't guarantee better text.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Learning from Language Feedback via Variational Policy Distillation
The paper proposes Variational Policy Distillation, framing language-feedback learning as variational EM with an E-step that updates the teacher and an M-step that trains the student; the abstract says VPD outperforms RLVR and self-distillation baselines on scientific reasoning and code generation tasks.
#Reasoning#Code#Fine-tuning#Research release
why featured
HKR-K/R pass: VPD frames language-feedback learning as variational EM and claims wins over RLVR/self-distillation. HKR-H is weak, and no scores, model size, code, or lab are disclosed, so it stays in 60–71.
editor take
VPD jointly trains teacher and student via variational EM; scores are undisclosed, so I’d file it as an RLVR sparse-reward patch.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Simply Stabilizing the Loop via Fully Looped Transformer
The paper proposes Fully Looped Transformer with two parameter-free changes, Fully Looped Architecture and Attention Injection, stabilizing training up to 12 loop iterations while baseline looped models collapse, and improving average downstream-task performance by up to 13.2% in milder settings.
#Inference-opt#Reasoning#Research release
why featured
HKR-K passes with a testable mechanism and numbers; HKR-H and HKR-R are weak. As a single arXiv architecture paper, it belongs in all, below featured.
editor take
Fully Looped Transformer trains stably for 12 loops; the 13.2% gain is nice, but compute just moves to inference.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Minimalist Visual Inertial Odometry
The paper presents planar odometry using four downward-facing photodiodes and an IMU, jointly optimizing Gabor mask parameters and a TCN in a physics-based simulator, then validating the prototype on a differential-drive robot across indoor and outdoor terrains without real-world fine-tuning.
#Robotics#Research release
why featured
HKR-H and HKR-K pass: the hardware-minimal setup and training mechanism are concrete. The topic is a niche robotics odometry paper, so it stays in the 60–71 band rather than featured.
editor take
Four downward photodiodes plus IMU handle planar odometry; I buy the direction—robots shouldn't default to burning camera compute.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment
The paper proposes ETS, a training-free inference method that estimates an energy term with online Monte Carlo and improves generation quality on MLM, reasoning, coding, and science benchmarks; the abstract states a provable convergence rate and released code, but does not disclose exact benchmark scores or latency numbers.
#Reasoning#Code#Alignment#Research release
why featured
HKR-H and HKR-K pass: the hook is training-free RL alignment, and the mechanism is online Monte Carlo energy estimation. HKR-R is weak because metrics, model scope, and reproducibility conditions are not disclosed.
editor take
ETS estimates energy via online Monte Carlo; scores and latency are undisclosed, so training-free RL alignment still lacks the bill.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
MO-CAPO: Multi-Objective Cost-Aware Prompt Optimization
MO-CAPO optimizes prompt performance and inference cost jointly, and the paper evaluates it on 4 tasks and 3 LLMs, where it beats the NSGA-II multi-objective baseline in 8 of 12 cases on noisy R2.
#Inference-opt#Tools#Benchmarking#MO-CAPO
why featured
HKR-K and HKR-R pass: the article gives a concrete evaluation setup and a cost-optimization angle. As a single arXiv methods paper, its practical impact remains unproven, so it fits the 60–71 interesting band.
editor take
MO-CAPO beats NSGA-II in 8/12 cases; prompt optimization finally prices inference cost, not just leaderboard points.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
INAR-VL: Input-Aware Routing for Edge-Cloud Vision-Language Inference
INAR-VL routes visual question answering requests using image and text complexity signals in a two-tier edge-cloud setup; it executes 36% of requests on the edge, cuts latency by 24%, lowers energy by 26%, and preserves 97% of cloud-level accuracy.
#Multimodal#Vision#Inference-opt#INAR-VL
why featured
HKR-K and HKR-R pass: INAR-VL gives a concrete routing mechanism and metrics, and it matters for edge-cloud VLM cost. Single arXiv paper and a narrow title keep it below featured.
editor take
INAR-VL keeps 36% of VQA on edge and cuts latency 24%; I buy the idea, but hardware/dataset details matter.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
MIRO: Multi-Reward Conditioned Pretraining Improves T2I Quality and Efficiency
MIRO conditions text-to-image generators on multiple rewards during pretraining instead of using post-hoc image selection and one reward model; the arXiv abstract says it improves visual quality and training speed, and reaches state of the art on GenEval plus PickAScore, ImageReward, and HPSv2 user-preference scores.
#Multimodal#Fine-tuning#Benchmarking#MIRO
why featured
HKR-K passes via a concrete training mechanism and four benchmark claims. HKR-H and HKR-R are weak: this is a standard arXiv T2I training paper, with no product, open-source artifact, or practitioner-facing test details.
editor take
MIRO bakes multiple rewards into pretraining and claims 4 SOTAs; no base model or cost details, so I don’t buy the efficiency story yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
EgoBabyVLM trains and evaluates VLMs on datasets with different semantic alignment levels, including infant and adult egocentric videos, and introduces Machine-DevBench, which generates lexical and grammatical tests from each model’s training vocabulary across logarithmic frequency bins; the paper reports current VLM paradigms depend on tightly aligned curated data and fail on weakly aligned egocentric input.
#Multimodal#Vision#Benchmarking#Research release
why featured
HKR-K passes: the paper offers a concrete frequency-binned evaluation mechanism and a claim about VLM reliance on curated alignment. HKR-H/R are weak because this is a niche benchmark paper, so it stays in all.
editor take
EgoBabyVLM tests training vocab by frequency bins; pull curated alignment away, and VLMs still crumble on egocentric video.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
CoLD reduces length bias in process reward models with 3 components: a length-penalty adjustment, a learned bias estimator, and joint length-invariant training; experiments on MATH500 and GSM-Plus report higher step-selection accuracy and shorter logically valid reasoning outputs.
#Reasoning#Alignment#Benchmarking#CoLD
why featured
HKR-K/R pass: PRM length bias is a real reasoning-eval pain point, with CoLD, 3 components, and two benchmarks named. No effect sizes or released artifact are disclosed, so it stays in the normal research band.
editor take
CoLD attacks PRM length bias with 3 components; MATH500/GSM-Plus help, but no deltas, so “strong generalization” is oversold.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Locate-then-Sparsify: Attribution-Guided Sparse Strategy for Visual Hallucination Mitigation
LTS-FS computes hallucination relevance scores for each LVLM layer with causal interventions, then converts those scores into layerwise feature-steering intensities; the abstract says it was tested across multiple LVLMs and benchmarks, and the code is available on GitHub.
#Vision#Alignment#Interpretability#Research release
why featured
HKR-K/R pass: the paper offers a concrete mechanism and open code, and LVLM hallucination matters for reliability. HKR-H is weak, and the arXiv method focus keeps it in the 60–71 band.
editor take
LTS-FS steers layers by attribution scores; metrics and model names are missing, so I buy the mechanism, not the claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
STRIDE: Learnable Stepwise Language Feedback for LLM Reasoning
STRIDE co-trains a generator and a generative verifier using only outcome-based rewards, replacing scalar process rewards with stepwise language critiques; the abstract says it outperforms state-of-the-art baselines on diverse reasoning benchmarks and learns on zero-pass-rate problems, but the snippet does not disclose exact scores.
#Reasoning#Alignment#Benchmarking#STRIDE
why featured
HKR-K passes: STRIDE replaces scalar process rewards with stepwise language feedback and jointly trains a generator and verifier. No exact benchmark scores are disclosed, so the SOTA claim stays hard to assess.
editor take
STRIDE discloses no scores; I don’t buy “guarantees harmless improvement” until noisy-verifier replications land.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Auditing Reasoning-Trace Memorization Claims after Unlearning with Head-Conditioned Canaries
The authors audit NPO unlearning on DeepSeek-R1-Distill-Qwen-7B with LoRA-memorized fictional authors and a six-token canary head, finding that a positive parser-split bypass gap alone neither identifies nor rules out hidden weight-level memorization.
#Reasoning#Fine-tuning#Safety#DeepSeek
why featured
HKR-K/R pass: the paper supplies a model, a 6-token canary-head test, and a limit on NPO unlearning evidence. HKR-H is weak; no cross-source pickup or broad product impact, so it stays in the 60-71 band.
editor take
DeepSeek-R1-Distill-Qwen-7B audit uses two seeds; treating parser gap as weight memory evidence looks underpowered.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance
This arXiv position paper proposes synthetic sequences from defined random processes as data probes. The method targets training, tuning, alignment, and in-context learning, using LLM behavior on those probes to study how data characteristics affect performance, generalization, and robustness.
#Fine-tuning#Alignment#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete data-probe mechanism and targets data issues across training, fine-tuning, alignment, and ICL. HKR-H is weak; with no experiments or artifact disclosed, it stays in the 60-71 band.
editor take
Data probes span training to ICL here; I buy the direction—synthetic random processes beat another public-dataset sweep.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Feature-Space Smoothing: Certified Robustness of Deep Representations
The paper proposes Feature-space Smoothing, which gives a certified lower bound on cosine similarity between clean and adversarial features under l2-bounded perturbations; its plug-in Gaussian Smoothness Booster targets MLLMs and other encoders without extra retraining or alignment, while the RSS snippet does not disclose model names or benchmark numbers.
#Safety#Multimodal#Benchmarking#Research release
why featured
HKR-K/R pass via the certified feature-smoothing mechanism and MLLM safety/cost angle. HKR-H is weak, and the arXiv item lacks benchmark numbers or production evidence, so it stays in 60–71.
editor take
FS certifies feature cosine bounds under l2 attacks; no model names or scores disclosed. Treat GSB as a defense plugin, not MLLM safety solved.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
GRASP: Deterministic Argument Ranking in Interaction Graphs
The paper proposes GRASP, a deterministic framework that aggregates local attack-support judgments into global argument rankings using a convergent propagation operator. The authors report that local interaction judgments are more reproducible than holistic LLM-as-a-Judge rankings, and that GRASP scores do not correlate with human convincingness labels.
#Reasoning#Benchmarking#GRASP#Research release
why featured
HKR-K and HKR-R pass: the paper offers a graph-propagation ranking mechanism and tests holistic LLM judges on reproducibility. HKR-H is weak, and no code or large benchmark numbers are disclosed, so it stays in all.
editor take
GRASP ranks arguments with a convergent operator; sample counts undisclosed. I like the audit trail, not the human-label miss.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Compositional Literary Primitives in Instruction-Tuned LLMs: Cross-Architectural SAE Features for Self, Style, and Affect
The paper uses sparse autoencoders on mid-depth residual streams in Llama 3.1 8B-Instruct and Gemma 2 9B-IT, finding four literary feature classes; Llama covers 27/27 Cowen-Keltner emotion categories, Gemma covers 23/27 with adoration as the strict-fail case, and each emotion-feature discovery cycle uses one GPU for about 15 minutes.
#Interpretability#Alignment#Benchmarking#Llama
why featured
HKR-H/K pass: the self/style/affect feature angle is clickable, and the post gives concrete Llama/Gemma coverage plus a one-GPU condition. It remains niche SAE interpretability research, so it fits the 60–71 band.
editor take
SAEs hit 27/27 and 23/27 emotion coverage on Llama 3.1 8B and Gemma 2 9B; I buy the method, not the “literary primitives” label.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
No Hard Negatives Required: Concept-Centric Learning Gives Contrastive Models Compositionality Without Degrading Zero-Shot Capabilities
The paper proposes a concept-centric training method for contrastive vision-language models, using short concept caption parts, parameter-free cross-modal attention pooling, and auxiliary contrastive losses; it reports SOTA results on standard compositionality benchmarks while maintaining or improving zero-shot and retrieval performance, with no added inference cost.
#Multimodal#Vision#Benchmarking#arXiv
why featured
HKR-H and HKR-K pass: the paper offers a concrete CLIP compositionality training recipe and claims SOTA with no inference cost. As a single arXiv technical paper with narrow practitioner resonance, HKR-R fails and it stays in 60–71.
editor take
SAIC tweaks CLIP training with short concept captions and parameter-free pooling. Stop worshipping hard negatives; SOTA numbers are undisclosed here.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Learning When to Adapt
DISeL adds input-dependent gates over LoRA rank-one components and reduces forgetting versus LoRA on RoBERTa, Llama, and Mistral experiments.
#Fine-tuning#Interpretability#Code#RoBERTa
why featured
HKR-K is solid via the LoRA rank-one gating mechanism; HKR-R passes because forgetting affects adaptation reliability. The abstract lacks reduction numbers, so this stays in the 60–71 band.
editor take
DISeL gates LoRA rank-one components per input; parameter cost is undisclosed, so I read it as a forgetting patch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
MOCHA optimizes six agent skills with Chebyshev scalarization and exponential annealing, improving mean correctness by 7.5% over the strongest baseline, with gains of 14.9% on FEVER and 10.4% on TheoremQA.
#Agent#Reasoning#Tools#MOCHA
why featured
HKR-K is clear: new mechanism plus benchmark numbers; HKR-R is moderate for agent reliability. As a regular arXiv methods paper with no disclosed open-source artifact or production replacement claim, it stays in the interesting band.
editor take
MOCHA beats baselines by 7.5% across six skills; I buy the Chebyshev angle over weighted-sum prompt tuning.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents
RecoAtlas introduces a shopping-agent benchmark that evaluates recommendation sets with behavior-grounded utility proxies for relevance, complementarity, and diversity learned from interaction data; its controlled tool environment tests semantic, behavior-aligned, and faulty tools to separate reasoning gains, signal quality, and tool-use policy effects.
#Agent#Benchmarking#Tools#RecoAtlas
why featured
HKR-K is clear: RecoAtlas offers a set-level utility benchmark and faulty-tool diagnostics for shopping agents. HKR-R is narrower, aimed at agent-eval and recommender teams; no hard exclusion, but missing numbers and wider traction keeps it in 60–71.
editor take
RecoAtlas scores recommendation sets via learned utility proxies; dataset size is undisclosed. I buy it: plausible prose was a lazy metric.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Hybrid Training for Vision-Language-Action Models
The paper proposes HyT, a framework that trains VLA models to learn from CoT-style thoughts while allowing inference to skip CoT and predict actions directly; the abstract says it evaluates the method on simulated benchmarks and real-world experiments, but the post does not disclose exact scores.
#Robotics#Reasoning#Multimodal#Research release
why featured
HKR-H and HKR-K pass: the VLA train/infer split is a concrete mechanism. No scores, code, authors, or model scale are disclosed, so this stays in the 60–71 research-paper band.
editor take
HyT trains VLAs with CoT but skips it at inference; no scores disclosed, and robotics claims need latency numbers.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Reward-Conditioned Reinforcement Learning
The paper introduces RCRL, an off-policy method that recomputes counterfactual rewards from shared replay data, exposing agents to multiple reward objectives without extra environment interaction. Experiments cover single-task, multi-task, and vision-based benchmarks.
#Robotics#Reasoning#Vision#arXiv
why featured
HKR-K/R pass: RCRL offers a no-extra-interaction mechanism for multi-reward training and tests single-task, multi-task, and vision benchmarks. HKR-H is weak, and this is an arXiv method paper without product or major-lab adoption signal, so it sits in 60-71.
editor take
RCRL reuses one replay buffer for many rewards; I buy the sample-efficiency angle, but the snippet gives no numbers.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips
SuperInfer uses RotaSched and DuplexKV on NVIDIA GH200 to manage KV cache under high request rates. Evaluations report up to 74.7% higher TTFT SLO attainment, while keeping TBT and throughput comparable to state-of-the-art systems.
#Inference-opt#NVIDIA#SuperInfer#Supercomputing-System-AI-Lab
why featured
HKR-K and HKR-R pass on concrete serving mechanisms and the 74.7% TTFT SLO gain. HKR-H fails because the angle is niche infra; no hard exclusion, but audience scope keeps it in all.
editor take
SuperInfer lifts TTFT SLO attainment by up to 74.7% on GH200. I care how much survives off NVLink-C2C.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling
The paper proposes Hierarchical-Schedule-Optimizer, a bi-level training-free schedule optimizer that reaches FID 11.94 on LAION-Aesthetics with Stable Diffusion v2.1 at NFE=5, using a one-time optimization cost below 8 seconds.
#Inference-opt#Stable Diffusion#LAION-Aesthetics#Research release
why featured
HKR-K passes with concrete experimental conditions and metrics. HKR-H/R are weak: this is a single arXiv diffusion-sampling paper with narrow practitioner reach, so it fits the 60–71 all band.
editor take
HSO hits FID 11.94 at NFE=5; an 8-second training-free schedule keeps diffusion sampling in the fight.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Class Unlearning via Depth-Aware Removal of Forget-Specific Directions
The paper introduces DAMP, a one-shot closed-form weight-surgery method for class unlearning that removes forget-specific directions without gradient-based optimization, using class prototypes, projection updates, and depth-aware scaling, and evaluates it on MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet across convolutional and transformer architectures.
#Fine-tuning#Interpretability#Safety#Research release
why featured
HKR-K is solid: DAMP gives a concrete closed-form unlearning mechanism and benchmark set. HKR-H is narrow, HKR-R is weak because the tests stay in vision classification, so this fits all rather than featured.
editor take
DAMP tests closed-form class removal on 4 vision datasets; honestly, class unlearning still lives in MNIST-to-Tiny-ImageNet land.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning
SACHI uses graph transformer convolutions over an inter-agent coordination graph to enrich each agent before action selection, and the paper evaluates it on 5 cooperative tasks against 12 baselines; the authors report that it matches or beats the best baseline on every task, with ablations tracing gains to content dependence in the message-passing operator.
#Agent#Reasoning#Benchmarking#SACHI
why featured
HKR-K passes via a concrete mechanism and benchmark setup; HKR-H is weak and HKR-R is narrow. No hard exclusion, but this is an incremental academic MARL result, so it fits the 60–71 band.
editor take
SACHI beats 12 baselines on 5 tasks; RSS lacks environment details, so I’d file it as comms-structure work, not agent breakthrough.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching
The paper introduces MRet, a dynamic learning-to-rank algorithm for two-sided matching platforms, which learns personalized retention curves from user profiles and interaction histories and allocates limited matching opportunities by estimated retention gains on both sides; evaluations use synthetic data and real-world data from a major online dating platform, while the RSS snippet does not disclose exact retention gains.
#Benchmarking#arXiv#Research release#Benchmark
why featured
HKR-K is strong, and HKR-H/R come from the retention-vs-fairness matching angle. This is a niche recommender-systems paper, not a model, agent, or platform update, so it lands in the 60–71 band.
editor take
MRet allocates matches by bilateral retention gain; exact lift is undisclosed, and the old fairness-retention shortcut looks lazy.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Learning Stable Predictors from Weak Supervision under Distribution Shift
The paper defines supervision drift as changes in P(y|x,c) across contexts and builds a non-IID benchmark on CRISPR-Cas13d transcriptomic data; ridge reaches in-domain R²=0.356, but temporal transfer drops to R²=-0.145 and Spearman ρ=0.008.
#Benchmarking#Research release#Benchmark
why featured
HKR-K is solid on mechanism and numbers, and HKR-R touches deployment risk under distribution shift. HKR-H is weak, and the CRISPR-Cas13d benchmark keeps it in the mid-interest band.
editor take
CRISPR weak supervision gets ridge R²=0.356 in-domain, then -0.145 over time; random splits are false comfort.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Cubit: Token Mixer with Kernel Ridge Regression
The paper proposes Cubit, a token mixer that replaces Transformer attention’s Nadaraya-Watson view with Kernel Ridge Regression. It adds Limited-Range Rescale for training stability, and the abstract says gains over Transformers increase as training sequence length grows, while exact benchmark numbers are not disclosed in the RSS snippet.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the paper challenges attention with a KRR mixer and LRR stabilization. Lacking benchmark numbers, code, or production impact keeps it in the 60–71 research-interest band.
editor take
Cubit replaces attention mixing with KRR. The snippet gives no scores, so I’m filing this as math-flavored, not proven.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows
The study uses an LLM agent to call physics-based tools through MCP and search discrete PSMILES under an OpenMM Packmol evaluation budget, with the best autonomous campaign reaching -2263 kJ/mol insulin-polymer interaction energy, 68% above reinforcement-learning baselines and 19% above Bayesian optimization under matched oracle budgets.
#Agent#Tools#OpenMM#Packmol
why featured
HKR-H and HKR-K pass: the paper puts an MCP agent inside physics-grounded search and reports quantified wins over RL/BO. HKR-R is weak; insulin-delivery polymers are niche, so no hard exclusion but it stays in the 60–71 band.
editor take
LLM agents hit -2263 kJ/mol, 19% above Bayesian optimization; I buy the workflow, not the wet-lab relevance yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Delta Attention Residuals
The paper proposes Delta Attention Residuals, which route sublayer deltas instead of cumulative hidden states; across 220M–7.6B parameter models, it reports 1.7–8.2% validation perplexity gains and higher-contrast attention with max weight around 0.6 versus around 0.2 for standard Attention Residuals.
#Inference-opt#Reasoning#Research release#Open source
why featured
HKR-K lands with a concrete routing mechanism and 220M–7.6B results. HKR-H and HKR-R are weak, and the architecture-paper angle keeps it in the 60–71 research-release band.
editor take
Delta Attention Residuals cuts perplexity 1.7–8.2% at 220M–7.6B; I buy routing deltas over redundant hidden states.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
EviTrack: Selection over Sampling for Delayed Disambiguation
EviTrack maintains competing latent trajectory hypotheses at test time and delays commitment using evidence- and likelihood-ratio-based selection; the paper evaluates it on a controlled synthetic benchmark with known latent ground truth and reports better performance than sampling baselines under a matched inference budget.
#Reasoning#Inference-opt#Benchmarking#EviTrack
why featured
HKR-K is clear: the article gives a mechanism and benchmark condition; HKR-R is moderate because equal-budget inference efficiency matters to practitioners. HKR-H is weak, and this remains an arXiv method paper without real-world task or product validation.
editor take
EviTrack beats sampling on synthetic delayed-disambiguation tasks; real-task evidence is undisclosed, so treat it as decoding hygiene, not reasoning lift.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
CEPO: RLVR Self-Distillation Using Contrastive Evidence Policy Optimization
CEPO assigns token-level credit in RLVR by contrasting correct-answer and wrong-answer teachers from rejected rollouts, adding no sampling cost. On five multimodal mathematical reasoning benchmarks, 2B and 4B models reach 43.43% and 60.56% average accuracy, compared with 41.17% and 57.43% for GRPO under identical training budgets.
#Reasoning#Multimodal#Alignment#CEPO
why featured
HKR-H and HKR-K pass: CEPO has a concrete contrastive credit-assignment mechanism and benchmark deltas over GRPO. HKR-R is weak, and the arXiv-only, narrow RLVR method keeps it in the 60–71 band.
editor take
CEPO beats GRPO by 2.26/3.13 points on five multimodal math benchmarks; I buy the credit signal, not 4B-scale extrapolation.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Chessformer: A Unified Architecture for Chess Modeling
Chessformer uses square tokens, GAB dynamic positional encoding, and an attention-based source-destination policy head for three chess tasks; its Maia-3 family reaches 57.1% move-matching accuracy, and integration into Leela Chess Zero adds more than 100 Elo while enabling square-level interpretability.
#Reasoning#Interpretability#Benchmarking#Chessformer
why featured
HKR-H/K pass: the paper has concrete mechanisms and Elo numbers, plus a Leela Chess Zero hook. HKR-R is weak because the impact stays inside chess modeling, not a core practitioner concern.
editor take
Chessformer adds 100+ Elo to Leela Chess Zero; square tokens look cleaner than text notation for structured reasoning.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines
The paper splits trajectory-based data attribution error into three categories: config-level, algorithm-level, and system-level, and proposes AdamW-influence to model AdamW dynamics; across four settings covering MLP, CNN, GPT-2, and Llama 3.2-1B, it reports 10% to over 300% gains in Spearman correlation against ground-truth influence.
#Fine-tuning#Interpretability#Benchmarking#GPT-2
why featured
HKR-K is solid: error taxonomy, AdamW-influence, and results across MLP/CNN/GPT-2/Llama 3.2-1B. HKR-H is weak and HKR-R is narrow, so this stays in the 60–71 research band.
editor take
AdamW-influence lifts Spearman 10% to 300%+ across 4 setups; using SGD math for AdamW-trained models looks reckless.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference
Spherical KV frames KV allocation as a rate-distortion problem for long-context inference. ADA stores keys as a scalar radius plus compact angle codes and computes attention logits without dense-key reconstruction, while RDR selects keep/drop decisions and precision tiers per token and head under a fixed budget.
#Inference-opt#Research release
why featured
HKR-K/R pass: the mechanism is concrete and targets long-context inference cost. HKR-H is weak, and the body gives no throughput, memory-saving, or benchmark numbers, so this stays in all.
editor take
Spherical KV uses ADA+RDR for KV compression; no throughput or perplexity numbers yet, so don't buy the geometry pitch.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines
PASC reduces multi-stage joint coverage to one scalar conformal prediction problem, and on a three-stage CoNLL-2003 NER-to-NED-to-typing pipeline it achieves 96.4% end-to-end coverage versus 93.4% for Bonferroni and 86.5% for independent conformal prediction.
#RAG#Agent#Benchmarking#PASC
why featured
HKR-K/R pass: it gives a concrete mechanism and a 96.4% coverage result, tied to reliability concerns in multi-stage LLM pipelines. HKR-H is weak, and the arXiv-only technical angle keeps it in the 60–71 band.
editor take
PASC hits 96.4% coverage on a 3-stage CoNLL-2003 pipeline; the hard test is RAG/agents under calibration-set drift.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era
The paper proposes RDB-CL, using sample-level Reasoning Portability to modulate KL regularization in RLVR, and reports a +12.0% Last accuracy gain over the vanilla RLVR baseline.
#Reasoning#Multimodal#Fine-tuning#Research release
why featured
HKR-K passes via RDB-CL using sample-level Reasoning Portability for RLVR KL regularization and reporting +12.0% Last accuracy. HKR-H and HKR-R are weak because this is a niche training paper, so it stays in the 60-71 band.
editor take
RDB-CL feeds sample-level RP into RLVR KL and reports +12.0% Last accuracy; I buy the direction, pending task order and baselines.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Fast Tensorization of Neural Networks via Slice-wise Feature Distillation
The paper proposes a slice-wise feature distillation framework that tensorizes individual layers, blocks, or small consecutive layer groups independently; ResNet-34 experiments report near-lossless compression at moderate rates, and GPT-2 XL results show scalability for large models in distributed settings.
#Fine-tuning#Inference-opt#ResNet#GPT-2 XL
why featured
HKR-K and HKR-R pass: the paper offers a concrete compression mechanism plus ResNet-34 and GPT-2 XL tests, touching inference cost. HKR-H is weak, and without an artifact or production data it stays in the 60–71 band.
editor take
The paper tensorizes ResNet-34 and GPT-2 XL by slices; no ratios or accuracy table in the snippet, so “near-lossless” stays unproven.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
EfficientTDMPC Improves MPC Objectives for Sample-Efficient Continuous Control
EfficientTDMPC improves TD-MPC for continuous control with dynamics-model ensembles, averaged return estimates across rollout depths, an optional uncertainty penalty, fresher replay data, and lower compute, and the paper reports sample-efficiency SOTA on HumanoidBench-Hard and DMC hard in low-data settings while matching SOTA on DMC easy.
#Robotics#Reasoning#Inference-opt#EfficientTDMPC
why featured
HKR-K passes on objectives and benchmarks; HKR-R is limited to robotics/RL data cost, while HKR-H is weak. The topic is specialized and lacks product impact or release details, so it stays in the 60–71 all band.
editor take
EfficientTDMPC reports low-data SOTA on HumanoidBench-Hard and DMC hard; rollout-depth averaging is the part I buy.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints
LILAC+ combines 3 adaptive safety mechanisms for safe continual reinforcement learning under nonstationarity, and the authors evaluate it in simulated driving across stationary, seen nonstationary, and unseen nonstationary conditions, where it reduces safety violations under distribution shift while keeping competitive task performance against unconstrained and fixed-constraint baselines.
#Agent#Robotics#Safety#Research release
why featured
HKR-K/R pass: the paper states a mechanism and simulated-driving test conditions, with relevance to agent safety. HKR-H is weak, and safe continual RL remains research-heavy with no real-system result disclosed.
editor take
LILAC+ uses 3 adaptive constraints; only the abstract is disclosed, no violation rates, so I read this as safety-RL engineering glue.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target
The paper proposes ABPO for continual LLM-Rec updates, using a logged anchor, self-normalized inverse propensity scoring, and self-certainty-tempered no-response penalties, and reports consistent recommendation accuracy gains across five Amazon Reviews and MovieLens domains.
#Agent#Reasoning#Amazon#MovieLens
why featured
HKR-H/K/R pass, but the scope is niche: this is a specialized LLM recommender paper, and the body gives no exact gains or reproducible setup, so it stays below featured.
editor take
ABPO reports gains across 5 Amazon/MovieLens domains; anchor+SNIPS+confidence-tempered negatives smells like offline RL hygiene for LLM recommenders.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?
The paper studies how DNN width affects machine unlearning across several validation-tuned methods; overparameterized models usually improve privacy or bias removal with limited generalization loss, while bias removal requires methods that explicitly use the unlearned examples.
#Fine-tuning#Safety#Benchmarking#Research release
why featured
HKR-K is the concrete link between overparameterization and unlearning outcomes; HKR-R comes from privacy deletion and debiasing. The academic framing lacks numbers, benchmarks, or artifacts, so it stays in the mid research band.
editor take
The paper ties unlearning to DNN width; local edits sound plausible, but models, datasets, and effect sizes are undisclosed.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency
The paper introduces LBW-Guard, a bounded training-control layer above AdamW; on Qwen2.5-7B with WikiText-103, it lowers final perplexity from 13.21 to 10.74 and reduces end-to-end time from 392.54 seconds to 357.02 seconds.
#Fine-tuning#Inference-opt#Safety#Qwen
why featured
HKR-K is supported by a concrete mechanism and metrics; HKR-R hits training cost. HKR-H is weak, and the training-control niche limits reach, so it lands in all rather than featured.
editor take
LBW-Guard cuts Qwen2.5-7B perplexity 18.7%; WikiText-103 is too small to sell governance for large training.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Spatial-MLLM: Boosting MLLM Capabilities in Visual-Based Spatial Intelligence
Spatial-MLLM uses a dual-encoder design to extract semantic features and 3D structure features from purely 2D images or videos, then merges them into visual tokens for spatial reasoning. The authors train it with supervised fine-tuning and GRPO, and the post does not disclose dataset size or benchmark scores.
#Multimodal#Vision#Reasoning#Spatial-MLLM
why featured
HKR-K passes because the post names the dual-encoder and SFT+GRPO mechanism, but HKR-H and HKR-R are weak. With no dataset size, scores, or product implication disclosed, this stays in the lower all band.
editor take
Spatial-MLLM does spatial reasoning from 2D images/videos; no dataset size or scores disclosed, so treat SOTA as arXiv self-report.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
The paper compares five low-rank pre-training methods with full-rank training across 60M, 130M, and 350M models, using 16 metrics covering loss-landscape geometry, checkpoint interpolation, weight and update spectra, and activation similarity; it reports that close validation perplexity does not imply matching basins, representations, or downstream performance at every scale.
#Fine-tuning#Benchmarking#Interpretability#GaLore
why featured
HKR-K passes: the paper gives 5 methods, 3 model sizes, and 16 metrics for low-rank pre-training. HKR-H/R are weak because the angle is technical and lacks a product, cost, or safety decision hook, so it stays in all.
editor take
This 60M/130M/350M study punishes perplexity-only low-rank claims; GaLore tracks full-rank closest, yet later activations still drift.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Distilling Linearized Behavior for Effective Task Arithmetic
The paper proposes distilling hidden representations from a curvature-regularized linearized teacher into a non-linear student, preserving task-vector composition for merging and unlearning while avoiding inference-time overhead.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the mechanism is specific and targets inference cost in task-vector composition. HKR-H is weak, and the arXiv item lacks benchmark numbers, so it stays in all rather than featured.
editor take
This distills a linearized teacher into a non-linear student; zero inference overhead is nice, but benchmark numbers are absent.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization
The paper introduces projection agents for RL-based graph combinatorial optimization, predicting latent actions in a continuous GNN action-embedding space and decoding them with nearest neighbors; across benchmarks, it reports up to 16.2x faster inference and up to 40% better generalization, and releases LaGCO-RL for latent action-space construction.
#Agent#Inference-opt#Benchmarking#Research release
why featured
HKR-K is solid with a new mechanism and two testable metrics; HKR-H/R are weak because the title is dense and the topic is narrow. This fits the 60s research-release band with no hard exclusion.
editor take
Projection agents report 16.2x faster inference and 40% better generalization; I’d test whether nearest-neighbor decoding breaks first on large graphs.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking
The paper proposes a multi-stage MLLM checkpoint selection framework that uses pointwise filtering, listwise ranking, pairwise comparison, and subsampling-based confidence estimation to handle evaluation noise in OCR-heavy scenarios.
#Agent#Multimodal#Benchmarking#Research release
why featured
HKR-K passes because the post gives a concrete checkpoint-selection mechanism for noisy OCR evaluation. HKR-H and HKR-R are weak, and no metrics, model list, or artifact is disclosed, so this stays in all.
editor take
The paper uses three-stage ranking plus subsampled confidence; I buy it, because 0.3-point MLLM gains often smell like noise.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
The paper studies frozen Gemma 4 31B across the L24–L29 slice of 192 attention heads and identifies four heads that rank top-tier on both a 95-sentence TxtCopy probe and four non-language token-pattern tasks, with hypergeometric significance at P=0.0013.
#Multimodal#Interpretability#Benchmarking#Gemma
why featured
HKR-H/K pass: the paper gives concrete evidence for cross-task attention-head overlap in Gemma 4 31B. Impact stays research-niche, with no product or safety consequence, so it belongs in all.
editor take
Frozen Gemma 4 31B shows 4 shared top heads across text and token tasks; I’d resist calling this general circuitry yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting
D-PACE derives per-position loss weights from a differentiable surrogate for expected accepted draft length, and the paper reports higher wall-clock speedup and average emitted length across six benchmarks, two Qwen3-4B drafter depths, two decoding temperatures, and two additional target models, with 2.3% measured training-time overhead and no architecture or inference changes.
#Inference-opt#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with a concrete mechanism and test setup; HKR-R passes on serving cost and latency. The angle is narrow inference research, so it stays in the lower interesting band.
editor take
D-PACE adds 2.3% training overhead and zero inference changes; I buy this, speculative decoding needs better objective alignment.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
FedMental: Evaluating Federated Learning for Mental Health Detection from Social Media Data
FedMental evaluates federated learning on depression detection from X and suicide crisis detection from Reddit; centralized training reaches F1 85.63, the best FL model reaches 83.16, and DP-FL drops by up to 27.01 F1 even at epsilon=50.
#Fine-tuning#Safety#Benchmarking#X
why featured
HKR-K and HKR-R pass: the paper gives concrete F1 tradeoffs for FL and DP in mental-health detection. HKR-H is weak, with no product angle or major lab hook, so it stays in the 60–71 band.
editor take
FedMental reports best FL F1 83.16, while DP-FL at ε=50 drops 27.01; sparse mental-health cues hate privacy noise.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Emergence of a Flow-Assisted Casting Strategy for Olfactory Navigation via Memory-Augmented Reinforcement Learning
The paper trains RL agents under varying memory lengths and unsteady flow conditions, finding that agents learn a flow-assisted casting strategy without predefined models and that average speed toward the odor source changes non-monotonically with memory length.
#Agent#Memory#Robotics#arXiv
why featured
HKR-H/K pass via emergent casting and concrete memory/flow experiments; HKR-R fails. The olfactory-navigation RL angle is narrow and lacks code, benchmark, or robot-deployment evidence, so it stays all.
editor take
RL agents learn casting in unsteady flows; only the abstract is disclosed, so “emergence” deserves skepticism.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SCAFDS: Edge-Feature Graph Attention for Interbank Fraud Detection with Attribution-Grounded SAR Generation
SCAFDS reports AUPRC of 0.515 and AUROC of 0.802 on 590,540 transactions and an 8,103-institution synthetic interbank network, improving over GraphSAGE-AML by 15.9 and 13.7 percentage points.
#Benchmarking#Interpretability#FinCEN#FDIC
why featured
HKR-K passes with concrete dataset size, institution count, and AUPRC gain. HKR-H and HKR-R are weak because this is a niche fintech-risk paper, so it sits in the 60–71 research-signal band.
editor take
SCAFDS hits 0.515 AUPRC on a synthetic 8,103-bank graph; I’d scrutinize the data before the SAR-generation wrapper.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Structured Style-Rewrite with Chain-of-Thought Planning for Low-Resource Character Dialogue
The paper proposes a structured style-rewrite framework that uses CoT supervision and CoT-shared DPO, enabling Qwen3-1.7B to reach a 0.632 Valid Style Score and 0.878 semantic fidelity across eight characters from four source domains.
#Fine-tuning#Reasoning#Alignment#Qwen
why featured
HKR-K passes because the summary gives testable metrics and scope. HKR-H/R are weak: this is a niche low-resource dialogue rewrite paper, not a broader AI-industry story.
editor take
Qwen3-1.7B hits 0.632 style score across 8 characters. For character rewrite, separating semantics from voice beats bigger-model theater.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
EUPHORIA: Efficient Universal Planning via Hybrid Optimization for Robust Industrial Robotic Assembly
EUPHORIA uses Graph Hypernetworks to generate policy parameters from a minimal support set without gradient-based retraining, combines SAC-trained physics-informed graph planning with DEM contact-force attention, and applies residual stability correction before execution; the abstract says it reduces energy use and improves success rates on unseen geometries, but the post does not disclose exact metrics.
#Robotics#Agent#Reasoning#EUPHORIA
why featured
HKR-K passes: the mechanism is concrete and targets generalization to unseen geometries. HKR-H/R are weak because success-rate and energy numbers are not disclosed.
editor take
EUPHORIA claims few-shot unseen-geometry assembly, but gives no success or energy numbers; I’d file this under tidy system, not robotics breakthrough.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Robustness and Regularization in Hierarchical Re-Basin
The paper proposes a hierarchical model merging scheme and compares it with MergeMany; its experiments find that Re-Basin increases adversarial and perturbation robustness as more models join the hierarchy, while causing a larger performance drop than the original authors reported.
#Fine-tuning#Alignment#Research release
why featured
HKR-K passes: the paper adds a concrete robustness-vs-performance tradeoff for Re-Basin model merging. HKR-H and HKR-R are weak, so it stays in all rather than featured.
editor take
Re-Basin gains robustness with more merged models, but scale is undisclosed; the larger performance hit kills the free-regularizer story.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Composition of Memory Experts for Diffusion World Models
The paper introduces a diffusion world-model framework that composes 3 memory experts for short-term dynamics, long-term episodic history stored in external diffusion weights via test-time finetuning, and spatial coherence, and reports gains in temporal consistency, past-observation recall, and navigation performance across simulated and real-world benchmarks.
#Memory#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes via a concrete three-expert mechanism for diffusion world models and a navigation-performance claim. HKR-H/R are weak: no metrics, artifact, or broad practitioner trigger, so this stays in all.
editor take
The paper uses 3 memory experts to dodge quadratic attention; no benchmark numbers disclosed, so treat it as memory engineering.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Olivia time series foundation model harmonizes cross-domain data with power spectral density
Olivia uses normalized power spectral density to harmonize heterogeneous time-series datasets during pretraining, adding a Harmonizer module and HarmonicAttention. The paper evaluates it on two large-scale benchmarks, TSLib and GIFT-Eval, plus 6 GluonTS datasets, and reports state-of-the-art results under zero-shot, few-shot, and full-shot forecasting settings; code is available on GitHub.
#Benchmarking#Research release#Open source#Benchmark
why featured
HKR-K passes: the paper gives a PSD harmonization mechanism and zero/few/full-shot tests on 6 datasets. HKR-H and HKR-R are weak, so this stays low in the 60–71 band.
editor take
Olivia reports SOTA on TSLib, GIFT-Eval, and 6 GluonTS sets; PSD harmonization is elegant, but replication decides it.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality
The paper introduces O'Prior, a compositional realism prior with 4 coupled components for synthetic pretraining of tabular foundation models; experiments hold architecture, optimizer, and compute budget fixed while varying only the synthetic task distribution, and the abstract reports accuracy and robustness gains on real tabular benchmarks without disclosing exact improvement numbers.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes because the paper states a mechanism and controlled variable. HKR-H and HKR-R are weak; no accuracy gain is disclosed, so this is useful but narrow research in the 60–71 band.
editor take
O'Prior fixes architecture and compute, changing only a 4-part synthetic prior; no gain numbers, but tabular FM data design is the variable.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals
Dywave applies wavelet-based hierarchical decomposition to build event-aligned representations for heterogeneous IoT sensing signals, and evaluations on five real-world datasets report up to 12% higher accuracy while reducing input token lengths by up to 75% across mainstream sequence models.
#Inference-opt#Dywave#Research release#Benchmark
why featured
HKR-K passes with a concrete mechanism and metrics; HKR-H/R are weak because IoT sensing tokenization is narrow and lacks product or agent pull. This fits the lower end of interesting research, not featured.
editor take
Dywave reports +12% accuracy and 75% fewer tokens on 5 IoT datasets; fixed-window sensing tokenizers look lazy here.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Robust Mitigation of Age-Dependent Confounding Effects via Sample-Difficulty Decorrelation
The paper proposes a sample-difficulty decorrelation framework for age-dependent confounding in medical image classification. After warm-up, it models label-conditioned age-difficulty trends, applies Huber-weighted affinity weights, and uses an Age Coverage Score based on minibatch age variance; across 2 radiology datasets, it reduces age-dependent true- and false-positive disparities with minimal AUC impact under train-test age shifts.
#Vision#Safety#Benchmarking#Research release
why featured
HKR-K comes from the sample-difficulty decorrelation mechanism and 2 radiology datasets; HKR-R comes from age-bias risk. The scope is narrow medical-imaging fairness, with no product or general-model impact.
editor take
The paper cuts age-linked TP/FP gaps on 2 radiology datasets; I don’t buy “minimal AUC impact” without AUC deltas or CIs.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Guiding Neuro-Symbolic Scenario Generation with Spatio-Temporal Logic
The paper introduces STRELGen, which optimizes a diffusion model’s latent space at inference time using differentiable STREL formula satisfaction to generate plausible safety-critical multi-agent driving scenarios for autonomous-driving stress tests.
#Agent#Reasoning#Safety#STRELGen
why featured
HKR-K passes with a concrete neuro-symbolic generation mechanism. HKR-H and HKR-R are weak; STREL-based driving scenario generation is niche, and no experiment numbers are disclosed.
editor take
STRELGen optimizes diffusion latents at inference with differentiable STREL. No hit-rate disclosed; I don't buy “efficient” yet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Q-learning with Adjoint Matching
The paper proposes Q-learning with Adjoint Matching, which converts the critic’s action gradient into a step-wise objective to avoid unstable backpropagation through multi-step denoising, and reports stronger results than prior methods on hard sparse-reward tasks in offline and offline-to-online RL.
#Reasoning#Research release
why featured
HKR-K passes: QAM offers a testable training mechanism and claims gains on offline and offline-to-online sparse-reward tasks. No concrete numbers are disclosed, and the paper is too niche for featured.
editor take
QAM turns critic action gradients into step-wise targets; benchmarks aren’t disclosed, so I buy the mechanism, not “consistently outperforms.”
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting
The paper proposes KUP-BI, which distills a post-target continuation proxy from a train-only historical library and fuses it with the input stream through lightweight feature-level gating; experiments on six public datasets improve state-of-the-art time-series forecasters with small additional overhead.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via the KUP-BI mechanism and 6-dataset evaluation. HKR-H/R are weak: this is a narrow forecasting paper with incremental research value, below featured threshold.
editor take
KUP-BI improves SOTA on 6 datasets; I’d audit its train-only library for adjacent-trajectory leakage first.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
How Class Ontology and Data Scale Affect Audio Transfer Learning
The paper pre-trains multiple model states on ontology-based AudioSet subsets and fine-tunes them on 3 audio tasks: acoustic scene recognition, bird activity recognition, and speech command recognition; larger sample and class counts improve transfer, while similarity to the downstream task has a stronger effect.
#Audio#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes: the paper compares AudioSet ontology-subset pretraining and fine-tunes on soundscapes, bird calls, and speech commands. HKR-H/R are weak; this is useful niche research, not a featured AI-industry story.
editor take
AudioSet subsets transfer to 3 audio tasks; scale helps, but task similarity beats it. Bigger pretraining sets are not magic.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Iterative Compositional Data Generation for Robot Control
The paper proposes a semantic compositional diffusion transformer that factorizes transitions into robot, object, obstacle, and objective components, then validates synthetic data with offline reinforcement learning across iterative training rounds for unseen task combinations.
#Robotics#Fine-tuning#Agent#Research release
why featured
HKR-K passes because the summary gives a concrete mechanism for synthetic data training in robot control. HKR-H/R are weak, and no results numbers or release conditions are disclosed, so this stays in the lower research-release band.
editor take
ICDG factorizes transitions into 4 components; task counts and success rates are undisclosed, so “nearly all” stays simulator-only.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Research on ODE Perspective for Continual Model Merging Published
arXiv:2605.19409v1 proposes ODE-M for continual model merging, using a time-dependent velocity field and barrier constraints to avoid loss-increasing steps, and the abstract claims state-of-the-art results across mainstream CMM benchmarks without disclosing benchmark names or scores in the RSS snippet.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
A narrow methods paper: HKR-K passes on ODE-M mechanics and benchmark claims, while HKR-H/R are weak. The ODE framing raises the access cost, so it stays in the lower research band.
editor take
ODE-M adds velocity fields and barrier constraints to CMM; the RSS gives zero benchmark names or scores, so hold the SOTA claim.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation
HypergraphFormer trains an LLM with supervised fine-tuning to generate hypergraph-based text for editable floor plans, evaluates on RPLAN and a newly released out-of-distribution dataset, and reports better raster/vector baselines and data efficiency, but the RSS snippet does not disclose metric values, model size, or release license.
#Fine-tuning#Research release
why featured
HKR-H/K pass: the LLM-to-hypergraph floor-plan angle is fresh and the mechanism is concrete. Metrics are not disclosed, and the use case is narrow, so it stays below featured.
editor take
HypergraphFormer tests RPLAN plus OOD floor plans, but no metrics disclosed; I buy the hypergraph interface, not the SOTA claim.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
INSIGHTS: Demonstration-Based Summaries of Time Series Predictors
INSIGHTS generates time-series sample summaries with utility functions balancing importance and diversity, then evaluates them through experiments, interviews, and a user study; the abstract does not disclose sample counts, model types, or concrete metric values.
#Interpretability#INSIGHTS#Research release
why featured
HKR-K passes because INSIGHTS adds a concrete sample-summary mechanism. HKR-H/R are weak, and the body lacks sample size, model types, and metrics, so this stays in all.
editor take
INSIGHTS targets global time-series explanations, but sample counts and metrics are absent; I don’t buy “expert preference” as evidence.
HKR breakdown
hook knowledge resonance
open source
57
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation
The authors reproduced PO4ISR on ML-1M, Games, and Bundle, then introduced PO4ISR++ with reflexive prompting and consistent rank detection to reduce semantic drift in long sessions, reporting stabilized gains of up to 54% on Games and 96% on Bundle.
#Reasoning#Benchmarking#PO4ISR#PO4ISR++
why featured
HKR-K passes on datasets, mitigation mechanisms, and reported gains; HKR-H/R are weak because this is niche session-recommendation research with limited broader industry pull.
editor take
PO4ISR++ gains 54% on Games and 96% on Bundle; LLM recommenders still bleed accuracy under long-session drift.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Diffusion-State Policy Optimization for Masked Diffusion Language Models
The paper introduces DiSPO, a plug-in credit-assignment layer for masked diffusion language models that branches at selected masked states, resamples currently masked positions from rollout-cached logits, and updates only newly filled tokens, improving over diffu-GRPO and SPG on math and planning benchmarks with matched rollout compute and optimizer steps on LLaDA-8B-Instruct.
#Reasoning#Fine-tuning#Benchmarking#LLaDA
why featured
HKR-K passes: DiSPO has a concrete training mechanism and beats diffu-GRPO/SPG on LLaDA-8B-Instruct under equal rollout compute and steps. HKR-H/R are weak, and the paper is specialist training research, so it stays in all.
editor take
DiSPO reuses rollout logits on LLaDA-8B-Instruct for mid-fill credit; I buy the direction, but gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition
The paper proposes HyperEmo-RAG for multimodal emotion recognition, using Poincaré-ball embeddings and hierarchical beam search to retrieve emotion evidence; the abstract says it outperforms existing methods on multiple datasets, but does not disclose metric values.
#RAG#Multimodal#Reasoning#Research release
why featured
HKR-K passes because the paper gives a concrete HyperEmo-RAG mechanism, but no metrics are disclosed and the use case is narrow. No hard exclusion applies; this sits in the lower band for niche research.
editor take
HyperEmo-RAG adds 2 mechanisms. No metrics disclosed, so I’d file this as architecture-first emotion RAG.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Towards Family-Grouped Hierarchical Federated Learning on Sub-5KB Models for ECG Wearables
The paper proposes Family-FL, a three-tier federated learning architecture, and reports a 76.7% communication reduction versus FedAvg on MIT-BIH simulations with 47 subjects; its 669-parameter INT8 Tiny CNN-LSTM uses 4.65KB Flash and 2.95KB RAM, reaching 91.9% accuracy without hardware deployment or formal differential privacy guarantees.
#Fine-tuning#Inference-opt#Safety#MIT-BIH
why featured
A niche edge-FL paper with hard metrics: sub-5KB model and 76.7% lower communication support HKR-H/K. Medical wearable scope is narrow, with no product or general AI-tooling impact, so it stays in 40-59.
editor take
Family-FL-Tiny cuts communication 76.7% on 47 MIT-BIH subjects; no hardware run or DP, so the privacy claim is thin.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H1·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
BERTO: Intent-Driven Network Time Series Forecasting via Natural Language Operator Preferences
BERTO uses a BERT-based forecasting framework and natural-language operator prompts to shift cellular traffic prediction bias without retraining, combining a Balancing Loss Function with prompt conditioning to trade power savings against service quality across real-world datasets, with experiments showing about a 1.4 kW power-consumption range and a 9x variation in SLA violations.
#Reasoning#Fine-tuning#BERTO#Research release
why featured
HKR-K passes: it states a mechanism, no-retraining condition, 9x SLA variation, and a 1.4 kW range. HKR-H/R are weak because telecom time-series forecasting is niche for the AI-practitioner audience.
editor take
BERTO shifts forecast bias by prompts, spanning 1.4kW and 9x SLA violations; I buy the mechanism, not the NL preference gloss.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Tail Annealing for Heavy-Tailed Flow Matching
The paper proposes Log-FM, which applies a coordinate-wise soft-log transform before flow-matching training and exponentiates generated samples afterward. On a 144-configuration multivariate benchmark with 3 copulas, dimensions up to 100, and 4 tail indices, Log-FM beats specialized baselines on W1, CVaR99, and extreme-quantile metrics, with zero severe divergences across 2,880 runs.
#Benchmarking#Research release#Benchmark
why featured
HKR-K lands through the Log-FM mechanism plus 144 benchmarks and 2,880 runs. HKR-H/R fail; heavy-tailed flow matching is specialized research, so the score stays in the lower band.
editor take
Log-FM reports zero severe divergences over 2,880 runs; I like the no-architecture hack, but Hill diagnostics can amplify messy real tails.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction
EgoTraj releases 75 egocentric urban navigation sequences recorded with Meta Quest Pro, with synchronized RGB video, continuous 6-DoF head poses, per-frame 3D eye-gaze vectors, and scene annotations.
#Multimodal#Vision#Benchmarking#EgoTraj
why featured
HKR-K passes: EgoTraj provides 75 egocentric urban navigation sequences with multimodal annotations. HKR-H and HKR-R are weak because the dataset is niche, small-scale, and mainly relevant to vision/AR researchers.
editor take
EgoTraj ships 75 MQPro urban sequences; small dataset, but gaze plus 6DoF head-pose ground truth is useful for embodied prediction.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
TrajTok: Adaptive Spatial Tokenization for Trajectory Representation Learning
TrajTok converts noisy GPS traces into discrete tokens using learned multi-resolution hexagonal cells, then pretrains a factorized transformer with masked-token modeling; on the Porto dataset, a frozen encoder with lightweight adapters is evaluated on 4 tasks: trajectory similarity search, classification, ETA, and full travel-time regression.
#Embedding#Benchmarking#TrajTok#Research release
why featured
Single arXiv paper with a concrete tokenization mechanism and Porto evaluations, so HKR-K passes. The topic is narrow trajectory representation, with no product, model, or practitioner nerve, keeping it in the 40–59 band.
editor take
TrajTok reports 4 Porto tasks; I buy trajectory tokenization, but one-city evidence cannot carry the foundation-model label.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction
The paper proposes time-ordered splits, the Taoke e-commerce cascade dataset, and the CasTemp framework, then evaluates CasTemp under leak-free conditions across four datasets; the post does not disclose exact performance metrics or training-time numbers.
#Benchmarking#Taoke#CasTemp#arXiv
why featured
HKR-K passes because the paper names a new dataset, split method, and evaluation setup. HKR-H/R are weak, metrics are not disclosed, and cascade prediction is too niche for featured treatment.
editor take
CasTemp reports leak-free wins on 4 datasets; exact metrics and runtime are undisclosed, so treat SOTA speedup as unpaid debt.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
FieldFormer: Locality-Aware Transformers for Spatio-Temporal Modeling on Sparse Sensor Networks
FieldFormer uses learnable velocity-scaled offsets to aggregate local sensor evidence for sparse spatio-temporal prediction. The paper evaluates it on five synthetic and real-world benchmarks, but the RSS snippet does not disclose exact error numbers.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a concrete mechanism and 5 benchmarks, but error numbers are not disclosed and the use case is narrow. HKR-H/R fail, so this stays in the lower research-update band.
editor take
FieldFormer reports 5 benchmarks but no errors in RSS; limiting reconstruction near sensor support is the sane bet here.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings
The study tests federated learning for multi-label ICD classification on MIMIC-IV clinical notes, comparing six public embedding models, three MLP architectures, ICD-9 and ICD-10 coding, and ten stratified splits; it finds embedding quality matters more than classifier complexity and federated training closely matches centralized results under idealized conditions.
#Embedding#Fine-tuning#Benchmarking#MIMIC-IV
why featured
HKR-K passes because the paper gives concrete experimental conditions and a testable claim that FL nears centralized training. HKR-H/R are weak: ICD coding is narrow and not tied to a mainstream AI product or agent workflow.
editor take
MIMIC-IV tests 6 embeddings and 3 MLPs; useful takeaway: in clinical FL, embedding quality beats classifier tinkering.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
R³L: Reasoning 3D Layouts from Relative Spatial Relations
R³L improves multi-hop relative spatial reasoning for 3D layout generation with invariant spatial decomposition, an imagine-and-revise self-consistency loop, and global-to-local coordinate re-parameterization; the arXiv abstract says experiments across diverse scene types and instructions produced more physically feasible and semantically consistent layouts, but the snippet does not disclose benchmark numbers.
#Reasoning#Multimodal#Research release#Open source
why featured
HKR-K passes because the abstract names concrete mechanisms for multi-hop 3D relative spatial reasoning. HKR-H and HKR-R fail: no benchmark numbers, artifact details, or product angle are disclosed, so this stays in the low-value research band.
editor take
R³L targets accumulated frame errors, but the abstract gives no benchmark numbers; I buy the problem, 3D layout reasoning dies on reference drift.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification
MAM-CLIP trains a vision-language model on 2,313 mammography atlas image-text pairs with PubMedBERT and contrastive learning, then fine-tunes the vision encoder for BI-RADS prediction, improving 3-class average F1 by 1% with 40K labeled samples and 14% with 1K labeled samples.
#Multimodal#Vision#Fine-tuning#MAM-CLIP
why featured
HKR-K passes via dataset size, task, and F1 gains. HKR-H/R are weak: this is narrow medical-imaging research with no deployment, product, or regulatory impact disclosed, so it stays in the lower band.
editor take
MAM-CLIP lifts 1K-sample F1 by 14% using 2,313 atlas pairs. For medical small data, captions beat label hoarding.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings
CoNNS relabels cross-patient chest X-ray report pairs with a 41-concept clinical ontology, applies noisy negative filtering and hard negative mining, and outperforms prior models on five zero-shot classification datasets plus multi-granularity zero-shot grounding tasks.
#Vision#Multimodal#Benchmarking#CoNNS
why featured
HKR-K passes with 41 clinical concepts and 5 zero-shot datasets. HKR-H/R are weak, and the article gives no product, deployment, or industry adoption angle.
editor take
CoNNS relabels negatives with 41 clinical concepts. Medical VLM gains are moving from scale to label-noise control.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning
RLFTSim fine-tunes a pre-trained traffic simulation model on the Waymo Open Motion Dataset, using a low-variance dense reward to jointly optimize rollout realism and goal-conditioned controllability.
#Agent#Fine-tuning#Robotics#RLFTSim
why featured
HKR-K passes: the summary gives Waymo data, RL fine-tuning, and a low-variance dense-reward mechanism. HKR-H/R are weak, and no metrics or deployment claim are disclosed.
editor take
RLFTSim uses RL fine-tuning on Waymo; no SOTA numbers are in the snippet, so don’t bank the sample-efficiency claim yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis
IMLJD releases 3,613 Indian matrimonial dispute judgments, covering 1,474 Supreme Court cases from 2000 to 2024 and 2,139 Karnataka High Court cases from 2018 to 2024, with outcome labels, metadata-derived indicators, and a knowledge graph published openly on GitHub and Hugging Face.
#Benchmarking#Supreme Court of India#Karnataka High Court#Hugging Face
why featured
HKR-K passes with dataset size, court sources, and year ranges. HKR-H/R are weak because this is a niche legal NLP corpus with limited AI-industry spillover, so it stays in the low-value research band.
editor take
IMLJD opens 3,613 judgments; a 57.6% SC quash rate gives legal NLP a needed non-US/UK stress test.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Automated Big Data Quality Assessment Using Knowledge Graph Embeddings
The paper proposes using knowledge graph embeddings to predict missing edges between dataset context and quality rules, then evaluates the method with AmpliGraph on a real-world radiation sensor dataset from LAEC-CNRS.
#Embedding#AccentureLabs#Lebanese Atomic Energy Commission#LAEC-CNRS
why featured
HKR-K passes because the paper gives a concrete KGE missing-edge mechanism and AmpliGraph evaluation; HKR-H and HKR-R fail, so this stays a low-value research signal rather than featured coverage.
editor take
The paper names one LAEC-CNRS sensor dataset and no metrics; KG embeddings for rule recall feels old, evidence thin.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Fast and Featureless Node Representation Learning with Partial Pairwise Supervision
The paper introduces Contrastive FUSE for graph node representation learning when node features are unavailable and only partial pairwise labels exist; it replaces the costly modularity gradient with a lightweight approximation and reports fast iterative updates on million-edge graphs.
#Embedding#Benchmarking#Contrastive FUSE#arXiv
why featured
HKR-K passes on a concrete graph-learning setup and mechanism. HKR-H/R fail: this is narrow node-representation research with no product, agent, or industry impact shown, so it stays in the low-value research band.
editor take
Contrastive FUSE targets featureless graphs with partial pair labels at million-edge scale; no runtime numbers disclosed, so treat it as graph-embedding plumbing.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
ExECG: An Explainable AI Framework for ECG Models
ExECG provides a three-stage Python pipeline for ECG model explainability, using Wrapper, Explainer, and Visualizer components, and demonstrates end-to-end reproducible usage with concise examples and two case studies.
#Interpretability#Tools#ExECG#Research release
why featured
HKR-K passes via a reproducible three-stage pipeline and cases; HKR-H/R are weak because ECG explainability is a narrow medical-AI tool with limited impact on general AI product or developer workflows.
editor take
ExECG packages ECG explainability into 3 stages; with only 2 case studies, the clinical-trust claim is thin.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams
The paper proposes a closed-loop, state-centric, multi-agent framework for estimating transit passenger load from heterogeneous data streams; its mechanisms include stop-by-stop inference, physical feasibility constraints, dynamic trust allocation across evidence sources, and optional trip-level macro-correction.
#Agent#Research release
why featured
HKR-K passes: the summary discloses stop-level reasoning, physical feasibility constraints, and evidence trust allocation. The transit-ops focus lacks HKR-H and HKR-R, so it stays in the low but browseable band.
editor take
The paper gives a transit-load multi-agent framework, with no metrics disclosed; physics constraints matter, but the agent label feels thin.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation
The paper uses layer-wise distillation to replace attention in pretrained ViTs, showing under a fixed training budget that sparser attention layers cause substantially smaller accuracy drops than denser layers.
#Inference-opt#Vision#Research release
why featured
HKR-K passes with a concrete mechanism and comparison, but HKR-H/R are weak. This is a niche ViT attention-replacement paper with limited practitioner resonance and no hard-exclusion trigger.
editor take
This paper replaces ViT attention under fixed budget; sparse layers degrade less, a useful incision map for attention surgery.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings
The paper validates a MobileNetV2 Float16 quantization pipeline for four-class brain tumor MRI classification, reaching 82.37% validation accuracy versus an 82.20% full-precision baseline and reducing model size from 35.34 MB to 5.76 MB.
#Vision#Inference-opt#TensorFlow Lite#MobileNetV2
why featured
HKR-K passes with concrete quantization and size metrics; HKR-H/R are weak because this is a narrow medical-imaging study, not an AI product or platform shift. No hard exclusion, but it stays in the low-value band.
editor take
MobileNetV2 hits 82.37% at 5.76MB after quantization; validation-only evidence makes the clinical claim too loud.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Bridge: Retrieval-Augmented Spatiotemporal Modeling for Urban Delivery Demand
Bridge combines an inductive contextual graph backbone with a time-aware memory of region-time windows for cold-start urban delivery forecasting. Experiments on four real-world delivery datasets show consistent gains over spatiotemporal baselines under within-city cold-start and cross-city transfer with partial observations.
#RAG#Memory#Benchmarking#Research release
why featured
HKR-K passes via a testable retrieval-and-gating method on four datasets. HKR-H and HKR-R fail; the logistics-forecasting scope is narrow and lacks agent or product implications.
editor take
Bridge improves cold-start forecasts on 4 delivery datasets; I buy the direction, but no gain sizes are disclosed.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection
SAGE combines SimHash-based stratified sampling with Mahalanobis-distance and k-NN-density gates to harvest confident negatives from unlabeled music-streaming fraud data; the abstract says it performs strongly on held-out, customer-level, and artist-level fraud settings, but the post does not disclose precision, recall, dataset size, or baseline numbers.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a concrete negative-sample mining mechanism; HKR-H/R miss because this is a narrow fraud-modeling paper with no metrics or practitioner-wide cost/safety hook.
editor take
SAGE uses SimHash, Mahalanobis, and k-NN gates; no precision/recall is disclosed, so don’t buy the “strong” claim yet.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
An Objective Performance Evaluation of LSTM Networks in Time Series Classification
The paper compares an LSTM classifier with a model-based EM classifier on 2 scalar linear Gaussian state-space models. LSTM needs larger noise-statistic separation, and stays below the Kalman likelihood-ratio reference when models differ only in measurement noise.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: the paper gives reproducible state-space setups and Kalman/EM baselines, showing LSTM lags under measurement noise. HKR-H/R are weak; LSTM classification benchmarking is old and academic.
editor take
LSTM loses to EM on 2 scalar Gaussian state-space models; when structure is known, black-box sequence models overclaim.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance
Jing Chen and five coauthors propose ST-Balance, a framework that uses low-rank spatial embedding and an extended temporal horizon to address spatial-temporal complexity mismatch; experiments cover urban traffic, meteorological, and epidemic datasets, but the abstract does not disclose exact accuracy gains.
#Benchmarking#Jing Chen#Shixiang Pan#Yujie Fan
why featured
HKR-K passes because the paper states the ST-Balance mechanism. HKR-H and HKR-R fail: no concrete gains are disclosed, and niche spatiotemporal prediction lacks an industry-practitioner hook.
editor take
ST-Balance compresses space and extends time horizons; 6 authors test 3 domains, but no gain numbers are disclosed.
HKR breakdown
hook knowledge resonance
open source
43
SCORE
H0·K1·R0
04:00
20d ago
arXiv · cs.LG· atomEN04:00 · 05·20
Neural Network Models for Contextual Regression
The paper proposes SCtxtNN for contextual regression, separating context identification from context-specific regression, and reports numerical experiments where it achieves lower excess MSE and more stable performance than feed-forward networks with comparable parameter counts.
#Interpretability#Benchmarking#Research release
why featured
HKR-K passes on a concrete model mechanism and excess-MSE comparison. HKR-H/R are weak: this is a narrow arXiv methods paper with no production replacement claim, artifact, or major-lab tie.
editor take
SCtxtNN splits context ID from regression; experiments cite excess MSE, but datasets aren’t disclosed, so I’d treat it as inductive-bias work.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
03:43
20d ago
HuggingFace Papers (takara mirror)· rssEN03:43 · 05·20
CoMET enables modular multimodal classification without fine-tuning
CoMET feeds PCA-compressed embeddings from frozen modality encoders into a Tabular Foundation Model, and the paper reports classification without fine-tuning on hierarchical datasets exceeding 500,000 samples and 2,000 classes.
#Multimodal#Fine-tuning#Benchmarking#CoMET
why featured
HKR-H/K/R all pass with a concrete no-finetuning mechanism and scale. It stays in the high 60–71 band because this is a single paper with no disclosed code, replication, or product adoption.
editor take
CoMET uses frozen encoders, PCA, and a TFM on 500k samples and 2,000 classes; I don't buy “no training” without TFM head details.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
00:32
20d ago
HuggingFace Papers (takara mirror)· rssEN00:32 · 05·20
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
The study tests three 1-3B instruction-tuned models on GSM8K and finds the last CoT number before the answer delimiter explains 54-92 percentage points of accuracy, with final answers matching that trailing number in 95-96% of incorrect items.
#Reasoning#Interpretability#Benchmarking#Qwen
why featured
HKR-H/K/R all pass, but the scope is 3 small 1-3B models on GSM8K, making it a useful reasoning paper rather than featured news. No hard-exclusion rule applies; score stays at the top of 60-71.
editor take
Three 1-3B models on GSM8K get 54-92 accuracy points from the last CoT number; small-model arithmetic CoT is answer transport.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1

more

feeds

admin