ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-05-25

200 papers · updated 3m ago
2026-05-25 · Mon
22:04
14d ago
HuggingFace Papers (takara mirror)· rssEN22:04 · 05·25
Research paper proposes Energy-Gated Attention and Wavelet Positional Encoding
The paper proposes Energy-Gated Attention and Morlet Positional Encoding for Transformer attention, and their combination improves TinyShakespeare validation loss by +0.119, while all experiments stay at small scale with no more than 6M parameters and a single seed.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via two mechanisms and a TinyShakespeare number. HKR-H/R are weak: ≤6M params and one seed make this far from product impact or mainstream training decisions.
editor take
EGA+MoPE cuts TinyShakespeare val loss by 0.119; at ≤6M params and one seed, don't ship it into LLM attention yet.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
19:30
14d ago
HuggingFace Papers (takara mirror)· rssEN19:30 · 05·25
Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer fine-tunes BiomedCLIP by updating only 0.11% of parameters, adding evidential uncertainty estimates and Dempster-Shafer cross-modal confidence fusion, and evaluates few-shot learning and domain generalization on 15 biomedical imaging datasets covering 8 organs and 8 modalities.
#Multimodal#Vision#Fine-tuning#BiomedCLIP
why featured
HKR-K passes via the 0.11% parameter update and 15-dataset evaluation. HKR-H/R are weak, and biomedical VLM tuning is narrow for general AI practitioners, so this sits in the all band.
editor take
Evi-Steer tunes 0.11% of BiomedCLIP; 15 datasets are solid, but the clinical-deployment claim needs a haircut.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
18:57
14d ago
HuggingFace Papers (takara mirror)· rssEN18:57 · 05·25
Frequency-Guided Fusion for RGB-Thermal Semantic Segmentation
The paper proposes a dual-ConvNeXt V2 RGB-thermal segmentation architecture; its lightest variant reaches 61.73% mIoU on MFNet and 86.24% on PST900 with 35.43M parameters, using frequency-based early fusion, cross-modal late fusion, and a PANet-style bidirectional decoder.
#Multimodal#Vision#Research release#Open source
why featured
HKR-K passes via architecture, parameter count, and mIoU numbers; HKR-H and HKR-R fail because the angle is a niche vision-paper benchmark. No hard exclusion, but audience fit keeps it in the 40–59 band.
editor take
Lightest model hits 61.73 MFNet and 86.24 PST900 mIoU; I want memory and FPS, since 35.43M params isn't edge-friendly.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
18:12
14d ago
HuggingFace Papers (takara mirror)· rssEN18:12 · 05·25
LongAV-Compass: Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
LongAV-Compass introduces a minute-scale audio-visual generation benchmark with 284 curated test cases across T2AV, I2AV, and V2AV, evaluating 11 representative models on more than 20 dimensions including narrative coherence, semantic alignment, and audio-visual synchronization.
#Multimodal#Audio#Benchmarking#LongAV-Compass
why featured
HKR-K is solid with 284 cases, 20+ dimensions, and 11 models; HKR-R fits AV-generation evaluation pain. HKR-H is weak and source impact is unclear, so this stays in the 60–71 band.
editor take
LongAV-Compass tests 11 models on 284 cases; minute-scale AV finally gets a ruler, but MLLM scoring needs auditing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
17:59
14d ago
HuggingFace Papers (takara mirror)· rssEN17:59 · 05·25
AgentSociety: Incentivizing Agentic Social Intelligence
The paper proposes AgentSociety, a mechanism for multi-agent collaboration using liquid democracy and information diffusion, proves incentive-compatible delegation, and characterizes Nash equilibrium; the RSS snippet does not disclose dataset counts, model names, or benchmark scores.
#Agent#Reasoning#Benchmarking#AgentSociety
why featured
HKR-H and HKR-K pass: the mechanism is novel and makes testable theoretical claims. No dataset count or benchmark scores are disclosed, and the paper stays theoretical, so it fits the 60–71 band.
editor take
AgentSociety proves incentive-compatible delegation and Nash equilibria, but withholds model names and scores; elegant mechanism, weak evidence so far.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
17:56
14d ago
arXiv · cs.AI· atomEN17:56 · 05·25
Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models
The paper presents a two-stage LLM pipeline for taxonomy-based code change labeling, evaluates four models on a manually curated benchmark of natural and synthetic patches, and reports up to 84% recall and 81% precision in its best configuration.
#Code#Tools#Benchmarking#Research release
why featured
HKR-K/R pass: the paper gives a concrete pipeline, 4-model evaluation, and 84%/81% metrics for code review. HKR-H is weak, and this is a single arXiv methods paper, not a product or market event.
editor take
Two-stage labeling hits 84% recall and 81% precision across 4 models; I buy structured review, not replacing static analysis.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
17:53
14d ago
HuggingFace Papers (takara mirror)· rssEN17:53 · 05·25
Pixel-Level Pavement Distress Assessment Using Instance Segmentation
The paper evaluates Mask R-CNN on UWGB-StreetCrack roadway images, and the ResNet-101 FPN variant reaches 84.23% precision, 90.04% recall, and 87.04% F1 under its project-specific bounding-box matching protocol.
#Vision#Benchmarking#Mask R-CNN#Detectron2
why featured
This is a narrow applied-vision paper: HKR-K passes on concrete metrics, while HKR-H and HKR-R fail. No product, platform, or general-model impact, so it stays in the low-value band.
editor take
Mask R-CNN hits 87.04% F1 on UWGB-StreetCrack; the catch is box matching, while mask-level evaluation is still missing.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
17:52
14d ago
arXiv · cs.AI· atomEN17:52 · 05·25
Channel-wise Vector Quantization
The paper presents CVQ, which quantizes feature-map channels instead of patch feature vectors. Its CAR model uses next-channel prediction, reaches 100% codebook utilization with a 16K+ codebook, and reports DPG 86.7 and GenEval 0.79 for text-to-image generation.
#Vision#Multimodal#Benchmarking#Research release
why featured
HKR-K passes via a concrete mechanism and 16K+ codebook utilization. HKR-H and HKR-R are weak, and the paper targets specialist vision-tokenization readers without product or open-source impact, so it stays at 58.
editor take
CVQ reports 100% utilization on a 16K+ codebook; I buy the tokenization bet, not the “human artist” framing.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
17:37
14d ago
HuggingFace Papers (takara mirror)· rssEN17:37 · 05·25
WhoSaidIt Multilingual Speaker-Attribute Classification Dataset Released
The authors propose a human-LLM collaborative re-annotation framework and build WhoSaidIt, a multilingual dataset covering 9 speaker-attribute labels, then benchmark recent LLMs and analyze how explicit rationales affect model behavior.
#Alignment#Benchmarking#WhoSaidIt#Research release
why featured
HKR-K passes on a new multilingual dataset, 9 attribute labels, and LLM benchmarks. HKR-H and HKR-R are weak because the title is academic and the post gives no metrics or production stakes.
editor take
WhoSaidIt covers 9 speaker attributes; languages and sample size are undisclosed, so don’t treat it as a solid benchmark yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
17:08
14d ago
arXiv · cs.CL· atomEN17:08 · 05·25
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals
The paper evaluates six confidence-estimation methods for activation oracles with 6,000 samples per oracle, and bootstrap mode frequency is best calibrated among tested methods, with 5.7% ECE on Qwen3-8B versus 25.5% for answer-word log probability.
#Interpretability#Benchmarking#Qwen#Research release
why featured
HKR-K passes because the paper gives testable calibration numbers. HKR-H is weak and HKR-R is narrow to interpretability readers, with no hard exclusion; this fits a useful but non-featured research item.
editor take
Six confidence methods, 6,000 samples per oracle; bootstrap mode hits 5.7% ECE, making log-prob’s 25.5% look sloppy.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
17:05
14d ago
arXiv · cs.CL· atomEN17:05 · 05·25
Peak-Then-Collapse and the Four Interface Channels of Knowledge-Graph Tool Use
The study trains Qwen2.5-7B-Instruct with GRPO on a four-verb Freebase API, raising tool-grounded answer rate from 3.8% to 9.6% over 250 steps before it falls to 0% within 50 steps across four seeds. One-iteration self-distillation reaches 40.0% EM at 7B, while 14B improves by only 0.25 percentage points.
#Agent#RAG#Reasoning#Qwen
why featured
HKR-H/K/R all pass, but this is a single arXiv paper on KG tool use, not a major model or product release. The collapse and distillation numbers are useful, yet the reach stays below featured.
editor take
GRPO lifts Qwen2.5-7B to 9.6% in 250 steps, then zeroes it; sparse KG APIs expose RLVR’s feedback debt.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
16:15
14d ago
HuggingFace Papers (takara mirror)· rssEN16:15 · 05·25
Causal Methods for LLM Development and Evaluation
The paper makes three contributions and argues that causal methods should be used across pretraining, alignment, routing, agentic workflows, and evaluation to handle confounding, distribution shifts, biased learned judges, and non-stationary deployment environments.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper applies causal methods to pretraining, alignment, routing, agents, and evals with concrete failure modes. HKR-H fails; no artifact, benchmark delta, or major-lab release, so it stays in all.
editor take
The paper claims 3 contributions across pretraining to eval; causal framing is right, but no experiments or identification conditions are disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
15:29
14d ago
HuggingFace Papers (takara mirror)· rssEN15:29 · 05·25
QUIET: Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation
QUIET proposes a multi-blank cascaded Story Cloze benchmark for LLM creative generation, placing 10-20 constrained blanks in each story and scoring answers automatically with score=satisfy*(1+lambda*surprise), where lambda is 1.0.
#Benchmarking#Reasoning#QUIET#Zou & Xu
why featured
HKR-K passes: QUIET has a concrete multi-blank setup and scoring formula. HKR-H/R are weak, and this is a regular research benchmark rather than a major model or product release.
editor take
QUIET uses 10–20 cascaded blanks per story; I don’t buy “objective creativity scoring” without disclosed surprise judging details.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
15:20
14d ago
HuggingFace Papers (takara mirror)· rssEN15:20 · 05·25
LRDDv3: High-Resolution Long-Range Drone Detection Dataset with Range Information and Thermal Data
LRDDv3 provides 102,532 long-range RGB drone images sampled at 5 FPS from 128 video clips across 17 collection days over 8 months, with range annotations and 29,630 paired 640x512 IR images.
#Vision#Benchmarking#Drexel University#Research release
why featured
HKR-K passes: the post gives concrete dataset scale and modality details. HKR-H/R are weak because this is a narrow vision benchmark, not a platform product, model release, or broad practitioner debate.
editor take
LRDDv3 ships 102,532 long-range RGB frames; honestly, drone detection needs this range-labeled messy data more than cleaner demos.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
14:22
14d ago
HuggingFace Papers (takara mirror)· rssEN14:22 · 05·25
D²-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing
D²-Monitor uses hesitation steps near a probe decision boundary to route D-LLM safety checks, evaluating the method on 3 datasets and 4 diffusion LLMs with a parameter footprint of no more than 0.85M and comparisons against 8 baselines.
#Safety#Inference-opt#Benchmarking#OpenAI
why featured
HKR-H/K pass: hesitation-aware routing is a concrete mechanism, and the evaluation setup has numbers. The D-LLM safety angle is research-heavy; deployment impact, cost delta, and mainstream model relevance are not disclosed, so this stays all.
editor take
D²-Monitor routes heavy probes by hesitation steps across 3 datasets and 4 D-LLMs; clean idea, but D-LLM safety ops still feels unproven.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
14:19
14d ago
HuggingFace Papers (takara mirror)· rssEN14:19 · 05·25
SP-MoMamba: Superpixel-driven Mixture of State Space Experts for Efficient Image Super-Resolution
SP-MoMamba replaces fixed-grid Mamba scanning with superpixel-level tokens for image super-resolution, then uses MSS-MoE dynamic routing to assign scale-specific state-space experts and LSME for local high-frequency detail; the snippet says standard benchmarks show better fidelity and efficiency trade-offs, but it does not disclose PSNR, runtime, parameter count, datasets, or code availability.
#Vision#Inference-opt#Research release#Benchmark
why featured
HKR-K passes via the superpixel scan and MSS-MoE routing mechanism. PSNR, speed, and parameter count are not disclosed, and this is a narrow super-resolution paper.
editor take
SP-MoMamba swaps fixed scans for superpixels; no PSNR, latency, or params disclosed, so I’d file it as a clever architecture paper.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
14:04
14d ago
HuggingFace Papers (takara mirror)· rssEN14:04 · 05·25
DyCoRM: Dynamic Criterion-Aware Reward Modeling for Text-to-Image Generation
DyCoRM introduces a dynamic criterion-aware reward model for text-to-image generation, builds DyCoDataset-20K with criterion-level annotations, and derives DyCoBench-1K to evaluate reward models under task-relevant dynamic criteria.
#Vision#Alignment#Benchmarking#DyCoRM
why featured
HKR-K and HKR-R pass via named datasets and an alignment/eval bottleneck. The abstract lacks performance gains, release status, or reproducible setup, so it stays in the 60–71 research-release band.
editor take
DyCoRM adds criterion-level labels for T2I reward models; DyCoDataset-20K and DyCoBench-1K matter more than the “first framework” claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
13:56
14d ago
HuggingFace Papers (takara mirror)· rssEN13:56 · 05·25
Study of timing dependencies of trust in human-AI teams: speed, accuracy, and neuro-decoupling
Seventeen operators tested Fast/Less-Accurate and Slow/Accurate AI teammates in a VR drone search task: fast AI drove human accuracy under deception down to 50.2%, while slow AI caused hesitation but let N=8 behavioral teams recover to 100.0%.
#Agent#Robotics#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but the study has 17 participants and a VR drone setup, so product impact is not established. This fits the 60–71 research-interest band.
editor take
17 operators tested AI timing in VR drones; fast-wrong AI cut deception accuracy to 50.2%. Blind compliance beats error rate as the hazard.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
13:50
14d ago
HuggingFace Papers (takara mirror)· rssEN13:50 · 05·25
SAM3-Assisted Training of Lightweight YOLO Models for Precision Pig Farming
The paper uses SAM 3 as an offline zero-shot pseudo-labeler to train YOLOv8 detectors, and on PigLife a SAM 3-supervised YOLOv8m reaches 79.4% mAP without human labels while cutting inference latency by about 200× versus the teacher model.
#Vision#Fine-tuning#Inference-opt#SAM 3
why featured
HKR-K is solid with 79.4% mAP and 200x latency reduction; HKR-H/R pass mainly on the odd vertical and labeling-cost angle. The pig-farming niche keeps it below featured.
editor take
SAM 3 pseudo-labels train YOLOv8m to 79.4% mAP; farm-edge vision still lives or dies on low occlusion.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
13:38
14d ago
HuggingFace Papers (takara mirror)· rssEN13:38 · 05·25
On the Limits of Model Merging for Multilinguality in Pre-Training
The paper compares mixed, merged, and monolingual pre-training setups, finding that merging monolingual models causes performance collapse from interference, while representational similarity is required for model merging to work.
#Fine-tuning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the paper makes a testable negative claim against direct monolingual-to-multilingual merging. It stays in pre-training research, with no production replacement, major model result, or tool release, so it lands below featured.
editor take
The paper tests mixed, merged, and monolingual pre-training; monolingual model merging collapses, so fine-tune merging lore fails here.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
12:41
14d ago
HuggingFace Papers (takara mirror)· rssEN12:41 · 05·25
When Can We Trust Early Warnings? Leakage-Excluded Early Outcome Prediction from LMS Interaction Logs
The paper introduces LEAP, a cutoff-first protocol for LMS early outcome prediction, and evaluates it on OULAD across weekly cutoffs; performance rises as the observation window expands, with a clear gain around week 3, using ROC-AUC, PR-AUC, Brier score, and F1@0.5.
#Benchmarking#Open University Learning Analytics Dataset#Research release#Benchmark
why featured
HKR-K passes: LEAP, weekly OULAD truncation, and ROC-AUC/PR-AUC/Brier/F1@0.5 give reproducible detail. The LMS education-data angle lacks product or industry impact, so it stays low-value signal.
editor take
LEAP cuts OULAD logs weekly; week 3 jumps. For early-warning papers, audit assessment leakage before trusting AUC.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
12:00
14d ago
HuggingFace Papers (takara mirror)· rssEN12:00 · 05·25
DeGRe: Dense-supervised Generative Reranking for Recommendation
DeGRe uses a Lookahead Evaluator to mine high-value sequences offline, distills step-wise value estimates into a lightweight Online Generator, and requires one greedy decoding pass during online inference. The paper says DeGRe outperforms baselines on public and industrial datasets and is deployed on Taobao Flash Shopping, but the snippet does not disclose exact gains.
#Reasoning#Inference-opt#Taobao Flash Shopping#Research release
why featured
DeGRe clears HKR-K/R via a concrete reranking mechanism and Taobao Flash Shopping deployment. No uplift numbers are disclosed, and the recsys scope keeps it in the upper 60-71 band.
editor take
DeGRe runs one greedy pass online; Taobao deployment is claimed, but no lift numbers are disclosed, so treat it as offline-search distillation.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
11:09
14d ago
HuggingFace Papers (takara mirror)· rssEN11:09 · 05·25
CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning
CMAP uses frozen CLIP text prototypes for task routing, multi-prototype visual-textual confidence, and symmetric cross-modal gating; on the MTIL benchmark with 11 datasets and 1,201 classes, it reaches 74.2% Transfer, 80.5% Average, and 88.7% Last with 2.5M trainable parameters and no external data.
#Multimodal#Vision#Fine-tuning#CLIP
why featured
HKR-K passes on concrete benchmark scale, parameter count, and metrics. HKR-H/R are weak: this is a narrow multimodal continual-learning paper without open-source detail, replication conditions, or a production-impact claim.
editor take
CMAP hits 80.5% Average on MTIL with 2.5M parameters; using CLIP text space for routing exposes a PEFT blind spot.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
08:26
15d ago
HuggingFace Papers (takara mirror)· rssEN08:26 · 05·25
AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution
AnE trains multimodal LLMs with Truth Anchor Expansion and a Scaffold-Stripping Mechanism, improving the base model by 10.3% across eight multimodal reasoning benchmarks while the post says the code will be made public.
#Reasoning#Multimodal#Fine-tuning#Research release
why featured
HKR-H and HKR-K pass: the method names, training mechanisms, and +10.3% on 8 benchmarks add signal. HKR-R is weak, with no major-lab tie or reproducible artifact disclosed, so this stays below featured.
editor take
AnE gains 10.3% on eight multimodal reasoning benchmarks. Anchor retrieval beats synthetic self-talk, but base model and code are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
06:59
15d ago
HuggingFace Papers (takara mirror)· rssEN06:59 · 05·25
Full-4D: Generating Full-Scope 4D Scenes from a Single-View Video
Full-4D converts a single-view video into a full-scope 4D scene through multi-view video synthesis followed by 4DGS reconstruction, using the Real-MV-4D synchronized multi-view dataset, fused time-view attention with reprojection priors, and a Flow Matching Distillation loss for novel-view rendering.
#Vision#Multimodal#Full-4D#Real-MV-4D
why featured
HKR-H/K pass: the single-view-to-4D hook is clear and the post names a dataset plus methods. HKR-R is weak, with no metrics, release status, or major-lab angle, so it stays in all.
editor take
Full-4D claims single-view video to full-scope 4D; dataset scale is undisclosed, so I trust Real-MV-4D before “full-scope.”
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
● P1arXiv · cs.LG· atomEN04:00 · 05·25
Research Shows Weak Teachers Can Effectively Distill Larger Student Models in LLM Pretraining
The arXiv paper tests strong-to-weak, same-level, and weak-to-strong teacher-student setups by varying architecture size and token budgets, and finds that small or undertrained teachers can improve larger students when language modeling and distillation losses are mixed properly.
#Fine-tuning#Benchmarking#arXiv#Research release
why featured
HKR-H/K/R all pass: the title has a counterintuitive hook, and the summary gives a test setup plus mixed LM/distillation loss. Single arXiv paper without cross-source uptake or deployment proof, so it stays in the quality-research band.
editor take
Only an arXiv dual-listing title is disclosed, no experiments. If weak-teacher pretraining distillation holds, big-teacher API lock-in takes a hit.
sharp
Both sources are the same arXiv title cross-listed in cs.CL and cs.LG, so the coverage is aligned but single-chain. The disclosed text gives no model sizes, data budget, loss setup, or benchmarks, only the claim that weak teachers can work in LLM pretraining. The sharp part is the target: it attacks the default engineering belief that distillation needs a stronger teacher. If weak-teacher signals help during pretraining, the gain is not cheap labels; it is denser distributional guidance for the student. Open-weight teams like DeepSeek and Qwen already showed that data recipe can beat brand-name model strength. If this only holds on small students or narrow corpora, the claim shrinks fast. Until the tables are visible, I read it as a serious challenge to distillation economics, not a settled result.
HKR breakdown
hook knowledge resonance
open source
90
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Evaluating Memory Structure in LLM Agents
The paper proposes StructMemEval to test whether LLM agents organize long-term memory, not just recall facts. It uses tasks such as transaction ledgers, to-do lists, and trees. Initial experiments find simple retrieval-augmented LLMs struggle, while memory agents solve them reliably when prompted with the target memory structure.
#Agent#RAG#Memory#StructMemEval
why featured
HKR-H/K/R pass: StructMemEval reframes agent memory as structured state maintenance, with ledger/todo/tree tasks. No authors, model list, or scores are disclosed, so it stays in the 60–71 band.
editor take
StructMemEval tests structured memory, scores undisclosed; simple RAG failing ledgers and trees is the right wound to press.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Tensor Cache: Eviction-conditioned Associative Memory for Transformers
Tensor Cache uses sliding-window attention as L1 and writes evicted KV pairs into a fixed-size L2 outer-product memory; the paper says it improves the memory-quality frontier over bounded-state baselines across four evaluation settings, including long-context language modeling.
#Memory#Inference-opt#Reasoning#Kabir Swain
why featured
HKR-H/K/R land: the paper gives a concrete L1/L2 memory design and claims wins across four long-context-related evaluations. Single arXiv paper, no code, cost numbers, or external replication, so it stays below the featured threshold.
editor take
Tensor Cache catches evicted KV in fixed L2 outer-product memory; the sharp bit is exposing C²-C fake cross-token terms in chunked-mean training.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Goal-Conditioned Agents that Learn Everything All at Once
The paper introduces LEO, which outputs values and actions for every goal in one network pass; it outperforms comparison methods on goal-conditioned Craftax and runs over 250 times faster than all-goals relabelling.
#Agent#Reasoning#Inference-opt#arXiv
why featured
HKR-H and HKR-K pass: the title has an “all at once” hook, and the summary gives LEO’s mechanism plus a 250x efficiency claim. Impact stays academic-RL-heavy, so it falls below featured.
editor take
LEO emits all-goal values and actions in one pass, >250x faster; strong on Craftax, merely competitive on control.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training
CapTrack evaluates forgetting in LLM post-training across algorithms, domains, and model families up to 80B parameters, finding that drift extends beyond factual knowledge into robustness and default behaviors.
#Fine-tuning#Benchmarking#Alignment#CapTrack
why featured
HKR-K/R pass: 80B coverage plus robustness and default-behavior drift give post-training teams concrete checks. HKR-H is weak, and this is a single arXiv benchmark without disclosed tooling or discussion, so it stays at all.
editor take
CapTrack tests forgetting up to 80B; robustness and default-behavior drift belong in evals, not another factual-QA leaderboard.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Moonwalk: Inverse-Forward Differentiation
Moonwalk uses vector-inverse-Jacobian products and fragmental gradient checkpointing to reconstruct parameter gradients without storing activations, matching backpropagation runtime while training networks more than twice as deep under the same memory budget.
#Fine-tuning#Inference-opt#Moonwalk#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete autodiff mechanism and a >2x depth claim. HKR-H is weak, and this is a single arXiv item with no code, adoption, or reproduction scope disclosed.
editor take
Moonwalk trains over 2× deeper nets at fixed memory; the catch is submersive layers, so Transformer proof matters.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
HARNESS-LM distills a billion-parameter SLM teacher retriever, including Qwen3-Embedding-4B/8B-class models, into a sub-600M student encoder through three phases, recovering over 98% of teacher precision on Bing Ads benchmarks while cutting online query-encoder latency by up to 27x on NVIDIA A100 GPUs.
#Embedding#Fine-tuning#Inference-opt#Qwen
why featured
HKR-H/K/R all pass, but this is a single niche retrieval paper focused on ads and embedding compression. No open-source artifact or production rollout is disclosed, so it stays at the top of 60-71.
editor take
HARNESS-LM’s 190M student drove +1% revenue in Bing Ads A/B; ad retrieval keeps proving distillation beats shipping 4B encoders.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning
SymNoise raises AlpacaEval on LLaMA-2-7B fine-tuned with Alpaca from 29.79% under standard training to 69.04% with symmetric noisy embeddings, versus 64.69% for NEFTune; the paper also reports consistent gains over NEFTune on Evol-Instruct, ShareGPT, and OpenPlatypus, while arguing uniform and Gaussian noise show comparable performance.
#Embedding#Fine-tuning#Benchmarking#SymNoise
why featured
HKR-H/K/R all pass, but this is a single arXiv fine-tuning technique tested on LLaMA-2-7B+Alpaca and AlpacaEval, without cross-model production evidence; 70 keeps it in all.
editor take
SymNoise hits 69.04% AlpacaEval on LLaMA-2-7B+Alpaca. I'd verify eval setup first; gains that large on old 7B bases often inflate.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
The paper models RLVR verifier errors as a stochastic reward channel with FP rate ρ0 and FN rate ρ1, then derives backward and forward corrections; the forward variant only needs the FN rate and is more stable under heavier synthetic and real verifier noise.
#Reasoning#Alignment#Inference-opt#arXiv
why featured
HKR-H/K/R pass, but this is still an arXiv methods paper: clear mechanism, no disclosed benchmark gain, code, or production validation. It fits all, below the featured threshold.
editor take
The paper splits RLVR verifier noise into FP ρ0 and FN ρ1; forward only needs FN, a cleaner GRPO patch.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Reading Calibrated Uncertainty from Language Model Trajectories
Aliai Eusebi and five coauthors propose extracting 11 scale-invariant geometric features from per-layer MLP update trajectories, then feeding them to a sparse linear probe; under selective abstention, the probe outperforms maximum softmax probability, with gains scaling with baseline miscalibration up to 21 AURC points.
#Interpretability#Benchmarking#Alignment#Aliai Eusebi
why featured
HKR-H/K/R pass, but this is an arXiv research paper centered on trajectory geometry and sparse probes, with no production replacement claim or major-lab release; it fits the upper 60–71 band.
editor take
Eusebi’s 11 geometric MLP-trajectory features add up to 21 AURC points; I buy the signal, not yet open-generation proof.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
TingIS Enterprise Risk Event Discovery System Research Published
TingIS processes more than 2,000 messages per minute at peak and 300,000 messages per day in production, with 3.5-minute P90 alert latency and a 95% discovery rate for high-priority incidents.
#RAG#Tools#Benchmarking#TingIS
why featured
HKR-K and HKR-R pass via production-scale throughput, latency, and discovery metrics tied to incident detection. HKR-H is weak, and this is not a top-lab release or widely clustered product update, so it stays in the 60–71 band.
editor take
TingIS handles 300K daily messages with 3.5-min P90 alerts; I trust these LLM-plus-index-plus-rules dirty-work systems more.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
PACE: Two-Timescale Self-Evolution for Small Language Model Agents
PACE evaluates frozen 4B–14B small language models on four controlled benchmarks, ranks best across all 12 backbone-benchmark pairs, and improves over vanilla SLM agents by up to 9.2% relative without weight updates or frontier-model teachers.
#Agent#Tools#Benchmarking#PACE
why featured
HKR-H/K/R pass on a concrete SLM-agent efficiency claim, but this is a single arXiv method paper with no released artifact, production case, or top-lab signal; impact stays below featured.
editor take
PACE wins 12/12 settings, up to +9.2%; I buy the engineering angle—frozen SLMs still have juice with validation-gated evolution.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention
ThriftAttention computes 5% of query-key blocks in FP16 and the rest in FP4, then merges both paths with online softmax. Across long-context benchmarks and model families, it recovers 89.1% of the FP4-to-FP16 performance gap on average, and its reported advantage grows with sequence length.
#Inference-opt#Benchmarking#Research release#Open source
why featured
HKR-K and HKR-R are strong: mechanism, number, and open code are clear. HKR-H is weak, and the low-level inference-optimization scope keeps it in all rather than featured.
editor take
ThriftAttention promotes 5% of QK blocks to FP16. If 89.1% recovery reproduces, FP4 long-context gets much less scary.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Understanding Goal Generalisation in Sequential Reinforcement Learning
The paper studies over 100 sequential RL training pipelines across more than 250 out-of-distribution environments, and introduces latent policy gradients to predict which out-of-distribution behaviors a training pipeline induces.
#Agent#Reasoning#Interpretability#Research release
why featured
HKR-K/R pass: the scale and latent policy gradients are concrete, and agent safety is relevant. HKR-H is weak, and this single arXiv paper lacks tooling or visible industry debate, so it stays in 60–71.
editor take
This paper tests 100+ RL pipelines; early goals persist, which makes single-task OOD evals look too clean.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning
FuRA uses a block tensor-train factorization, W = LSR, for full-rank adaptation. It fixes the pretrained block-wise SVD basis L, optimizes compact R and singular values S, reports +1.37 over Full FT on LLaMA-3-8B commonsense reasoning, and says 4-bit QFuRA also beats QLoRA.
#Fine-tuning#Inference-opt#Benchmarking#Yequan Zhao
why featured
HKR-H/K/R pass, but this is still a method paper with evidence centered on LLaMA-3-8B commonsense tests and 4-bit comparisons. No broad reproduction or toolchain adoption is disclosed, so it stays in 60–71.
editor take
FuRA beats Full FT by 1.37 on LLaMA-3-8B commonsense; I buy the spectral preconditioning angle, pending larger-data fine-tunes.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception
CVSearch introduces a training-free adaptive framework for high-resolution image perception, using an Assess-then-Search workflow to schedule expert-assisted search and semantic-aware scanning; the abstract reports state-of-the-art accuracy on HR benchmarks, but does not disclose dataset names or numeric gains.
#Multimodal#Vision#Inference-opt#CVSearch
why featured
HKR-K/R pass: the training-free high-res visual search mechanism is useful and relevant to multimodal builders. HKR-H is weak, and the post gives no concrete accuracy numbers, code status, or reproducibility details, so it stays in the 60–71 band.
editor take
CVSearch makes HR vision a training-free router; no benchmark names or gains disclosed, so I read it as inference plumbing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection
ConjNorm uses a Bregman-divergence framework for density-based OOD scoring and estimates the partition function with Monte Carlo importance sampling; on CIFAR-100 and ImageNet-1K FPR95 benchmarks, it outperforms the current best method by up to 13.25% and 28.19%.
#Benchmarking#ConjNorm#Research release#Benchmark
why featured
HKR-K and HKR-R pass: the method and CIFAR-100/ImageNet-1K numbers are concrete, and OOD detection maps to reliability. HKR-H is weak, and this is a single arXiv paper with no adoption artifact, so it stays in 60–71.
editor take
ConjNorm cuts FPR95 by up to 13.25%/28.19% on CIFAR-100/ImageNet-1K; I’d audit sampling cost before buying the SOTA table.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Google Introduces Orbax Distributed Checkpointing Library for JAX
Google introduces Orbax as a JAX-native distributed checkpointing library, reporting up to 3.5x faster saving and 2x faster loading than comparable PyTorch checkpointing alternatives.
#Tools#Inference-opt#Google#JAX
why featured
HKR-H/K pass via the PyTorch comparison and concrete speedups, but the JAX checkpointing topic is narrow ML infrastructure. Google source and numbers keep it useful, not featured.
editor take
Orbax claims 3.5x faster saves than PyTorch rivals; the bigger test is ending JAX’s DIY checkpoint mess.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
DynMuon: A Dynamic Spectral Shaping View of Muon
DynMuon replaces Muon-style updates with UΣ^pV^T and schedules p from positive to mildly negative during training. The paper reports lower validation loss than Muon across model sizes, architectures, and training settings, and reaches the same target loss with 10.6–26.5% fewer steps.
#Fine-tuning#Inference-opt#Benchmarking#DynMuon
why featured
HKR-K/R pass: new optimizer mechanism and 10.6–26.5% fewer steps. HKR-H fails; niche spectral-shaping optimizer work keeps it in all, not featured.
editor take
DynMuon schedules UΣ^pV^T and cuts 10.6–26.5% steps; I’d test whether big batches and long runs erase the gain.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Instance-Optimal Estimation with Multiple LLM Judges on a Budget
The paper formalizes LLM-as-a-judge evaluation as budgeted heteroskedastic multi-judge estimation with K prompt-response pairs and J judges. It proposes EST-IVWE, an adaptive allocation algorithm using optimistically biased variance estimates, and proves it matches the oracle IVWE rate up to lower-order budget terms, with validation on synthetic data and HelpSteer2.
#Benchmarking#Research release#Benchmark
why featured
HKR-K and HKR-R pass: the paper adds a concrete K/J budget-allocation mechanism for LLM judges. HKR-H is weak, and the item lacks scale, code, or deployment evidence, so it stays in the 60–71 band.
editor take
EST-IVWE makes K-sample, J-judge eval budgeting provably near-oracle; I buy the move from judge voting vibes to variance allocation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models
The paper proposes encoding visual signals as low-rank adaptation functions attached to a frozen diffusion generative model, then hashing an 81-frame video into one compact vector for perceptual video compression at extremely low bitrates.
#Vision#Multimodal#Inference-opt#Research release
why featured
HKR-H and HKR-K pass: 81 frames hashed into one vector and low-rank adapters on frozen diffusion models are concrete. The paper lacks disclosed reproducible metrics or production impact, so it stays in all.
editor take
The paper hashes 81 video frames into one vector; I want reconstruction metrics before trusting generative-prior compression.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models
The authors apply Transcoders to Gemma 3-4B-IT to decompose MLP computation paths linking image patches to token directions, and a logistic classifier using graph features from circuit traces predicts hallucinations with AUC 0.68.
#Multimodal#Vision#Interpretability#Gemma
why featured
HKR-H/K/R pass, but this is a single arXiv interpretability paper with a modest AUC 0.68 hallucination signal. Technical accessibility keeps it below the featured band.
editor take
Transcoders hit AUC 0.68 on Gemma 3-4B-IT; promising interpretability, still weak as hallucination detection.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization
The paper proposes tabularizing graph data with logic-based Weisfeiler-Leman variants and tests the method on 14 datasets; with up to 40,000 samples, it generally matches GNNs and graph transformers without a GPU, and remains 5–20× faster even when its tuning time is included.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K is solid: the paper gives dataset count, sample scale, speedups, and comparisons to GNNs/graph Transformers. HKR-H has a clear replacement-style hook, but graph learning is too niche for broad HKR-R, so it stays in all.
editor take
WL tabularization matches GNNs on 14 graph datasets and runs 5–20× faster; I’d bet it eats mid-size graph baselines first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering
The paper evaluates multi-level Floyd-Steinberg dithering as a model-agnostic defense across 6 vision tasks, 2 model families, 3 attack types, and an adaptive straight-through-estimator attacker. Intermediate quantization levels with post-processing blur match or exceed tested baselines, including diffusion-based denoising, while causing less degradation on clean inputs.
#Vision#Multimodal#Safety#DINOv2
why featured
HKR-H comes from the old dithering method used against new VFM attacks, and HKR-K has concrete tasks, model families, attacks, and adaptive tests. The work is niche vision-robustness research, not a production-pipeline replacement or major model update.
editor take
Floyd-Steinberg dithering spans 6 tasks, 2 model families, 3 attacks; cheap preprocessing beats diffusion denoising here.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
BarrierSteer: LLM Safety via Learning Barrier Steering
BarrierSteer applies hidden-state safety classifiers as CBF constraints at inference time and steers latent trajectories without changing LLM parameters; the paper says experiments across multiple model families and datasets reduce attack success rates and unsafe generations, but the snippet does not disclose exact reductions.
#Safety#Inference-opt#Alignment#BarrierSteer
why featured
HKR-H/K/R all pass, but the post lacks attack-success-rate deltas, model list, and reproduction conditions. This is a useful safety paper, not a same-day must-write.
editor take
BarrierSteer steers hidden states with CBFs at inference; no reductions disclosed, so latency versus refusal-head baselines is the tell.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Memorization Dynamics of Fill-in-the-Middle Pretraining
The study pretrains matched Llama 3.2 models on repeated Gutenberg excerpts, comparing FIM with left-to-right training. FIM recovers more short or partial spans, LTR favors long exact continuations, and FIM verbatim extraction grows roughly linearly with repetitions while recall stays prefix-anchored.
#Safety#Benchmarking#arXiv#Llama 3.2
why featured
HKR-K and HKR-R pass: the paper gives a testable FIM-vs-LTR setup and speaks to leakage risk. HKR-H is weak, and as a single arXiv study without product impact it stays in 60–71/all.
editor take
FIM memorization rises roughly linearly on repeated Gutenberg; LTR-style long-continuation tests undercount short-span leakage.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs
The paper tests 120 English verb-construction pairings across four experiments. LLM surprisal correlates with human acceptability judgments at r = 0.79, and controlled fine-tuning shows that changing competing-form frequencies shifts statistical preemption behavior.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-H/K pass: the title has a counterintuitive question, and the paper reports experiments, sample size, correlation, and a fine-tuning causal intervention. HKR-R is weak; this is academic mechanism work, not same-day industry news.
editor take
Four experiments cover 120 pairings with r=0.79; don’t mistake LLM error-avoidance for explicit grammar knowledge.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models
The paper applies MadEvolve to Bitcoin trading strategy optimization, covering signal feature evolution, strategy-component tuning, and joint feature-pipeline plus execution-strategy evolution, while comparing against Claude Code and evaluating p-hacking probabilities in the simulation setup.
#Agent#Code#Benchmarking#MadEvolve
why featured
HKR-H and HKR-K pass: LLM-evolved trading systems and p-hacking checks are concrete. Single arXiv source, no return numbers or reproducible setup disclosed, so it stays below featured.
editor take
MadEvolve optimizes three Bitcoin backtest tasks; no return numbers are disclosed, so I file this under suspicious quant backtest papers.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations
The paper proposes a framework that detects underspecified features in demonstrations, has a robot explain uncertainty in natural language, and requests corrective demonstrations; evaluation covers a simulated tabletop manipulation task and a real Franka robot user study, where targeted explanation-guided queries outperform random querying and passive data collection for reward recovery.
#Robotics#Alignment#Agent#Franka
why featured
HKR-H/K/R all pass, but this is a single arXiv robotics-alignment paper with no reported metrics, code, or cross-source pickup. The real Franka user study adds signal, keeping it in the 60–71 research band.
editor take
Franka uses feature variance to find underspecified rewards; results beat baselines, but sample counts are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
WMAttack: Automated Attack Search for Adversarial Evaluation of World-Model Agents
WMAttack searches attack configurations for world-model agents across Atari and DeepMind Control tasks; it raises normalized reward drop from 0.497 to 1.034 on DreamerV3 Atari and from 0.319 to 0.682 on DMC under fixed evaluation budgets.
#Agent#Safety#Benchmarking#WMAttack
why featured
HKR-H and HKR-K pass via automated attack search and concrete reward-drop numbers. HKR-R is weak because the Atari/DMC world-model setting is narrow for AI practitioners, so this stays in the 60–71 band.
editor take
WMAttack pushes DreamerV3 reward drop to 1.034; manual attack tuning now looks indefensible for world-model robustness claims.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Cost-Effective Model Evaluation with Meta-Learning
The paper presents MetaEvaluator, a model-agnostic framework that uses meta-learning over a reference model pool to evaluate unseen models on unlabeled datasets, avoiding per-model retraining while amortizing evaluation cost across the pool.
#Benchmarking#Fine-tuning#MetaEvaluator#Research release
why featured
HKR-K and HKR-R pass: the mechanism targets unlabeled evaluation and avoids per-model retraining. The arXiv summary gives no metrics, model scope, or artifact, so this stays in all.
editor take
MetaEvaluator scores unseen models on unlabeled data via a reference pool; no cost multiple is disclosed, so “no retraining” isn’t free.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Self-Improving In-Context Learning
The paper proposes optimizing continuous embeddings of a fixed few-shot prompt at test time, using output log-probabilities from a single forward pass as a self-supervised confidence proxy. The method requires no finetuning, token generation, predefined label set, or external data, and applies to classification and free-form generation tasks.
#Reasoning#Embedding#Inference-opt#arXiv
why featured
HKR-H and HKR-K pass: the paper has a self-improving ICL hook and a concrete test-time embedding mechanism without labels or external data. No metrics, artifact, or production evidence keeps it below featured.
editor take
It optimizes few-shot embeddings from one forward-pass log-probs; models and gains are undisclosed, so “self-improving” is doing PR work.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation
SyMerge jointly optimizes merging coefficients and one task-specific layer, reports state-of-the-art results across vision, dense prediction, and NLP benchmarks, and merges models trained from different initializations where standard methods break down.
#Fine-tuning#Vision#Benchmarking#SyMerge
why featured
This is a concrete model-merging paper: HKR-K passes via coefficient optimization plus one task-specific layer, and HKR-R touches fine-tune reuse cost. HKR-H is weak and the post gives no deployment numbers or artifact detail, so it stays in all.
editor take
SyMerge adapts one task layer and claims SOTA; I buy the lightweight bet, but the snippet gives no gain table.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
The paper proposes ProxyCoT, which generates chain-of-thought traces from proxy contexts, then grounds them in full long contexts with supervised fine-tuning; the abstract says it outperforms strong baselines across multiple datasets with lower compute overhead, but the snippet does not disclose scores.
#Reasoning#Fine-tuning#Research release
why featured
HKR-H/K pass: the method is novel and testable as a tuning recipe. HKR-R is weak because the post gives no concrete scores, code, or adoption signal, so this stays in the 60-71 band.
editor take
ProxyCoT trains CoT on proxy contexts, then SFTs full contexts; without scores, stop equating 10M tokens with reasoning.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition
Unpack decomposes Transformer credit paths from one forward pass, recovering all three IOI composition connections on GPT-2 small and reproducing duplicate-name suppression across Pythia models from 160M to 6.9B parameters without interventions, gradients, or auxiliary training.
#Interpretability#GPT-2#Pythia#Research release
why featured
HKR-H and HKR-K pass: the title has a concrete hook and the summary gives reproducible model ranges. The work stays in GPT-2/Pythia circuit analysis, with no product impact or broad practitioner controversy, so it fits 60–71.
editor take
Unpack traces credit paths in one forward pass; nice engineering, but GPT-2 IOI is still a narrow proof.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
The Attribution Contract: Feature Attribution for Generative Language Models
The paper introduces the Attribution Contract, a five-part specification for feature-attribution claims in generative language models, naming the output explained, eligible features, assumed generative process, fixed variables, and attributed model score; it uses autoregressive and diffusion language models as cases and argues that many disputes come from unstated contracts rather than attribution algorithms.
#Interpretability#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete 5-part attribution framework for generative LMs. As an arXiv methods paper without benchmarks, code, or visible debate, it stays in the 60–71 band.
editor take
Attribution Contract adds 5 constraints to attribution claims; I buy the direction, since generative models don’t fit classifier-era explanations.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving
The paper proposes CoPhy, which distills VLM knowledge into a BEV encoder and removes the VLM at inference, then uses an auto-regressive BEV world model and GRPO dual rewards; it reports state-of-the-art results on NAVSIM v1 and v2.
#Robotics#Vision#Reasoning#CoPhy
why featured
HKR-H/K/R pass on the VLM-distillation-to-BEV-world-model angle, but this is a single arXiv AV benchmark paper. No code, real-road test, or major-lab product link is disclosed.
editor take
CoPhy drops the VLM after BEV distillation and claims NAVSIM v1/v2 SOTA; I trust the zero-cost semantics more than rollout-derived safety.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Entropy-Aware On-Policy Distillation of Language Models
The paper introduces Entropy-Aware On-Policy Distillation, adding forward KL on high-entropy teacher tokens while retaining reverse KL elsewhere; across six math reasoning benchmarks, it improves Pass@8 over baseline on-policy distillation by +1.37 for Qwen3-0.6B-Base, +2.39 for Qwen3-1.7B-Base, and +5.05 for Qwen3-4B-Base.
#Reasoning#Fine-tuning#Alignment#Qwen
why featured
HKR-K/R pass: the mechanism and six math-benchmark result are concrete, and small-model reasoning cost matters. HKR-H is weak; this remains a routine arXiv method paper below featured threshold.
editor take
Entropy-aware distillation adds +5.05 Pass@8 on Qwen3-4B; forward KL on high-entropy tokens beats squeezing reverse KL harder.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics
LLAMA LIMA v3 analyzes 24 studies, including 3 newly added studies, and estimates a positive effect of generative AI interventions on mathematics learning at g=0.40 with a credible interval of [0.14, 0.67].
#Benchmarking#LLAMA LIMA#Research release#Benchmark
why featured
HKR-K is strong and HKR-R is present, but this is an arXiv meta-analysis update rather than a model, product, or market move. It fits the 60–71 band.
editor take
LLAMA LIMA v3 covers 24 studies, g=0.40; AI math tutoring helps, but replacing teachers lacks support here.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Are Targeted Data Poisoning Attacks as Effective as We Think?
This arXiv paper identifies the easiest and hardest test samples to poison using only clean model information, then stratifies targeted data poisoning vulnerability with clean training dynamics, poison distances, and poison budgets.
#Safety#Benchmarking#arXiv#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv paper with only method framing disclosed; no author authority, experimental numbers, or reproducible setup are given. It fits the 60–71 research-signal band.
editor take
The paper stratifies poisoning targets from clean-model signals; datasets and ASR are undisclosed, but random-target averages look weak.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic
The paper trains a non-linear student by distilling hidden representations from a curvature-regularized linearized teacher, preserving task-vector composition for addition-based merging and subtraction-based unlearning across vision and language benchmarks, while avoiding the inference-time overhead of linearized fine-tuning; the RSS abstract does not disclose exact benchmark scores, model sizes, or training compute.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes on curvature regularization, hidden-state distillation, and no inference overhead. HKR-R is modest for fine-tuning/model-merge cost; HKR-H fails because the title is specialist jargon and the summary gives no benchmark numbers.
editor take
This distills linear fine-tuning arithmetic into a non-linear student; scores and model sizes are undisclosed, so treat it as a merging/unlearning lead.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer
IVF-TQ replaces residual codebooks with a fixed random rotation and Lloyd-Max scalar quantizer; across three 10M datasets and nine controlled cells, it keeps streaming recall drift between -0.80 and +0.56 percentage points without per-dataset bit-budget tuning or compression retraining.
#Embedding#Inference-opt#Benchmarking#IVF-TQ
why featured
HKR-K is solid and HKR-R reaches RAG/vector-DB infra teams. The arXiv-only method lacks code, deployment proof, or broad-source pickup, so its niche technical burden keeps it in the 60–71 band.
editor take
IVF-TQ caps recall drift at -0.80 to +0.56pp across nine 10M-scale cells; learned residual codebooks look stale for streaming ANN.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
DiLaDiff proposes three components for masked diffusion language models: a continuous semantic latent space, a latent diffusion prior, and consistency distillation; the abstract says it outperforms the masked diffusion baseline and significantly accelerates inference, but it does not disclose benchmark names or numeric speedups.
#Reasoning#Inference-opt#DiLaDiff#Research release
why featured
HKR-K has concrete mechanisms and HKR-R touches inference cost, but the post only gives abstract-level claims with no speedup, model scale, or benchmark detail. This stays in the 60–71 research band.
editor take
DiLaDiff adds 3 parts to masked diffusion LMs; no benchmarks or speedup numbers are disclosed, so discount the “significant” claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
MedExpMem: Adapting Experience Memory for Differential Diagnosis
MedExpMem lets VLM-based diagnostic agents store failure-derived differential notes, and on a radiology benchmark spanning 11 subspecialties, it reports accuracy gains up to 7.0% across models and scales.
#RAG#Vision#Memory#Qianhan Feng
why featured
HKR-K is clear: failure-experience memory and a reported +7.0% across 11 radiology subspecialties. HKR-H is weak, and no code, deployment, or major-lab signal is disclosed, so it stays in the 60–71 band.
editor take
MedExpMem reports up to 7.0% across 11 radiology subspecialties; failure memory is sane, but clinical safety remains undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
D2 Actor Critic: Diffusion Actor Meets Distributional Critic
D2AC introduces a model-free reinforcement learning algorithm for online diffusion policies, using a distributional critic fused with clipped double Q-learning, and reports state-of-the-art results on 18 hard RL tasks including Humanoid, Dog, and Shadow Hand, with code released on GitHub.
#Robotics#Reasoning#Code#D2AC
why featured
HKR-K passes with a concrete mechanism, 18-task benchmark claim, and code. HKR-H and HKR-R are weak, and the arXiv RL-algorithm format has a high accessibility bar, so it stays in the 60–71 signal band.
editor take
D2AC claims SOTA on 18 hard RL tasks; I’d verify runs first, online diffusion-policy RL has plenty of benchmark theater.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection
The paper proposes random feature selection as a baseline for unsupervised feature selection, and reports that many state-of-the-art methods are outperformed by the random baseline in both performance and efficiency.
#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R pass, but this is a specialized ML evaluation paper and the body does not disclose method names, datasets, or effect sizes. Useful signal, not a featured industry story.
editor take
Random feature selection beats multiple SOTA methods; dataset counts are undisclosed. Unsupervised feature selection needs this sanity check before new acronyms.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Paper Evaluates TabPFN Performance on Insurance Pricing Tasks
The paper evaluates TabPFN on two public MTPL datasets against GLM and XGBoost, and finds that it does not consistently outperform the baselines, has substantially longer inference times, and is sensitive to the in-context training set size.
#Inference-opt#Benchmarking#TabPFN#XGBoost
why featured
HKR-H/K/R pass: a concrete benchmark pushes back on TabPFN hype with two MTPL datasets and classic baselines. The insurance-pricing niche keeps it in the 60–71 band, not featured.
editor take
TabPFN fails to consistently beat GLM and XGBoost on 2 MTPL datasets; foundation-model hype hits actuarial pricing friction.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
FIRMA: Fibonacci Ring Model Aggregation for Privacy-Preserving Federated Learning
FIRMA proposes three server-free ring federated learning protocols with private classification heads and Fibonacci-weighted neighbor blending; across 28 experimental configurations, the full fibflpp system beats FedAvg in all 12 label-skew settings, with a peak +20.7 percentage-point gain on CIFAR-10 at K=1.
#Fine-tuning#Safety#Benchmarking#FIRMA
why featured
HKR-H comes from the Fibonacci ring setup, and HKR-K has concrete protocol counts, test configs, and a +20.7pp result. The federated-learning protocol angle is research-heavy, so it stays in all.
editor take
fibflpp beats FedAvg in 12/12 label-skew runs; privacy here is private heads, not a secure aggregation replacement.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
LLM-driven design of physics-constrained constitutive models: two agents are better than one
The paper introduces a Creator-Inspector two-agent pipeline for CANN constitutive model generation, where proposals are checked against nine physical constraints; the Inspector raises valid exported models from 91% to 100% for Claude Opus 4.7 and from 37% to 56% for Kimi K2.5.
#Agent#Code#Benchmarking#Claude Opus
why featured
HKR-H and HKR-K pass: the dual-agent inspection setup and pass-rate gains are concrete. The constitutive-modeling domain is too narrow for broad practitioner resonance, so technical-accessibility drag keeps it below featured.
editor take
Two agents push Opus from 91% to 100%; Kimi lands at 56%, so inspection doesn’t rescue a weak backbone.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
What Linear Probes Miss: Multi-View Probing for Weight-Space Learning
Eunwoo Heo and two coauthors introduce MVProbe, a weight-space probing framework that fuses first-order signals with Gram-based interaction views. The ICML 2026 paper says MVProbe outperforms ProbeX on Model Jungle across ResNet, SupViT, MAE, DINO, and Stable Diffusion LoRA adapters, but the abstract does not disclose exact score margins.
#Benchmarking#Interpretability#Eunwoo Heo#Kyeongkook Seo
why featured
HKR-K is supported by the MVProbe mechanism and ProbeX comparison, and HKR-H has a modest title hook. The weight-space probing angle is specialized, with no disclosed engineering impact, so it stays in the 60–71 all band.
editor take
MVProbe beats ProbeX on Model Jungle, but margins are undisclosed; Gram views make sense, not a weight-audit solution yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
ImProver 2 research on neurosymbolic proof optimization released
ImProver 2 optimizes formal proofs in Lean 4 with an expert-iteration pipeline and neurosymbolic scaffold, and its 7B-parameter model outperforms much larger models in the same family while matching mid-tier frontier models across structural proof metrics.
#Reasoning#Code#Benchmarking#ImProver 2
why featured
HKR-H and HKR-K pass: iterative proof optimization is a real hook, with Lean 4, a 7B model, and metric comparison. The formal-proof niche keeps it in the 60–71 band, below featured.
editor take
ImProver 2 trains a 7B Lean 4 proof optimizer; baselines are undisclosed, so treat “frontier-competitive” as pending replication.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Decomposing MXFP4 Quantization Error for LLM Reinforcement Learning
The paper decomposes MXFP4 quantization error into scale bias, deadzone truncation, and grid noise, then applies targeted corrections that recover BF16 accuracy within 0.7% on Qwen2.5-3B and exceed BF16 by 1.0% on Qwen3-30B-A3B-Base.
#Reasoning#Inference-opt#Fine-tuning#Qwen
why featured
HKR-K is clear: the paper decomposes MXFP4 error into three terms and reports Qwen2.5-3B/Qwen3-30B-A3B-Base results. HKR-R is cost/accuracy relevant, but the quantization-RL depth keeps it in the lower band.
editor take
Qwen2.5-3B and Qwen3-30B hit BF16±1% with three MXFP4 fixes; far sturdier than generic “4-bit training works” claims.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models
The paper proposes DDE, a compact trainable coordinator that combines denoised outputs from pre-trained diffusion models, and evaluates it on long audio track generation and conditional image generation.
#Multimodal#Audio#Vision#Research release
why featured
HKR-K passes with a concrete method and two evaluation settings: long audio and conditional image generation. HKR-H and HKR-R are weak; this is a single arXiv method paper without visible product impact or strong benchmark numbers.
editor take
DDE coordinates pretrained diffusion outputs with a compact net, but parameter count is undisclosed; long-audio extrapolation is nice, if baselines are fair.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
MirrorCheck: Efficient Adversarial Defense Method for Vision-Language Models
MirrorCheck detects adversarial attacks on vision-language models by regenerating images with T2I models and comparing feature-space embeddings; the arXiv abstract covers unimodal and multimodal settings but does not disclose specific benchmark numbers.
#Multimodal#Vision#Safety#MirrorCheck
why featured
HKR-K/R pass via the T2I-regeneration mechanism and multimodal safety relevance. HKR-H is weak, and the abstract lacks accuracy, overhead, or dataset details, so it stays in the 60–71 research-signal band.
editor take
MirrorCheck randomizes T2I and encoders for detection; no benchmark numbers are disclosed, so I’d treat it as a costly defense sketch.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
The paper proposes Relay, a per-token differentiable channel for Masked Diffusion Models, and scales it to Fast-dLLM v2, where coding-task inference latency drops by up to 32% while outperforming standard supervised fine-tuning.
#Inference-opt#Fine-tuning#Code#Fast-dLLM v2
why featured
HKR-K is clear and HKR-R has a cost hook; HKR-H misses. The paper gives a 32% latency figure and mechanism, but discrete-diffusion scope is narrow and industry impact is not shown.
editor take
Relay cuts Fast-dLLM v2 coding latency by 32%; I buy it, because MDMs wasting hidden state was always odd.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
GEMQ assigns expert-level bit-widths for MoE LLMs using global linear programming and router fine-tuning, then refines allocation through progressive quantization; the abstract says it reduces memory and accelerates inference with minimal accuracy loss, but the RSS snippet does not disclose compression ratios, speedup numbers, or benchmark scores.
#Inference-opt#Fine-tuning#GEMQ#Research release
why featured
HKR-K comes from the global-LP plus router-tuning mechanism, and HKR-R hits MoE serving cost. No compression, latency, or benchmark numbers are disclosed, so this stays in all.
editor take
GEMQ uses global linear programming for expert bit-widths; no compression or speedup numbers are disclosed, so park it as reproducibility bait.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning
GILT uses a token-based framework to unify node, edge, and graph classification for graph in-context learning on numerical features; the paper says it beats LLM-based or tuning-based baselines in few-shot settings, but the snippet does not disclose exact scores.
#Reasoning#GILT#Research release#Open source
why featured
HKR-H and HKR-K pass: the anti-LLM framing is clickable and the mechanism is concrete. Missing benchmark numbers and niche graph-ICL scope keep it in the 60–71 band.
editor take
GILT unifies node, edge, and graph classification, but exact scores are missing; LLM-free graph ICL is plausible, not proven here.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
XATTNMARK uses partial generator-detector parameter sharing, cross-attention, temporal conditioning, and a psychoacoustic time-frequency masking loss for audio watermarking; the arXiv abstract claims state-of-the-art detection and attribution under audio transformations, including generative editing at varying strengths.
#Audio#Safety#XATTNMARK#WavMark
why featured
HKR-K and HKR-R pass via concrete watermarking mechanisms and provenance value. HKR-H is weak, and a single arXiv paper without deployment or major-lab backing stays in the 60-71 band.
editor take
XATTNMARK claims SOTA detection and attribution, with no RSS metrics; I’m skeptical until generative-edit stress curves show up.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Task-Awareness Improves LLM Generations and Uncertainty
The paper models LLM outputs in a task-dependent latent structure and computes Bayes-optimal responses with a dissimilarity measure; the abstract says these responses outperform beam search across tasks, but the post does not disclose benchmark numbers.
#Reasoning#Benchmarking#Research release
why featured
HKR-K/R pass: the paper gives a decoding and uncertainty mechanism and claims multi-task gains over beam search. No benchmark numbers are disclosed, and HKR-H is weak, so it stays in the 60–71 all band.
editor take
The paper claims latent-structure decoding beats beam search; no benchmark numbers in RSS, so I file it as structured-output postprocessing.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees
DSR decomposes mathematical statements into logical components and maps them to operator trees, outperforming baselines under equal compute on PRIME, a benchmark of 156 undergraduate and graduate-level Lean 4 theorems.
#Reasoning#Code#Benchmarking#DSR
why featured
HKR-K passes via a concrete operator-tree mechanism and PRIME-156 Lean 4 result. HKR-H/R are weak, and Lean autoformalization is narrow for general AI practitioners, so this sits in the all band.
editor take
DSR beats baselines on 156 PRIME theorems; I buy operator trees, but this is too small to crown Lean automation.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Safe Reinforcement Learning with Preference-based Constraint Inference
The paper proposes PbCRL to infer safety constraints from preference data; the method adds a dead-zone mechanism, an SNR loss, and two-stage training, while the RSS snippet does not disclose the number of experiments.
#Reasoning#Safety#Alignment#Research release
why featured
HKR-K and HKR-R pass: the paper gives concrete mechanisms for preference-based constraint inference and touches safety/alignment. HKR-H is weak, and no experiment count or production-level claim is disclosed.
editor take
PbCRL infers safety constraints from preferences, but experiment count is undisclosed; I buy the BT critique, not the SOTA claim yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
FusionSense: Tri-Stage Near-Sensor Learning for Runtime-Adaptive Multimodal Edge Intelligence
FusionSense applies tri-stage near-sensor learning to an RGB+Depth/LiDAR SynDrone setup, cutting energy by up to 33x at 1% FoI prevalence and reducing quality loss by 92.3% at a fixed 30% data reduction.
#Multimodal#Inference-opt#Sanggeon Yun#Mohsen Imani
why featured
HKR-K is solid via mechanism and numbers; HKR-R lands on edge inference cost. The arXiv systems angle is specialized and lacks product or flagship-model spillover, so it stays in the 60–71 band.
editor take
FusionSense cuts energy 33x on SynDrone dual-modal sensing; the catch is 1% FoI prevalence, so deployment lives or dies on drift.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Steered Generation via Gradient-Based Optimization on Sparse Query Features
The paper introduces Prototype-Based Sparse Steering, which trains Sparse Autoencoders on attention query activations and uses gradient-based optimization at inference to align sparse features with target prototypes, then validates the method on Textualized Gridworld planning constraints and an educational feedback task using Bloom’s Taxonomy.
#Reasoning#Interpretability#Inference-opt#Research release
why featured
HKR-K is solid: Prototype-Based Sparse Steering and two evaluation settings are disclosed. HKR-R is present for controllability, but HKR-H is weak and the scope stays niche research.
editor take
The paper steers query activations with SAEs at inference; no model or overhead disclosed, so the control idea is cleaner than the engineering case.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases
RelPrism builds pseudo-task pools from intrinsic, relational, and hybrid attributes for relational database pre-training; across 14 tasks on 5 real-world datasets, it improves classification ROC-AUC by 4.15% and reduces regression MAE by 10.75% versus state-of-the-art baselines.
#Embedding#Benchmarking#RelPrism#arXiv
why featured
HKR-K passes: RelPrism discloses a self-generated pseudo-task mechanism and concrete benchmark gains. The scope is relational-database pretraining research, not a product or foundation-model event.
editor take
RelPrism wins 4.15% AUC across 14 tasks; I’d stress-test whether pseudo-task pools just move RDB tuning pain upstream.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models
Complete-muE transfers hyperparameters from one dense reference to MoE configurations through two bridges: active-width μP with normalized router scale, and activated-expert scaling with first-order SDE LR/WD correction canceled; the paper reports language and diffusion pretraining experiments where optima stay relatively stable across architecture and parameter-count changes, with only minor residual σ0 drift.
#Inference-opt#Benchmarking#Complete-muE#Research release
why featured
HKR-K/R pass: the two-bridge transfer and scaling conditions add concrete signal, and MoE tuning cost resonates. The arXiv paper is narrow and not clicky, so it stays in the 60–71 band.
editor take
Complete-muE maps dense hyperparams to MoE via two bridges; I buy the pain point, but “tune once” needs code and scale tables.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Lost in the Folds: When Cross-Validation Is Not a Deep Ensemble for Uncertainty Estimation
The paper compares a standard 5-fold CV ensemble with a 5-member deep ensemble on three multi-rater segmentation datasets across three modalities. Deep ensembles matched segmentation accuracy and improved calibration and failure detection, while CV ensembles sometimes correlated more strongly with inter-rater variability.
#Benchmarking#nnU-Net#Research release#Benchmark
why featured
HKR-H/K pass: the paper tests a common ensemble shortcut with a 5-fold vs 5-member setup across 3 datasets. Its scope is narrow segmentation uncertainty, so it stays in the 60–71 band.
editor take
5-fold CV posing as DE is sloppy; across 3 datasets, use 5-seed DE for reliability and CV for rater ambiguity.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling
SemiPrune uses a small randomly labeled subset to generate pseudo-labels for unlabeled data, then estimates example difficulty from pseudo-label-driven training dynamics to select a coreset. The paper reports state-of-the-art results against label-free and label-efficient baselines on domain-specific, image-corrupted, and long-tailed datasets, but the snippet does not disclose label ratios or pruning rates.
#Benchmarking#Research release#Benchmark
why featured
HKR-K and HKR-R pass: the paper gives a concrete semi-supervised pruning mechanism and touches labeling cost. HKR-H fails, and the post does not disclose label ratios, pruning rates, or result numbers, so it stays in all.
editor take
SemiPrune discloses only a small labeled subset; without label ratios or pruning rates, I treat the SOTA claim as abstract-level.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Reflex: Reinforcement Learning with Reflection Symmetry Exploitation in State-Based Continuous Control
Reflex integrates axial and bilateral reflection symmetry into PPO and SAC for state-based continuous control, and the paper evaluates it on OpenAI Gym and DeepMind Control benchmarks with reported sample-efficiency gains over standard baselines.
#Reasoning#Robotics#Benchmarking#OpenAI
why featured
HKR-K passes with a concrete algorithmic mechanism and benchmark setting; HKR-H and HKR-R are weak. This is useful RL research, but the path to practitioner impact is narrow, so it stays in the 60–71 band.
editor take
Reflex adds reflection symmetry to PPO and SAC; gains lack numbers, but state control beats another image-rotation RL trick.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking
UniScale combines ES³ sample construction with an HHSFT fusion transformer for e-commerce search ranking, and online A/B tests on a large e-commerce search platform show a 1.70% purchase increase and a 2.04% GMV lift.
#Reasoning#Benchmarking#UniScale#ES³
why featured
HKR-K passes on ES³/HHSFT and A/B lift. HKR-H/R stay weak because this is a specialized e-commerce ranking paper, not a model release, tool, or broad AI workflow story.
editor take
UniScale lifts purchases 1.70% and GMV 2.04% online; I buy the data-scaling angle, but traffic, duration, and significance are undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Adaptive Mass-Segmented KV Compression for Long-Context Reasoning
The paper proposes AMS KV Compression, which partitions KV cache by attention-mass distribution and uses EMA smoothing instead of global Top-k eviction, with experiments on MATH500, AIME, GSM8K, code completion, open-domain QA, and sparse retrieval.
#Reasoning#Inference-opt#Code#vLLM
why featured
HKR-K comes from a testable KV-compression mechanism and MATH500/AIME/GSM8K conditions; HKR-R comes from long-context inference cost pressure. No effect sizes or product path are disclosed, so it stays in 60-71.
editor take
AMS preserves KV by attention-mass segments; no compression ratio disclosed, so don’t price “reasoning survives” as serving win yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
An Open-Source Training Dataset for Foundation Models for Black-box Optimization
The paper introduces BBO-Pile, an open-source dataset with over 500,000 optimization trajectories across 3,095 black boxes and different optimizers. The authors train foundation models from 2M to 80M parameters on 200M to 2B tokens, then study compute scaling for imitating black-box optimization methods.
#Benchmarking#BBO-Pile#arXiv#Research release
why featured
HKR-K passes on dataset scale and scaling setup; HKR-H and HKR-R miss because this is a niche BBO dataset paper without product impact or a broad practitioner nerve.
editor take
BBO-Pile ships 500K trajectories; reproducibility improves, but 80M models still need proof against tuned BBO baselines.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Diffusion and Flow Matching Models for Tabular Data: A Survey
The survey reviews tabular diffusion and flow matching research from June 2015 to May 2026, covering synthesis, missing-value imputation, anomaly detection, privacy, fairness, benchmarking, and constraint-aware generation; the abstract says the authors maintain updates in a GitHub repository.
#Benchmarking#arXiv#GitHub#Research release
why featured
HKR-K passes because the survey has a defined 2015–2026 scope and concrete application areas. HKR-H and HKR-R are weak: no new model, test result, or production-impact claim, so this stays in the lower research-survey band.
editor take
This survey covers June 2015 to May 2026. Tabular generation needs shared evals before another CTGAN-vs-diffusion leaderboard.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
The paper evaluates Android malware detection robustness across more than a decade of app slices, comparing same-year, cross-year, and expanding-window deployment protocols, and generating adversarial examples with FGSM and SPSA under feasibility constraints.
#Safety#Benchmarking#arXiv#Research release
why featured
HKR-K has concrete experimental setup and HKR-R touches security robustness. The Android malware focus is niche and technical, with no broad AI product or model impact, so it stays in all.
editor take
A decade-plus Android split hurts adversarial robustness; FGSM/SPSA feature-space attacks limit extrapolation to end-to-end detectors.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
MELT: A Behavioral Trace Dataset for High-Risk Memecoin Launch Detection
MELT covers more than 41,000 Solana memecoin launches and parses over 200 million transactions into typed behavioral records, providing 122 behavioral features and risk-level labels for supervised high-risk launch detection.
#Benchmarking#MELT#Solana#Research release
why featured
HKR-H and HKR-K pass: the crypto-fraud angle is unusual and the dataset numbers are concrete. HKR-R is weak because this is niche on-chain risk research, not a core AI product, model, or competition story.
editor take
MELT covers 41k launches and 200M transactions; its 36.5% bundled-supply signal beats rug-pull labels for live risk filters.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures
The researchers built an in silico sandbox for bi-planar X-ray-guided spine procedures and trained imitation-learning policies for visual planning and open-loop cannula control; the policy succeeded on the first attempt in 68.5% of cases, while entry-point precision remained a reported limitation.
#Robotics#Vision#Benchmarking#Research release
why featured
HKR-H/K/R all pass via the autonomous spine-procedure hook, concrete 68.5% result, and safety angle. The arXiv medical-robotics focus keeps it below featured for a general AI-practitioner feed.
editor take
The policy hits 68.5% first-try success, but entry precision lags; spine robotics still needs hard constraints before closed-loop trust.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting
Super-Linear replaces deep forecasting architectures with frequency-specialized linear experts and a lightweight spectral gate; the arXiv abstract says the implementation is available on GitHub, but it does not disclose model size or benchmark scores.
#Benchmarking#Super-Linear#Chronos#Time-MoE
why featured
HKR-K passes via a concrete architecture and open-source code, but HKR-H and HKR-R miss: no benchmark numbers, deployment claim, or major-lab context. This stays in the lower interesting band.
editor take
Super-Linear swaps deep TSF models for frequency-linear experts; no sizes or scores disclosed, so don’t crown it over Chronos yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Debiased Negative Mining Improves OOD Detection with Pre-trained Vision-Language Models
The paper proposes a debiased negative mining framework for OOD detection with pre-trained VLMs, converting bias correction into Monte Carlo sampling over ID labels and unlabeled corpus data; the abstract says experiments reach state-of-the-art across multiple OOD setups and the code is public.
#Vision#Multimodal#Benchmarking#Research release
why featured
HKR-K passes via a concrete debiased negative-mining mechanism, and HKR-R passes for VLM deployment reliability. HKR-H fails; this is a narrow single arXiv paper with no industry event, so it stays in the 60-71 band.
editor take
This turns VLM OOD negative-label bias into Monte Carlo sampling; gains are undisclosed, so don’t buy the SOTA line yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Assessing Predictive Models for Fairness Based on Movement Patterns
The paper proposes evaluating spatial fairness in predictive models using individuals’ movement patterns, not single residence locations; its method maps movements across multiple spatial partitions and applies a spatial scan statistic, with experiments on thousands of synthetic unfair datasets testing detection and localization performance.
#Safety#Benchmarking#arXiv#Research release
why featured
HKR-K passes because the method and test setup are concrete; HKR-H and HKR-R are weak due to an academic title and narrow application. No hard exclusion, so this stays in all.
editor take
This extends spatial fairness from residence to movement traces; thousands of synthetic tests pass, but real data and false positives are undisclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Eye Gaze-Informed and Context-Aware Pedestrian Trajectory Prediction in Shared Spaces with Automated Shuttles
The study collected synchronized motion, eye-gaze, and head-orientation data in a VR setup with automated shuttles, and its multimodal model reduced final displacement error by 8.47% when combining gaze with situational context.
#Multimodal#Robotics#GazeX#Research release
why featured
HKR-K passes via the 8.47% final-displacement-error drop and gaze/context fusion mechanism. HKR-H and HKR-R are weak because the work is a narrow automated-shuttle trajectory paper, so it sits in the 60-71 band.
editor take
GazeX cuts FDE by 8.47% in VR; with only 45/90/135° approaches and 3/5s gaps, curbside transfer is unproven.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Certified Per-Instance Unlearning Using Individual Sensitivity Bounds
The paper proposes certified machine unlearning with per-instance noise calibration, derives high-probability individual sensitivity bounds for ridge regression trained via Langevin dynamics, and reports experiments in linear settings plus empirical evidence in deep learning settings.
#Alignment#Safety#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete certified-unlearning mechanism, but the article is theory-heavy and discloses no production replacement or artifact. Defaulting to the lower mid band.
editor take
Per-instance unlearning cuts worst-case noise; the proof covers ridge-regression Langevin, while deep learning is still empirical.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
The paper introduces VI-CuRL, a verifier-independent curriculum RL framework that uses intrinsic model confidence to prioritize high-confidence samples, reduce action and problem variance, prove asymptotic unbiasedness for its estimator, and outperform verifier-dependent and verifier-independent baselines on math and general reasoning benchmarks with and without verifiers.
#Reasoning#Alignment#Benchmarking#VI-CuRL
why featured
HKR-K and HKR-R pass: verifier-free RL reasoning targets a real training-cost pain point and names a confidence-guided curriculum mechanism. HKR-H is weak, and the post gives no metric gains, so this stays in the normal research band.
editor take
VI-CuRL uses intrinsic confidence for verifier-free RL curricula; only the abstract is shown, no scores, so don’t buy the verifier-beating claim yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Uncovering the Latent Potential of Deep Intermediate Representations
The paper introduces LOES and GeoReg to select task-discriminative layers across multiple architectures, modalities, depths, and data regimes; the abstract does not disclose specific models, datasets, or numerical gains.
#Embedding#Fine-tuning#Interpretability#Research release
why featured
HKR-K passes via LOES, GeoReg, and a testable layer-selection mechanism across architectures and modalities. HKR-H/R are weak, and the abstract gives no models, datasets, or gains, so it stays in the lower research-release band.
editor take
LOES picks discriminative layers spectrally, GeoReg constrains class geometry; no models or gains disclosed, so treat as a hypothesis.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
SeedER: Seed-and-Expand Retrieval from Knowledge Graphs
SeedER seeds core KG nodes with lightweight dense and entity-based retrieval, then expands them with a reinforcement-learned graph-aware policy; the abstract does not disclose recall numbers, candidate-set sizes, datasets, or runtime costs.
#RAG#Reasoning#Embedding#SeedER
why featured
HKR-K passes: SeedER’s seed-then-RL-expand retrieval flow gives RAG/KG readers a concrete mechanism. HKR-H and HKR-R miss because no recall numbers, candidate scale, datasets, or deployment stakes are disclosed.
editor take
SeedER splits KG retrieval into seeding plus RL expansion; I buy the route, but recall, candidate size, datasets are undisclosed.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination
Dream-MPC optimizes a few policy-rolled trajectories with gradient ascent through a learned world model, reuses previously optimized actions over time, and outperforms gradient-free MPC and state-of-the-art baselines on 24 continuous control tasks.
#Robotics#Reasoning#Dream-MPC#Research release
why featured
HKR-K passes via the 24-task setup and gradient-based MPC mechanism. HKR-H/R are weak, and latent-control MPC is niche for general AI practitioners, so this stays in the low-60s.
editor take
Dream-MPC wins across 24 continuous-control tasks; gradient planning looks alive again, but real-robot latency is undisclosed.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Hinge Regression Trees and HRT-Boost: Newton-Optimized Oblique Learning for Compact Tabular Models
The paper introduces HRT and HRT-Boost, reformulating oblique splits as nonlinear least squares over two linear predictors, with an O(δ²) approximation rate, an empirical risk reduction guarantee under squared loss, benchmark comparisons, and public code at the GitHub repository disclosed in the abstract.
#Benchmarking#Code#Hongyi Li#Research release
why featured
HKR-K is solid: a new algorithm, guarantees, and code. HKR-H/R are weak because compact tabular-model optimization is narrow and not an industry conversation driver, so this stays in all.
editor take
HRT-Boost claims O(δ²) approximation and squared-loss risk descent; I’d trust it after node-count wins over CatBoost.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Building a Privacy-Preserving Federated Recommender System for Mobile Devices
The paper presents a two-stage federated recommender pipeline: the cloud ranks candidates from non-sensitive app-context data, the device re-ranks them with sensitive mobile signals, and only updates or gradients leave the device, with validation on three datasets.
#Fine-tuning#MovieLens#UCI#Research release
why featured
HKR-K passes: the two-stage federated recommender design and 3-dataset validation add concrete information. HKR-H and HKR-R are weak, so it stays in the lower all band.
editor take
The paper validates on 3 datasets; the Kotlin library is practical, but accuracy, latency, and privacy budget are undisclosed.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation
B-GRTO reuses GRPO rollouts to train a segmentation decoder alongside the policy. Across three referring segmentation settings, it improves over plain GRPO and matches or exceeds domain-specific state-of-the-art methods.
#Vision#Reasoning#Tools#Research release
why featured
HKR-K passes on a concrete training mechanism and 3 referring-segmentation settings. HKR-H/R are weak, and the niche vision-training focus limits general-practitioner relevance, so it stays in the upper low-value band.
editor take
B-GRTO reuses GRPO rollouts for the segmentation decoder across 3 referring-segmentation settings; scores aren’t disclosed, but tool gradients inside RL are practical.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Curriculum Reinforcement Learning with Measurable Task Representation Learning
The paper proposes a curriculum reinforcement learning method that uses a variational autoencoder to encode rewards and state transitions into a measurable latent task space. The method generates tasks increasingly similar to the target task and reports stronger results than interpolation-based and GAN-based CRL baselines on challenging navigation tasks.
#Agent#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: the abstract gives a concrete VAE mechanism and automatic curriculum generation in navigation tasks. HKR-H/R are weak, so this stays as a niche RL research item below featured.
editor take
VAE encodes rewards and transitions for curricula; I buy the direction, but distance fidelity beyond navigation is undisclosed.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Contrast to Detect: Dynamic Graph Contrastive Regularization for Unsupervised Anomaly Detection in Multivariate Time Series
ContrastAD reports the highest mean F1 across five real-world multivariate time-series benchmarks and the top AUC on three datasets: SWaT 93.60, SMD 98.66, and PSM 97.79.
#Benchmarking#ContrastAD#Research release#Benchmark
why featured
HKR-K passes on concrete benchmark claims and a named mechanism. HKR-H/R are weak: this is a narrow research paper with no product, code, or production-replacement evidence, so it stays below the 60 band.
editor take
ContrastAD leads mean F1 on 5 MTS benchmarks; I want thresholding details and DTW batch-graph cost, undisclosed here.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
A Simple Plug-in for Improving Eviction-Based KV Cache Compression
VECTOR adds three-way token routing to eviction-based KV cache compression: retention, approximation, and eviction; the abstract reports better quality-memory trade-offs under medium-to-high compression, but the RSS snippet does not disclose model names, datasets, or numerical gains.
#Inference-opt#VECTOR#Research release
why featured
HKR-K/R pass: the routing mechanism matters for KV-cache compression and inference cost. Missing model names, compression ratios, and metrics keep it below featured despite practical relevance.
editor take
VECTOR adds retain/approximate/evict routing, but the snippet gives no models or numbers; treat it as a KV-cache eviction patch for now.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
RADAR: Relative Angular Divergence Across Representations
RADAR estimates cross-domain transferability by measuring angular alignment and distance changes along layer-to-layer representation trajectories, and the paper evaluates it against existing transferability metrics on multiple text embedding and foundation vision benchmarks.
#Embedding#Vision#Benchmarking#Research release
why featured
HKR-K passes via a concrete transferability metric tested on text embeddings and vision models. HKR-H/R are weak, and the work is niche representation analysis rather than broad practitioner news, so it stays in the 40–59 band.
editor take
RADAR scores transfer via layerwise geometry, but no benchmark numbers are disclosed; I buy the angle, not the smooth-domain caveat.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Advanced AI Service Provisioning in O-RAN through LLM Engine Integration
The paper presents a Dual-Brain architecture for O-RAN: an LLM orchestrator turns operator intents into data-collection policies and deployment code, while NeuralSmith trains lightweight classifiers on demand through an API, with the provisioning workflow tested in a containerized O-RAN 5G SA testbed.
#Agent#Code#Tools#O-RAN
why featured
HKR-K passes through a concrete Dual-Brain mechanism and testbed; HKR-H/R miss. The O-RAN 5G specialty barrier limits relevance for general AI practitioners, so it stays in the lower research-signal band.
editor take
Dual-Brain runs provisioning in a containerized O-RAN 5G SA testbed; I buy the split, but latency and isolation are undisclosed.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
CALAD: Channel-Aware Contrastive Learning for Multivariate Time Series Anomaly Detection
CALAD uses reconstruction errors from a transformer-based autoencoder to estimate channel relevance, then builds positive and negative samples by preserving or perturbing anomaly-relevant channels; the paper reports stronger results than existing methods on multiple real-world datasets, especially under distribution shift.
#Embedding#Benchmarking#CALAD#Research release
why featured
HKR-K passes for a concrete mechanism and evaluation setting. HKR-H/R are weak: this is a niche time-series anomaly-detection paper with no product, agent, or foundation-model impact, so it stays in the low browseable band.
editor take
CALAD selects channels via reconstruction error; dataset counts are undisclosed. I buy the bias, not the distribution-shift claim yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization
The paper introduces the Fine-Badminton dataset and DSTA for badminton temporal action localization, covering 31 matches, 29 stroke classes, 2,104 rallies, and 27,597 annotated actions.
#Vision#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes with concrete dataset scale and labels. HKR-H/R are weak: this is a narrow vision benchmark with no product, agent, or foundation-model impact, so it fits the 40–59 browseable band.
editor take
Fine-Badminton labels 27,597 actions; I buy the dataset, while DSTA’s SOTA margin is undisclosed.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
MARS: Magnitude-Aware Rank Statistics
The paper proposes MARS, a magnitude-aware rank statistic that weights discrete ranks with a relative margin coefficient; it targets magnitude-blindness in Critical Difference diagrams by scaling ranks using the distance between the best and worst performers.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes for a concrete benchmarking-statistics mechanism, but HKR-H/R are weak. The post discloses only the method summary, with no experiment scale or industry implication, so it stays in the lower band.
editor take
MARS reweights CD ranks by best-worst gaps; I buy the flaw, not the “more realistic” claim without reported experiments.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
World Machine: Towards Generative World Modeling for Time-Series
World Machine proposes a transformer-based time-series world-modeling architecture with latent states and validates it on the synthetic Toy1D dataset; the abstract says it adapts to different observed data amounts and contexts, but the post does not disclose concrete metrics.
#Reasoning#Benchmarking#World Machine#Research release
why featured
HKR-K passes via the latent-state transformer and Toy1D setup; HKR-H and HKR-R are weak. No metrics or production setting are disclosed, so this stays in the lower research-signal band.
editor take
World Machine only reports Toy1D validation, with no metrics disclosed; the world-modeling pitch is big, but this reads like a sketch.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Enhancing Deep Neural Network Reliability with Refinement and Calibration
RefCal jointly optimizes calibration, refinement, and accuracy, reaching 58.81 accuracy, 95.67 refinement, and 0.08 ECE on CIFAR-100-LT with 10 percent class imbalance, compared with Correctness Ranking Loss at 46.27 accuracy, 93.7 refinement, and 0.22 ECE.
#Alignment#Safety#Benchmarking#Ramya Hebbalaguppe
why featured
HKR-K passes because the paper gives a method and test numbers; HKR-H and HKR-R fail because the framing is a narrow academic benchmark. No hard exclusion, but audience fit is limited.
editor take
RefCal hits 58.81 accuracy on 10% imbalanced CIFAR-100-LT; chasing low ECE alone should be retired.
HKR breakdown
hook knowledge resonance
open source
49
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Shallow ReLU^s Networks in L^p-Type and Sobolev Spaces: Approximation and Generalization
The paper analyzes shallow ReLU^s networks in L^p-type integral and Sobolev spaces, deriving approximation bounds via spherical harmonics and path-norm-regularized nonparametric regression rates including O(n^(-(d+2s+1)/(2d+2s+1)) log n) over B_s and O(n^(-2α/(2α+d)) log n) over W^{α,∞}.
#Reasoning#Benchmarking#arXiv#Research release
why featured
hard-exclusion-1 applies: the paper needs approximation theory, Sobolev spaces, and path-norm background with no generalist on-ramp. HKR-K passes on the stated rate, but accessibility caps it below 40.
editor take
Shallow ReLU^s gets Lp approximation O(m^-1/p). Useful theory; ℓ1 path-norm control is not an architecture trigger.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
When One Point Is Not Enough: Addressing Ambiguous Instances in Dimensionality Reduction by Splitting
The paper introduces a graph-based method that detects ambiguous instances in dimensionality reduction and replicates each instance as multiple projected points, with each copy placed in its corresponding neighborhood. The authors report UMAP-based experiments and quantitative analyses showing reduced partial neighborhood embedding, while stating the approach generalizes to other local graph-based dimensionality-reduction techniques.
#Embedding#Benchmarking#Research release
why featured
HKR-H and HKR-K pass, but this is a niche dimensionality-reduction visualization paper with no agent, product, or deployment angle. The body gives a method and UMAP result, not industry impact.
editor take
The paper splits ambiguous samples into multiple UMAP points; I buy the diagnosis, but copied points turn the map into an interpretation layer.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H1·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
X-TRACK: Physics-Aware xLSTM for Realistic Vehicle Trajectory Prediction
X-TRACK integrates vehicle kinematic constraints into xLSTM-based trajectory prediction and evaluates on two highway datasets, highD and NGSIM; the abstract says it beats state-of-the-art baselines on highD but does not disclose error metrics.
#Robotics#Benchmarking#X-TRACK#highD
why featured
HKR-K passes on a concrete mechanism and two datasets, but no error numbers are disclosed. HKR-H and HKR-R are weak, so this stays in all below the featured threshold.
editor take
X-TRACK reports highD and NGSIM only, with no error numbers disclosed; physics constraints sound sane, but don’t call this a driving breakthrough.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
CBANet: A Compact Attention-Based CNN-BiLSTM Network for Aggressive Driving Event Detection
CBANet detects aggressive driving events with a CNN-BiLSTM architecture, engineered vehicle-dynamics features, SMOTE-based oversampling, class-weighted loss, and class-specific threshold calibration; the paper reports higher minority-class recall and safety-critical F-score on a newly collected naturalistic driving dataset, but the RSS snippet does not disclose dataset size or metric values.
#Benchmarking#CBANet#Research release#Open source
why featured
This is an incremental applied ML paper: HKR-K passes on concrete mechanisms and dataset conditions, while HKR-H/R are weak. No hard exclusion applies, so it sits in the 40–59 low-value band.
editor take
CBANet claims better minority recall, but RSS gives no dataset size or scores; SMOTE plus threshold tuning needs harder evidence.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Interactive Query Answering on Knowledge Graphs with Soft Entity Constraints
The paper introduces query answering with soft constraints on incomplete knowledge graphs and proposes two lightweight methods; the methods tune only two parameters or train a small neural network, while the RSS abstract does not disclose specific benchmark scores.
#RAG#Reasoning#Research release#Benchmark
why featured
HKR-K passes on a new task and lightweight mechanisms. HKR-H/R are weak, and benchmark scores are not disclosed, leaving limited practical signal for AI practitioners.
editor take
Soft constraints enter KG QA with just two tuned parameters; without benchmark scores, don’t sell it as a RAG reasoning leap.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
Cascaded Transfer: Learning Many Tasks under Budget Constraints
The paper proposes Cascaded Transfer Learning, which cascades model parameters through a rooted task tree under a global training budget, and evaluates it on synthetic and real many-task settings, including time-series forecasting and image classification, against alternative approaches.
#Fine-tuning#Benchmarking#Research release
why featured
HKR-K lands because the paper names a concrete method: tree-structured parameter cascading under a global budget. HKR-H and HKR-R miss: no surprising result, no savings number, no product path; score stays in low all.
editor take
CTL routes fine-tuning through a task tree under one budget; no benchmark numbers disclosed, so treat it as scheduling work.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
GP2F: Cross-Domain Graph Prompting with Adaptive Fusion of Pre-trained Graph Neural Networks
GP2F proposes a dual-branch cross-domain graph prompting method: one frozen branch preserves pre-trained knowledge, one adapted branch uses lightweight adapters for task adaptation, and fusion is trained with contrastive and topology-consistent losses.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes for a concrete cross-domain graph-prompting mechanism, but HKR-H/R fail. This is niche GNN research with no product, agent, or industry-deployment hook, so it stays in the low-value all band.
editor take
GP2F uses dual-branch cross-domain GPL, but datasets and gains are undisclosed; honestly, beating FT/LP is just table stakes.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
15d ago
arXiv · cs.LG· atomEN04:00 · 05·25
PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows
PaP-NF aligns continuous time-series representations with a frozen LLM via Prefix-as-Prompt, then conditions a normalizing-flow decoder on LLM global context and evaluates predictive distributions with CRPS across multiple long-term forecasting benchmarks.
#Reasoning#Benchmarking#PaP-NF#Research release
why featured
HKR-K passes on the concrete method and CRPS setup; HKR-H/R are weak, and no benchmark numbers or release details are disclosed. This is a narrow time-series paper, so it sits in the low-value upper band.
editor take
PaP-NF freezes an LLM and adds flows, scored by CRPS; no model names or numbers, so don’t buy “LLMs understand time series” yet.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
02:28
15d ago
HuggingFace Papers (takara mirror)· rssEN02:28 · 05·25
Learning to Route Languages for Multilingual Policy Optimization
LRPO treats language as a selectable variable, generates multilingual rollouts for each training question, and uses a trainable multi-armed bandit router to choose languages under a fixed rollout budget.
#Fine-tuning#Alignment#Reasoning#Research release
why featured
HKR-K passes with a concrete LRPO mechanism for language routing in multilingual policy optimization. HKR-H and HKR-R are weak: the angle is academic and narrow, so it stays in all below featured.
editor take
LRPO routes language inside RL; gains aren’t disclosed, but bandit selection under a fixed rollout budget beats hard-coded English supervision.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
01:57
15d ago
HuggingFace Papers (takara mirror)· rssEN01:57 · 05·25
MATO: Multi-objective Personalized Alignment with Test-time Optimization for Large Language Models
MATO formulates personalized alignment as test-time optimization, using controllable weights during decoding to adjust multiple objectives without changing model parameters or requiring external reward models.
#Alignment#Inference-opt#MATO#Research release
why featured
HKR-K/R pass: the mechanism is concrete and relevant to personalization and inference control. No reported metrics, model scale, or reproducible setup are disclosed, so it stays in the 60–71 band.
editor take
MATO tunes objective weights at decoding, with no finetune or reward model; compute cost is undisclosed, so steerability isn’t free.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1

more

feeds

admin