ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-06-04

239 papers · updated 3m ago
2026-06-04 · Thu
17:59
4d ago
arXiv · cs.AI· atomEN17:59 · 06·04
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution
Code2LoRA uses a hypernetwork to generate repository-specific LoRA adapters for 604 Python repositories, reaching 63.8% cross-repo exact match on the static track and 60.3% on the evolution track with GRU state updates per code diff.
#Code#Fine-tuning#RAG#Code2LoRA
why featured
HKR-K is strong with a clear mechanism and numbers; HKR-R lands for code-model maintenance under repo evolution. It remains an arXiv research/benchmark item without major-tool adoption, so it fits the 60–71 band.
editor take
Code2LoRA hits 63.8% cross-repo EM on 604 Python repos; I buy it as an adapter factory, not a RAG replacement.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
17:59
4d ago
arXiv · cs.AI· atomEN17:59 · 06·04
TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies
TempoVLA controls a single VLA policy with an explicit speed condition, while VSTA re-times demonstrations by merging or splitting actions; experiments in simulation and real-world tasks show bidirectional speed control and improved default 1× performance.
#Robotics#Vision#Multimodal#TempoVLA
why featured
HKR-H/K pass: TempoVLA offers speed-conditioned control and a VSTA retiming mechanism across sim and real tasks. As a single robotics arXiv paper with limited entity pull and sparse reproducibility detail, it stays in all.
editor take
TempoVLA conditions one VLA on speed, but task counts and success rates are undisclosed; I buy the problem, not the evidence yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
17:58
4d ago
arXiv · cs.AI· atomEN17:58 · 06·04
Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection
OpAI-Bench constructs nine sequential revisions per human-written sample across five AI edit operations and four domains, preserving authorship provenance at document, sentence, token, and span levels for evaluating 8 document detectors, 7 sentence detectors, and 2 fine-grained detectors.
#Benchmarking#VILA-Lab#OpAI-Bench#Research release
why featured
HKR-K is solid because the benchmark has concrete structure; HKR-R applies through AI-text detection and provenance pressure. HKR-H is weak, and this is a single arXiv benchmark without adoption or cross-source pull.
editor take
OpAI-Bench makes 9 AI revision steps per human text; mixed-authorship middle states are where detector benchmarks break.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
17:57
4d ago
arXiv · cs.AI· atomEN17:57 · 06·04
Pretraining Recurrent Networks without Recurrence
The paper proposes Supervised Memory Training for nonlinear RNNs, reducing training to supervised one-step memory transition labels and using a Transformer encoder to obtain them, with an O(1) gradient path between any two tokens.
#Memory#Reasoning#Inference-opt#Research release
why featured
HKR-H comes from the paradox title, and HKR-K from SMT plus an O(1) gradient path. No benchmark, code, or measured Transformer replacement value is disclosed, so this stays in the all band.
editor take
SMT turns RNN training into one-step memory supervision with O(1) gradients; the catch is its Transformer labeler may eat the savings.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
17:56
4d ago
arXiv · cs.AI· atomEN17:56 · 06·04
RREDCoT: Segment-Level Reward Redistribution for Reasoning Models
The paper introduces RREDCoT, which redistributes rewards at the CoT segment level and uses the model itself to approximate the optimal allocation without extra generation during training.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K passes: the paper offers a testable training mechanism, but the feed lacks benchmark gains, model scale, or reproducible setup. It is narrow research, no hard-exclusion trigger, so it stays below featured.
editor take
RREDCoT pushes CoT rewards to segments without extra train-time generation; if variance drops cleanly, GRPO patches get copied fast.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
17:55
4d ago
arXiv · cs.AI· atomEN17:55 · 06·04
PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
The paper proposes PC Layer, a low-degree polynomial weight preconditioner that reshapes singular-value spectra during LLM pre-training, reports gains over standard Transformers in Llama-1B runs with AdamW and Muon, and merges the trained weights back into the original architecture with no inference overhead.
#Inference-opt#Llama#Research release#Open source
why featured
HKR-K/R pass: PC Layer has a concrete mechanism, and “merge after training with no inference overhead” maps to training-cost concerns. HKR-H is weak; no perplexity, token-cost, or wall-clock gains are disclosed, so it stays in 60–71.
editor take
PC Layer hits Llama-1B pretraining with AdamW/Muon; zero inference cost is nice, but gains are undisclosed—no free lunch yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
17:53
4d ago
HuggingFace Papers (takara mirror)· rssEN17:53 · 06·04
Research paper compares active exploration abilities of human adults and large language models
The paper compares adult participants with multiple large language models on a modified blicket detector task, where learners actively intervene under conjunctive or disjunctive causal rules. Active exploration improves adults’ conjunctive causal reasoning, but conjunctive rules still require more tests, while some state-of-the-art models approach human inference accuracy yet use less efficient exploration strategies.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R all pass, but this is a single cognitive-science-style LLM benchmark with no model list, sample size, or reproducibility detail disclosed; it stays below featured.
editor take
The paper tests adults and multiple LLMs on active blicket tasks; sample sizes are undisclosed. Human-like accuracy hides wasteful exploration.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:44
4d ago
HuggingFace Papers (takara mirror)· rssEN17:44 · 06·04
NF-CoT Enables Latent Reasoning with Normalizing Flows
NF-CoT inserts a TARFlow-style normalizing flow into the LLM backbone, replacing explicit CoT with continuous thoughts while preserving left-to-right sampling, KV-cache decoding, and exact likelihood estimation.
#Reasoning#Code#Inference-opt#Research release
why featured
HKR-H/K pass: the mechanism is novel and targets CoT replacement. HKR-R fails because the post gives abstract-level detail only, with no gains, code, or reproducible setup, so it stays in the 60–71 band.
editor take
NF-CoT keeps KV cache and exact likelihood; that beats vague latent-thought claims, but no pass-rate numbers are disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
17:42
4d ago
arXiv · cs.CL· atomEN17:42 · 06·04
USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding
USAD 2.0 integrates knowledge from SSL and supervised foundation models using domain-aware distillation, extends coverage to music, adds second-stage supervised distillation for downstream use, and scales the encoder to one billion parameters through depth scaling; experiments report strong or state-of-the-art results across probing and LLM-based evaluations, while the RSS snippet does not disclose datasets or exact benchmark scores.
#Audio#Embedding#Benchmarking#Research release
why featured
HKR-K passes on mechanism and 1B-parameter scale; HKR-H and HKR-R are weak. This is useful audio-understanding research, but lacks product impact, a major lab signal, or disclosed reproducible results, so it sits in 60–71.
editor take
USAD 2.0 scales its audio encoder to 1B parameters; no datasets or scores disclosed, so discount the SOTA claim.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
17:41
4d ago
arXiv · cs.CL· atomEN17:41 · 06·04
Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions
The paper proposes counterfactual context revision to audit LLM-based stance simulation in online discussions, evaluating text-only and meme-based multimodal revisions with two metrics: average directional stance shift and stance transition rate.
#Multimodal#Benchmarking#Safety#Research release
why featured
HKR-K and HKR-R pass: the paper offers an audit mechanism and metrics for LLM stance simulation reliability. No experiment scale or headline result is disclosed, and HKR-H is weak, so it stays in the 60–71 all band.
editor take
Only two metrics are disclosed, with no model names or sample size; this tests prompt steerability, not user-belief simulation.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
16:30
4d ago
HuggingFace Papers (takara mirror)· rssEN16:30 · 06·04
An Infectious Disease Spread Simulation Based on Large Language Model Decision Making
The paper proposes a spatial agent-based simulation framework that uses LLM-generated decisions for self-reported influenza-like illness, compares three decision scenarios in San Francisco and Atlanta, and finds income and education dominate variation in reporting rates.
#Agent#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the angle is fresh and the post gives cities, scenarios, and a variable-level finding. Weight stays in all because it is an applied public-health simulation paper with no product, open-source artifact, or reproducibility detail.
editor take
Two cities and three scenarios are thin evidence; I don’t buy LLM agents as a substitute for behavioral data.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
15:41
4d ago
HuggingFace Papers (takara mirror)· rssEN15:41 · 06·04
Tangram: Non-Uniform KV Cache for Efficient Multi-turn LLM Serving
Tangram implements non-uniform KV cache serving with deterministic per-head budget allocation, Head Group Page management, and ahead-of-time load balancing, reporting up to 2.6x higher throughput than existing baselines while preserving model accuracy; the authors also released the implementation at the aiha-lab/TANGRAM GitHub repository.
#Inference-opt#Memory#aiha-lab#Research release
why featured
HKR-K/R pass: 2.6x throughput and concrete KV-cache mechanisms are useful for inference-cost work. HKR-H is weak, and the source/body detail is thin, so this stays in the high all band.
editor take
Tangram reports up to 2.6x throughput; static per-head budgets are clean, but multi-model serving will stress the scheduler first.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
14:47
4d ago
HuggingFace Papers (takara mirror)· rssEN14:47 · 06·04
Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents
The authors introduce a data snapshot extraction benchmark covering three institutional document types: humanitarian reports, World Bank policy research working papers, and project appraisal documents, and release source PDFs, annotations, metadata, and code for evaluating open-source layout detection models.
#Vision#Benchmarking#World Bank#Hugging Face
why featured
HKR-K is clear: a new open benchmark with artifacts. HKR-R applies for document extraction and RAG practitioners, but HKR-H is weak and the niche scope keeps it in all, not featured.
editor take
World Bank released a 3-document-type benchmark; I like the dirty layout work, closer to real RAG than academic-PDF scores.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
13:52
4d ago
HuggingFace Papers (takara mirror)· rssEN13:52 · 06·04
Ouvia: A User-centered Framework for Measuring Usability of Speech Translation in Real-World Communication Scenarios
Ouvia evaluates four speech translation systems using more than 1,750 English-to-Portuguese one-to-one interactions in healthcare and everyday scenarios, and users rate only around half of the interactions as usable.
#Audio#Benchmarking#Ouvia#Research release
why featured
HKR-H/K/R pass, but this is a vertical speech-translation usability benchmark, not a major model or platform release. Concrete sample size and outcome make it useful, but not featured-level.
editor take
Ouvia ran 1,750 English-Portuguese interactions; four ST systems hit only ~50% usable, making decontextualized ST scores look thin.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
13:03
4d ago
HuggingFace Papers (takara mirror)· rssEN13:03 · 06·04
Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback
The paper introduces Structured Defect Grounding, modeling text-to-image defects as location, type, reason, and importance tuples, and releases SDG-30K with 30K images annotated with boxes across four modern T2I generators.
#Vision#Multimodal#Alignment#Research release
why featured
HKR-H/K pass: SDG-30K adds a concrete 30K-image, 4-generator benchmark and a four-field defect schema. Reach stays narrow to multimodal evaluation, with no product launch or cross-source debate, so it fits 60–71.
editor take
SDG-30K adds box-level defects on 30K images; I buy the interface, heatmaps don’t bind “where” to “why.”
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
12:45
4d ago
HuggingFace Papers (takara mirror)· rssEN12:45 · 06·04
MS-DKC: A Dataset Knowledge Card Framework for Designing and Adapting Medical Image Segmentation Models
The paper introduces MS-DKC, a Medical Segmentation Dataset Knowledge Card framework, and evaluates it on DRIVE, ISIC2018, and ACDC by linking dataset descriptors to failure modes, design priors, and risk criteria; on DRIVE, SA-UNetv2-DKC-AmbRef reports Dice 0.8141, IoU 0.6865, sensitivity 0.8265, specificity 0.9804, and AUC 0.9853.
#Vision#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a concrete framework and metrics, but HKR-H and HKR-R are weak because the item is a narrow medical-imaging paper. No hard exclusion applies, so it stays in all at the low-value research band.
editor take
MS-DKC runs on 3 medical segmentation sets; I buy dataset cards, but DRIVE Dice 0.8141 needs stronger baselines.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
09:51
4d ago
HuggingFace Papers (takara mirror)· rssEN09:51 · 06·04
Learning Robot Safety Policies via Adversarial Synthetic Scenarios
The paper proposes a robot safety framework where a Red Team generates hazardous scenarios and a Blue Team iteratively refines policies; the post states this is ongoing work and discloses only a problem formulation plus proposed architecture.
#Agent#Robotics#Safety#Research release
why featured
HKR-H/K/R barely pass because the paper offers an adversarial robot-safety training mechanism. The body only gives a problem framing and architecture, with no metrics or reproducible experiment, so it stays in the 60–71 band.
editor take
The paper only gives a red-team/blue-team architecture; no metrics yet, so treat it as a robotics safety roadmap.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
08:58
4d ago
HuggingFace Papers (takara mirror)· rssEN08:58 · 06·04
GLASS: GRPO-Trained LoRA for Acoustic Style Steering in Zero-Shot Text-to-Speech
GLASS freezes the TTS backbone and trains one LoRA per acoustic control axis. It uses GRPO with speech-token length, mean F0, and WER rewards to steer speaking rate and pitch in zero-shot TTS while preserving speaker similarity, naturalness, and intelligibility.
#Audio#Fine-tuning#Alignment#GLASS
why featured
HKR-K passes via the concrete GRPO+LoRA reward setup for zero-shot TTS control. HKR-H and HKR-R are weak, and the post lacks result numbers, model size, or release status, so it stays in the normal research-update band.
editor take
GLASS uses one LoRA per acoustic axis for rate and pitch; metrics are undisclosed, but LoRA arithmetic beats style-label catalogs.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
08:47
4d ago
HuggingFace Papers (takara mirror)· rssEN08:47 · 06·04
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving
QCFuse uses chunk-anchor query probing and critical-layer profiling in SGLang to select recomputation tokens for RAG cache fusion, reaching full-prefill-level quality across 4 open-weight LLMs and 6 datasets while averaging 1.7x prefill-time speedup over full prefill and 1.5x over ProphetKV.
#RAG#Inference-opt#QCFuse#SGLang
why featured
HKR-H/K/R pass, but this is a systems paper for RAG serving with no disclosed broad adoption or major-lab push. The 1.7x prefill speedup is useful, so it sits high in the 60–71 band.
editor take
QCFuse gets 1.7x prefill speedup across 4 models and 6 datasets; RAG serving gains still come from KV plumbing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
08:46
4d ago
HuggingFace Papers (takara mirror)· rssEN08:46 · 06·04
Entropy-Based Evaluation of AI Agents: A Lightweight Framework for Measuring Behavioral Patterns
The paper proposes EEA, a lightweight framework that evaluates agent behavior with six entropy-based metrics, and provides a Python implementation for LangChain, Google ADK, custom agent loops, and stored observability traces.
#Agent#Tools#Benchmarking#LangChain
why featured
HKR-H/K/R all pass, but this is a single lightweight evaluation-framework paper without major-lab backing, benchmark impact, or production replacement evidence. It fits the upper 60–71 band, not featured.
editor take
EEA adds six entropy metrics for agents; I buy the lens, but trajectory variety is not capability.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
08:39
4d ago
HuggingFace Papers (takara mirror)· rssEN08:39 · 06·04
Analysis of the Neglect-Zero Effect in Large Language Models
The paper tests two neglect-zero inference types in LLMs using a structural priming paradigm, with primes designed to force zero-model consideration and targets used to check transfer; the authors report that the analyzed models did not show the neglect-zero effect and released code at github.com/ynklab/neglect_zero.
#Reasoning#Interpretability#Benchmarking#ynklab
why featured
HKR-K passes: the paper offers a concrete experimental setup, two test types, released code, and a negative result. HKR-H and HKR-R are weak, so it fits the 60–71 research-signal band.
editor take
The paper tests two neglect-zero inference types; models didn’t show the bias. Model list and sample size aren’t disclosed, so treat it as a small probe.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
08:11
5d ago
HuggingFace Papers (takara mirror)· rssEN08:11 · 06·04
Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models
GeoVR trains geometric representations for MLLMs using only 2D video sequences, with four targets: inter-frame camera pose estimation, dense depth regression, metric scale prediction, and multi-scale 3D feature distillation from pretrained 3D foundation models; the snippet says experiments on spatial reasoning benchmarks report state-of-the-art performance, but does not disclose datasets, model size, or scores.
#Multimodal#Vision#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: training spatial geometry from 2D video is a concrete mechanism. HKR-R is weak, and the post lacks model scale, benchmark gains, or reproducible results, so it stays in the 60–71 band.
editor take
GeoVR trains 2D video with 4 geometry losses; no scores or datasets disclosed, so treat SOTA as abstract PR.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
07:07
5d ago
HuggingFace Papers (takara mirror)· rssEN07:07 · 06·04
Beyond Absolute Scores: Relative Edit-induced Difference for Generalizable Image Aesthetic Assessment
RED-Aes trains image aesthetic assessment through controllable image edits, not absolute MOS regression. The paper introduces RED-20k with edit-based image pairs, quantitative aesthetic differences, and CoT rationales, then applies three-stage training with a relative ranking consistency reward across multiple public benchmarks.
#Vision#Reasoning#Benchmarking#Research release
why featured
HKR-K passes because the post names RED-20k and its relative-supervision setup. HKR-H and HKR-R are weak, making this a narrow vision-evaluation research item below the featured bar.
editor take
RED-20k has 20k edit pairs; relative aesthetic deltas beat MOS regression, but the SOTA proof is undisclosed here.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
06:23
5d ago
HuggingFace Papers (takara mirror)· rssEN06:23 · 06·04
MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA
MARDoc splits multimodal long-document QA into three agents—Explorer, Refiner, and Reflector—and uses dynamically updated structured memory instead of full interaction history, with experiments on MMLongBench-Doc and DocBench showing gains over same-backbone baselines.
#Agent#Multimodal#Memory#MARDoc
why featured
HKR-K and HKR-R pass: the item names a three-agent mechanism and two benchmark wins, relevant to document agents. The post lacks gain sizes, release status, and reproducible details, so it stays in the normal research-release band.
editor take
MARDoc beats same-backbone baselines on two long-doc QA benchmarks; no margins disclosed, so I read it as context diet, not agent novelty.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
06:09
5d ago
HuggingFace Papers (takara mirror)· rssEN06:09 · 06·04
AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding
AdaPLD improves model-free speculative decoding with semantic-similarity retrieval and branched reuse hypotheses, preserving lexical reuse while recovering matches missed by surface-form variation; across diverse benchmarks, the method reduces target-model forward passes and reports up to 3.10× decoding speedup, while the snippet does not disclose model sizes or per-benchmark latency numbers.
#Inference-opt#Research release
why featured
HKR-K and HKR-R are strong, with HKR-H from the 3.10× speedup hook. The post is paper-summary level, with no code, model scale, or reproducible setup disclosed, so it stays in the 60–71 band.
editor take
AdaPLD reports up to 3.10× speedup; no model sizes or latency table disclosed, so I read it as a ceiling.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:52
5d ago
HuggingFace Papers (takara mirror)· rssEN04:52 · 06·04
Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving
The study introduces a critic-guided heterogeneous multi-agent framework for mathematical reasoning, using generator-validator feedback on intermediate steps, and reports up to 13% accuracy improvement on GSM8K over single-shot and non-critic models.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-K passes with a concrete critic-guided multi-agent mechanism and a 13% GSM8K gain. HKR-H and HKR-R are weak; this is a single reasoning paper without code, real-world tasks, or production impact, so it fits 60–71.
editor take
GSM8K gains hit 13%, but baselines are undisclosed; this smells like buying accuracy with extra inference budget.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:49
5d ago
HuggingFace Papers (takara mirror)· rssEN04:49 · 06·04
Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models
The paper introduces ChronoVision, a benchmark with three datasets for testing chronological reasoning in VLMs across similar historical objects, event and object categories, and image-news text pairs; experiments find that models often use superficial cues such as grayscale versus color filters instead of genuine chronological features.
#Vision#Multimodal#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: ChronoVision adds 3 datasets and a testable shortcut-bias claim for VLMs. The post stays at abstract level and does not disclose model rankings or tooling, so it remains below featured.
editor take
ChronoVision tests VLM time reasoning on 3 datasets; grayscale shortcuts show up, basically annotation leakage in visual form.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R0
04:35
5d ago
HuggingFace Papers (takara mirror)· rssEN04:35 · 06·04
PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation
PerceptUI predicts persona-conditioned UI/UX answers for specific users and trains in two stages: contrastive reflection fine-tuning and reflective prompt evolution from failure traces.
#Agent#Multimodal#Fine-tuning#PerceptUI
why featured
HKR-H/K/R pass, but the body only gives a method sketch; dataset size, metrics, and artifacts are not disclosed. Useful applied-agent research, not a must-write release.
editor take
PerceptUI uses two-stage training for persona feedback; sample size is undisclosed, so don’t treat “human-level realism” as UX evidence.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning
The paper proposes PACT, which constrains safety-token confidence at each response step during downstream fine-tuning to match an aligned reference model, leaves non-safety tokens mostly unconstrained for task adaptation, and releases code on GitHub; the abstract does not disclose model sizes or benchmark scores.
#Fine-tuning#Safety#Alignment#PACT
why featured
HKR-H/K/R pass: the hook, mechanism, and deployment risk are clear. Importance stays in the 60–71 band because model scale, baselines, and evaluation scores are not disclosed.
editor take
PACT constrains only safety-token confidence; no model sizes or scores in the abstract. Clean idea, but don't assume generalization yet.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
The paper compares models with identical architectures and fine-tuning data, and finds that stronger long-context capacity before SFT yields higher accuracy on reasoning benchmarks, with gains persisting on short-input tasks.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-H/K/R pass: the paper makes a testable claim that pre-SFT long-context ability correlates with reasoning accuracy and transfers to short inputs. No concrete deltas, author context, or replication details are disclosed, so it stays below featured.
editor take
Same architecture and SFT data: stronger pre-SFT long context wins on reasoning; no effect size disclosed, so treat it as recipe evidence.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Cross-Prompt Generalization in Detecting AI-Generated Fake News Using Interpretable Linguistic Features
The paper trains a random forest on AI-generated articles from three distinct prompts plus real news, then tests six cross-prompt train-test combinations with AUC ranging from 0.988 to 1.000.
#Benchmarking#Interpretability#Research release#Benchmark
why featured
HKR-H/K/R pass, but this is a single arXiv paper with only the experiment summary visible; dataset scale and real-platform replication are not disclosed, so it stays at the top of 60–71.
editor take
Random forest hits 0.988-1.000 AUC across 3 prompts; I don't buy it without generator and external-news details.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
ZeroUnlearn reframes machine unlearning as model editing, maps sensitive inputs to a neutral target state, enforces representational orthogonality through a closed-form multiplicative parameter update, and adds a gradient-based variant for multi-sample unlearning.
#Fine-tuning#Safety#ZeroUnlearn#XMUDeepLIT
why featured
HKR-H/K/R pass, but the post gives only a method summary with no metrics, model scale, or reproducible repo. Treat it as a normal arXiv safety/unlearning paper: all tier, below featured.
editor take
ZeroUnlearn uses closed-form multiplicative updates for few-shot unlearning; no benchmark numbers here, so don’t equate it with compliant deletion.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Be Fair! Can Machine Learning Engineering Agents Adhere to Fairness Constraints?
The paper evaluates two MLE agents on melanoma classification and finds their generated pipelines show high variance and underperform manual baselines on both predictive quality and skin-tone fairness, even with fairness-oriented prompts.
#Agent#Safety#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv paper and the feed does not disclose agent names, dataset size, or reproducibility details. It is useful agent-safety signal, not same-day must-write news.
editor take
Two MLE agents lost to manual baselines on melanoma; fairness prompts still failed to control pipeline search.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Fixed Aggregation Features Can Rival GNNs
The paper introduces training-free Fixed Aggregation Features that convert graph tasks into tabular tasks, and across 14 benchmarks, MLPs trained on FAFs match or outperform state-of-the-art GNNs and graph transformers on 12 tasks.
#Benchmarking#Interpretability#Research release#Benchmark
why featured
HKR-H and HKR-K pass: fixed features plus MLP challenging GNNs is a concrete mechanism with 14 benchmarks. HKR-R is weak because the impact is mostly graph-ML-specific, with no deployment, cost, or mainstream model angle.
editor take
FAF matches or beats GNNs on 12 of 14 benchmarks; many graph papers look under-baselined without strong tabular checks.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Reinforcement Learning from Rich Feedback with Distributional DAgger
The paper introduces DistIL, a Distributional DAgger method for learning from rich feedback such as execution traces, tool outputs, expert corrections, and self-evaluations. The authors prove forward cross-entropy gives monotonic policy improvement and regret guarantees, then report gains over RLVR and self-distillation baselines on scientific reasoning, coding, and hard math tasks.
#Reasoning#Code#Fine-tuning#Research release
why featured
HKR-K/R pass: the paper offers a new algorithm, proof, and science/code/math tests. As a single arXiv item without gains, model scale, or reproduction detail, it stays high-all.
editor take
DistIL applies DAgger to trajectory feedback; model scale is undisclosed, so the theory looks cleaner than the evidence.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling
GeoMin models global feature distributions on labeled data to assess self-reward reliability in semi-supervised RLVR; experiments show it beats the strongest baselines by 4.1% and surpasses fully supervised models using only 10% of the annotations.
#Reasoning#Fine-tuning#GeoMin#Research release
why featured
Single arXiv training-method paper: HKR-K and HKR-R pass via the 4.1% gain and 10% labeled-data claim. HKR-H is weak, and there is no product release or major-lab signal, so it stays in all.
editor take
GeoMin beats full supervision with 10% labels and +4.1%; RLVR data-efficiency looks legit, pending code and task list.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Research proposes pre-deployment verification framework for enterprise AI agents using ontology-grounded simulation
The paper proposes a pre-deployment verification framework for enterprise AI agents, combining an operational envelope, ontology-to-scenario generation, and machine-verifiable trust certificates; its pilot across four regulated industries generated 1,800 scenarios, tested 125 regulatory requirements and 25 injected faults, and found ontology-grounded generation reached 48.3% regulatory coverage versus 33.1% for a persona baseline.
#Agent#Safety#Benchmarking#Claude
why featured
HKR-K/R pass: the paper gives concrete scenario counts and maps to enterprise agent assurance pain. HKR-H is weak, and as a single arXiv paper without deployment results or adoption, it stays below featured.
editor take
G4 ran 1,800 scenarios and hit 48.3% vs 33.1% coverage; don’t call it certification when Bonferroni weakens the edge.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling
LoopMoE compares a looped MoE language model with Vanilla MoE under identical total parameters, per-token FLOPs, and active sublayer ratios; at 3B scale, it outperforms Vanilla MoE on 8 of 9 downstream benchmarks, with an average gain above 1 point.
#Reasoning#Benchmarking#LoopMoE#Vanilla MoE
why featured
HKR-K/R pass: the equal-parameter/equal-FLOPs 8-of-9 benchmark result is concrete and cost-relevant. HKR-H is weak; this is one arXiv architecture paper with no adoption or release artifact, so it stays in the high all band.
editor take
LoopMoE beats Vanilla MoE on 8/9 benchmarks at 3B; I buy the controls, not the one-point victory lap.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Study Finds Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate
The study evaluates eight public MTSAD benchmarks and finds no cross-channel rupture without a univariate deviation under reasonable thresholds; in six benchmarks, at least half of labeled anomaly segments deviate univariately on 89% to 100% of timesteps.
#Benchmarking#arXiv#Research release#Benchmark
why featured
HKR-H/K/R pass: the paper challenges MTSAD benchmark validity with concrete numbers across 8 datasets. Impact stays mostly with anomaly-detection and benchmark users, so it remains in the 60–71 band.
editor take
Eight MTSAD benchmarks show no cross-channel-only anomalies; many CD model wins are probably univariate detection in disguise.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval
The paper replicates Nano-Memory late interaction and adds BM25 score fusion, improving LoCoMo Hit@1 by 8.8 to 17.2 points across six encoders and reaching Hit@1 0.752 with e5-large-v2.
#RAG#Memory#Benchmarking#Nano-Memory
why featured
HKR-K/R pass: the paper gives measurable LoCoMo gains and a training-free BM25+dense mechanism. HKR-H is weak, and the work is incremental retrieval research, so it stays in the 60-71 all band.
editor take
BM25 fusion lifts LoCoMo Hit@1 to 0.752; I like this CPU-only recipe, especially since the reranker loses 6.9 points.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Model-Preserving Adaptive Rounding
YAQA directly optimizes network-output error for quantization and provides the first end-to-end error bounds for quantization algorithms; the paper reports about 30% lower error than GPTQ/LDLQ and no added inference overhead.
#Inference-opt#YAQA#GPTQ#LDLQ
why featured
HKR-K and HKR-R pass: YAQA gives a concrete error-bound claim and ~30% lower error tied to deployment cost. HKR-H is weak, and this is an arXiv paper rather than a same-day must-write release.
editor take
YAQA targets output error and reports ~30% lower error than GPTQ/LDLQ; I buy the direction, pending reproduction.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
LimiX-2M Mitigates Low-Rank Collapse and Attention Bottlenecks in Tabular Foundation Models
LimiX-2M uses 2M parameters with RaBEL scalar RBF tokenization and S→N→F bidirectional routing, outperforming larger TabPFN-v2 and TabICL baselines on widely used tabular benchmarks while reducing training and inference costs; checkpoints and inference code are available on GitHub.
#Embedding#Inference-opt#Benchmarking#LimiX
why featured
HKR-H/K/R pass, but this is still a niche tabular-foundation-model paper rather than a broad LLM or agent update. Open code and benchmark claims make it useful signal, but not featured-level.
editor take
LimiX-2M beats larger TabPFN-v2 with 2M params; tabular FMs need better scalar tokenization, not fatter attention.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents
The paper finds existing experience-internalization methods suffer progressive capability collapse under multi-iteration learning, not compounding gains. It analyzes three factors: principle-level experience beats instance-level experience, step-wise injection beats global injection for long-horizon tool use, and off-policy context distillation on high-quality teacher trajectories gives a stabler signal than on-policy distillation.
#Agent#Fine-tuning#Tools#Research release
why featured
HKR-K/R pass because the paper targets self-evolving agents and names a training recipe. HKR-H is weak, and the post gives no metrics, lab, or reproducible setup, so it stays in the 60–71 band.
editor take
Multi-iteration experience internalization causes capability collapse; useful 3-axis recipe, but no model, task, or drop size disclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
OpenRFM: Dissecting Relational In-Context Learning
OpenRFM proposes a dual-stage ICL architecture and mixed pre-training scheme for relational foundation models, improves average task performance by about 30% over the RT backbone, and surpasses the commercial KumoRFMv1 model on a large evaluation set.
#Reasoning#Benchmarking#OpenRFM#KumoRFMv1
why featured
HKR-K is clear and HKR-R is present via open-vs-commercial replacement pressure. The arXiv relational-ICL focus is narrow and HKR-H is weak, so it stays at the high end of 60–71.
editor take
OpenRFM beats RT by ~30%; the useful bit is turning KumoRFMv1’s black-box edge into a reproducible label-scarcity diagnosis.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Rollout-Level Advantage-Prioritized Experience Replay for GRPO
The paper proposes a rollout-level replay buffer for GRPO, removes samples older than tau_max training steps, keeps fresh on-policy rollouts in each batch, and reports gains across three Qwen3-Base scales on five math benchmarks, with the largest five-benchmark average gain of +4.35 percentage points at 4B.
#Reasoning#Fine-tuning#Benchmarking#Qwen
why featured
HKR-K and HKR-R pass: the paper reports a concrete replay mechanism and benchmark lift. HKR-H is weak, and a single arXiv GRPO training trick lacks broad product or adoption impact, so it stays in 60–71.
editor take
GRPO replay with tau_max eviction lifts Qwen3-Base 4B math average by 4.35 pp; don't generalize yet, non-math tasks aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
LoopFM: Learning from Historical Representations of Foundation Models for Recommendation
LoopFM feeds foundation-model intermediate embeddings into downstream recommendation models without real-time FM inference or FM-VM architectural coupling; across three public benchmarks it improves AUC, including over 6% on TaobaoAd, and in billion-example industrial systems with trillion-parameter FMs it roughly doubles the knowledge transfer ratio on top of KD.
#Embedding#Fine-tuning#LoopFM#TaobaoAd
why featured
HKR-K and HKR-R pass: the paper gives a concrete embedding mechanism plus TaobaoAd and KD comparison numbers. HKR-H is weak, and this is a single arXiv recommender paper, so it stays below featured.
editor take
LoopFM lifts TaobaoAd AUC by 6%+; offline intermediate embeddings beat KD’s scalar bottleneck for recommender transfer.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Revisiting Model Stitching in the Foundation Model Era
The paper tests stitching across heterogeneous VFMs including CLIP, DINOv2, and SigLIP 2, introduces VFM Stitch Tree to share early layers, and reports that deep stitch points can exceed either constituent model with only the stitch-layer inference overhead.
#Vision#Multimodal#Inference-opt#CLIP
why featured
HKR-H/K/R pass, but this is a specialized vision-model stitching paper with no disclosed tool release, replication artifact, or production proof, so it stays in the 60–71 research-signal band.
editor take
CLIP, DINOv2, and SigLIP 2 can win after deep stitching; no gain numbers disclosed, so VST isn't free lunch yet.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Customizing the Inductive Biases of Softmax Attention Using Structured Matrices
The paper proposes attention scoring functions based on BTT and contiguous MLR structured matrices, reporting better high-dimensional in-context regression under any fixed compute budget and improved language-modeling scaling laws versus standard attention and sliding-window variants.
#Reasoning#Inference-opt#Research release
why featured
HKR-K passes via named BTT/continuous-MLR mechanisms and fixed-compute comparison claims. HKR-H and HKR-R are weak: the angle is academic, and deployment conditions are not disclosed.
editor take
BTT/MLR attention beats standard attention at fixed compute, but no margins disclosed; I’d audit the LM scaling curves first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences
arXiv 2605.07724v2 shows that curation with multiple reward functions can mitigate collapse in recursive generative retraining under specified conditions, leading to a stable distribution that allocates probability across competing high-reward regions and satisfies a weighted Nash bargaining solution.
#Alignment#Fine-tuning#Safety#Research release
why featured
HKR-H/K/R all pass, but the post offers an arXiv theory result only: no experiments, code, or production validation. The technical barrier keeps it in the 60–71 research-signal band.
editor take
2605.07724v2 proves multi-reward curation can reduce collapse; conditions are unspecified here, so engineering reproducibility is still open.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Validity Threats for Foundation Model Research
The arXiv paper frames foundation model research as a causal inference problem and evaluates three compute-saving strategies—proxy experiments, observational studies, and single-run designs—against four validity types: statistical, internal, external, and construct validity.
#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: it frames foundation-model research validity across four categories and three study designs. HKR-H is weak, and the post lacks authorship signal, concrete experiments, or industry impact, so it stays in all.
editor take
The paper audits 3 compute-saving designs across 4 validity types; I buy the frame—many scaling-law claims need causal accounting.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Why Muon Outperforms Adam: A Curvature Perspective
The paper says Muon improves large language-model training efficiency over Adam by about 2x, attributing the gap to lower Normalized Directional Sharpness rather than different update norms.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-H/K/R all land for optimizer-focused readers: the ~2x efficiency claim and lower-NDS mechanism are concrete. Curvature/NDS framing and single arXiv sourcing keep it in 60–71.
editor take
Muon gets a 2x efficiency story via lower NDS, not update size; I buy the mechanism, not broad pretraining claims.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents
The paper proposes LifeSkill, a two-stage reinforcement learning framework for online lifelong learning agents, and reports a 7-point absolute average performance gain over existing lifelong agent baselines on LifelongAgentBench.
#Agent#Reasoning#Fine-tuning#LifeSkill
why featured
HKR-H/K/R pass, but this is a single arXiv paper with evidence centered on a +7-point LifelongAgentBench gain. No open-source artifact or production replacement claim is disclosed, so it stays in all.
editor take
LifeSkill gains 7 points on LifelongAgentBench; parameter updates beat retrieval bloat, but online update cost is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
LLM Compression with Jointly Optimizing Architectural and Quantization Choices
The paper introduces a differentiable NAS framework that jointly optimizes LLM architectural configurations and mixed-precision quantization for linear layers, achieving up to 1.4x faster inference than sequential NAS-then-quantization baselines at comparable accuracy, or up to 6% higher average accuracy across seven reasoning tasks at equivalent latency.
#Inference-opt#Reasoning#Research release
why featured
HKR-K and HKR-R pass: the mechanism and numbers are concrete, and they map to inference cost. As an arXiv compression paper without a notable lab, artifact, or cross-source pickup, it stays in the 60–71 band.
editor take
Joint NAS plus mixed precision gives up to 1.4x speedup; I want search cost, and the abstract omits it.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment
The paper introduces a two-player multi-agent environment based on Fog of Love and tests affinity-based reinforcement learning on competitive and cooperative objectives; the abstract says localized affinities improve overall scores in both domains.
#Agent#Reasoning#Interpretability#arXiv
why featured
HKR-H/K/R all pass, but this is a single arXiv game-environment paper. The post gives the mechanism and directional result, not benchmark strength, code, or real-agent transfer, so it stays in the 60–71 band.
editor take
Fog of Love adds a two-agent testbed. Scores aren’t disclosed; don’t stretch affinity RL into alignment yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
BiasGRPO paper proposes method for stabilizing bias mitigation in high-variance reward settings
The paper proposes BiasGRPO, using GRPO to normalize rewards across a group of sampled completions and replace the value function with a group-relative baseline; the abstract says it outperforms DPO and PPO across multiple benchmarks, but does not disclose benchmark names or scores.
#Alignment#Safety#Fine-tuning#Research release
why featured
HKR-K is clear via the GRPO mechanism, and HKR-R fits bias-mitigation/post-training concerns. HKR-H is weak, and the body lacks benchmark names, effect sizes, or code, so this stays in all.
editor take
BiasGRPO swaps the value function for a group-relative baseline; no benchmark names or scores disclosed, so don't buy the DPO/PPO win yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Stateful Visual Encoders for Vision-Language Models
The paper introduces a Stateful Visual Encoder that conditions each image representation on prior visual features; after supervised fine-tuning, VLMs with the encoder improve on cross-image spatial aggregation, multi-object visual differencing, and visual trajectory behavior cloning across resolutions, model sizes, and VLM backbones.
#Vision#Multimodal#Fine-tuning#Research release
why featured
HKR-H/K pass: the paper proposes stateful cross-image visual encoding and tests spatial aggregation, difference detection, and trajectory imitation. No concrete gains, product path, or open-source artifact are disclosed, so it stays in all at 68.
editor take
Stateful Visual Encoder feeds prior visual features into each image embedding; I buy the direction, but no gains are disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Building the Ph(ysical)AI Layer of Machine Intelligence
The authors propose principle-driven foundation models and report that a 1.99M-parameter frozen RF encoder reaches 77.7% average accuracy across 15 linear-probe tasks, with no encoder fine-tuning on target domains.
#Multimodal#Embedding#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv paper with no named-lab signal, open artifact, or production replacement proof. It stays in the informative all band.
editor take
A 1.99M RF encoder hits 77.7% on 15 linear probes; I don’t buy PhAI hype past its 70.0% semantic ceiling.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning
The paper introduces Outcome-grounded Advantage Reshaping for GRPO in mathematical reasoning, replacing uniform sequence-level credit with token-level advantage redistribution; OAR-P uses counterfactual token perturbations as a high-fidelity attribution signal, while OAR-G uses an input-gradient proxy with one backward pass, and the abstract reports benchmark gains over a strong GRPO baseline without disclosing exact scores.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-K/R pass: OAR targets token-level credit assignment in GRPO with counterfactual and one-backward-pass variants. HKR-H fails, and the feed gives no gain numbers, code, or top-lab signal, so it stays in 60–71.
editor take
OAR adds token-level attribution to GRPO; scores are undisclosed, so I buy one-backward-pass OAR-G, not “significant gains.”
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
VentAgent: When LLMs Learn to Breathe — Multi-Objective Arbitration for ARDS Ventilation
VentAgent reformulates ARDS mechanical ventilation as multi-objective arbitration with three stages, Perception, Planning, and Orchestration, and evaluations on a high-fidelity physiological simulator report better results than state-of-the-art RL and classical control baselines.
#Agent#Reasoning#Interpretability#VentAgent
why featured
HKR-H/K/R pass, but this is a single arXiv summary in a specialist medical-control setting with simulator-only claims and no clinical validation or reproducibility details disclosed; keep it as interesting research, below featured.
editor take
VentAgent beats RL in simulation, not clinic; putting LLMs in ventilator control needs evidence beyond readable reasoning chains.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Sparse Mixture-of-Experts Reward Models Learn Interpretable Experts for Personalized Preference Modeling
The paper proposes a sparse MoE reward model trained on binary preference data with sparse routing and expert diversity, and reports controlled and real-world experiments where it learns interpretable routing patterns, specialized experts, and improves test-time personalization.
#Alignment#Fine-tuning#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the interpretable-expert angle is specific, and the summary gives sparse routing plus diversity training. No numbers, artifact, or product impact keeps it in the 60–71 research band.
editor take
Sparse MoE trains reward models on binary preferences; no extra annotation cost is the hook, but baseline gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts
CRAFT frames prompt optimization as a Pareto-front search over accuracy and prompt-token cost, using target-LLM validation calls as a scarce resource and covering high-accuracy and low-cost regions across six classification and reasoning benchmarks.
#Reasoning#Inference-opt#Benchmarking#CRAFT
why featured
HKR-K and HKR-R pass: cost-aware prompt optimization is practical, and the post gives a 6-benchmark setup. No savings rate, code artifact, or production replacement claim is disclosed, so this stays in the 60–71 band.
editor take
CRAFT searches accuracy-token Pareto fronts on 6 benchmarks; I buy the framing—single winning prompts are the wrong ops target.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
FactoryNet: A Large-Scale Dataset toward Industrial Time-Series Foundation Models
FactoryNet introduces 51M industrial time-series datapoints across 23k task executions, six embodiments, and 27 annotated anomaly types, using an S-E-F-C schema for zero-shot cross-embodiment transfer and parameter-efficient anomaly detection.
#Robotics#Benchmarking#FactoryNet#arXiv
why featured
HKR-H and HKR-K pass via the rare factory dataset and concrete scale figures. Impact is narrower than a mainstream model/tool release, so it stays in the 60–71 band.
editor take
FactoryNet ships 51M industrial time-series points; S-E-F-C is clever, but six embodiments is thin for “industrial foundation model.”
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Efficient Reasoning on the Edge
The paper tests LoRA adapters, supervised fine-tuning, and reinforcement-learning budget forcing on Qwen2.5-7B, reducing reasoning length, KV-cache pressure, and time-to-first-token for on-device inference under strict resource constraints.
#Reasoning#Fine-tuning#Inference-opt#Qwen
why featured
HKR-H/K/R all register, but the body gives mechanisms and goals without metrics, device conditions, or baselines. As a research release, it stays useful but below featured.
editor take
Qwen2.5-7B reports shorter traces and TTFT gains, but no deltas; I’d file this as engineering glue, not a capability jump.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Vision Transformer Finetuning Benefits from Non-Smooth Components
The paper reports over 1,000 finetuning runs on large-scale Vision Transformers and finds that high-plasticity attention modules and feedforward layers deliver better adaptation performance, challenging the assumption that smoother components are preferable.
#Vision#Fine-tuning#Research release#Open source
why featured
HKR-H has a counterintuitive title and HKR-K has 1,000+ runs, but this is a narrow ViT finetuning paper with no product or broad practitioner pain point. Lower-band all.
editor take
The paper ran 1,000+ ViT finetunes: prioritize high-plasticity attention and FFN, stop treating smoothness as a default virtue.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
CounterFace: A Synthetic Face Dataset for Fine-Grained Counterfactual Evaluation of Face Recognition Systems
CounterFace provides 11,821 counterfactual face pairs covering 20 facial attributes and 8 demographic factors, and evaluates six face recognition systems across 160 attribute-demographic combinations, with occluding attributes such as facemasks and facial hair degrading performance across all tested systems.
#Vision#Benchmarking#AWS Rekognition#Face++
why featured
HKR-K and HKR-R pass: the dataset size and evaluation setup are concrete, and face-recognition fairness has practitioner relevance. The arXiv benchmark is too vertical and lacks HKR-H, so it stays below featured.
editor take
CounterFace tests 11,821 pairs across 160 slices; citing LFW averages for robustness now looks conveniently blind.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
PerchRL: Vision-Based Agile Perching of Quadrotors on Rapidly Moving Inclined Surfaces
PerchRL trains quadrotors for vision-based perching on rapidly and irregularly moving inclined platforms, using a two-stage RL pipeline with state-based pre-training, vision-based fine-tuning, randomized trajectories, temporal augmentation, and active perception rewards under intermittent visual loss.
#Robotics#Vision#Agent#PerchRL
why featured
HKR-H and HKR-K pass: the robotics setup is concrete and the RL recipe is specific. HKR-R is weak; a single arXiv control paper lacks product pull, named lab weight, or broad practitioner stakes.
editor take
PerchRL targets vision perching, but the snippet gives no success rate; the two-stage RL recipe is practical, not proven robust.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Stochastic Sparse Attention for Memory-Bound Inference
SANTA samples S≪nk value rows during Llama-3.1-8B-Instruct decoding at 32k-token contexts, matches baseline accuracy, and reports up to 1.5x attention-kernel speedup over FlashInfer and FlashDecoding on an NVIDIA RTX 6000 Ada, with up to 1.25x end-to-end decode-latency speedup in batched long-context generation.
#Inference-opt#OPUSLab#Llama#NVIDIA
why featured
HKR-K and HKR-R pass: the paper offers a testable sparse-attention mechanism and concrete speedups. HKR-H is weaker, and the low-level kernel focus keeps it below featured.
editor take
SANTA gives 1.25x end-to-end decode speed at 32k on Llama-3.1-8B; useful trick, not a stack-changing result yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration
The paper introduces TIDE, a template-guided iterative framework that discovers multiple hidden problems from context, grounds them in evidence, and pairs them with actions, with validation on personal workspaces and software repositories across four model backbones against single-shot and parallel multi-agent baselines.
#Agent#Reasoning#Tools#TIDE
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and evaluation settings, and maps to agent deployment pain. No performance numbers, artifact, or visible debate, so it stays in the 60–71 band.
editor take
TIDE beats single-shot and multi-agent baselines across 2 settings and 4 backbones; agent work is moving toward proactive bug-hunting.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Policy Improvement Reinforcement Learning
The paper introduces PIRL and PIPO for RLVR, using a sliding-window historical baseline to verify each update retrospectively, and reports better stability and performance than GRPO and its variants on mathematical reasoning benchmarks.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the mechanism and GRPO comparison matter to RLVR readers. The post does not disclose exact scores, model scale, or reproducible setup, so it stays in the regular research band.
editor take
PIPO checks each RLVR update against a sliding-window baseline; it beats GRPO on math, but size and gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark
VAMPS introduces 1,168 bilingual multimodal multiple-choice items for graph-assisted algebra and calculus, testing whether models construct useful plots and ground answers in visual outputs; across tested models, direct analytical solving outperformed tool-enabled visual solving even when plotting was a natural strategy.
#Multimodal#Reasoning#Tools#VAMPS
why featured
HKR-H/K pass: VAMPS has a concrete visual-then-solve math setup and 1,168 bilingual items. It remains a single arXiv benchmark with no disclosed major-model results or adoption signal, so it stays in the 60-71 band.
editor take
VAMPS has 1,168 graph-aided math items; tool-enabled plotting lost to direct solving, so tool use still isn’t tool competence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Position: Deployed Reinforcement Learning Should Be Continual
Parnian Behdin and two coauthors argue that deployed RL agents should keep learning after release, citing four sources of post-deployment non-stationarity and framing evaluative reward signals as a continual RL condition; the paper was accepted to the ICML 2026 Position Paper Track.
#Agent#Reasoning#Parnian Behdin#Kevin Roice
why featured
HKR-K/R pass: the ICML 2026 position paper frames deployed RL around 4 non-stationarity sources, relevant to agents and online policies. No experiments, artifact, or major deployment case, so it stays in the 60–71 band.
editor take
Three authors frame deployed RL as continual learning; I buy the direction, but online safety bounds are the hard part.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Self-Distilled Policy Gradient
The paper proposes SDPG, combining group-relative verifier advantages, exact full-vocabulary on-policy self-distillation, and reference-policy KL regularization; the code is available on GitHub, while the snippet does not disclose benchmark names or scores.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K and HKR-R pass via a concrete SDPG recipe and open code, but HKR-H is weak and the feed text gives no benchmark numbers, model scale, or comparison setup. Interesting research release, not featured.
editor take
SDPG adds self-distillation to policy gradient, but benchmarks and scores aren’t disclosed; don’t retire RLVR yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Attention-Based Sampler for Diffusion Language Models
The paper proposes Attn-Sampler, a training-free sampler for diffusion language models that orders tokens by attention-matrix column sums, proves the original sampling-order selection problem is NP-hard, and reports higher generation quality with greater sampling parallelism across multiple benchmarks.
#Inference-opt#Reasoning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the mechanism and NP-hard claim are concrete. The article gives only abstract-level detail, with no speed, quality, or reproducible numbers, so this specialized dLLM sampling paper stays in all.
editor take
Attn-Sampler orders sampling by attention column sums; no gains disclosed, so I’d treat it as a neat dLLM inference hack.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control
The paper proposes LA-LQR, which models T2V inference as a dynamical system and solves a latent LQR problem in a low-dimensional subspace to produce timestep- and layer-specific activation steering signals while penalizing unnecessary perturbations.
#Safety#Vision#Alignment#Research release
why featured
HKR-H/K pass: LA-LQR treats T2V inference as a dynamical system and emits layer/timestep steering signals. No metrics, artifact, or product tie-in are disclosed, so HKR-R is weak; specialist control theory keeps it in all.
editor take
LA-LQR treats T2V inference as control over activations. No benchmark numbers disclosed; I’d treat it as a reproducible safety knob over prompt filters.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
STaR-Quant Method for State-Time Consistent Post-Training Quantization of Diffusion Language Models
The paper proposes STaR-Quant for post-training quantization of diffusion large language models. It targets state-dependent activation disparity and temporal error accumulation. SGAT separates masked and unmasked token activation spaces. TAC corrects quantized attention with a block-diagonal affine mapping. Experiments report up to 1.69x speedup and 3.14x memory savings versus FP16 deployment.
#Inference-opt#STaR-Quant#Research release
why featured
HKR-K and HKR-R pass: the paper offers concrete STaR-Quant mechanisms plus speed and memory numbers. HKR-H is weak, and the topic remains an inference-optimization paper below the featured bar.
editor take
STaR-Quant reports 1.69x speedup and 3.14x memory savings; DLLM quantization is finally treating iterative error as first-class.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
QuBLAST: Quantizing Large Language Models with Block-Level Compression and Activation Scaling
QuBLAST applies block-level mixed-precision PTQ and activation scaling maps to Qwen3-8B, Llama3-8B, Mistral v0.1-8B, and Falcon H1R-7B, reducing model size by 40%-45.2% while keeping perplexity increases within 5% on WikiText-2 and WikiText-103.
#Inference-opt#Qwen#Meta#Mistral AI
why featured
HKR-K/R pass: QuBLAST offers testable compression and perplexity claims tied to inference cost. HKR-H is weak, and the quantization-paper framing keeps it in the 60-71 band.
editor take
QuBLAST shrinks four 7B/8B models by 40%-45.2%; WikiText perplexity alone doesn’t sell real inference robustness.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty
The paper shows on nine real-world forecasting benchmarks that relaxing MSE by ≤5% often yields a median 17.3% improvement in marginal realism, with gains above 30% in some datasets.
#Benchmarking#Research release#Benchmark
why featured
HKR-K has concrete benchmark numbers, and HKR-R speaks to metric-vs-realism tradeoffs. The topic is still niche forecasting research, with no product or major model impact, so it stays in all.
editor take
Across 9 forecasting benchmarks, ≤5% MSE slack buys 17.3% median realism gain; long-horizon MSE worship rewards under-dispersion.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Data Attribution in Large Language Models via Bidirectional Gradient Optimization
The paper proposes training data attribution for auto-regressive LLMs using bidirectional gradient optimization: it perturbs a base model with gradient ascent and descent on a generated text sample, then measures loss changes across training samples to attribute factual and stylistic influence.
#Interpretability#Reasoning#Research release
why featured
HKR-K passes with a concrete attribution mechanism, and HKR-R connects to compliance and debugging. HKR-H is weak, and the post gives no metrics, code, or deployment conditions, so this stays mid-band.
editor take
The paper uses bidirectional gradients for LLM attribution; the abstract omits model scale, so don’t treat metrics as audit evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Distributional Approximate Nearest Neighbour Search for Uncertainty-Aware Retrieval
The paper proposes DINOSAUR, which samples S_i embeddings per item, builds an ANN index over the augmented set, and samples the user embedding at query time; this two-sided stochastic retrieval process models embedding uncertainty without changing the model architecture or ANN index infrastructure, and the abstract reports larger coverage with small offline recall losses.
#RAG#Embedding#DINOSAUR#arXiv
why featured
HKR-K and HKR-R pass: DINOSAUR's multi-sampled embeddings are practical and avoid model/ANN infra changes. No results, code, or major-lab adoption are disclosed, so it stays in the 60–71 band.
editor take
DINOSAUR indexes S_i embeddings per item in ANN; I buy the idea, but index bloat and latency are undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA
Tabula RASA tests KGQA multi-hop reasoning with four independently removable transformer components, and sparse adjacency masking accounts for most gains: +72.5pp on 3-hop MetaQA, +45.5pp on WebQSP, and +53.9pp on CWQ, while learned relation parameters add limited refinement.
#Reasoning#Benchmarking#Tabula RASA#Research release
why featured
HKR-H/K pass: the paper names a mechanism and a +72.5pp ablation on 3-hop MetaQA. HKR-R is weak because KGQA inductive bias remains research-centric with no product or agent impact shown.
editor take
Tabula RASA gains 72.5pp on 3-hop MetaQA; for KGQA, add adjacency masks before piling on relation parameters.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
STRIDE models training data attribution as sparse recovery in activation space, learns lightweight steering operators to perturb test predictions, and reports state-of-the-art LLM pre-training attribution with a 13× speedup over prior methods.
#Interpretability#Inference-opt#STRIDE#Research release
why featured
HKR-K/R pass: the 13x speedup and sparse-recovery mechanism add substance, and data attribution matters for compliance and debugging. The arXiv angle is narrow and technically dense, so it stays below featured.
editor take
STRIDE moves TDA into activation space and claims 13× speedup; I buy the direction, pending subset scale and attribution stability.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation
SceneDiver builds a holistic scene graph, iteratively decomposes tasks through recognition, understanding, and analysis, and distills focus ability into VLAs with a lightweight adapter; the abstract reports reduced visual hallucinations on embodied AI benchmarks but does not disclose exact scores.
#Vision#Robotics#Agent#SceneDiver
why featured
HKR-K/R pass: the paper offers a concrete VLA perception mechanism using scene graphs, focus-plan iteration, and adapter distillation. No benchmark scores, major-lab signal, or adoption data are disclosed, so it stays in the 60–71 band.
editor take
SceneDiver uses scene graphs and iterative focus plans to cut hallucination; no scores disclosed, so “substantially” gets no pass.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
The paper proposes on-the-fly repulsion in multimodal attention channels during the Diffusion Transformer forward pass, intervening between blocks after text conditioning gains image structure and before composition is fixed; the abstract claims richer T2I diversity with small overhead, but the post does not disclose numeric overhead or benchmark scores.
#Multimodal#Vision#Inference-opt#Research release
why featured
HKR-H/K pass: it has a concrete inference-time intervention for T2I diversity. Metrics, overhead, and reproducible setup are not disclosed, and the DiT-specific angle keeps it in all.
editor take
DiT attention repulsion runs during the forward pass; overhead and scores are undisclosed, so don’t buy “small overhead” yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning
The paper introduces In-Context RLVR, which prepends demonstrations before each rollout and uses Evidence Gain to approximately reweight rewards, reporting consistent gains in accuracy and reasoning quality over standard RLVR baselines on mathematical reasoning benchmarks.
#Reasoning#Alignment#Fine-tuning#Research release
why featured
HKR-H and HKR-K pass: the paper states a concrete training mechanism and math-benchmark improvement claim. It remains an arXiv method without lab-scale adoption, release traction, or production evidence, so it stays in 60–71.
editor take
In-Context RLVR prepends demos before every rollout; I buy the direction, but the snippet gives no benchmark numbers.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Supportive Token Revealing for Fast Diffusion Language Model Decoding
The paper proposes AXON, a training-free module that selects anchor tokens with attention, uncertainty, and confidence signals, and experiments across multiple diffusion language models show fewer function evaluations while maintaining or improving accuracy on reasoning and code-generation benchmarks.
#Inference-opt#Reasoning#Code#AXON
why featured
HKR-K and HKR-R pass: AXON provides a training-free decoding mechanism and targets lower inference cost. It remains a niche arXiv inference-optimization paper, so it stays in the 60–71 band.
editor take
AXON picks anchor tokens via attention, uncertainty, and confidence; NFE gains aren't disclosed, so I read it as a diffusion-LM decoding patch, not a model leap.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Geometry-Aware Hallucination Detection in Large Language Models
The paper proposes GA-ICL, a geometry-aware in-context demonstration sampler that uses latent representations from frozen LLMs, and reports better results than standard ICL selection baselines across most FEVER and HaluEval settings. Extended evaluations cover Phi-14B and Qwen3-32B, with the post not disclosing exact metric values in the snippet.
#RAG#Reasoning#Benchmarking#Phi
why featured
HKR-K and HKR-R pass: the method and eval setup are concrete, and hallucination detection is a real deployment pain. HKR-H is weak; a single arXiv benchmark paper lacks production proof or broad replication.
editor take
GA-ICL beats ICL baselines on most FEVER/HaluEval settings; metrics are undisclosed, so I’d file this as sampling-heuristic progress.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Platonic Transformers: A Solid Choice for Equivariance
Platonic Transformer defines attention relative to reference frames from Platonic solid symmetry groups, preserving the standard Transformer architecture and computational cost while providing equivariance to translations and Platonic symmetries, and the paper evaluates it on CIFAR-10, ScanObjectNN, QM9, and OMol25.
#Reasoning#Vision#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the Platonic-solid attention mechanism and four benchmarks are concrete. The topic stays niche geometric deep learning, so it fits the 60–71 band.
editor take
Platonic Transformer tests equivariant attention on 4 task types; if zero extra cost holds, it beats another expert-module stack.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Provably Reduced Sample Cost in Prior-Guided Hyperparameter Optimization
The paper gives distribution-dependent sample-complexity bounds for prior-guided multi-fidelity HPO, models priors over arm means in fixed-budget best-arm identification, and validates the theory on a synthetic benchmark and LCBench with up to 90% budget reduction while retaining solution quality.
#Fine-tuning#Benchmarking#LCBench#Research release
why featured
HKR-K is strong: 90% budget reduction plus LCBench validation is concrete. Kept in all because this is a niche theoretical HPO paper, not a broad product or lab release.
editor take
Prior-guided HPO cuts up to 90% budget on LCBench; I buy the theory, but production hinges on having good priors.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
FedMental: Evaluating Federated Learning for Mental Health Detection from Social Media Data
FedMental evaluates FL on depression detection from X and suicide crisis detection from Reddit, with centralized training at 85.63 F1, the best FL model at 83.16 F1, and DP-FL losing up to 27.01 F1 even at epsilon 50.
#Fine-tuning#Safety#Benchmarking#FedMental
why featured
HKR-K and HKR-R pass: the paper gives concrete F1 and DP-FL tradeoff numbers for mental-health detection. It remains a niche applied-research benchmark without product or agent implications, so it stays in all.
editor take
FedMental gets FL to 83.16 F1, 2.47 below centralized; DP-FL at ε=50 loses 27.01, a brutal privacy bill.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Worker Utility as Hysteresis: A Preisach Model of Transaction Acceptance in Gig Labour Markets
The paper models worker acceptance in 36,891 gig transactions with a Preisach hysteresis pipeline, using a dual-output neural network and XGBoost to reach Jaccard 0.827 and ROC AUC 0.799, with recommendations that reduce the total wage bill by 21.3% and raise expected fill rate by 9.7 percentage points.
#Benchmarking#arXiv#XGBoost#Research release
why featured
HKR-K/R pass with sample size, metrics, and wage-bill/fill-rate deltas, plus an algorithmic labor-pricing nerve. HKR-H is weak; this is niche gig-market modeling, not a model or product release, so it stays in the 60-71 band.
editor take
Preisach hits 0.799 AUC on 36,891 gigs; 21.3% wage savings plus higher fill smells overfit without external validation.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Spatially Grounded Concept Bottleneck Models via Part-Factorized Attention
The paper proposes a part-factorized CBM built on frozen DINOv3, reaching 88.6% top-1 and about 70% pointing accuracy on CUB-200-2011 without per-image supervision.
#Vision#Interpretability#DINOv3#Research release
why featured
HKR-K passes with a concrete mechanism and benchmark numbers. HKR-H and HKR-R are weak because this is a niche vision-interpretability paper with no product or agent impact.
editor take
Part-factorized CBM hits 88.6% top-1 on CUB; the wild bit is 27 images suffice for the spatial prior.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Explainably Safe Reinforcement Learning
The paper proposes an explainable safe RL method that represents a shielding policy as hierarchical decision trees; in experiments, the explanation trees are several orders of magnitude smaller than the original shield.
#Reasoning#Safety#Interpretability#Research release
why featured
HKR-K passes on a concrete mechanism and result. HKR-H is weak, and HKR-R is limited because safe RL is narrow; the post does not disclose code or production use.
editor take
Hierarchical trees shrink shield explanations by orders of magnitude; I buy the direction, but experiment scale is undisclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
SFMP: Fine-Grained, Hardware-Friendly, Search-Free Mixed-Precision Quantization for LLMs
SFMP proposes four mechanisms for compressing large language models: fractional bit-width, block-wise mixed precision, row-column weight reordering, and a unified GEMM kernel, with code released on GitHub.
#Inference-opt#SFMP#Research release#Open source
why featured
HKR-K lands with four concrete quantization mechanisms and open code; HKR-R is infra-specific, while no compression, throughput, or accuracy numbers are disclosed.
editor take
SFMP uses 4 mechanisms for search-free quantization; without latency tables, the unified GEMM claim carries the paper.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
TANDEM: Bi-Level Data Mixture Optimization with Twin Networks
Jiaxing Wang and 11 coauthors propose TANDEM, a twin-network method that optimizes LLM training data mixture ratios by comparing a proxy model trained on primary data with a dynamically updated reference model trained with additional data; the abstract says experiments cover data-restricted and supervised fine-tuning settings, but the post does not disclose exact performance gains.
#Fine-tuning#Benchmarking#Jiaxing Wang#arXiv
why featured
This is relevant LLM training research: HKR-K has a clear mechanism and HKR-R hits cost/data-mix concerns. No concrete gain is disclosed and HKR-H is weak, so it stays in the 60–71 band.
editor take
TANDEM uses twin networks for data mixing, but no gains are disclosed; I don’t buy “significant” without the tables.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Invariant Gradient Alignment for Robust Reasoning Distillation
The paper introduces Invariant Gradient Alignment, a distillation training framework that aligns gradients across logically isomorphic examples in mathematics, medicine, law, and science; across four benchmarks, IGA beats eight baselines, improves accuracy by up to 14.3 percentage points over ERM-SFT, and reports a Logical Consistency Score of 0.031 versus 0.142.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-K is strong: the method and +14.3-point gain are concrete. HKR-R is moderate for reasoning distillation, but this is a single arXiv method paper without product impact or broad debate, so it stays in 60–71.
editor take
IGA beats ERM-SFT by up to 14.3 points; the gradient-conflict mask is useful, but isomer-set construction cost decides adoption.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems
The paper proposes constraint injection for verifying VRP constraint modeling, releases the 8B VRPCoder model and a 21-variant expert-verified benchmark, and reports that VRPCoder-GRPO reaches 93% average Pass@1 across four VRP benchmarks.
#Code#Reasoning#Benchmarking#VRPCoder
why featured
HKR-K is strong with model size, benchmark count, and Pass@1. HKR-H/R are weak because VRP constraint modeling is too narrow, so this is useful research signal but not featured.
editor take
VRPCoder-GRPO hits 93% Pass@1 on four VRP benchmarks; constraint injection is a sharper OR-code eval than answer agreement.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs
Deliberate Evolution decouples symbolic generation from search control for LLM-based symbolic regression. On LLM-SRBench, it outperforms representative LLM-based SR baselines across scientific domains while using 40% of the standard sample budget.
#Agent#Reasoning#Memory#arXiv
why featured
HKR-K passes with a concrete mechanism and a 40% sample-budget result on LLM-SRBench. HKR-H/R are weak because symbolic regression is a narrow research topic, so it stays in all.
editor take
Deliberate Evolution beats LLM-SR baselines on LLM-SRBench at 40% sample budget; splitting MSE feedback into diagnosis and memory is the useful part.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Adaptive Head Budgeting for Efficient Multi-Head Attention
BudgetFormer dynamically allocates attention heads per input, learning a head budget and relevance distribution; on text classification tasks, the paper says it reduces FLOPs and memory usage while matching or surpassing standard multi-head attention.
#Inference-opt#BudgetFormer#Research release
why featured
HKR-K/R pass: BudgetFormer offers a dynamic head-budgeting mechanism targeting FLOPs and memory cost. HKR-H is weak, and the post does not disclose reduction size, model scale, or reproducibility details.
editor take
BudgetFormer budgets heads per input; no FLOPs delta is disclosed, so I’d file this as text-classification efficiency work.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Scaling Datasets for Multi-Sensor, Multi-Agent, and Multi-Domain Learning in Autonomous Systems
R. Spencer Hallyburton and two coauthors present a modular dataset generation pipeline that uses AVstack and CARLA to create terabyte-scale ground-truth-labeled data for ground, aerial, and infrastructure-based autonomous systems.
#Agent#Robotics#Vision#R. Spencer Hallyburton
why featured
HKR-K passes: TB-scale ground-truth data and an AVstack+CARLA pipeline are concrete. HKR-H/R are weak because the paper is niche autonomous-systems dataset work, not a broad AI-practitioner story.
editor take
Hallyburton’s 3-author pipeline makes TB-scale CARLA/AVstack labels; the old sim-to-real gap remains, with no real-vehicle validation disclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Fast & Faithful Function Vectors
The paper studies two Function Vector design choices for LLM steering: attention-head selection and steering, reporting that LRP-based gradient attribution improves efficiency and accuracy, while distributed steering outperforms simple aggregation; the abstract says the code is public but does not disclose benchmark numbers.
#Reasoning#Tools#Interpretability#Research release
why featured
HKR-K passes via concrete mechanisms and public code; HKR-H and HKR-R are weak. No hard exclusion applies, but the post discloses no result numbers, so it stays in the mid research band.
editor take
LRP head selection and distributed FV steering are the hook; no benchmark numbers disclosed, so treat it as reproducibility fodder, not capability news.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation
ClustRecNet trains a clustering algorithm recommender on 34,000 synthetic tabular datasets, evaluates 10 clustering algorithms, and uses ARI as labels; on real-world benchmarks, it reports a 44.16% average ARI improvement over ML2DAC.
#Benchmarking#ClustRecNet#ML2DAC#AutoCluster
why featured
HKR-K passes with concrete dataset scale, algorithm count, and ARI gain. HKR-H and HKR-R are weak; this is a niche arXiv AutoML paper with limited product or model-ecosystem impact.
editor take
ClustRecNet trains on 34k synthetic tables; 44.16% ARI over ML2DAC is strong, but synthetic-to-real leakage needs checking first.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Constrained Adaptive Rejection Sampling
The paper introduces CARS, which records constraint-violating prefixes in a trie and subtracts their probability mass from later draws, improving acceptance rates monotonically while preserving the exact constrained distribution in experiments on program fuzzing and molecular generation.
#Inference-opt#Code#Research release
why featured
HKR-K passes on a concrete constrained-sampling mechanism and tests in fuzzing/molecule generation. HKR-H/R are weak because the title is dry and no numbers tie it to cost, safety, or competitive stakes.
editor take
CARS subtracts invalid-prefix mass via a trie; elegant, but the snippet omits trie memory and constraint-check costs.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids
Researchers introduced the open-source DAL framework for personalized hearing aid fitting, using a JAX-port of CARFAC and a SEANet waveform-to-waveform UNet to train against subject-specific impaired-hearing models, and the DAL-optimized SEANet outperformed tested MHA baselines on neural-representation and signal-fidelity metrics.
#Audio#Fine-tuning#arXiv#CARFAC
why featured
HKR-H and HKR-K pass: the applied hearing-aid angle is novel, with an open-source DAL framework using JAX CARFAC, SEANet, and MHA baselines. No metric values or major product tie-in, so it stays mid-band all.
editor take
DAL trains SEANet with JAX-CARFAC for personalized hearing aids; sample size and latency are undisclosed, so clinical claims wait.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks
The paper introduces TIME, a time-series forecasting benchmark with 50 fresh datasets and 98 forecasting tasks, designed for leakage-free zero-shot evaluation of 12 time-series foundation models with a human-in-the-loop construction pipeline.
#Benchmarking#TIME#Hugging Face#Real-TSF
why featured
HKR-K is concrete: 50 datasets and 98 tasks create a testable benchmark update. HKR-R is limited to time-series/eval practitioners, so it stays below featured.
editor take
TIME adds 50 fresh datasets and 98 tasks; TSFM benchmarking needed this cleanup, but leakage-proof claims need reproducible audits.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
NLLog: Lightweight, Explainable SOC Anomaly Detection via Log-to-Language Rewriting
NLLog deterministically rewrites parsed log templates into WHO-WHAT-SEVERITY sentences, pools them with TF-IDF, classifies sessions using tree ensembles, and back-projects evidence with TreeSHAP across HDFS, BGL, and the AIT Alert Data Set.
#Interpretability#Safety#Benchmarking#NLLog
why featured
HKR-K passes with a concrete mechanism and test sets; HKR-H and HKR-R are weak. Security-log anomaly detection is vertical, not a broad AI-industry research release, so it stays in all.
editor take
NLLog reports low false positives on HDFS, BGL, and AIT; deterministic rewrites plus TreeSHAP beat another LLM-shaped SOC pitch.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
The paper proposes Time-R1, a two-stage reinforcement fine-tuning framework for time series forecasting, using supervised fine-tuning for warmup, then reinforcement learning with fine-grained multi-objective rewards and GRIP to optimize reasoning paths; the abstract says experiments improve performance across diverse datasets, but does not disclose benchmark names or numeric gains.
#Reasoning#Fine-tuning#OpenAI#Research release
why featured
HKR-H and HKR-K pass: the title reframes forecasting as reasoning, and the summary gives Time-R1’s training recipe. No benchmark numbers, code, or production claim are disclosed, so this stays in the mid research band.
editor take
Time-R1 uses SFT plus RL for forecasting; no datasets or gains disclosed, so I’d treat “slow-thinking TSF” as training plumbing.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats
Giuseppe Franco and four coauthors introduce dMX, a differentiable mixed-precision quantization framework that uses a continuous per-layer offset, temperature annealing, and target-aware regularization to assign MXFP bit-widths for Llama, Qwen3, and SmolLM2, with evaluation on WikiText-2 perplexity and four zero-shot reasoning benchmarks.
#Inference-opt#Fine-tuning#Benchmarking#Giuseppe Franco
why featured
HKR-K is solid and HKR-R is narrow: dMX has a concrete mechanism and model benchmarks, but low-level inference optimization lacks product impact or a broad discussion hook, so it fits 60-71.
editor take
dMX assigns per-layer MXFP bits for Llama, Qwen3, SmolLM2; I buy the direction, but hardware latency is missing.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning
KITE models ICL example selection as a query-specific optimization problem. It uses an approximately submodular surrogate, greedy selection, kernelization, and an optimal-design regularizer. The paper reports significant gains over nearest-neighbor retrieval methods such as KATE across multiple classification tasks, but the RSS abstract does not disclose exact datasets, model names, or numerical scores.
#RAG#Reasoning#Benchmarking#KITE
why featured
HKR-K and HKR-R pass: the method is specific and relevant to ICL exemplar selection. It stays in the 60–71 band because the article gives no gain size, code, or production validation.
editor take
KITE frames ICL selection as per-query optimization; scores, models, and datasets are undisclosed, so don’t overread its KATE win.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via k-Parity
The paper decomposes the Masked Diffusion objective into Signal and Noise regimes, then reports peak gains of 8.8% for pre-training and 5.8% for supervised fine-tuning on 8B-parameter models.
#Reasoning#Fine-tuning#Benchmarking#arXiv
why featured
HKR-K passes on the Signal/Noise mechanism and 8B-model gains; HKR-H and HKR-R fail because the angle is niche ML theory with limited practitioner buzz. This fits the 60–71 all band.
editor take
The paper reports 8.8% pretraining gains on 8B models; I buy the mechanism, not the peak-gain scaling story.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Testing Neural Networks via Bayesian-Guided Exploration of Decision Landscapes
The paper introduces BayesWarp, a neural network testing framework evaluated on MNIST, CIFAR-10, ImageNet, and six models; it mutates saliency-identified decision-critical regions and uses uncertainty-aware Bayesian optimization to guide test generation under a fixed mutation budget.
#Vision#Safety#Interpretability#BayesWarp
why featured
HKR-K passes: BayesWarp gives a testable mechanism across MNIST, CIFAR-10, ImageNet, and 6 models. HKR-H/R are weak; this is useful academic testing work, not a same-day industry story.
editor take
BayesWarp covers 3 vision datasets and 6 models; saliency plus Bayesian search is neat, but multimodal transfer is unproven.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text
The paper introduces eXTC, a three-stage text classifier that learns a natural-language SOP via structured prompt optimization, distills SOP-grounded reasoning from a large teacher LLM into a compact LM, and applies reinforcement learning; the abstract says it improves classification and explanation quality across benchmarks, but the snippet does not disclose exact scores.
#Interpretability#Reasoning#Fine-tuning#Research release
why featured
HKR-K passes because the paper states a concrete three-stage eXTC mechanism. HKR-H/R are weak: no exact scores are disclosed, and the angle is too niche for broad practitioner debate.
editor take
eXTC uses three-stage SOP distillation plus RL for explainable classification; no scores disclosed, so I don’t buy “significant” yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Towards Efficient and Evidence-Grounded Mobility Prediction with LLM-Driven Agent
AgentMob formulates next-location prediction as adaptive evidence-controlled decision making and evaluates it on three mobility datasets; GPT-5.4 reaches 71.42% Acc@1 on BW, 33.14% on YJMob100K, and 33.50% on Shanghai ISP, with code released on GitHub.
#Agent#Tools#Reasoning#Linyao Chen
why featured
HKR-K passes: AgentMob provides a mechanism, datasets, Acc@1, and public code. HKR-H and HKR-R are weak because the title is academic and the use case is narrow, so it sits in the 60–71 band.
editor take
AgentMob lifts BW non-fast-path Acc@1 from 30.65% to 48.62%; agent value here is evidence routing on low-confidence cases.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Dual Advantage Fields
Dual Advantage Fields turns a bilinear dual value model into a local advantage signal by scoring action-effect feature displacement against the goal direction, and the paper reports improved aggregate RLiable metrics on OGBench locomotion, manipulation, and puzzle tasks.
#Reasoning#Robotics#Benchmarking#arXiv
why featured
HKR-K passes for a concrete mechanism and OGBench RLiable claim; HKR-H/R are weak because the title is abstract and broader impact is unclear. No hard exclusion, but it stays in the 60–71 niche research band.
editor take
DAF improves RLiable on three OGBench task groups, with no effect size disclosed; useful idea: dual values become local action ranking.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation
SymTRELLIS enforces finite point-group symmetries during TRELLIS.2 flow-based 3D generation, evaluated on 266 strictly symmetric objects spanning 2- to 20-fold rotations and polyhedral symmetry groups.
#Multimodal#Vision#SymTRELLIS#TRELLIS.2
why featured
HKR-K passes with a concrete mechanism and dataset scope; HKR-H and HKR-R are weak because the angle is academic and narrow. Useful 3D-generation research, but not featured-level.
editor take
SymTRELLIS tests on 266 symmetric objects; no retraining, just ODE-step velocity averaging—more engineering patch than model leap.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Global Sketch-Based Watermarking for Diffusion Language Models
The paper proposes a global vector-valued sketch watermark for masked diffusion language models, using additive statistics over the full sequence for order-agnostic detection and analyzing distortion, soundness, and robustness properties.
#Safety#Alignment#Research release#Safety/alignment
why featured
HKR-K and HKR-R pass, but this is a niche arXiv watermarking paper. The summary gives mechanism only, with no numbers, artifact, or product path, so it sits in the 60–71 research-signal band.
editor take
This paper targets masked diffusion LMs with sketch watermarks; the RSS text gives theory, not empirical false-positive rates.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series
HEPA pretrains a causal Transformer with JEPA for multivariate time-series event prediction, then freezes the encoder and fine-tunes only the predictor; across 14 benchmarks in 11 domains, it outperforms PatchTST, iTransformer, MAE, and Chronos-2 on at least 10 benchmarks with an order of magnitude fewer tuned parameters.
#Reasoning#Fine-tuning#Benchmarking#HEPA
why featured
HKR-K passes with a concrete 14-benchmark claim and named baselines. HKR-H and HKR-R are weak, and there is no product, open-source ecosystem, or major-lab pull, so it stays in the mid-low research band.
editor take
HEPA wins at least 10 of 14 benchmarks; frozen encoder plus predictor tuning is a clean small-parameter bet for time series.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
CADET: A Modular Platform for Evaluating Distributed Cooperative Autonomy in Connected Autonomous Vehicles
CADET decouples the autonomous-vehicle stack into composable modules and evaluates distributed cooperative autonomy under V2V, V2I, RSU, edge, and cloud conditions, with open-source code and a demo available.
#Robotics#Inference-opt#Benchmarking#CADET
why featured
HKR-K passes via a concrete modular evaluation platform, deployment conditions, and open artifacts. HKR-H and HKR-R are weak because this is a niche CAV research platform, not a broad model or product story.
editor take
CADET open-sources V2V/V2I evaluation; the useful jab is cloud perception losing on safety, not another AV benchmark.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
GENEB: Why Genomic Models Are Hard to Compare
GENEB evaluates frozen representations from 40 genomic foundation models across 100 tasks in 13 functional categories under one probing protocol, including few-shot settings; the study finds aggregate leaderboards unstable, with rankings shifting by task category and architecture or pretraining alignment often outweighing parameter count.
#Benchmarking#GENEB#Research release#Benchmark
why featured
HKR-H/K pass: GENEB evaluates 40 genomic foundation models on 100 tasks and claims leaderboards are unstable while architecture/pretraining fit beats scale. The genomics focus limits HKR-R, so it stays all.
editor take
GENEB tests 40 genomic FMs on 100 tasks; unstable leaderboards make parameter-count bragging look weak here.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization
The paper introduces Policy Split, splitting a shared-parameter policy into normal and high-entropy modes; the normal mode optimizes task correctness, the high-entropy prompt drives exploration, and the post does not disclose baseline names or exact scores.
#Reasoning#Alignment#Research release
why featured
HKR-K passes via a testable post-training mechanism, but baseline names and scores are not disclosed. HKR-H/R are weak, so this fits all rather than featured.
editor take
Policy Split separates correctness and exploration via dual-mode entropy regularization; no baselines or scores disclosed, so I don't buy “consistently outperforms.”
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
RePercENT framework extends disentangled representation learning to multiple modalities
The paper proposes RePercENT, a self-supervised framework that performs plug-and-play pairwise disentanglement on pre-extracted embeddings and targets the scalability bottleneck that keeps existing multimodal disentanglement methods mostly limited to two modalities.
#Multimodal#Embedding#RePercENT#arXiv
why featured
HKR-K passes: the paper names RePercENT and its disentanglement mechanism, but the feed gives only framework-level detail with no metrics or product path. No hard exclusion; this sits in the 60–71 research-signal band.
editor take
RePercENT targets 3+ modality embeddings; dataset scale and complexity gains are undisclosed, so don’t overbuy the claim yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers
LARM makes recurrent ASR encoder depth a controllable test-time compute axis and reduces WER on LibriSpeech as inference loops increase, using sparse CTC checkpoints, supervision-clock embeddings, FiLM depth conditioning, and delayed soft-posterior feedback; the abstract does not disclose exact WER values or loop counts.
#Audio#Inference-opt#LARM#LibriSpeech
why featured
HKR-H/K pass: test-time compute is moved into ASR via loop depth, with a LibriSpeech condition. No exact WER numbers or product impact are disclosed, so it stays in the lower research band at 62.
editor take
LARM lowers LibriSpeech WER as loops increase; exact numbers are missing, so treat this as ASR testing test-time compute.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning
The paper introduces MetaEvaluator, a model-agnostic meta-learning framework that evaluates unseen models on unlabeled datasets using a pool of reference models; the code is available on GitHub, while the abstract does not disclose experiment numbers, cost reduction ratios, or specific modalities.
#Benchmarking#Fine-tuning#MetaEvaluator#Research release
why featured
HKR-K passes on the unlabeled-data meta-evaluation mechanism, and HKR-R is limited to evaluation cost. No experimental numbers, cost reduction, or modality are disclosed, so this stays in the 60–71 band.
editor take
MetaEvaluator scores unseen models via reference-model meta-learning; no error bars or cost ratio disclosed, so label-free evaluation stays unproven.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
HYolo: Hypergraph Learning Applied to Object Detection
HYolo integrates hypergraph learning into the YOLO architecture and reports about a 12% mAP@50 improvement over baseline YOLO models on COCO, using high-order feature relationships to model object and contextual dependencies in IoT vision settings.
#Vision#Benchmarking#HYolo#YOLO
why featured
HKR-K passes with a concrete mechanism and about +12% mAP@50 on COCO. HKR-H/R miss: this is a specialized vision paper with no product angle or practitioner debate hook, so it sits in the 60–71 all band.
editor take
HYolo reports +12% mAP@50 on COCO; no YOLO version, compute, or latency disclosed, so discount the IoT angle.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
A Latent Variable Framework for Scaling Laws in Large Language Models
The paper proposes a latent-variable statistical framework for LLM scaling laws and evaluates it on 12 Open LLM Leaderboard v1/v2 benchmarks, using a family-level latent variable plus observable model features to explain performance differences across model families and tasks.
#Benchmarking#Reasoning#Open LLM Leaderboard#Research release
why featured
HKR-K passes with a concrete framework and 12-benchmark setup; HKR-H/R are weak, and the post does not disclose key results or practical impact. This is relevant academic signal in the 60–71 band.
editor take
The paper fits latent-variable scaling laws on 12 Open LLM Leaderboard tasks; single-curve scaling is dead, but leaderboard contamination can swallow elegant stats.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Spectral Scaling Laws of Muon
The paper tracks Muon momentum singular-value quantiles across 77M to 2.8B-parameter models and finds that after burn-in they stabilize by layer type and model size, following power-law scaling. Early to mid-late layers scale around M^-0.25, so 5-step Newton-Schulz remains adequate, while some late layers scale up to M^-0.96 and require more NS iterations or tuned coefficients at frontier scale.
#Fine-tuning#Inference-opt#Benchmarking#Muon
why featured
HKR-K is clear: the paper reports Muon spectral scaling numbers across 77M–2.8B models and NS iteration conditions. HKR-H/R are weak, and the niche optimizer focus keeps it in all.
editor take
Muon momentum spectra scale from 77M to 2.8B; late layers hit M^-0.96, so 5-step NS needs layer-aware treatment.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Post-Training Corrections for Improved Time-Series Forecasting
The paper introduces post-training corrections for time-series forecasters, applying selected corrections sequentially after training and reporting up to 30% higher forecasting accuracy across benchmark datasets with minimal computational overhead.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with a concrete post-training correction method and up to 30% benchmark gain. HKR-H/R are weak: the title is academic and the use case is narrow, so this sits in the lower 60–71 band.
editor take
Post-training corrections report up to 30% accuracy gains; smells like residual patching for forecasters, cheap but benchmark-sensitive.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals
The paper introduces SustainFM, a benchmark framework that evaluates geospatial foundation models against 17 Sustainable Development Goals, with tasks spanning asset wealth prediction to environmental hazard detection.
#Benchmarking#SustainFM#Research release#Benchmark
why featured
HKR-K passes because the paper names a concrete benchmark and 17-SDG evaluation frame. HKR-H and HKR-R are weak: the article lacks rankings, adoption data, or a practitioner-facing product hook.
editor take
SustainFM tests geospatial models on 17 SDGs; energy and domain-shift metrics are the part that makes this useful.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Activation-Based Active Learning for In-Context Learning: Challenges and Insights
The paper tests MLP activation-based sampling on Llama-3.2-3B and Qwen2.5-3B for in-context example selection, finding an absolute Spearman correlation of at most 0.33 across tested tasks and models, so these activation signals do not track example quality or task performance.
#Reasoning#Interpretability#Benchmarking#Llama
why featured
HKR-K passes: two 3B models, MLP activation sampling, and a 0.33 correlation ceiling give a testable negative result. HKR-H/R are weak, so this stays an all-tier niche research item.
editor take
Llama-3.2-3B and Qwen2.5-3B hit max ρ=0.33; MLP activations are a weak hook for ICL selection.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
The paper benchmarks Qwen2.5-0.5B for leader-follower role classification in HRI, comparing prompt engineering and fine-tuning under zero-shot and one-shot modes against an untrained baseline. Zero-shot fine-tuning reaches 86.66% accuracy with 22.2 ms per-sample latency, while one-shot modes degrade as longer context strains model capacity.
#Robotics#Fine-tuning#Benchmarking#Qwen
why featured
HKR-K passes: the paper gives testable accuracy and latency numbers for a small model in HRI role classification. HKR-H/R are weak because the topic is narrow and not tied to a broader agent or robotics product release.
editor take
Qwen2.5-0.5B fine-tuning hits 86.66% at 22.2ms; longer one-shot context hurts, so edge SLMs still hate in-context tricks.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Semiparametric Preference Optimization: Your Language Model Is Secretly a Single-Index Model
The paper proposes semiparametric preference optimization for policy alignment under an unknown, unrestricted preference link function, derives link-agnostic convergence guarantees using generic function complexity measures, and releases code at causalml/spo; the RSS snippet does not disclose benchmark names or quantitative empirical results.
#Alignment#Fine-tuning#Benchmarking#Research release
why featured
HKR-H/K pass: the title has a counterintuitive hook and the abstract gives an unknown-link method, convergence guarantees, and code. It remains a technical arXiv method without major-model results or production impact, so tier is all.
editor take
SPO drops the Bradley-Terry link assumption; no benchmarks or scores are disclosed, so I read it as a robustness patch for preference optimization.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models
The paper proposes a lightweight autoregressive graph generation framework that serializes graphs into regular edge sequences with structure-guided topological ordering, targets near log-linear generation, and reports higher novelty while preserving validity and uniqueness on molecular and non-molecular benchmarks.
#Reasoning#Benchmarking#arXiv#Research release
why featured
HKR-K passes because the paper states a concrete mechanism and testable efficiency claim. HKR-H/R are weak, and graph generation is too niche for featured.
editor take
The paper claims near log-linear graph generation; no scaling curve disclosed, so novelty gains stay untrusted until reproduced.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
ProtoAda: Prototype-Guided Adaptive Adapter Expansion for Multimodal Continual Instruction Tuning
ProtoAda uses format-aware task prototypes to improve MCIT routing, targeting cases where image-text similarity assigns VQA and grounding tasks to the same LoRA expert; the abstract reports gains across multiple benchmarks but does not disclose benchmark counts or exact scores.
#Multimodal#Fine-tuning#Vision#ProtoAda
why featured
HKR-K passes via a concrete mechanism and testable routing problem, but benchmark count and scores are undisclosed. The narrow technical scope lacks HKR-H/R, so it stays in all.
editor take
ProtoAda fixes LoRA routing with format prototypes; no scores are disclosed, so treat “multiple benchmarks” as a claim.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
LaVIDE: Language-Prompted Satellite Change Detection via Map-Image Alignment
LaVIDE aligns map semantics with satellite image content using restricted prompt learning and object-aware embedding enhancement, and reports 18.4% higher IoU for multi-class change detection and 5.2% higher IoU for single-class detection across four benchmarks: DynamicEarthNet, HRSCD, BANDON, and SECOND.
#Vision#Multimodal#Embedding#LaVIDE
why featured
HKR-K passes via concrete mechanisms and benchmark gains, while HKR-H and HKR-R miss. The niche remote-sensing scope and lack of product or practitioner impact keep it below the interesting-news band.
editor take
LaVIDE reports +18.4%/+5.2% IoU on four remote-sensing benchmarks; language as map-image glue beats pixel matching here.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Test-time reward-guided alignment of language models by importance sampling on pre-logit space
The paper proposes AISP, a test-time alignment method that adds Gaussian perturbations to pre-logits from the penultimate layer, estimates the optimal mean with importance sampling over sampled rewards, and reports higher rewards than best-of-n under the same sample count.
#Alignment#Inference-opt#Research release
why featured
HKR-K passes: AISP adds a concrete test-time alignment mechanism and a same-sample reward comparison. HKR-H/R are weak because the item is a specialized arXiv method with no disclosed model scale, datasets, or artifact.
editor take
AISP perturbs penultimate-layer pre-logits and importance-samples the mean; it beats best-of-n, but model, tasks, and latency are undisclosed.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction
The study evaluates selectors across five edX clickstream dropout predictors and 16 windows; the oracle beats the best single base model by 9.7 accuracy points on average, while BC, DQN, and CQL remain below the oracle under a tenfold buffer sweep and 2,000 held-out examples, pointing to state ambiguity rather than offline learner tuning.
#Benchmarking#Reasoning#edX#Research release
why featured
HKR-H and HKR-K pass: the negative result is concrete, with an oracle gap and state-ambiguity mechanism. edX dropout prediction is far from AI products, agents, or model-lab news, so it stays below featured.
editor take
Five edX dropout models leave a 9.7-point oracle gap; BC/DQN/CQL miss it, so stop blaming offline-RL tuning.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Uncertainty-Aware (Un)Supervised Few-Shot User Adaptation for On-Device Personalized HAR
The paper presents a gradient-free HAR user adaptation framework that uses only 3 seconds of calibration data per activity, improving supervised macro-F1 by 2.76 to 33.44 points and unsupervised macro-F1 by 0.56 to 32.13 points across four datasets.
#Fine-tuning#Inference-opt#Research release
why featured
HKR-K passes with concrete calibration conditions and F1 gains. HKR-H/R are weak, and HAR user adaptation is a narrow research item with no product, open-source tool, or foundation-model impact.
editor take
3 seconds per class lifts macro-F1 by up to 33.44 points; gradient-free prototypes look more deployable than on-device finetuning.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Crafting Your Evolving Dreams: Concept-Incremental Versatile Customization
The paper proposes CCDM for continual customization in diffusion models, using AD-LoRA aggregation and controllable regional context synthesis to reduce catastrophic forgetting and concept neglect; the abstract says experiments improve over baselines, but the post does not disclose metrics or dataset details.
#Multimodal#Vision#Fine-tuning#Research release
why featured
HKR-K passes with concrete mechanisms and a testable claim; HKR-H and HKR-R are weak, and no experiment numbers are disclosed. This is useful but narrow diffusion-customization research, below featured threshold.
editor take
CCDM uses AD-LoRA plus regional synthesis against forgetting; metrics are undisclosed, so I don't buy “significant improvements” yet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Research presents PaCX-MAE physiology-augmented chest X-ray masked autoencoder model
PaCX-MAE distills ECG and laboratory embeddings into a chest X-ray encoder while keeping inference image-only, and evaluation across nine benchmarks reports gains over domain-specific MAE, including +2.7 AUROC on MedMod and +6.5 F1 on VinDr.
#Multimodal#Vision#Embedding#PaCX-MAE
why featured
HKR-K passes with a concrete distillation setup and MedMod AUROC +2.7 / VinDr F1 +6.5 gains. HKR-H/R are weak because this is a niche medical-imaging paper, not a broad AI product or agent story.
editor take
PaCX-MAE beats MAE on 9 benchmarks; training with ECG and labs while inferring CXR-only is a practical trick.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Stationarity-Aware Retrieval-Augmented Time Series Forecasting
SARAF adapts retrieval for time-series forecasting with dataset-level stationarity, testing on eight real-world datasets and using diversity-aware selection plus stationarity-aware aggregation to reduce redundancy from similarity-only historical segments.
#RAG#SARAF#Research release#Open source
why featured
HKR-K passes for a concrete retrieval mechanism and 8 real datasets. HKR-H and HKR-R miss: this is a niche forecasting-method paper, with no product impact or practitioner-wide nerve.
editor take
SARAF tests stationarity-aware retrieval on 8 datasets; similarity-only history is the weak link in time-series RAG.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
MimeLens: Position-Agnostic Content-Type Detection for Binary Fragments
MimeLens pretrains small BERT-style encoders on binary windows sampled from random file offsets and classifies chunks into 125 MIME labels; it beats Magika v1.1 by 10.7 percentage points top-1 on clean complete-file heads, but runs one to two orders of magnitude slower per CPU sample.
#Benchmarking#Google#Hugging Face#MimeLens
why featured
HKR-K passes with a concrete mechanism and benchmark numbers. HKR-H/R are weak because binary-fragment MIME detection is niche and far from AI product or model competition themes.
editor take
MimeLens beats Magika by 10.7pp on 125 MIME labels; 10–100× CPU latency makes it for forensics, not hot paths.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
ChessMimic: Per-Rating Transformer Models for Human Move, Clock, and Outcome Prediction in Online Blitz Chess
ChessMimic trains three small encoder-only Transformers per 100-Elo band for move, clock, and outcome prediction, and on a held-out month of Lichess Rated Blitz games its move predictor beats Maia-2 in every band while the 9M-parameter model lands between Maia-3-5M and Maia-3-23M accuracy.
#Benchmarking#ChessMimic#Maia#Lichess
why featured
HKR-K passes with concrete segmentation, test conditions, and Maia comparisons. HKR-H and HKR-R are weak because this is a niche chess-modeling benchmark with little product or agent spillover.
editor take
ChessMimic trains a 9M model per 100 Elo band; beating Maia-2 is nice, but calibration is bought with duplication.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
When Do Fewer Coordinates Suffice in DP-SGD?
The paper proposes TP-TopK, a two-phase private warm-up method that selects k coordinates for DP-SGD so the relevant noise term scales with active dimension k instead of full parameter dimension d, with experiments on MNIST, FMNIST, and CIFAR-10.
#Fine-tuning#Safety#Benchmarking#Research release
why featured
HKR-K passes: the paper gives TP-TopK and tests on MNIST, FMNIST, and CIFAR-10. HKR-H/R are weak because DP-SGD coordinate selection is niche and has no product or mainstream training-pipeline impact.
editor take
TP-TopK cuts DP-SGD noise from d to k; I buy the direction, but CIFAR-10 doesn't justify LLM-finetuning hype.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification Using Vision Transformers
The paper releases an open-source two-stage vehicle classification pipeline using RT-DETR for localization and ViT-Base/16 for six body-type classes, with predictions abstained as unknown below 0.60 softmax confidence; it reports 0.94 accuracy on 3,805 Ann Arbor overtaking events and 0.89 accuracy on 311 out-of-distribution cycling events.
#Vision#Fine-tuning#Benchmarking#arXiv
why featured
HKR-K passes via reproducible pipeline details and accuracy numbers. HKR-H/R are weak: fine-grained vehicle classification is narrow, with no product deployment or competitive industry hook; no hard exclusion applies.
editor take
RT-DETR+ViT-Base/16 hits 0.94 on 3,805 events; the 0.60 abstention gate is the deployable safety detail.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Learning Empirically Admissible Neural Heuristics for Combinatorial Search
The paper introduces validation-calibrated admissible neural heuristics using an Admissible Bellman Operator, asymmetric loss, and a validation safety offset; under its evaluation protocol, it reports no observed admissibility violations and reduces search node expansions by up to 83.0% on a 2x2 Rubik's Cube.
#Reasoning#Benchmarking#arXiv#DeepCubeA
why featured
HKR-K passes with a concrete mechanism and 83.0% node-reduction claim. HKR-H/R are weak; combinatorial-search research is niche, so this stays in the lower-value all tier.
editor take
It cuts 2x2 Cube expansions by 83.0%; validation-calibrated “no violations” is useful, but still not admissibility proof.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Analysis-Driven Procedural Generation of an Engine Sound Dataset with Embedded Control Annotations
The paper presents an engine-sound generation framework that expands 5–10 minutes of source audio per engine by 15–30x, producing the 19.0-hour Procedural Engine Sounds Dataset with 5,935 files and sample-accurate RPM and torque annotations.
#Audio#Fine-tuning#arXiv#Research release
why featured
HKR-H and HKR-K pass: the engine-sound angle is unusual and the dataset numbers are concrete. HKR-R fails because this is narrow audio-data research, not a broad product, model, or market move.
editor take
5–10 minutes per engine becomes 19 hours; sample-accurate RPM/torque labels make this useful, not another generic audio demo.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H1·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification
SpurAudio evaluates few-shot audio classification with controlled foreground-event and background-environment shifts; the post does not disclose dataset size, model names, or exact performance drops.
#Audio#Benchmarking#SpurAudio#Research release
why featured
HKR-K passes for a concrete benchmark mechanism, but the body lacks sample size, tested models, and measured drops. HKR-H and HKR-R are weak, so this stays in the upper 40–59 band.
editor take
SpurAudio controls foreground-background shifts, but no sizes or drops disclosed; few-shot audio leaderboards need a leakage audit.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Neetyabhas Framework Optimizes Public Policy with Reinforcement Learning Under Uncertainty
Neetyabhas models 1,000 individuals making mask, vaccination, and shopping decisions, while hierarchical reinforcement learning with DQN, DDPG, and TD3 optimizes lockdowns and mandates under measurement and implementation uncertainty.
#Agent#Reasoning#WHO#Neetyabhas
why featured
HKR-K passes via the 1,000-agent simulation and named RL methods. HKR-H and HKR-R are weak, with no product, code release, or production-replacement claim, so this stays below featured.
editor take
Neetyabhas runs only 1,000 simulated agents; DQN/DDPG/TD3 for lockdown policy is a sandbox, not evidence.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
An Empirical Study of Data Scale, Model Complexity, and Input Modalities in Visual Generalization
The paper compares training data scale, model architectures, and input modalities on CIFAR-10 and CIFAR-100; results show larger training sets consistently improve generalization, while higher model complexity does not deliver stable gains.
#Vision#Benchmarking#Research release
why featured
HKR-K passes for a concrete empirical claim across data scale, model complexity, and modalities. HKR-H and HKR-R miss: CIFAR-10/100 visual generalization is incremental and has little product or practitioner urgency.
editor take
CIFAR-10/100 says data scale wins reliably, complexity doesn’t; don’t overread small benchmarks into vision generalization law.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
RIDE: An Open Dataset and Benchmark for Train Delay Prediction
RIDE introduces an open Belgian nationwide train-delay prediction dataset and benchmark covering 94.5 million train events, 3.6 million journeys, and 35.7 million weather records from 2023 to 2025.
#Benchmarking#RIDE#Research release#Benchmark
why featured
HKR-K passes on the dataset scale and benchmark facts, while HKR-H and HKR-R are weak. No hard exclusion applies, but the domain-specific rail ML angle keeps it in the lower research-dataset band.
editor take
RIDE covers 94.5M events and 3.6M journeys; GNNs lead, but learning models stay close enough to temper leaderboard hype.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Towards Pretraining Text Encoders for TabPFN
The paper introduces TabPFN Text Adapter, freezing both the sentence encoder and TabPFN while training only a lightweight adapter that maps text embeddings into a short token sequence in TabPFN’s embedding space, avoiding the PCA compression bottleneck used in standard text-tabular pipelines.
#Embedding#Fine-tuning#TabPFN#LLaVA
why featured
HKR-K passes for a concrete adapter mechanism, but there are no result numbers, artifact details, or product implications. HKR-H/R are weak, so this fits the upper 40–59 low-value band.
editor take
TabPFN Text Adapter trains only a small adapter and freezes both ends; I buy this over end-to-end text-tabular pretraining.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Graph Set Transformer
The paper introduces Graph Set Transformer, which interleaves node-level propagation and cross-graph contextual modeling at each layer with a gating mechanism; evaluation covers one synthetic suite and three real-data benchmarks under matched parameter budgets.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: the paper gives a concrete Graph Set Transformer mechanism and evaluation on 1 synthetic suite plus 3 real benchmarks. HKR-H/R are weak; this is a narrow methods paper without product or industry stakes.
editor take
GST beats baselines on 1 synthetic suite and 3 real benchmarks; I buy the setup, but no margins are disclosed.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents
The paper proposes simplicial embedding layers that constrain representations to simplicial structures and reports better sample efficiency on FastTD3, FastSAC, and PPO, while the RSS snippet does not disclose the number of environments, baselines, or gain sizes.
#Agent#Embedding#Research release
why featured
HKR-K passes via a new representation mechanism and tests on three actor-critic methods. HKR-H/R are weak, and the post lacks environment count or gain size, so it stays in all.
editor take
Simplicial embeddings plug into FastTD3, FastSAC, and PPO; no env count or gains disclosed, so I suspect small-benchmark wins.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support
arXiv 2606.04209 compares several pretrained encoders and linear classifier heads with a standardized local search probe, finding that under similar predictive performance, changing only the classifier head alters counterfactual outcomes while leaving accuracy largely unchanged.
#Interpretability#Vision#Multimodal#arXiv
why featured
HKR-K passes: the paper offers a testable counterfactual-analysis setup and a concrete finding. HKR-H/R are weak, and the work is niche interpretability research, so it stays in all.
editor take
2606.04209 changes linear heads, keeps accuracy, and shifts counterfactuals; accuracy-only model audits look fragile here.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models
The paper proposes Omni-Geometry Knowledge Distillation for prompt tuning biomedical VLMs, reporting 1.7%-2.8% average absolute accuracy gains over prior VLM adaptation methods across 11 medical datasets.
#Vision#Multimodal#Fine-tuning#Research release
why featured
HKR-K passes with a named method, 11 datasets, and accuracy gains; HKR-H/R fail because the title is routine and the audience impact is narrow. No hard exclusion, but this stays in the low-value research band.
editor take
OGKD gains 1.7%-2.8% on 11 medical datasets; I buy the angle—medical VLM tuning needs graded wrong classes.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification
Neeti Pokharna and coauthors present a variance-reduction framework that combines post-stratification with CUPED for online ranking and retrieval experiments, using pre-experiment covariates to improve sensitivity for heavy-tailed monetization metrics; deployed at ShareChat, the method reached equivalent statistical confidence with about 45% less traffic than standard metrics.
#Benchmarking#ShareChat#Neeti Pokharna#ACM SIGIR
why featured
HKR-K passes on the 45% traffic-saving claim and post-stratification+CUPED mechanism. HKR-H is weak and HKR-R is narrow; no hard exclusion, but the niche experimentation angle keeps it in all.
editor take
ShareChat cuts traffic by ~45% with post-stratification+CUPED; monetization A/B tests shouldn’t brute-force heavy tails with raw means.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
How Do Machines Learn? Evaluating the AIcon2abs Method
The study evaluated AIcon2abs with 34 Brazilian participants in a six-hour remote course, using WiSARD, a weightless neural network that runs without Internet access and can learn from a single example.
#Benchmarking#AIcon2abs#WiSARD#UFRJ
why featured
HKR-K passes via participant count, course length, and the WiSARD mechanism; HKR-H and HKR-R are weak. This is niche AI-education evaluation, with limited product or industry relevance, so it stays in the 40-59 band.
editor take
AIcon2abs tested 34 people in a 6-hour remote course; offline one-shot WiSARD is neat pedagogy, not evidence of learning gains.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
LastAct paper on trajectory-guided smart-home activity recognition published
LastAct targets streaming smart-home HAR on four public datasets under mixed-activity sliding windows, using floorplan-aligned trajectory images, a contamination gate, boundary localization, and template caching; the abstract reports competitive or superior pure-window results and substantial Macro-F1 gains on cross/mixed windows, but does not disclose exact scores.
#Vision#Inference-opt#LastAct#arXiv
why featured
HKR-K passes with 4 datasets and testable mechanisms; HKR-H and HKR-R miss. The paper is narrow activity-recognition research, far from general AI products or agent practice, so it stays in the low browseable band.
editor take
LastAct uses 4 smart-home datasets, but exact Macro-F1 is undisclosed; don’t bank the mixed-window robustness claim yet.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
The Right Measure for Physics-Constrained Generation: A Co-Area Correction for Posterior-Consistent PDE Inverse Problems
The paper shows that diffusion and flow-matching methods with hard PDE constraints sample the wrong posterior by omitting the co-area Jacobian factor, raising posterior error up to 20 times the sampling-noise floor, and introduces CoCoS to match the gold-standard posterior within sampling noise.
#Reasoning#Benchmarking#CoCoS#Research release
why featured
Hard-exclusion-1 and hard-exclusion-4 apply: PDE inverse problems and co-area Jacobians are narrow, with no agent or product angle. The 20x error claim and CoCoS mechanism give HKR-K, but audience fit stays low.
editor take
CoCoS adds the co-area factor; the paper reports 20× sampling-floor error without it, so physics-constrained uncertainty needs auditing.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning
The paper proposes a DRL execution overlay for multi-pair cryptocurrency trading, using a PPO agent with an LSTM layer on 1-hour Binance USD-M Futures data; the out-of-sample policy beat a heuristic baseline, with stationary circular block bootstrap showing risk-adjusted outperformance significant at the 10% level but not the 5% level.
#Agent#Reasoning#Binance#Research release
why featured
HKR-K passes via concrete method, dataset, and significance details. HKR-H/R are weak because crypto DRL trading is a narrow quant-finance paper, not a core AI-industry update.
editor take
PPO+LSTM beat the baseline on Binance 1h futures, but only at 10% significance; quants should not hype this yet.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Policy Gradient Algorithms for Continuous-Time Robust Markov Decision Processes
The paper proposes policy-gradient algorithms for continuous-time robust Markov decision processes, deriving pathwise and adjoint gradients and giving double-loop optimizers with linear oracle convergence and Õ(1/ε²) sample complexity.
#Agent#Reasoning#Research release
why featured
HKR-K passes, but this is theory-heavy continuous-time robust MDP work with no generalist on-ramp. hard-exclusion-technical-accessibility-fail caps it below 40.
editor take
arXiv v2 gives continuous-time RMDP policy gradients at Õ(1/ε²); Neural ODE tests exist, but code is undisclosed.
HKR breakdown
hook knowledge resonance
open source
47
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
UniFair: A Unified Fair Clustering Approach Based on Separation and Compactness
UniFair jointly optimizes two criteria, separation fairness and social fairness, and extends unified k-means objectives to deep clustering by enforcing the same criteria in an autoencoder latent space.
#Embedding#Fine-tuning#Benchmarking#UniFair
why featured
Only HKR-K passes: the paper offers a unified fairness objective, but the headline is dry and the post gives no results, code, or deployment hook. This sits in the 40–59 low-value band for a niche clustering paper.
editor take
UniFair constrains boundary distance and within-cluster distortion. Dataset count is undisclosed; fair clustering is finally touching decision boundaries.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Adaptive Patching Is Harder Than It Looks for Time-Series Forecasting
The paper models time-series Transformer patching as budgeted bitrate allocation and tests three architectures with fixed backbones, data, and training protocols; on standard long-horizon forecasting benchmarks, validation-selected uniform baselines match dynamic patching in aggregate, with effects concentrated near zero.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with a controlled comparison and a contrarian result. HKR-H/R are weak because the topic is niche forecasting methodology with limited practitioner resonance, so it stays in the low browseable band.
editor take
The paper tests 3 architectures; dynamic patching fails to beat tuned uniform baselines, so “adaptive” isn’t free lunch here.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
OA-CutMix: Correcting the Label Bias of CutMix
OA-CutMix replaces CutMix’s area-based label weight with precomputed segmentation-mask weights, and reports the highest accuracy across 4 architectures and 6 datasets against more than 10 static and dynamic mixing methods.
#Vision#Fine-tuning#Benchmarking#Research release
why featured
HKR-K passes because OA-CutMix states a concrete mechanism and evaluation setup. HKR-H/R are weak: CutMix label bias is a niche vision-training issue with limited product or industry resonance.
editor take
OA-CutMix measures CutMix label error at 21.5%; fixing labels without touching images beats fancier mixing tricks.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Ternary Decision Trees with Locally-Adaptive Uncertainty Zones
The paper introduces ternary decision trees that add a half-width δ uncertainty zone to each split node, and reports significant decided-accuracy gains over standard CART across 71 OpenML-CC18 datasets using 5-fold cross-validation.
#Reasoning#Benchmarking#OpenML#CART
why featured
HKR-K passes via a concrete mechanism and 71 OpenML-CC18 experiments. HKR-H/R fail: this is an academic algorithm tweak far from LLMs, agents, or product updates, so it stays in the low browseable band.
editor take
Ternary trees beat CART on 71 OpenML sets at p<0.001; I buy the trick, but it buys accuracy by flagging cases.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Generating Financial Time Series by Matching Random Convolutional Features
The paper introduces SOCK, a fully differentiable random convolutional feature map, and trains financial time-series generators by matching SOCK features; across multiple small-sample financial datasets, the authors report consistent gains over signature and diffusion baselines, with extra tests on two-sample hypothesis testing and classification.
#Fine-tuning#Benchmarking#SOCK#Rocket
why featured
HKR-K passes via the SOCK method and baseline comparison. HKR-H/R fail: the topic is niche financial time-series generation with no product, agent, or industry-impact hook, so it stays in the low-value research band.
editor take
SOCK trains generators on differentiable random convolutional features; for one-path finance data, that beats letting GAN discriminators memorize.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
AI from Concrete to Abstract: Demystifying Artificial Intelligence to the General Public
The paper presents AIcon2abs, a methodology combining visual programming with WiSARD weightless neural networks, and places training and classification as blocks inside the main program rather than external AI modules.
#WiSARD#Research release
why featured
HKR-K passes via the AIcon2abs teaching mechanism, but HKR-H/R are weak: this is not a product, model, or industry shift, and has limited practitioner pull.
editor take
AIcon2abs puts training and classification inside program blocks; I’d trust this visual route over another chatty AI-literacy course.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
04:00
5d ago
arXiv · cs.LG· atomEN04:00 · 06·04
Symbolic Regression for Shared Expressions: Introducing Partial Parameter Sharing
The paper proposes a symbolic regression method for shared expressions with multiple categorical variables and partially shared parameters; it tests the setup on a synthetic fitting-only case and one astrophysics dataset used in a prior single-category study.
#Reasoning#Interpretability#Research release
why featured
HKR-K passes for the partial-parameter-sharing mechanism and test setup. HKR-H/R fail: this is a niche symbolic-regression paper with no agent, product, or mainstream model implication.
editor take
The paper tests 1 synthetic case and 1 astrophysics dataset; partial parameter sharing is useful, but still method-demo evidence.
HKR breakdown
hook knowledge resonance
open source
41
SCORE
H0·K1·R0
01:09
5d ago
HuggingFace Papers (takara mirror)· rssEN01:09 · 06·04
Representation Learning Enables Scalable Multitask Deep Reinforcement Learning
The paper presents MR.Q, a model-free actor-critic method that combines predictive representations with high-capacity value functions and runs without planning; it outperforms a recent world-model method and several deep RL baselines on multitask continuous-control tasks, while the post does not disclose the number of tasks or the exact compute reduction.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes because MR.Q adds a concrete mechanism and a world-model comparison. HKR-H/R fail; task count, cost reduction, and reproducible conditions are not disclosed, so it stays low-value research signal.
editor take
MR.Q beats a world-model baseline without planning; RSS omits task count and compute delta, so I’d treat this as ablation signal first.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
00:59
5d ago
HuggingFace Papers (takara mirror)· rssEN00:59 · 06·04
Multilingual Detection of Alzheimer's Disease from Speech: A Cross-Linguistic Transfer Learning Approach
The study trained transformer-based speech models on English, Chinese, Arabic, and Hindi datasets for binary Alzheimer’s Disease classification. The cross-language approach reached 82% F1 across all languages and reported 0.5-second inference, while the snippet does not disclose dataset sizes, model names, or validation splits.
#Audio#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the speech-based Alzheimer’s angle is clickable, and the post gives languages, F1, and latency. With no product launch, open artifact, or major lab, HKR-R is weak and the item stays in all.
editor take
Four-language speech AD classification hits 82% F1. No dataset sizes or splits disclosed, so “global deployment” is premature.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0

more

feeds

admin