ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-05-28

251 papers · updated 3m ago
2026-05-28 · Thu
22:48
11d ago
HuggingFace Papers (takara mirror)· rssEN22:48 · 05·28
CSULoRA: Closest Safe Update Low-Rank Adaptation
CSULoRA corrects trained LoRA adapters post hoc by estimating a safety-aligned subspace from the weight displacement between an aligned model and its base checkpoint, then solving a closed-form penalized minimum-change problem that reduces adversarial fine-tuning attack success rate while preserving most utility gains.
#Fine-tuning#Safety#Alignment#Research release
why featured
HKR-K and HKR-R pass: the piece names a concrete LoRA safety-correction mechanism tied to adversarial fine-tuning risk. No reduction numbers, code, or test setup are disclosed, so it stays in the high all band.
editor take
CSULoRA estimates safety subspaces from aligned-base weight deltas; no ASR numbers disclosed, so I’d treat it as a LoRA safety patch.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
17:59
11d ago
arXiv · cs.AI· atomEN17:59 · 05·28
Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection
The researchers built VisAnomBench and fine-tuned VisAnomReasoner for time-series anomaly detection, improving precision and F1 on VisAnomBench by at least 21.23 and 23.87 percentage points over all baselines.
#Vision#Reasoning#Fine-tuning#VisAnomBench
why featured
HKR-H and HKR-K pass: cross-modal anomaly detection is novel, and the paper gives VisAnomBench plus concrete gains. The topic is narrow, with no major lab or production-replacement evidence, so it stays in 60–71.
editor take
VisAnomReasoner gains 23.87 F1 points on VisAnomBench; I trust the 13.39-point TSB-AD-U gain more than synthetic rationales.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
17:59
11d ago
arXiv · cs.CL· atomEN17:59 · 05·28
Working Memory of Large Language Models for Latent Reasoning
The paper introduces Reasoning in Memory, a latent reasoning method that replaces autoregressive thought generation with fixed special-token memory blocks processed in one forward pass; it uses a two-stage curriculum, but the RSS snippet does not disclose specific model names, benchmark scores, or compute-cost numbers.
#Reasoning#Memory#Inference-opt#Research release
why featured
HKR-H/K pass: the latent-memory mechanism is novel for reasoning readers. Missing model names, benchmark scores, and overhead keeps it in the 60–71 research-release band.
editor take
RiM swaps chain-of-thought for fixed memory blocks, but gives no models or scores; saving tokens is not saving compute.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
17:58
11d ago
arXiv · cs.CL· atomEN17:58 · 05·28
COMPOSE: Composing Future Theorems from Citations and Formal Structure
COMPOSE generates future theorem-like claims from both scientific citation graphs and formal theorem dependency graphs, using 108K paired arXiv-Mathlib graph examples and a benchmark of 47K future papers from 2024–2025.
#Reasoning#Benchmarking#arXiv#Mathlib
why featured
HKR-H/K pass: the future-theorem hook and 108k paired graph samples plus a 47k-paper benchmark add real signal. HKR-R is weak because the article lacks product impact, adoption data, or a workflow consequence.
editor take
COMPOSE bets on future theorems with 108K paired graphs; solid setup, but LLM-judged math novelty gets a discount.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
17:53
11d ago
HuggingFace Papers (takara mirror)· rssEN17:53 · 05·28
Archon: A Unified Multimodal Model for Holistic Digital Human Generation
Archon uses a unified autoregressive multimodal model for digital human generation across 7 modalities and 72 tasks, with semantic video reparameterization reducing high-fidelity talking-video tokens by 4x while preserving fine-grained dynamics.
#Multimodal#Vision#Audio#Archon
why featured
HKR-H and HKR-K pass: the unified autoregressive model, 7 modalities, 72 tasks, and 4x token reduction add concrete signal. HKR-R is weak without a major lab, open release, or deployment claim, so this stays in all.
editor take
Archon spans 7 modalities and 72 tasks; 4x video-token compression is solid, but unified avatar claims need open weights.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
17:42
11d ago
arXiv · cs.CL· atomEN17:42 · 05·28
MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings
The authors introduce MedCase-Structured, a synthetic Text-to-FHIR benchmark built from MedCaseReasoning with staged LLM generation plus terminology-grounded validation and repair; the pipeline produces valid HL7 FHIR R4 bundles for 82.5% of cases, and LLMs show lower diagnostic accuracy on structured FHIR inputs than on plain text.
#Reasoning#Benchmarking#MedCaseReasoning#MedCase-Structured
why featured
HKR-H and HKR-K pass: the paper adds a dataset, FHIR R4 pipeline, 82.5% validity, and a counterintuitive text-vs-structured result. HKR-R is weak because EHR benchmarking is vertical and unlikely to drive broad AI-practitioner discussion.
editor take
MedCase-Structured gets valid FHIR for 82.5% of cases; plain-text clinical LLM scores deserve a deployment-format discount.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
17:00
11d ago
HuggingFace Papers (takara mirror)· rssEN17:00 · 05·28
Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning
GASP injects geometric priors into LLM transformer layers using a correspondence head, contrastive point-correspondence loss, and depth consistency supervision; the paper reports peak internal correspondence accuracy rising from often below 5% to over 70%, over 85% temporal robustness, and downstream gains of +18.2% on All-Angles Bench and +29.0% on VSI-Bench without 3D VQA training data.
#Vision#Multimodal#Reasoning#Research release
why featured
HKR-H/K pass: the paper offers a concrete mechanism and a large reported metric jump. HKR-R is weak because this remains a VLM research item without product adoption or competitive impact disclosed.
editor take
GASP lifts correspondence from under 5% to 70%+ without 3D VQA data; I buy geometry supervision over benchmark drilling.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
17:00
11d ago
HuggingFace Papers (takara mirror)· rssEN17:00 · 05·28
IP-Adapter Is All You Need: Towards Fine-Tuning-Free Diffusion-Based Talking Face Generation
The paper uses pretrained Stable Diffusion and IP-Adapter weights for talking face generation without task-specific fine-tuning; experiments report at least a 0.16 PCLD gain in lip-sync accuracy and at least a 0.7 FID improvement in visual fidelity.
#Multimodal#Vision#Fine-tuning#Stable Diffusion
why featured
HKR-K passes with a concrete no-tuning mechanism and reported metric gains. HKR-H and HKR-R are weak because this is a niche vision-generation paper, so it fits the 60–71 research-signal band.
editor take
The paper reports +0.16 PCLD and +0.7 FID; fine-tuning-free is useful, but inference cost is undisclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
16:05
11d ago
HuggingFace Papers (takara mirror)· rssEN16:05 · 05·28
AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection
AnomalyAgent proposes a training-free anomaly detection framework that uses an anomaly-centric toolset and a memory module for zero- and few-shot reasoning; the snippet reports stronger results than training-free VLM baselines and generic agents, but does not disclose specific metrics or the code URL.
#Agent#Multimodal#Memory#AnomalyAgent
why featured
HKR-H/K pass: the training-free agentic anomaly-detection angle is fresh, and the toolset-plus-memory mechanism is concrete. Metrics, code, and deployment evidence are not disclosed, keeping it in the mid research-release band.
editor take
AnomalyAgent turns AD into training-free tool use; metrics and code are missing, so I don’t buy “substantially better” yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
16:01
11d ago
HuggingFace Papers (takara mirror)· rssEN16:01 · 05·28
CorPipe at CRAC 2026: Empty Nodes and Cross-Lingual Transfer in Multilingual Coreference Resolution
CorPipe 26 won the CRAC 2026 Shared Task on Multilingual Coreference Resolution, leading the LLM track by 2.8 percentage points and the unconstrained track by 9.5 percentage points, with source code and trained models released on GitHub.
#Reasoning#Benchmarking#Code#CorPipe
why featured
HKR-K passes on concrete leaderboard margins and open-sourced artifacts. HKR-H and HKR-R are weak because multilingual coreference shared-task news is narrow and unlikely to spark broad AI-practitioner discussion.
editor take
CorPipe 26 wins both tracks by 2.8/9.5 points; for multilingual coreference, specialized systems still beat generative LLMs.
HKR breakdown
hook knowledge resonance
open source
47
SCORE
H0·K1·R0
15:37
11d ago
HuggingFace Papers (takara mirror)· rssEN15:37 · 05·28
Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence
GenIntel introduces a 3D-aware post-training framework that uses SAM3D for geometry and pose estimation, renders PartField descriptors, filters matches with geodesic distances, and trains a lightweight adapter on DINO and Stable Diffusion features for semantic correspondence.
#Vision#Fine-tuning#GenIntel#SAM3D
why featured
HKR-K passes because the post states a concrete 3D-aware training mechanism. HKR-H and HKR-R are weak: no surprising hook, no benchmark lift or deployment impact, and the audience is mostly CV correspondence researchers.
editor take
GenIntel adds SAM3D and PartField supervision to DINO/SD; no metrics disclosed, but symmetric-part mismatches are the right target.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
15:21
11d ago
HuggingFace Papers (takara mirror)· rssEN15:21 · 05·28
Native Audio-Visual Alignment for Generation
NAVA uses 6.3B parameters for joint audio-video generation, first building audio-video correspondence in a dedicated interaction space and then conditioning joint denoising with external context.
#Multimodal#Audio#Vision#NAVA
why featured
HKR-K is solid: NAVA discloses 6.3B scale and a joint denoising mechanism; HKR-R fits multimodal generation competition. Sparse sourcing and no benchmark numbers keep it in the 60–71 band.
editor take
NAVA does joint audio-video at 6.3B; decoupling sync from semantic conditioning via Align-then-Fuse is a bet I buy.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
14:39
11d ago
HuggingFace Papers (takara mirror)· rssEN14:39 · 05·28
Test-Time Training for Supervised Causal Learning
The paper proposes TTT-SCL, which dynamically generates a training set for each test instance; the snippet says it outperforms existing SCL and traditional causal discovery methods on synthetic, pseudo-real, and real-world datasets.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a test-time per-sample training mechanism and benchmark claim. HKR-H/R are weak: the angle is academic, narrow, and lacks product, open-source, or deployment impact, so it sits in the upper 40–59 band.
editor take
TTT-SCL generates a training set per test instance. No dataset counts or metrics disclosed; causal discovery generalization claims need proof.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
12:00
11d ago
HuggingFace Papers (takara mirror)· rssEN12:00 · 05·28
Harnessing Non-Adversarial Robustness in Large Language Models
The paper proposes debiasing-based fine-tuning to improve LLM robustness against semantically neutral prompt perturbations. It identifies perturbation-induced bias in neural network module outputs as the key mechanism, but the RSS snippet does not disclose the evaluated models, datasets, metrics, or experiment scale.
#Fine-tuning#Safety#Benchmarking#Research release
why featured
HKR-K passes because the paper offers a debiasing fine-tuning mechanism for prompt-perturbation robustness. HKR-H and HKR-R are weak, and model, dataset, and experiment scale are not disclosed.
editor take
The paper claims debiasing fine-tuning improves prompt-perturbation robustness; models, datasets, and scale are undisclosed, so don’t ship on “certified” yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
11:19
11d ago
HuggingFace Papers (takara mirror)· rssEN11:19 · 05·28
Energy-Aware NECO for Single-Pass Pixel-wise Out-of-Distribution Detection in Semantic Segmentation
Energy-Aware NECO achieves 0.8539 AUROC on miniMUAD with true pixel-level OOD labels, above NECO-only at 0.8280, Energy-only at 0.8171, and an ensemble predictive-entropy baseline at 0.8124.
#Vision#Robotics#Benchmarking#Energy-Aware NECO
why featured
HKR-K passes via concrete AUROC comparisons. HKR-H/R are weak: pixel-wise OOD segmentation is narrow, and the post does not disclose code, data access, or production impact.
editor take
Energy-Aware NECO hits 0.8539 AUROC on miniMUAD; single-pass OOD beats MC Dropout for edge robot deployment.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
11:13
11d ago
HuggingFace Papers (takara mirror)· rssEN11:13 · 05·28
From XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving Networks
The authors introduce XXLTraffic and EvoXXLTraffic, covering up to 27 years of California PeMS and Transport for NSW data, with yearly active sensors, traffic-flow matrices, and graph snapshots across nine PeMS districts for a streaming forecasting protocol.
#Benchmarking#RAG#California PeMS#Transport for NSW
why featured
HKR-K passes via the 27-year traffic-sensor corpus and evolving graph snapshots. HKR-H/R are weak because this is a narrow traffic-forecasting benchmark, not a general model, agent, or product update.
editor take
EvoXXLTraffic spans 27 years; at +10,000% sensor growth, static-GNN traffic leaderboards look badly miscalibrated.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
10:13
11d ago
HuggingFace Papers (takara mirror)· rssEN10:13 · 05·28
User-Aware Active Knowledge Acquisition for Emotional Support Dialogue
The paper introduces UKA, a gradient-free active dialogue learning framework that uses Theory-of-Mind uncertainty estimation to select responses and elicit user feedback; experiments span multiple dialogue benchmarks and model architectures, but the post does not disclose exact scores.
#Agent#Reasoning#Alignment#Research release
why featured
HKR-K passes: UKA uses Theory-of-Mind uncertainty for active knowledge acquisition. No exact scores are disclosed, the title is academic, and HKR-H/R do not clear featured threshold.
editor take
UKA selects replies via ToM uncertainty, but no scores are disclosed; “strong baselines” without tables is paper-abstract theater.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
10:04
11d ago
HuggingFace Papers (takara mirror)· rssEN10:04 · 05·28
BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge Devices
BitTP converts an LLM-based trajectory predictor into a bitlinear architecture for edge devices, using 1.58-bit weight-only quantization while keeping activations full precision, and reports average ADE reductions of 14.29% and FDE reductions of 20.97% versus a BF16 baseline.
#Robotics#Reasoning#Inference-opt#BitTP
why featured
HKR-H/K pass: ultra-low-bit quantization improving trajectory metrics is concrete and testable. HKR-R is weak because edge trajectory prediction is niche, so this stays in the interesting research band.
editor take
BitTP cuts ADE 14.29% with 1.58-bit weights; full-precision activations make the edge-device claim feel stretched.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
09:04
11d ago
HuggingFace Papers (takara mirror)· rssEN09:04 · 05·28
Predicting Causal Effects from Natural Language Queries Using Structured Representations
The authors introduce Query2Effect, a benchmark with more than 72,000 natural-language questions aligned to experiment descriptions, and test a two-step framework that creates a synthetic structured query representation before supervised effect-size prediction; finetuning reduces absolute error by 27% to 71% versus prompted out-of-the-box LLMs.
#Reasoning#Fine-tuning#Benchmarking#Query2Effect
why featured
HKR-K passes with a 72K-question benchmark and a 27% to 71% error reduction claim. HKR-H and HKR-R are weak because this is an academic benchmark, so it fits all rather than featured.
editor take
Query2Effect has 72K questions; finetuning cuts error 27–71%. Bare prompting is the wrong tool for causal effect estimates.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
09:02
11d ago
HuggingFace Papers (takara mirror)· rssEN09:02 · 05·28
Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory
Entity-Collision fixes the BM25 floor by making every distractor share answer entity tokens, then attributes retrieval lift across 5 tags, 3 embedders, and 5 collision degrees; MiniLM-384 leads both axes, while 2.7x-parameter BGE-large wins on intent queries but loses on lexical ones.
#Agent#RAG#Embedding#BM25
why featured
HKR-K and HKR-R pass: the paper decomposes retrieval lift by collision level, label type, and embedder, with a MiniLM-384 vs BGE-large result. HKR-H fails because the title is narrow and academic.
editor take
MiniLM-384 leads across the 5×3×5 setup; stop using parameter count as a proxy for RAG embedder quality.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
08:16
12d ago
HuggingFace Papers (takara mirror)· rssEN08:16 · 05·28
From General Vision to Reliable Traversability Estimation: Adapting Vision Foundation Models for Unstructured Outdoor Environments
The paper proposes ViTA on SAM2 for traversability estimation in unstructured outdoor environments. It uses learnable traversability prompts, Perspective-Diversified Training, and geometric distillation to infer slope and elevation risk from RGB at inference, while the post does not disclose exact IoU, Precision, or false-positive reduction numbers.
#Vision#Robotics#Benchmarking#Research release
why featured
HKR-K passes because the post gives concrete ViTA mechanisms around SAM2, PDT, and geometric distillation. HKR-H/R are weak, and no IoU result is disclosed, keeping this robotics-vision paper in all.
editor take
ViTA adapts SAM2 for RGB traversability, but exact IoU and false-positive cuts are undisclosed; I trust the distillation idea, not the SOTA claim.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
07:29
12d ago
HuggingFace Papers (takara mirror)· rssEN07:29 · 05·28
ESAM++: Efficient Online 3D Perception on the Edge
ESAM++ replaces ESAM’s 3D sparse UNet with a 3D Sparse Feature Pyramid Network for streaming point clouds, and reports competitive segmentation accuracy on ScanNet, ScanNet200, SceneNN, and 3RScan with up to 3x faster inference and a 2x smaller model for edge devices without GPU acceleration.
#Vision#Robotics#Inference-opt#ESAM++
why featured
HKR-K/R pass: the article gives a concrete architecture swap and benchmark numbers across ScanNet and three other tasks. The 3D perception focus is useful but narrow, so it stays in the 60–71 band.
editor take
ESAM++ reports up to 3x speed and 2x smaller model on four benchmarks; no absolute latency, so edge deployment is unproven.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
07:15
12d ago
HuggingFace Papers (takara mirror)· rssEN07:15 · 05·28
AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling
AnyMo uses the OmniHuMo dataset to train a unified multimodal motion generation framework, with over 5,000 hours of motion and 3.2 million sequences aligned to text, speech, music, and trajectory annotations.
#Multimodal#Robotics#AnyMo#OmniHuMo
why featured
HKR-H and HKR-K pass: the title has a unified any-modality motion hook, and the body gives dataset scale plus annotation details. The impact is niche motion-generation research, so it stays in the 60–71 band.
editor take
OmniHuMo ships 5,000 hours and 3.2M sequences; AnyMo’s arbitrary-conditioning pitch is strong, but RSS gives no benchmark numbers.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
07:06
12d ago
HuggingFace Papers (takara mirror)· rssEN07:06 · 05·28
MOOSE-Copilot: A Web-Based Interactive Assistant for Scientific Hypothesis Discovery
MOOSE-Copilot connects exploratory ideation and fine-grained refinement through three expert signals: initial blueprints, inter-stage routing, and regenerative feedback; the RSS snippet says quantitative evaluations beat purely autonomous baselines, but the post does not disclose datasets, metrics, scores, model choices, or release details.
#Agent#Tools#Reasoning#Research release
why featured
HKR-K passes via the 3 expert-signal mechanism and autonomous-baseline comparison. HKR-H and HKR-R are weak, and missing datasets, metrics, and scores keep it in the lower research-release band.
editor take
MOOSE-Copilot has 3 expert signals; datasets, metrics, and scores are undisclosed, so I don’t buy the baseline win yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics
MathlibLemma uses an LLM-based pipeline to mine missing folklore lemmas for Mathlib, producing 1,506 Lean-checked proofs that pass a proof-bypass screen and building a benchmark of 4,028 non-trivial type-checked Lean statements across mathematical domains.
#Reasoning#Code#Benchmarking#MathlibLemma
why featured
HKR-H and HKR-K pass: the angle is novel and the paper gives concrete counts plus screening. HKR-R is weak because Lean formal math is niche for most AI practitioners, so it stays in the 60–71 band.
editor take
MathlibLemma ships 1,506 Lean-checked proofs; I care how many survive Mathlib maintainer review after the small merge.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Continual Model Routing in Evolving Model Hubs
The paper defines Continual Model Routing, introduces CMRBench to simulate model hub expansion with over 2,000 candidate models, and proposes CARvE, a contrastive embedding method using checkpoint-based anchoring and structured replay for continual routing.
#Agent#Embedding#Benchmarking#arXiv
why featured
HKR-K and HKR-R pass: the paper formalizes routing under growing model hubs and adds a 2,000+ model benchmark plus CARvE. Without a major lab, open-source adoption, or production replacement claim, it stays in the 60–71 research-signal band.
editor take
CMRBench covers 2,000+ models; CARvE beats retrieval and fine-tuning baselines, but the abstract omits margins, so hold the SOTA talk.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models
The paper proposes Bidirectional Manifold Consistency, a training-free unsupervised metric for diffusion language models that checks reasoning-trace stability through a forward-masking and backward-reconstruction cycle; the authors evaluate it across three stages: diagnosis without ground-truth answers, inference via rejection resampling, and alignment with dense geometric rewards.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: BMC gives a training-free verification mechanism across diagnosis, reasoning, and alignment. HKR-H is weak; the post discloses no result numbers, model list, or artifact, so it stays in the 60–71 band.
editor take
BMC checks dLLM traces with one mask-reconstruct loop; diagnosis sounds useful, alignment reward claims need benchmarks first.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Decoupling Reasoning and Confidence: Resurrecting Calibration in RLVR
The paper proposes DCPO to decouple reasoning and calibration objectives in RLVR; its theoretical analysis reports a gradient conflict between maximizing policy accuracy and minimizing calibration error, and the abstract says experiments match GRPO accuracy while improving calibration, without disclosing benchmark names in the snippet.
#Reasoning#Alignment#Benchmarking#arXiv
why featured
HKR-H/K/R pass, but the article only gives abstract-level facts and no results, model scale, or reproducible gain. Treat as a regular research release in the 60–71 band.
editor take
DCPO claims GRPO-level accuracy with better calibration, but benchmarks aren’t disclosed; single-objective RLVR looks shakier after this.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems
The paper introduces AgensFlow, an open-source framework that treats multi-agent coordination as online policy learning under partial observability, and evaluates it on two corpora: distributed-systems incident tasks and security-advisory tasks.
#Agent#Reasoning#Tools#AgensFlow
why featured
HKR-K/R pass because the paper offers an open-source coordination framework and two evaluation settings. HKR-H fails; metrics, repo maturity, and deployment evidence are not disclosed, so it stays below featured.
editor take
AgensFlow reports two corpora but no absolute scores in the snippet; auditable online routing beats yet another agent pile.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Survey of Determinism in Financial AI Systems from Accuracy to Auditability
The arXiv survey analyzes reproducibility failures in three financial AI modalities: tabular models, graph networks, and LLM-based agentic workflows, and validates audit metrics including RBO, D_cos, TDI, and PSD on public financial datasets for credit scoring, fraud detection, and entity extraction.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper offers concrete failure classes and audit metrics, and finance compliance is a real practitioner nerve. As an arXiv survey without a product release or broad discussion, it stays in the 60–71 band.
editor take
This survey splits financial AI reproducibility into 3 failure modes; I buy the angle, audit metrics beat accuracy theater for deployment.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
LLMs are not consistently Bayesian: Quantifying internal inconsistencies in probabilistic beliefs
The paper introduces the information processing gap to measure internal inconsistencies in how LLMs update probabilistic beliefs from evidence, and its experiments across multiple evidence-incorporation methods find that some updates are nearly Bayesian while others follow a learned heuristic.
#Reasoning#Benchmarking#Interpretability#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv evaluation/interpretability paper. The provided text gives the metric and finding, not model lists, scale numbers, or adoption signal, so it stays in the 60–71 band.
editor take
Information processing gap tests LLM belief updates; don't fetishize Bayes here, since model list and task count aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Persuade Me if You Can: A Framework for Evaluating LLM Persuasion and Susceptibility
PMIYC evaluates LLM persuasiveness and susceptibility through automated multi-agent, multi-turn conversations; Llama-3.3-70B and GPT-4o show similar persuasion effectiveness, outperforming Claude 3 Haiku by 30%, while GPT-4o shows over 50% higher misinformation resistance than Llama-3.3-70B.
#Agent#Alignment#Safety#Llama
why featured
HKR-H/K/R pass, but this is still a single arXiv evaluation framework with no disclosed artifact adoption or wider debate. It fits the 60–71 “interesting, not featured” band.
editor take
PMIYC runs multi-turn agent chats; GPT-4o resists misinformation 50%+ better than Llama-3.3-70B. Persuasion scores are nice, gullibility is the safety metric.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Alibaba proposes IB-TPO algorithm for tree-based LLM reasoning policy optimization
Alibaba researchers propose IB-TPO, a tree-based online RL framework that uses IB-Score to optimize the exploration-exploitation balance in LLM reasoning training. Under the same token budget, its IB-guided tree sampling collects 50% more trajectories, reuses the tree for Monte Carlo estimation, and beats GRPO by 2.9% to 3.6% across standard benchmarks.
#Reasoning#Fine-tuning#Benchmarking#Alibaba
why featured
HKR-K/R are strong, and HKR-H comes from the token-budget efficiency hook. A single arXiv training-method paper stays in all because code release and production-scale validation are not disclosed.
editor take
IB-TPO samples 50% more trajectories per token; a 2.9%-3.6% GRPO gain reads like sampling efficiency, not a new RL path.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Learning Deliberately, Acting Intuitively: Enabling Test-Time Reasoning in Multimodal LLMs
The arXiv paper proposes D2I for multimodal LLMs, using rule-based format rewards during training and removing explicit reasoning strategies at inference, with no extra annotations or complex rewards required; the abstract says D2I outperforms baselines on in-domain and out-of-domain benchmarks, but does not disclose model names or benchmark scores.
#Reasoning#Multimodal#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv paper with mechanism only; benchmark gains, model scale, and code are not disclosed. Lower-band score: 70.
editor take
D2I trains with format rewards and drops explicit strategies at inference; no model names or scores, so I don’t buy the generalization claim yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Ω-QVLA: Robust Quantization for Vision-Language-Action Models via Composite Rotation and Per-step Scaling
Omega-QVLA quantizes both the language backbone and DiT action head of Pi 0.5 and GR00T N1.5 to uniform W4A4, reaching 98.0% and 87.8% task success on LIBERO while reducing static memory footprint by 71.3%.
#Vision#Robotics#Inference-opt#Omega-QVLA
why featured
HKR-K and HKR-R pass: W4A4 full quantization and a 71.3% memory cut are useful for VLA deployment. HKR-H is weak because the title is dense; scope is narrower than a mainstream model release.
editor take
Omega-QVLA pushes Pi 0.5 and GR00T N1.5 to W4A4; beating FP16 on LIBERO punctures the DiT-action-head taboo.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring
arXiv:2502.05242v3 proposes TELLME, a method that makes LLMs themselves easier to monitor instead of adding external modules, and reports consistent gains on detoxification tasks across multimodal test sets, distinct architectures, and varying parameter scales; the abstract does not disclose exact model names, dataset names, or numerical scores.
#Interpretability#Safety#Multimodal#Research release
why featured
HKR-H/K/R pass, but the article only gives an arXiv method-and-evaluation sketch with no code, headline metric, or major-lab signal. This fits an interesting safety/interpretability research release, so 70 and all.
editor take
TELLME moves monitoring into the model, but names zero models, datasets, or scores; safety claims without numbers smell thin.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts
The paper presents a method to prune translation-irrelevant experts from MoE LLMs without retraining, removing 50% of experts with negligible translation degradation and 75% after a short SFT while recovering baseline performance.
#Inference-opt#Fine-tuning#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv methods paper with narrow translation scope and no major entity. The 50%/75% pruning claims make it useful signal, not featured-level news.
editor take
This prunes 50% of MoE experts while preserving translation; if reproducible, translation stacks are carrying dead generalist weight.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Structure-Guided Visual Perturbation Neutralization for LVLMs
The paper proposes SIGN, a plug-and-play defense for adversarial visual perturbations in LVLMs, using Prior Structural Extraction and Dynamic Guided Neutralization; experiments report over 87% defense success with 0.5% pixel modification and 0.16 seconds per image, while the abstract says benign task performance and original visual representations are nearly preserved.
#Vision#Multimodal#Safety#Research release
why featured
HKR-K and HKR-R pass: the paper gives testable numbers and targets LVLM visual-attack defense. Single arXiv source, technical framing, and no disclosed code or independent replication keep it in the 60–71 band.
editor take
SIGN reports 87% defense success with 0.5% pixel edits; I want the attack suite and LVLM list before trusting it.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
IRDS selects RLVR training instances using SAE clusters and a verifier-coupled coverage objective, then solves selection with greedy log-determinant maximization; across three instruction-tuned models and six math reasoning benchmarks, it beats the strongest baseline by +3.9/+4.0 pp on two Qwen models and +0.5 pp on Llama-3.1-8B while running about one order of magnitude cheaper than a trajectory-based baseline.
#Reasoning#Fine-tuning#Interpretability#Qwen
why featured
HKR-K is solid: 3 instruction models, 6 math benchmarks, and Qwen +3.9/+4.0 pp make the claim testable. HKR-H is weak and HKR-R is limited to training teams, so it stays below featured.
editor take
IRDS wins on 3 models and 6 math sets, +4pp on Qwen; +0.5pp on Llama keeps the SAE-selection hype contained.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Law of Neural Interaction: Depth-Width Shape, Interaction Efficiency, and Generalization
The paper defines neural interaction by extending superposition from parameter space to gradient space, and reports that adjusting the depth-width ratio R_D/W can place a fixed-budget model in an efficient interaction interval, with small dense LLMs near that interval performing better on MMLU-Pro.
#Reasoning#Benchmarking#arXiv#MMLU-Pro
why featured
HKR-K and HKR-R pass: the paper offers a testable R_D/W depth-width mechanism and small dense-LLM MMLU-Pro evidence. Impact stays in arXiv research scope, so it lands below the featured band.
editor take
R_D/W is pitched as fixed-budget generalization control; no model list in the snippet, so treat it as shape intuition, not a scaling law.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Regression Language Models for Code
The paper introduces a 300M-parameter Regression Language Model using a frozen LLM encoder to predict code execution metrics, reporting over 0.9 Spearman rank on APPS memory-footprint tasks and over 0.5 average Spearman rank across 17 CodeNet languages.
#Code#Benchmarking#arXiv#T5Gemma
why featured
HKR-K has concrete mechanism and numbers, and HKR-R touches code-model evaluation and cost. Still, this is a narrow arXiv methods paper without product impact or a strong click hook, so it stays in the 60–71 band.
editor take
A 300M T5Gemma RLM hits >0.9 Spearman on APPS memory; I care whether it resists benchmark shortcuts, and leakage checks aren’t disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling
The paper proves that higher temperature increases classifier entropy, challenges the common claim that higher LLM temperature increases diversity, and gives two characterizations: an information-projection view and a linear-scaling result where temperature scaling uniquely preserves hard predictions.
#Inference-opt#Reasoning#Research release
why featured
HKR-H/K/R all pass, but this is a theory-heavy single paper with proofs and conceptual correction, not a model release, tool, or production result, so it stays in the 60–71 band.
editor take
The paper proves higher temperature raises classifier entropy, but questions LLM diversity claims; entropy alone is a weak proxy for sampling quality.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
GradientStabilizer: Fix the Norm, Not the Gradient
GradientStabilizer replaces update magnitude with a stabilized estimate from running gradient-norm statistics while preserving gradient direction, and the paper reports lower divergence across LLM pre-training, FP4 quantization-aware pre-training, ImageNet classification, reinforcement learning, and time-series forecasting versus clipping baselines.
#Fine-tuning#Inference-opt#Benchmarking#GradientStabilizer
why featured
HKR-K/R pass: the mechanism and test settings are clear, and training stability maps to cost. HKR-H is weak; the body gives no effect size, model scale, or code, so this stays in the 60–71 band.
editor take
GradientStabilizer spans LLM, FP4, ImageNet, and RL tests; without code, don't crown it a clipping replacement yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Mind Dreamer: Untethering Imagination via Active Causal Intervention on Latent Manifolds
Mind Dreamer samples initial states from an adversarial generator instead of observed histories, creating non-continuous latent jumps to epistemic blind spots; on DeepMind Control Suite, it reports a 1.67× average speedup over DreamerV3 and up to 8.8× in sparse-reward tasks.
#Agent#Reasoning#Benchmarking#Mind Dreamer
why featured
HKR-H/K pass on the concrete mechanism and 1.67x/8.8x results. HKR-R fails because this is still a narrow arXiv RL benchmark paper, far from products or mainstream agent workflows.
editor take
Mind Dreamer reports 1.67× faster DMC learning, 8.8× on sparse rewards; I’d audit whether generated anchors fool the world model.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns
The paper maps Tree-of-Thoughts to three classical search components—state representation, successor generation, and heuristic evaluation—and separates design patterns for Best-First Search, DFS, and MCTS under shallow deterministic tasks or deeper multi-step reasoning.
#Reasoning#Agent#Research release
why featured
HKR-H/K/R pass, but the post discloses a framework only; no results, code, or production replacement claim. This stays in the upper 60–71 band, not featured.
editor take
ToT gets reduced to 3 search components; good, because prompt mysticism belongs back in BFS, DFS, and MCTS knobs.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Research paper proposes retiring positive backdoor label for secret alignment evaluation
The position paper argues that the AI/ML community should retire the “positive backdoor” label and evaluate trigger-activated hidden behaviors as Secret Alignment, covering three applications across six properties: effectiveness, harmlessness, persistence, efficiency, robustness, and reliability.
#Alignment#Safety#Benchmarking#Research release
why featured
HKR passes on a niche safety-taxonomy hook, but the post only discloses summary-level claims. No author authority, experiments, or discussion signal is given, so it stays in the 60–71 band.
editor take
The paper tests 3 Secret Alignment uses across 6 properties; I buy retiring “positive backdoor”—without standard evals, it’s security theater.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
High Performance, Low Reliability: Uncertainty Benchmarking for Tabular Foundation Models
The paper compares TFMs, GBDTs, and classical baselines on 112 TALENT benchmark datasets, finding that TFMs achieve the highest AUC but lower SSCS conditional coverage under conformal prediction than GBDTs.
#Benchmarking#TALENT#Research release#Benchmark
why featured
HKR-H/K/R pass, but uncertainty benchmarking for tabular foundation models is narrower than mainstream LLM product news. The 112-dataset TALENT result gives real signal, placing it in the 60–71 research band.
editor take
TFMs top AUC on 112 TALENT datasets; SSCS coverage trails GBDTs, so tabular leaderboard wins still need calibration checks.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vector Model Editing
The paper evaluates SAE-guided model editing on Gemma-3-4B-IT and finds that projecting task vectors into SAE feature subspaces discards about 97% of modification energy, with no statistically significant gains across seven math subjects; using SAEs for layer selection instead raises Minerva Number Theory accuracy from 29.6% to 39.4%, with 5 of 7 subjects significantly improved.
#Interpretability#Reasoning#Fine-tuning#Gemma
why featured
HKR-H and HKR-K pass: the title has a contrarian hook, and the post gives concrete Gemma-3-4B-IT results. The SAE/task-vector editing scope is narrow, so it stays in the 60–71 band.
editor take
SAE projection drops 97% of edit energy on Gemma-3-4B-IT; using it for layer diagnosis lifts 29.6% to 39.4%.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics
SaFeR-Steer trains Qwen2.5-VL-3B/7B with staged synthetic bootstrapping and tutor-in-the-loop GRPO, and its STEER dataset contains 18,161 multimodal safety dialogues spanning 2–10 turns across SFT, RL, and benchmark splits.
#Multimodal#Safety#Alignment#Qwen
why featured
HKR-K is supported by dataset size and training setup, and HKR-R fits multimodal safety deployment concerns. HKR-H is weak, and this is a single arXiv paper without visible industry pickup, so it stays in 60-71.
editor take
SaFeR-Steer pushes Qwen2.5-VL-7B multi-turn safety to 64.89; TCSR is a sane fix for single-turn safety theater.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting
TIPS trains bias-specialized Transformer teachers with attention masking and distills them into one student Transformer; across four major equity markets, it exceeds strong ensemble baselines by 55% in annual return, 9% in Sharpe ratio, and 16% in Calmar ratio while using 38% of the inference-time computation.
#Reasoning#Inference-opt#TIPS#Research release
why featured
HKR-K/R pass: TIPS distills biased teachers into one Transformer and gives market and compute numbers. HKR-H is weak; the finance-forecasting angle is vertical, with no code, deployment, or independent replication disclosed.
editor take
TIPS beats ensembles by 55% annual return across four markets; I’d inspect trading costs and walk-forward setup first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Test-Time Collective Action: Proxy-Based Perturbations for Correcting Algorithmic Harms
The paper proposes Test-Time Collective Action, where users pool black-box API queries to extract a proxy model and optimize per-class universal perturbations applied at submission time; experiments on CIFAR-10, CIFAR-100, and FairFace report smaller subgroup accuracy gaps, transfer from small proxies to larger platforms, improved worst-group metrics, and lower pooled query cost than per-user attacks.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-H/K/R pass, but the evidence is still an arXiv paper with CIFAR-10, CIFAR-100, and FairFace tests. No production deployment or broad debate is disclosed, so it stays in 60–71.
editor take
TTCA tests pooled black-box fixes on 3 datasets; honestly, this smells like fairness as jailbreak, and platforms will patch perturbations first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Transformers Provably Learn to Internalize Chain-of-Thought
The paper proves that an L-layer transformer trained with the Log-ICoT curriculum learns k-parity using poly(n) samples, with L=log2 k training stages, matching explicit CoT sample efficiency while removing explicit reasoning tokens at inference.
#Reasoning#Benchmarking#Interpretability#Research release
why featured
HKR-H/K/R all pass, but this is a theory-heavy arXiv proof with no code, model eval, or product impact disclosed. It fits the 60–71 research-signal band, not featured.
editor take
Log-ICoT learns k-parity in L=log2 k stages; clean proof, but parity still sits far from real reasoning.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
The paper introduces REFT, which uniformly samples the first token after the reasoning marker from the policy’s top-N candidates and allocates rollouts evenly, improving aggregate Pass@1, Pass@8, and Pass@64 over DAPO and GRPO across four 0.5B-7B base models and three difficulty regimes.
#Reasoning#Alignment#Benchmarking#REFT
why featured
HKR-K is solid: REFT gives a concrete sampling point, top-N mechanism, 0.5B-7B bases, and Pass@1/8/64 claims. HKR-H/R pass for RLVR practitioners, but the single arXiv item is narrow, so it stays in 60-71.
editor take
REFT changes only first-token sampling after the reasoning marker, beating DAPO/GRPO on 0.5B-7B; I buy this cheap lever.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Beyond Binary Moral Judgment: Modeling Ethical Pluralism in AI
The paper proposes an ethical pluralism framework that models moral reasoning as a distribution over normative theories. It uses 450 natural-language dilemma cases across 15 subtheories, a two-stream normative-semantic architecture, and stacked ensemble learning to classify consequentialism, virtue ethics, and deontology with 88.89% accuracy.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K passes with concrete dataset and accuracy numbers, and HKR-R connects to alignment value conflicts. HKR-H is weak, with no major lab, released artifact, or production-impact claim, so it stays in all.
editor take
450 dilemmas yield 88.89% accuracy; I don’t buy “human-like moral reasoning”—this smells like a small ethical-label classifier.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
TRACES learns prefix-level trajectory risk states from an observer LLM’s hidden representations for multi-turn tool-using agents. The paper says weak trajectory-level supervision yields dense prefix-level risk estimates and improves safety prediction across multiple agent safety benchmarks, but the RSS snippet does not disclose benchmark names, dataset counts, or improvement sizes.
#Agent#Safety#Interpretability#TRACES
why featured
HKR-K and HKR-R pass: the mechanism targets prefix-level risk estimates for multi-turn agents. HKR-H is weak, and benchmark count plus gain size are not disclosed, so it stays in all.
editor take
TRACES estimates prefix risk via trajectory-level weak labels; benchmarks and gains aren’t disclosed, so buy the direction, not the result.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Disentangling Language Roles in Multilingual LLM Task Execution
The paper introduces MTM-Bench, a benchmark that crosses instruction, content, and response languages across English, Spanish, and Chinese into 27 triplets, evaluating 20 frontier and open-weight LLMs on 2,430 instances per model with decomposed metrics and a targeted human audit.
#Benchmarking#MTM-Bench#Research release#Benchmark
why featured
HKR-K has concrete benchmark scale and setup, and HKR-R fits multilingual LLM deployment concerns. The post discloses design and size, not key results or model rankings, so it stays in all.
editor take
MTM-Bench tests 20 models across 27 language triplets; I buy the role-split, especially response-language failure.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Generalized Holographic Reduced Representations
The paper proposes GHRR, extending FHRR with a flexible non-commutative binding operation. The authors replace Transformer attention with a GHRR-equivalent mechanism and report better language-modeling performance than a vanilla Transformer, while proving HDC property preservation and testing compositional decoding accuracy.
#Reasoning#Benchmarking#Interpretability#Research release
why featured
HKR-H/HKR-K pass: the hook is an attention replacement, and the new mechanism is non-commutative binding. Kept in all because the post lacks authors, metrics, datasets, and code, with a specialized model-architecture bar.
editor take
GHRR beats vanilla Transformer after replacing attention; no task scale or numbers disclosed, so I’m treating this as HDC revival work.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference
ASTRA combines sequence parallelism with mixed-precision attention, sending non-local token embeddings as low-bit vector-quantized codes, and reports up to 2.64× speedup over single-device inference and 15.25× over prior multi-device baselines at bandwidths as low as 10 Mbps across ViT and GPT2.
#Inference-opt#ASTRA#GPT2#Llama-3-8B
why featured
HKR-H/K/R pass, but this is a single arXiv inference-optimization paper aimed at systems readers, not a broad product or model event. Solid numbers keep it in the 60–71 band.
editor take
ASTRA reports 2.64× single-device speedup at 10 Mbps; I buy the edge-inference angle more than GPT2-to-Llama-3-8B extrapolation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Bias Leaves a Gradient Trail: Label-Free Bias Identification via Gradient Probes on Concept Decompositions
The paper presents a post-hoc bias identification method for frozen vision classifiers that uses only standard class labels from a held-out audit set, ranks NMF-derived concept vectors with gradients from misclassified examples, and improves worst-group accuracy by up to 17.9 percentage points on Waterbirds and 10.4 on CelebA without retraining or parameter updates.
#Vision#Interpretability#Safety#Research release
why featured
HKR-K is solid: label-free gradient probes and a 17.9-point Waterbirds worst-group gain are testable. HKR-H/R pass, but frozen-vision-classifier auditing is too narrow and technical for featured.
editor take
Gradient probes find bias in frozen vision models and add 17.9 points on Waterbirds worst-group accuracy; I like that it skips group labels.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Restoring the Sweet Spot: Pass-Rate Weighted Self-Distillation for LLM Reasoning
Zehao Liu and coauthors introduce SC-SDPO, which weights each question’s SDPO loss by [p̂(1-p̂)]^1/2 from on-policy rollouts, and report gains of +3.2 mean@16 and +4.3 maj@16 on Qwen3-8B, plus +1.8 and +3.0 on OLMo-3-7B.
#Reasoning#Alignment#Tools#Zehao Liu
why featured
HKR-H and HKR-K pass: the pass-rate weighting hook is clear and the Qwen3-8B gains are concrete. HKR-R is weak; this is a single arXiv method paper, not a product or market event.
editor take
SC-SDPO lifts Qwen3-8B mean@16 by 3.2 points; explicit mid-difficulty weighting beats another vague RL slogan.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Reevaluating Policy Gradient Methods for Imperfect-Information Games
The paper releases exact exploitability computations for five large imperfect-information games and reports that, across more than 7,000 training runs, FP-, DO-, and CFR-based deep reinforcement learning methods did not outperform generic policy gradient methods such as PPO.
#Benchmarking#Reasoning#arXiv#Research release
why featured
HKR-H and HKR-K pass: 7,000+ runs across 5 games make the claim testable. The topic stays niche RL/game benchmarking, with weak HKR-R and no product or model-release impact, so it fits 60–71.
editor take
7,000+ runs found FP/DO/CFR-style DRL failed to beat PPO; imperfect-information RL has a baseline debt problem.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Learning to Translate from Soft to Hard LLM Prompts
The paper trains a soft-prompt-to-natural-language translation model and reports better quantitative and qualitative results than InSPEcT across multiple DoD datasets, with translated prompts from small open-source models transferring to larger closed-API models and sometimes outperforming few-shot learning.
#Fine-tuning#Interpretability#InSPEcT#Research release
why featured
HKR-H and HKR-K pass: the soft-to-hard prompt angle is novel, and the summary gives an InSPEcT comparison plus a transferability claim. Impact stays research-heavy with no artifact or production evidence, so it fits 60–71.
editor take
A trained soft-prompt translator beats InSPEcT on DoDs; if reproducible, small-model tuning can leak into closed-model prompting.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Rethinking Layer Redundancy: Calibration Matters More Than Search in LLM Depth Pruning
The paper evaluates depth pruning across multiple LLM families and calibration settings, finding that calibration choices produce different layer-removal patterns; under a fixed calibration setup, complex search algorithms deliver only marginal gains over simple one-shot methods and converge on similar pruned layer subsets.
#Inference-opt#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is an LLM depth-pruning paper with impact concentrated in inference optimization research. No model release, open-source tool, or production-replacement evidence, so it stays in the 60–71 all tier.
editor take
This paper tests multiple LLM families: with fixed calibration, complex search barely beats one-shot; depth pruning needs cleaner calibration, not fancier search.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures
The study evaluates VQ-BeT, Diffusion Policy, and ACT across 450 PushT and ALOHA 14-DOF episodes, finding direction reversal rate predicts failures across all three VLA architectures with AUROC scores of 0.93, 0.79, and 0.91, while velocity-only checks provide weak or zero signal despite common use in deployment code.
#Robotics#Safety#Benchmarking#VQ-BeT
why featured
HKR-H/K/R all pass, but this is a robotics safety evaluation rather than a broad model or product release. The concrete AUROC results and black-box mechanism put it at the high end of 60–71.
editor take
Across 450 episodes, direction reversals hit 0.93 AUROC; teams still guarding VLAs with velocity thresholds need new monitors.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Analyzing Quality-Latency-Resource Trade-offs in a Technical Documentation RAG Assistant Using LoRA Adaptation
The paper evaluates 20 LoRA configurations on a 5,144-pair Kubernetes documentation QA benchmark, using fixed hybrid retrieval and Llama-3.2-3B-Instruct or Llama-3.1-8B-Instruct, and finds q/v-only attention adapters consistently dominate the Pareto front across quality, latency, memory, and training cost.
#RAG#Fine-tuning#Benchmarking#Kubernetes
why featured
HKR-K and HKR-R pass: the paper gives concrete sample size, config count, and a q/v-adapter finding. It remains a niche engineering evaluation rather than a broad industry event, so it stays in 60–71.
editor take
5,144 Kubernetes QA pairs and 20 LoRA runs put q/v-only on the Pareto front; full-module tuning loses its default excuse.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
SPARD: Defending Harmful Fine-Tuning Attacks via Safety Projection and Relevance-Diversity Data Selection
SPARD tests four harmful fine-tuning attacks on GSM8K and OpenBookQA, combining SPAG safety-projected alternating optimization with relevance-diversity DPP safe-data selection; the paper reports the lowest average attack success rates versus state-of-the-art defenses while maintaining task accuracy, but the snippet does not disclose exact ASR or accuracy numbers.
#Fine-tuning#Safety#Alignment#SPARD
why featured
HKR-K/R pass: the paper gives attack count, benchmarks, and a defense mechanism, and fine-tuning safety matters to practitioners. HKR-H is weak; single arXiv paper with no lab-scale release or production evidence.
editor take
SPARD covers 2 tasks and 4 attacks; without ASR numbers, the safety projection is a lead, not a result.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference
Meta-Attention uses a Bayesian Meta-Controller to route each token to full softmax, linear, or sliding-window attention, and its Phase 1 Tiny LM results report 25.1% projected normalized FLOP cost under hard routing versus 59.3% for the prior-free baseline.
#Inference-opt#Reasoning#Meta-Attention#Research release
why featured
HKR-K is solid: the post gives a concrete routing mechanism and FLOP numbers. HKR-R is present on inference cost, but single arXiv evidence and Tiny LM Phase 1 keep it below featured.
editor take
Meta-Attention cuts Tiny LM hard-routing FLOPs to 25.1%; Phase 1 is neat, but real long-context throughput is unproven.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Structured Agent Distillation for Large Language Model
The paper proposes Structured Agent Distillation, which splits trajectories into [REASON] and [ACT] spans and evaluates against token-level distillation and imitation learning baselines on ALFWorld, HotPotQA-ReAct, and WebShop.
#Agent#Reasoning#Fine-tuning#Research release
why featured
HKR-K is clear via the structured [REASON]/[ACT] distillation setup and three benchmarks; HKR-R lands on agent cost/control. HKR-H is weak, and no result numbers or artifact details are disclosed.
editor take
Structured Agent Distillation reports 3 benchmarks; no compression ratio or score drop is disclosed, so don’t crown span loss yet.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning
RankTuner introduces the Relative Rank Indicator, comparing the ground-truth token rank with its expected rank under the prediction distribution, then uses the inverse signal as a token-wise Relative Scale for supervised fine-tuning; the abstract reports gains across multiple backbones on math reasoning, out-of-distribution reasoning transfer, and code generation versus probability-only or entropy-only reweighting baselines.
#Fine-tuning#Reasoning#Code#RankTuner
why featured
HKR-K/R pass: RankTuner/RRI gives a concrete weighting mechanism and claims gains on math, OOD reasoning, and code. No metrics, artifact details, or broad hook are disclosed, so this stays in the 60–71 research-release band.
editor take
RankTuner calibrates true-token rank against expected rank; I buy the signal, but the snippet omits backbones, deltas, and reproducibility details.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Heterogeneous Parallelism for Multimodal Large Language Model Training
The paper presents heterogeneous parallelism for multimodal LLM training. Modules use independent layouts and rank placements in one graph. Boundary communicators transform forward activations and backward gradients. Colocated heterogeneity improves TFLOPS/GPU by up to 49.3%. Non-colocated heterogeneity improves aggregate token throughput by 13.0% and TFLOPS/GPU by 9.6%.
#Multimodal#Inference-opt#Tools#Megatron-LM
why featured
HKR-K and HKR-R pass: the paper gives a boundary-communicator mechanism, a 49.3% TFLOPS/GPU figure, and a clear training-cost angle. HKR-H fails because the title reads like a niche systems paper.
editor take
Heterogeneous parallelism lifts colocated TFLOPS/GPU by 49.3%; multimodal training pain is back at communication boundaries.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Guaranteed Optimal Compositional Explanations for Neurons
The paper introduces a framework for computing guaranteed optimal compositional explanations for neurons across the assumed state space, and reports that 10-40% of prior beam-search explanations are suboptimal when concepts overlap.
#Interpretability#Research release
why featured
HKR-K is clear and HKR-R lands on interpretability/safety concerns, but HKR-H is weak. A single arXiv paper with technical framing and no tool or industry adoption fits the 60–71 all band.
editor take
Beam-search explanations are 10-40% suboptimal under overlapping concepts; interpretability needs fewer pretty rules and more guarantees.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Cyclical Entropy Eruption: Entropy Dynamics in Agent Reinforcement Learning
The paper identifies cyclical entropy eruption in agent RL, where training shows recurring entropy spikes and gradual subsidence, and proposes SEAL, a lightweight auxiliary loss that separates correct and incorrect trajectories in representation space; the abstract says experiments span multiple benchmarks, models, and RL algorithms, but does not disclose exact counts.
#Agent#Reasoning#Alignment#SEAL
why featured
HKR-H/K/R pass via the entropy-cycle claim and SEAL loss mechanism. The item stays in all because this is an arXiv training-diagnostics paper with no disclosed scale, benchmark gain, or ready artifact.
editor take
Agent RL shows recurring entropy spikes; exact experiment counts are undisclosed, so SEAL lives or dies on suppressing duplication and hallucination.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management
The paper separates fixed-system and scaling-family settings for autoregressive Transformers, arguing that many existing Turing-completeness proofs hold in the latter and do not establish Turing-completeness for real-world LLM deployment with fixed context management.
#Reasoning#Research release#Commentary
why featured
HKR-K/R pass while HKR-H is weak; the paper adds a theory claim about LLM capability limits, but no experiment, code, or deployment impact is disclosed, so it stays in the 60–71 band.
editor take
This paper splits fixed systems from scaling families; proving Turing-completeness with growing context does not cover deployed LLMs.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
HQMQ compresses KV cache by combining the 24-element Hurwitz group with S random unit quaternions per layer and head, matching fp16 perplexity on Mistral-7B and Qwen3-8B within 0.02–0.03 points at about 5 bits.
#Inference-opt#Mistral#Meta#Qwen
why featured
HKR-K and HKR-R pass: KV-cache compression ties to inference cost, with testable 5-bit results on Mistral-7B and Qwen3-8B. HKR-H fails and technical accessibility keeps it below featured.
editor take
HQMQ keeps Qwen3-8B within 0.03 ppl of fp16 at ~5 bits; calibration-free random codebooks make it feel deployable, not another int4 patch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
E^3-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference
E^3-Agent manages edge generative inference with a millisecond fast-path router and an event-driven LLM meta-controller, and in three dynamic regimes it reduces average latency by 65%-73% versus the best static baseline while staying within 7%-10% of a full-information Oracle.
#Agent#Inference-opt#Tools#Rui Bao
why featured
HKR-K is strong via the fast/slow-path mechanism and latency numbers; HKR-R is real for edge-inference cost and latency. HKR-H is weak, and this is a single arXiv systems paper, so it stays in 60–71.
editor take
E^3-Agent cuts simulated latency 65%-73%; I’d demand real edge-cluster replication before buying the 7%-10% Oracle gap.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Segment to Focus: Guiding Latent Action Models in the Presence of Distractors
MaskLAM restricts the reconstruction objective to agent pixels and obtains zero-shot masks from segmentation models such as SAM, requiring no architecture changes, auxiliary losses, or action labels during pre-training; on Distracting Control Suite and Distracting Meta-World, it reduces normalized linear-probe MSE by up to 3.51x and improves normalized return by up to 4.97x over LAPO.
#Robotics#Vision#Agent#SAM
why featured
HKR-K is strong with MaskLAM, SAM masks, and 3.51/4.97 results; HKR-H has a clear method twist. The robotics-representation niche keeps it in the 60–71 band, not featured.
editor take
MaskLAM gets 4.97x return via SAM masks; I buy the distractor setup, but real robot mask stability is still undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
ROSD: Reflective On-Policy Self-Distillation for Cross-Domain Language Model Reasoning
ROSD uses a self-reflector to extract a corrective idea and locate the first erroneous span, then limits distillation to that span; the paper reports stronger in-domain reasoning and better out-of-domain generalization than standard OPSD across multiple reasoning benchmarks, but the RSS snippet does not disclose model sizes, datasets, or numeric scores.
#Reasoning#Fine-tuning#Research release#Open source
why featured
HKR-K passes on targeted span distillation; HKR-R is modest because reasoning fine-tuning is practitioner-relevant. HKR-H misses: arXiv method title, no numbers or model names, so it stays in all.
editor take
ROSD distills only the first wrong span. Scores and model sizes are undisclosed; I buy the mechanism, not the generalization claim yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Explaining Is Harder Than Predicting Alone: Evaluating Concept-Based Explanations of MLLMs as ICL Visual Classifiers
The paper evaluates four frozen MLLMs under five few-shot ICL conditions and finds that requiring formally structured, concept-based explanations reduces visual classification accuracy from 93.8% to 90.1%, while high-quality class-discriminative explanations correlate with correct predictions when the models can produce them.
#Multimodal#Vision#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the title has a counterintuitive hook, and the abstract gives testable settings plus an accuracy delta. The MLLM interpretability benchmark is useful but too narrow for featured.
editor take
Four frozen MLLMs drop from 93.8% to 90.1% under structured explanations; readable reasoning is not a free accuracy gain.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
A Structural Theory of Position Bias in Transformers
The paper proposes residual-aware cumulative attention rollout to explain position bias in causal Transformers, showing that finite depth, causal masking, and residual connections induce broad U-shaped influence profiles, with empirical profiles matching measured input-token influence in pretrained language models.
#Interpretability#Reasoning#Research release
why featured
HKR-H and HKR-K pass: U-shaped positional influence and residual-aware rollout are concrete. HKR-R is weak; the post lacks model names, scale, or reproduction details, so this stays in the 60–71 band.
editor take
This pins Lost-in-the-Middle on causal masks, residuals, and finite depth; the tested model list is undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL
The paper trains code RL checkpoints with nested unit-test coverage and observes a correctness-efficiency frontier across 32B and 7B models and three inference settings: pure reasoning, tool use, and agentic coding; extrapolative weight averaging extends the frontier and raises pass@250 on LCB/hard by 3.3% over the best single checkpoint at a matched sample budget.
#Code#Reasoning#Agent#arXiv
why featured
HKR-K and HKR-R pass: the paper gives concrete model sizes, settings, and a +3.3% pass@250 gain, with relevance to code-model cost tradeoffs. HKR-H is weak, and this is a single arXiv paper, so it stays below featured.
editor take
EWA lifts LCB/hard pass@250 by 3.3% at matched budget; the useful bit is new complementary policies without extra RL.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Locality-Aware Redundancy Pruning for LLM Depth Compression
The paper introduces LoRP, a training-free one-shot depth pruning method that uses a small calibration set to compute pairwise hidden-state similarity, cluster layers by representation similarity, and allocate pruning by residual intra-cluster redundancy; the abstract says experiments across multiple LLM families improve perplexity and downstream task accuracy, but it does not disclose model names or exact scores.
#Inference-opt#Benchmarking#Research release
why featured
HKR-K has a concrete mechanism and HKR-R touches deployment cost, but the item gives only abstract-level claims with no numbers, code, or major-lab signal, so it stays in all.
editor take
LoRP does one-shot depth pruning with a small calibration set; no model names or scores, so good idea, weak evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
Vision-OPD builds a crop-conditioned teacher and a full-image student from the same MLLM, then minimizes token-level divergence on student on-policy rollouts, requiring no external teacher, ground-truth labels, reward verifier, or inference-time tool use.
#Multimodal#Vision#Fine-tuning#Vision-OPD
why featured
HKR-K passes on the concrete self-distillation setup; HKR-R is modest because fine-detail vision is a real MLLM pain point. No benchmark gain, model scale, or artifact is disclosed, so this stays in the 60–71 band.
editor take
Vision-OPD uses one MLLM as crop-teacher and full-image student; no benchmark numbers disclosed, so I’d file it as a cheap training trick.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Noise Scheduling as Information-Guided Allocation in Diffusion Training
InfoNoise estimates a conditional-entropy-rate profile from denoising losses during diffusion training and changes only the training noise distribution, while keeping the objective, weighting, and parameterization fixed; on DNA and language generation tasks, it reaches target quality with up to 3x less training compute than fixed and adaptive baselines.
#Inference-opt#InfoNoise#arXiv#Research release
why featured
HKR-K and HKR-R pass: the 3x training-compute claim and noise-only intervention are concrete. HKR-H fails, and the entropy-rate diffusion method is specialist, keeping it in 60–71.
editor take
InfoNoise changes only training-noise sampling and saves up to 3x compute on DNA/language; image gains are modest, so don’t oversell it.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Stay Fair! Ensuring Group Fairness in Diffusion Models Across Guidance Scales
The paper introduces StayFair, which decomposes diffusion-model bias into model bias and guidance bias, then modifies only the guidance step under classifier guidance and classifier-free guidance to keep the target distribution’s group ratio stable across guidance scales.
#Multimodal#Vision#Alignment#StayFair
why featured
HKR-K is supported by a concrete mechanism, and HKR-R touches bias governance in generative models. HKR-H is weak, and the article is a single arXiv paper without code, benchmark numbers, or product impact.
editor take
StayFair only changes guidance to preserve group ratios; monotonic bias at high guidance matches how users actually run diffusion models.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
How Far Can Disaggregation Go? Attention-FFN Disaggregation for Efficient MoE LLM Serving
The paper evaluates Attention-FFN disaggregation for MoE inference and reports about 4k tokens/s system throughput on DeepSeek-V3.2 under strict TTFT/TPOT SLOs across chat, coding, and agentic-coding workloads.
#Inference-opt#Benchmarking#DeepSeek#arXiv
why featured
HKR-K and HKR-R pass via concrete throughput, SLOs, and DeepSeek-V3.2 conditions. The MoE serving-systems angle is specialized, so technical-accessibility pressure keeps it in the lower all band.
editor take
AFD hits ~4k tokens/s on DeepSeek-V3.2; I want the SLO cutoff where non-AFD becomes infeasible.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Neural Weight Compression for Language Models
The paper proposes Neural Weight Compression, a neural codec framework trained on pretrained weight datasets, and reports competitive accuracy-compression tradeoffs in the 4-6 bit regime without rigid handcrafted components such as the Hadamard transform.
#Inference-opt#Research release
why featured
HKR-K/R pass: the paper adds a neural-codec mechanism for pretrained weights and reports 4–6 bit results tied to inference cost. HKR-H fails; the title is plain. Sparse numbers keep it in the 60–71 band.
editor take
NWC reports strong 4–6 bit compression; treating weights as codec data looks saner than hand-tuned Hadamard tricks.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
A Unified Structured Query Understanding Framework for Industrial Semantic Search
The paper proposes one schema-constrained SLM for query understanding and deploys it in LinkedIn Job Search. Query Illuminator handles auto-annotation, distillation, and evaluation; the abstract does not disclose exact engagement or cost numbers.
#RAG#Fine-tuning#Inference-opt#LinkedIn
why featured
HKR-K and HKR-R pass: the paper gives a LinkedIn Job Search deployment plus Query Illuminator for labeling, distillation, and evaluation. No uplift numbers are disclosed, and HKR-H is weak, so it stays in the 60–71 band.
editor take
LinkedIn folds query understanding into one schema-constrained SLM; no lift or cost numbers disclosed, so I buy the direction, not the claim.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification
DecomposeRL trains a 7B decomposition policy with GRPO, curates 115K fact-verification claims down to 5K, and reports 86.3 in-domain and 69.8 out-of-domain balanced accuracy across 11 claim-verification benchmarks.
#Reasoning#Alignment#Benchmarking#DecomposeRL
why featured
HKR-K has concrete mechanisms and numbers, and HKR-R fits fact-checking and traceability. This remains a narrow arXiv methods paper with no product impact or top-lab spread, so it stays in 60–71.
editor take
DecomposeRL-7B hits 86.3/69.8 from 5K claims; I buy the training funnel, not traceability-as-trust.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models
The paper introduces residualized temporal SAEs for diffusion activation trajectories, representing each trajectory with an initial activation and residuals after linear prediction between neighboring denoising steps, then evaluates the method on Stable Diffusion 1.5 through reconstruction, ablation studies, spatiotemporal feature analysis, and qualitative steering experiments.
#Vision#Interpretability#Stable Diffusion#Research release
why featured
HKR-K/R pass: the method is specific and tested on Stable Diffusion 1.5, with relevance to interpretability and steering. HKR-H is weak, and this is an arXiv research release without product impact or cross-source heat.
editor take
Residualized temporal SAE is tested on SD 1.5; I buy the direction, but qualitative steering is not an interpretability proof.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
ProgVLA: Progress-Aware Robot Manipulation Skill Learning
ProgVLA uses a 0.1B-parameter VLA model for robot manipulation, compressing visual, language, and proprioceptive streams with two-stage Perceiver resampling while training progress heads with offline RL targets; the paper reports competitive success rates on two multi-task manipulation benchmarks and stronger results on long-horizon and harder tiers versus larger pretrained baselines.
#Robotics#Multimodal#Vision#Research release
why featured
HKR-H and HKR-K pass: the small VLA, progress modeling, and benchmark claims are concrete. HKR-R is weak because this remains an arXiv robot-manipulation paper, far from product impact, so it sits in the 60–71 band.
editor take
ProgVLA runs manipulation at 0.1B params; I buy the Perceiver compression, while progress heads read like a long-horizon patch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Where LLM Annotators Fail: Label-Free Learning on Graphs with LLMs
The paper proposes CANE, a label-free graph learning framework that estimates cluster-conditional LLM reliability without ground-truth labels, then selects pseudo-labels to trust or correct, and reports gains over the strongest label-free baselines across multiple graph benchmarks and GNN backbones, with largest improvements under stronger cluster-conditional noise.
#RAG#Fine-tuning#Benchmarking#Research release
why featured
HKR-H/K/R pass, but the story is niche graph-learning research. The summary gives a mechanism and benchmark scope, not code, scale numbers, or production impact, so it stays in the 60–71 band.
editor take
CANE models cluster-conditional LLM label noise; gains are undisclosed, but regional reliability beats global confidence for graph labels.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Can Entry-Wise Clipping Give Spectral Control of Stochastic Gradients?
The paper proposes entry-wise smooth shrinkage for heavy-tailed stochastic gradient noise, proves an O(ε^-4) convergence guarantee under Cauchy-contaminated noise, and reports about 7% token savings over Adam on NanoGPT pretraining plus about 2% additional savings when applied before Muon spectral normalization.
#Fine-tuning#Inference-opt#Benchmarking#NanoGPT
why featured
HKR-K and HKR-R pass via the mechanism, convergence bound, and NanoGPT token number. HKR-H is weak because the title is niche stochastic optimization, so it stays in the lower 60–71 band.
editor take
Entry-wise smooth shrinkage saves ~7% tokens on NanoGPT; I buy the direction, but Cauchy noise still needs real pretraining evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
A Simple State Space Model Excels at Multivariate Time Series Classification
Hassan Saadatmand and coauthors compare S4D with Mamba-family models across 59 MONSTER and UEA datasets against 15 baselines; their MS4 and MS4N variants outperform Mamba-based models in accuracy and efficiency, while MS4N matches or exceeds deep learning competitors with roughly 2x and 10x more parameters.
#Benchmarking#Inference-opt#Hassan Saadatmand#Geoffrey I. Webb
why featured
HKR-H and HKR-K pass: the title has a small-model-beats-large-model hook, and the post gives 59 datasets plus 15 baselines. The topic is multivariate time-series classification, far from AI product or agent workflows, so it stays in 60–71.
editor take
MS4N beats Mamba variants on 59 TSC datasets; for time series, input-dependent transitions look overbuilt.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training
The paper studies HiF8 W8A8 QAT on OpenPangu-Embedded-1B across eight controlled experiments, identifies amax saturation and catastrophic forgetting, and uses a 64-step max-algorithm DTS plus a 500-step BF16 warmup before lr=1e-5 QAT to limit the MMLU drop to 0.43% versus a matched BF16 baseline.
#Fine-tuning#Inference-opt#Benchmarking#OpenPangu
why featured
HKR-K and HKR-R pass: the paper gives reproducible training settings and an accuracy-loss number tied to cheaper deployment. HKR-H fails, and the HiF8/W8A8 QAT scope keeps it in the lower all band.
editor take
OpenPangu-Embedded-1B loses 0.43% MMLU with 64-step max DTS and 500-step BF16 warmup; QAT loss-only checks are broken.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Safe In-Context Reinforcement Learning
The paper introduces SCARED for in-context reinforcement learning, using a constrained Markov decision process and exact-penalty dual method to keep accumulated cost within a user-specified safety budget during parameter-update-free adaptation, while the abstract does not disclose benchmark names or numerical results.
#Agent#Reasoning#Safety#SCARED
why featured
HKR-K and HKR-R pass: SCARED gives a concrete safety-budget mechanism for ICRL agents. HKR-H is weak, and no experiment numbers or artifact are disclosed, keeping it in the 60–71 band.
editor take
SCARED constrains ICRL test-time cost to a user budget; benchmarks and numbers are undisclosed, so I don't buy the “first method” framing yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting
The paper proposes Beta-Bernoulli Calibrator, which converts any model’s point forecast into a Beta distribution and trains with both binary outcomes and human forecasts, using variance as epistemic uncertainty.
#Alignment#Benchmarking#Research release
why featured
HKR-H and HKR-K pass because the paper has a concrete uncertainty-calibration mechanism. HKR-R is weak: the abstract gives no effect size, benchmark spread, or deployment path, so it stays in the lower research-release band.
editor take
BBC turns point forecasts into Beta distributions; I buy the direction—stop trusting verbal confidence in LLM forecasting.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Hybrid Neural World Models
The paper presents Hybrid Neural World Models, using one continuously horizon-conditioned network to predict any future physical state in one forward pass. On PDE environments, the surrogate reports 26x to 72x CPU speedups versus textbook solvers. Its per-trajectory error map gates reference-solver fallback and roughly halves residual error at the default operating point.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes with a concrete mechanism and 26x-72x speedup. HKR-H and HKR-R are weak, and the PDE/numerical-simulation setting keeps it relevant but not featured.
editor take
Hybrid Neural World Models reports 26x-72x CPU speedups; I trust the fallback gate more than pure surrogates around shocks.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Efficient Pre-Training of LLMs through Truncated SVD Layers
The paper introduces TSVD, a pretraining framework that keeps LLM layers low-rank and strictly orthonormal during training, using a spectral-energy heuristic for adaptive rank selection and a caching mechanism for orthonormality; the abstract says TSVD matches or exceeds full-parameter baselines and reduces compute, but the snippet does not disclose exact model sizes or compute-reduction numbers.
#Inference-opt#Research release
why featured
HKR-K passes on concrete mechanisms and HKR-R passes on training-cost relevance, while HKR-H is weak. With no compute-reduction ratio or large-scale reproduction details, this stays in the ordinary research-release band.
editor take
TSVD claims full-parameter parity, but model sizes and compute cuts are undisclosed; low-rank pretraining again hits the reproducibility ledger.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
When Do Complex-Valued Neural Networks Help? A Study of Representation, Geometry, and Optimization
The paper compares CVNNs with six real-valued baselines across RF, quantum-wavefunction, and EEG analytic-signal tasks; on RadioML 2018.01A, a CReLU complex model leads the best real baseline by 22.94 percentage points under matched shared-trial selection, but the gap falls to 2.46 points under independent per-family tuning with the same 16-trial search space.
#Benchmarking#RadioML#Research release#Benchmark
why featured
HKR-H/K/R pass, but the topic is academic and centered on CVNNs and RadioML, far from most AI practitioners' product decisions. This fits the 60–71 band, not featured.
editor take
CVNN’s RadioML lead drops from 22.94 to 2.46 points; smells like benchmark tuning failure, not a complex-network win.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Researchers partially reverse-engineered a convolutional RNN trained with model-free reinforcement learning on Sokoban, finding that hidden-state “path channels” store future moves and that convolutional kernels between those channels encode position changes for each action, while negative activations at obstacles propagate backward to prune invalid plan steps.
#Interpretability#Reasoning#Research release
why featured
HKR-H/K pass: the title and summary state a testable planning mechanism, but the subject is a Sokoban conv-RNN far from frontier models or products. Technical specificity keeps it in the 60–71 band.
editor take
A Sokoban RNN stores future moves in hidden-state path channels; small-world circuit work beats vague RL reasoning claims.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Assessing Factual Music Comprehension in Large Audio Language Models
The paper introduces a factual music evaluation protocol for LALMs, defines six information-retrieval tasks across MusicNet, Free Music Archive, and OverClocked ReMix, and benchmarks nine models, including Gemini and Music Flamingo, using Precision, Recall, and F1.
#Audio#Multimodal#Benchmarking#Gemini
why featured
HKR-K passes because the paper gives a reproducible music-fact evaluation setup. HKR-H and HKR-R are weak; the summary does not disclose model names, result gaps, or failure cases, so this stays niche.
editor take
This tests 9 LALMs on 6 music retrieval tasks; MusicQA gets called out, and audio eval finally retreats to verifiable facts.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
HGMEM: Hypergraph-Based Working Memory to Improve Multi-Step RAG
HGMEM represents working memory as a hypergraph, with hyperedges acting as memory units for multi-step RAG in long-context relational modeling; the abstract says it outperforms strong baselines across several global sense-making benchmarks, but the post does not disclose exact scores.
#RAG#Memory#Reasoning#HGMem
why featured
HKR-H and HKR-K pass: the title has a hypergraph-memory hook and the summary gives the hyperedge memory mechanism. No benchmark numbers, artifact, or deployment condition keeps it in the normal research band.
editor take
HGMEM turns RAG memory into hypergraphs, but exact scores are absent; nice idea, not SOTA until tables land.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning
arXiv:2605.01046v3 proposes a Fisher-guided LoRA initialization method that uses downstream-data-induced curvature to select low-rank adaptation directions, and the abstract says it improves performance across tasks and modalities over existing approaches, but the post does not disclose metric values or model names.
#Fine-tuning#Multimodal#Benchmarking#Research release
why featured
HKR-K passes on a concrete Fisher-subspace mechanism for LoRA direction choice. HKR-H/R are weak because the summary gives no metrics, code, or cost impact, so this stays in the interesting band.
editor take
Fisher-LoRA picks low-rank directions via downstream curvature; no metrics disclosed, so I buy the mechanism, not the “significant” claim.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning
The paper introduces SID, a decentralized multi-robot motion planning framework that uses CADM to simulate neighboring robots’ future trajectories and constrain each robot’s own plan, with experiments scaling to 108 robots and 160 obstacles while reporting better planning effectiveness and constraint satisfaction than baselines.
#Robotics#Reasoning#Research release
why featured
HKR-K passes with a concrete mechanism and 108-robot, 160-obstacle setup. HKR-H/R are weak: the title is academic and the robotics-planning audience is narrow, so it stays in all.
editor take
SID scales to 108 robots and 160 obstacles; simulation constraints beat local snapshots, but real communication noise is undisclosed.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Soft Specialists: α-Rényi Ensembles for Uncertainty-Aware LLM Post-Training
The paper proposes an α-Rényi variational framework for LLM post-training. It learns an ensemble of LoRA adapters on a shared frozen base model, softly routes training examples across members, and covers supervised fine-tuning plus preference optimization.
#Fine-tuning#Alignment#Research release
why featured
Single arXiv method paper with concrete HKR-H/HKR-K hooks, but no result numbers, code, or production case. HKR-R misses, so it stays in the lower generic research band.
editor take
Soft Specialists trains softly routed LoRA ensembles; scale is undisclosed, so I’d file it as a framework bet on post-training uncertainty.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse
The paper proposes sink-aware training with an auxiliary load-balancing loss for attention layers, testing it under three mechanisms: Vanilla Attention, Sink Attention, and Gated Attention, while arguing that attention sinks naturally form an MoE structure and explain head collapse.
#Reasoning#Inference-opt#Benchmarking#GPT-OSS
why featured
HKR-H and HKR-K pass through the architecture hook and named mechanism, but HKR-R fails. The article gives no metrics, model scale, or reproducibility details, so it sits in the lower 60–71 band.
editor take
Sink-aware training adds load-balancing loss; experiment scale is undisclosed, so I’d treat it as a head-collapse diagnostic lead.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates
AdaDPO outperforms DPO on Llama-3-8B-Instruct trained with UltraFeedback, achieving higher length-controlled win rates in 81% of hyperparameter combinations on AlpacaEval 2 and a best LC score of 48.3%.
#Alignment#Fine-tuning#Benchmarking#Llama
why featured
HKR-K and HKR-R pass via concrete AlpacaEval 2 numbers and DPO tuning pain. HKR-H fails; this is a narrow preference-optimization paper without an artifact or production-level claim, so it stays in the lower interesting band.
editor take
AdaDPO beats DPO in 81% of hyperparameter settings; loss-only changes make it a cheap default candidate for preference tuning.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning
The paper introduces b1, a post-training framework that uses a Monotonic Entropy Descent objective and reinforcement learning to learn dynamic-size reasoning blocks for diffusion LLMs, reporting consistent gains over fixed-size block baselines while releasing code on GitHub.
#Reasoning#Fine-tuning#Benchmarking#arXiv
why featured
HKR-K passes with a concrete mechanism and open code. HKR-H/R are weak because this is a niche dLLM post-training paper with limited product or practitioner impact, so it stays in the all band.
editor take
b1 trains dynamic reasoning blocks via monotonic entropy descent; gains aren’t disclosed, so I read this as a dLLM decoding patch.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations
The paper proposes SA-GSAE, using two-sided gated sparsity, a signed-magnitude path, and auxiliary reconstruction; across six activation cells from Pythia-1B and SmolLM3-3B, the half-width model strictly Pareto-dominates a full-width 2H Gated SAE on three cells and matches R² within 0.025 on the other three.
#Interpretability#Pythia#SmolLM3#Research release
why featured
HKR-K passes on mechanism and six activation tests; HKR-R is limited to interpretability/safety specialists, and HKR-H suffers from jargon. A single arXiv method paper without a repo or production claim stays in all.
editor take
SA-GSAE wins 3 of 6 activation cells; splitting opposite-sign concepts into two latents is real SAE capacity waste.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations
Oryx switches between attention and linear recurrent mixers within a sequence while sharing at least 90% of parameters; at 1.4B scale, every Oryx instance beats its corresponding baseline by at least 0.7 percentage points on averaged language modeling tasks.
#Reasoning#Inference-opt#Benchmarking#Oryx
why featured
HKR-H and HKR-K pass via the hybrid mixer mechanism and concrete numbers: 90% sharing, 1.4B scale, +0.7pp. HKR-R is weak because there is no major lab, code artifact, or product implication, so this stays in all.
editor take
Oryx 1.4B shares ≥90% weights and still gains 0.7pp; <10% attention matching Transformer retrieval is the compute story.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Singular Vectors of Attention Heads Align with Features
The paper tests whether singular vectors of attention matrices align with features in a model with directly observable features, derives conditions under which alignment is expected, and uses sparse attention decomposition as a testable prediction for real language models where feature representations are not directly observable.
#Interpretability#Research release
why featured
HKR-K is clear: the paper offers a testable claim about attention singular vectors aligning with features. HKR-R is limited to interpretability/safety readers, while HKR-H is weak, so this stays in all.
editor take
The paper gives theory for attention singular vectors aligning with features; I buy half, since real-model evidence stays indirect.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
CAREF: Calibration-Aware Regularization for Explanation Faithfulness Without Rationale Supervision
CAREF evaluates Flan-T5 on four NLE benchmarks, and the lightweight CAREF-AQ variant reaches 89.04 average accuracy and 81.00 nBERT explanation alignment with 6.43% trainable parameters, outperforming LoRA and AdaLoRA without rationale supervision.
#Fine-tuning#Alignment#Interpretability#CAREF
why featured
HKR-K passes with concrete setup and metrics; HKR-H is weak because the headline reads like a methods paper; HKR-R is limited to the interpretability niche. No hard exclusion applies, so this sits in the interesting-but-not-featured band.
editor take
CAREF-AQ hits 89.04 accuracy with 6.43% trainable params; I buy the direction, but nBERT faithfulness is thin proof.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
QuITE Query-Based Irregular Time Series Embedding Method Released
QuITE uses learnable query tokens and one self-attention layer to aggregate irregular multivariate time-series observations, producing backbone-compatible latent representations without interpolation or architecture changes; experiments on real-world benchmarks report average relative gains up to 54.7% in forecasting and 15.8% in classification across datasets and backbone architectures.
#Embedding#Benchmarking#arXiv#GitHub
why featured
HKR-K and HKR-R pass: the post gives a query-token/self-attention mechanism and a 54.7% average relative gain. HKR-H fails because the title is niche and low-drama, so this stays in all.
editor take
QuITE reports up to 54.7% forecasting gains; I like pushing irregular-time handling into embeddings, but baselines need code-level scrutiny.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Unified Framework for Robust Supervised Learning Optimization
The paper decomposes robust supervised learning into four sequential stages and uses joint hyperparameter optimization across tabular, image, and reward-modeling benchmarks, where the unified design space is competitive with the best single-method baseline in each setting; the abstract does not disclose model sizes, datasets, or compute costs.
#Fine-tuning#Benchmarking#arXiv#Research release
why featured
HKR-K passes on the 4-stage mechanism and cross-benchmark optimization result. HKR-H/R are weak: the title is paper-like, with no industry hook or debate trigger; no hard-exclusion rule applies.
editor take
The paper unifies robust training into 4 stages; datasets and compute are undisclosed, so treat it as tuning infrastructure, not a new robustness method.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
The paper proposes a multi-AUV multi-agent reinforcement learning method for multi-day Douro River plume mapping, using intermittent central coordination, spatiotemporal GPR, and a multi-head Q-network controller; Delft3D simulations show that doubling the AUV count can more than double endurance in some cases while maintaining or improving accuracy.
#Agent#Robotics#Reasoning#Douro River
why featured
HKR-K passes via concrete MARL/AUV mechanisms and simulation results. HKR-H/R are weak because the angle is niche ocean robotics, with no hard-exclusion trigger, so it sits in the 60–71 band.
editor take
Multi-AUV MARL maps plumes for days in Delft3D; doubling vehicles sometimes beats 2x endurance, but sea-trial proof is absent.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
DSSE: A Drone Swarm Search Environment
DSSE provides a PettingZoo-based drone swarm search environment for single-agent or multi-agent reinforcement learning, where drones search for shipwrecked people without knowing target positions or receiving distance-based rewards, and instead receive cell-level target probabilities as dynamic inputs; a peer-reviewed paper describing software version 2 has been published in JOSS with DOI 10.21105/joss.06746.
#Agent#Robotics#DSSE#PettingZoo
why featured
HKR-K passes on concrete artifacts: PettingZoo env, cell-level target probabilities, and JOSS v2. HKR-H and HKR-R are weak, so this stays in the lower 60s as niche multi-agent robotics infrastructure.
editor take
DSSE v2 landed in JOSS; no distance reward forces policies to use probability maps, which makes this less toy-like.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning
ECHO controls branch width in test-time reinforcement learning using local entropy and group-level confidence, then prunes persistently low-confidence branches online; the abstract says it improves results on multiple mathematical and visual reasoning benchmarks, but the post does not disclose exact scores or benchmark tables.
#Reasoning#Vision#Benchmarking#ECHO
why featured
This reasoning-optimization paper hits HKR-K with a concrete branching/pruning mechanism. Exact scores are not disclosed, and HKR-H/R are weak, so it fits all rather than featured.
editor take
ECHO gates test-time branches with entropy and confidence; no scores disclosed, so I read it as budget control, not reasoning progress.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions
ReSAE fits affine maps between selected transformer layers and trains later-layer SAEs on unexplained residuals; on Pythia-1.4B and Gemma-2-9B, it reduces decoder redundancy and recovers more cross entropy under multi-layer replacement despite reconstructing less raw activation variance.
#Interpretability#Pythia#Gemma#Research release
why featured
HKR-K passes for a concrete ReSAE mechanism and evaluation on Pythia-1.4B/Gemma-2-9B. HKR-H and HKR-R are weak; this is a specialist arXiv interpretability method, so it stays in the 60–71 band.
editor take
ReSAE improves multi-layer cross-entropy recovery on Pythia-1.4B and Gemma-2-9B; layerwise SAE training deserved this hit.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Rethinking Calibration for Early-Exit Neural Networks
The paper introduces Early-Exit Failure Prediction for early-exit neural networks, combining prediction correctness with the cost of further computation, and reports better cost-accuracy trade-offs than calibration; the RSS snippet names no datasets, model architectures, or numeric results, while code is available on GitHub.
#Inference-opt#Benchmarking#Research release#Open source
why featured
HKR-K is clear and HKR-R is weak but present: EEFP reframes early-exit calibration as joint prediction of correctness and continuation cost, with code. The topic is specialized and lacks product pull, so it stays in the 60–71 band.
editor take
EEFP scores correctness plus continuation cost; no datasets or numbers in the snippet, so don’t retire calibration baselines yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving
The paper proposes SARAD, a hybrid autonomous-driving framework that replaces DRL random exploration with RAG-enhanced LLM-guided decisions and adds a fine-tuned collision predictor; the abstract reports Highway-Env experiments but does not disclose exact performance numbers.
#RAG#Agent#Fine-tuning#SARAD
why featured
HKR-K/R pass: SARAD gives a mechanism and Highway-Env test condition, but no lift numbers, code, or road validation. HKR-H is weak, so this stays in the all tier.
editor take
SARAD tests on Highway-Env, but gives no gains; I don't buy “LLM replaces exploration” until latency is priced.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Sentence Curve Language Models
The paper proposes SCLM, a diffusion language model that predicts spline-based sentence curves instead of static target word embeddings, and reports state-of-the-art results among DLMs on IWSLT14 and WMT14 while maintaining stable training without burdensome knowledge distillation, with additional comparison against discrete DLMs on LM1B.
#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the mechanism and benchmark claims are concrete. HKR-R is weak; DLM and spline embeddings stay research-heavy, with no product impact or reproducibility details disclosed.
editor take
SCLM tops DLMs on IWSLT14 and WMT14; clever spline targets, but DLM-only wins don't threaten autoregressive LMs yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Researchers propose Graph Memory Transformer architecture to replace FFN sublayers
Graph Memory Transformer replaces the FFN sublayer in a decoder-only Transformer with an explicit learned memory graph. The studied v7 model has 16 blocks, 128 centroids per block, and 82.2M trainable parameters; it trails a 103.0M dense GPT-style baseline on validation loss and perplexity, 3.5995/36.58 versus 3.2903/26.85.
#Memory#Interpretability#Benchmarking#Graph Memory Transformer
why featured
HKR-H/K pass: the mechanism is concrete, and the 82.2M GMT losing to a 103.0M dense GPT gives real signal. No production claim, open-source impact, or major-lab weight keeps it in the ordinary research band.
editor take
GMT v7 drops FFNs at 82.2M params, but perplexity 36.58 trails 103.0M GPT’s 26.85; interpretability pays, performance doesn’t yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Explicit Critic Guidance for Aligning Diffusion Models
The paper proposes a state-aligned latent actor-critic framework for diffusion post-training, where the diffusion model predicts timestep-conditioned values on noisy latent states and uses trajectory-level PPO, with experiments covering UNet- and DiT-based backbones on single-reward and multi-reward benchmarks.
#Fine-tuning#Alignment#Inference-opt#Research release
why featured
HKR-K passes on a concrete post-training mechanism, but the post gives no result numbers, model scale, or artifact. HKR-H and HKR-R are weak, so this fits the 60-71 research-release band.
editor take
The paper makes diffusion models value noisy latent states; I buy the direction, but RSS omits benchmarks and gains.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity
The paper evaluates RAT+’s exponentially decaying memory with Quest, MoBA, and SnapKV, reporting accuracy gains over standard attention across sparse budgets on eight needle-in-a-haystack tasks and on OLMo2-7B after 10B-token continued pretraining.
#Inference-opt#Memory#Benchmarking#RAT+
why featured
HKR-K passes on the RAT+ exponentially decaying memory mechanism and 8 needle tasks across sparse budgets. HKR-H is weak; no latency, cost, or deployment numbers are disclosed, so it stays in all.
editor take
RAT+ improves Quest, MoBA, and SnapKV on 8 needle tasks; with 10B continued training, don't extrapolate to real long-doc workloads.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Trust Region Continual Learning as an Implicit Meta-Learner
The paper proposes trust region continual learning, combining generative replay with a Fisher-metric constraint; on task-incremental diffusion image generation and continual diffusion-policy control, it reports better final performance, retention, and faster early-task recovery than EWC, replay, and continual meta-learning baselines.
#Fine-tuning#Memory#Benchmarking#Research release
why featured
HKR-K passes because the mechanism and test settings are concrete. HKR-H and HKR-R are weak: the title is dry, and the impact is mostly confined to continual-learning and diffusion-control researchers, so it lands in the 60–71 band.
editor take
TRCL beats EWC and replay on diffusion generation and diffusion-policy streams; I buy the mechanism, not broad transfer claims.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Hierarchical Synthetic Tabular Data Generation: A Hybrid Top-Down and Bottom-Up Framework
The paper proposes H-TDBU for synthetic tabular data generation, combining top-down logical constraints with bottom-up lightweight tabular generators, and reports improved train-synthetic-test-real performance over neural baselines on weak multimodal financial benchmarks using tabular and sentiment-text data.
#Multimodal#Benchmarking#Research release#Benchmark
why featured
HKR-K/R pass: the abstract gives the H-TDBU mechanism and TSTR setup, and synthetic data touches privacy and data scarcity. No improvement size, code artifact, or production replacement claim is disclosed, so it stays in the 60-71 research-tail band.
editor take
H-TDBU beats neural baselines on weak financial multimodal TSTR; I want ablations and data scale, both undisclosed in the abstract.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Bilinear Coordinate Alignment for Training-Free Task-Vector Transfer
BiCo formulates task-vector transfer as dual-space alignment and estimates orthogonal Procrustes mappings on both activation and gradient sides with one forward-backward pass over a small calibration set, without any parameter updates.
#Fine-tuning#Benchmarking#BiCo#arXiv
why featured
HKR-K passes because the method gives a concrete mechanism for training-free task-vector transfer. HKR-H/R are weak: it is a single arXiv technical paper with no benchmark numbers or production-replacement claim.
editor take
BiCo estimates dual Procrustes maps in one forward-backward pass; no gap numbers disclosed, but it looks like a serious task-vector baseline.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
LiDDA: Data Driven Attribution at LinkedIn
LinkedIn presents LiDDA, a unified transformer-based attribution method for member-level data, aggregate-level data, and external macro factors; the abstract says it was implemented at large scale, but the post does not disclose impact metrics or deployment details.
#Reasoning#LinkedIn#Research release
why featured
HKR-K passes: LiDDA uses one Transformer over member-level, aggregate, and macro signals, with claimed LinkedIn-scale deployment. HKR-H/R are weak, and metrics are not disclosed, so it stays in the ordinary research-release band.
editor take
LinkedIn unifies three attribution data types with a Transformer; no lift or A/B disclosed, so treat the ad-attribution paper as PR for now.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Energy-Structured Low-Rank Adaptation for Continual Learning
The paper proposes E²-LoRA for continual learning, preserving parameters along principal directions of output feature drift and using dynamic rank allocation to balance stability and plasticity across multiple benchmarks.
#Fine-tuning#Reasoning#Benchmarking#Research release
why featured
HKR-K passes: E²-LoRA gives a testable mechanism for parameter retention and dynamic rank allocation. HKR-H/R are weak; no benchmark numbers, code, or production impact are disclosed.
editor take
E²-LoRA allocates rank by output-drift directions; benchmarks and model sizes are undisclosed, so task-order robustness is the test.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Mahalanobis PatchCore: Covariance-Aware and Streaming-Compatible Industrial Anomaly Detection
Mahalanobis PatchCore implements Mahalanobis retrieval by whitening embeddings with a regularized covariance model, evaluates on a 15-category public benchmark and three industrial datasets, cuts peak memory from 5.41 GB to 2.78 GB, and raises selected industrial mean image-level AUROC from 0.981 to 0.986.
#Vision#Embedding#Inference-opt#PatchCore
why featured
HKR-K passes with concrete benchmark, memory, and AUROC numbers. HKR-H is weak, HKR-R is narrow to industrial anomaly detection; no hard exclusion, so it stays in the interesting band.
editor take
Mahalanobis PatchCore cuts peak memory to 2.78GB; AUROC rises only 0.005, so the win is streaming training.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
TinyDéjàVu: Smaller RAM and Faster Inference with Neural Networks on MCUs for Sensor Data Streams
TinyDéjàVu reduces RAM usage by up to 90% versus StreamiNNC on overlapping sliding-window sensor streams, while keeping equal compute latency in reproducible benchmarks on Arm Cortex-M microcontroller hardware.
#Inference-opt#TinyDéjàVu#Arm#StreamiNNC
why featured
HKR-K is solid: Arm Cortex-M sensor-stream inference gets up to 90% lower RAM with unchanged latency. HKR-H and HKR-R are weak because the topic stays inside embedded inference optimization.
editor take
TinyDéjàVu saves up to 90% RAM on Arm Cortex-M; on 128KB MCUs, memory dies before FLOPs.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Learning Compositional Latent Structure with Vector Networks
The paper introduces Vector Network, a hierarchical recurrent architecture that replaces fixed weight matrices with reusable rank-1 weight atoms. It is evaluated on four compositional benchmarks, and its out-of-distribution error is often about one order of magnitude lower when familiar factors are recombined in novel ways.
#Reasoning#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes: Vector Networks add a testable rank-1 weight-atom mechanism and 4 compositional benchmarks. HKR-H and HKR-R are weak, so this sits in the 60–71 band.
editor take
VN uses rank-1 weight atoms across 4 compositional benchmarks; 10x lower OOD error is tasty, pending code and tougher baselines.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data
The paper evaluates LIME, Kernel SHAP, and Feature Ablation on 32 tabular classification datasets. It measures local explanation faithfulness, robustness, and complexity, then compares consensus-correct and consensus-wrong samples across multiple machine-learning models.
#Interpretability#Benchmarking#LIME#SHAP
why featured
HKR-K passes: 32 tabular classification datasets and three local explainability methods give testable detail. HKR-H/R are weak, making this a narrow research benchmark below featured.
editor take
This tests LIME, Kernel SHAP, and Feature Ablation on 32 tabular datasets; don’t let explanation scores launder model quality.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
SAME targets router drift and expert drift in multimodal continual instruction tuning, using orthogonal-subspace routing, curvature-aware scaling, and adaptive expert freezing; the abstract says code is available, but it does not disclose model size, task count, or exact benchmark scores.
#Multimodal#Fine-tuning#Benchmarking#LAMDA-CL
why featured
HKR-K passes via three named MCIT mechanisms and released code, but HKR-H and HKR-R miss. The abstract lacks model scale, task count, and scores, so this sits in the lower research-release band.
editor take
SAME targets MCIT router/expert drift, but gives no scale, task count, or scores; I’d treat SOTA as unverified.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Causal Machine Learning: A Survey and Open Problems
The survey defines CausalML as machine learning methods based on structural causal models and compares work across five problem groups, with applications in computer vision, NLP, graph representation learning, benchmarks, and open problems.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: this is a useful CausalML survey with a 5-part problem frame. HKR-H and HKR-R fail because the title lacks a fresh claim and the post has no product, safety, cost, or competitive hook.
editor take
CausalML survey maps SCM work into 5 groups; useful for LLM causal eval framing, not a new method.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing
The paper introduces FinTexTS, a financial text-paired stock-price dataset built with SEC-filing context, embedding-based news retrieval, and LLM classification into four levels: macro, sector, related company, and target company; the abstract reports improved stock-price forecasting, but the RSS snippet does not disclose dataset size or benchmark numbers.
#Embedding#Benchmarking#FinTexTS#SEC
why featured
HKR-K passes: FinTexTS adds SEC semantic matching and 4-level news pairing. HKR-H and HKR-R are weak, and dataset scale is not disclosed, keeping it in the upper low-value band.
editor take
FinTexTS uses SEC context plus 4-level news pairing, but gives no scale or benchmark numbers; for finance forecasting, that’s half a dataset card.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
The Principles of Diffusion Models
arXiv 2510.21890v2 updates a book manuscript on diffusion models. The abstract covers three views—variational, score-based, and flow-based—and frames sampling as solving a differential equation that transports noise to data along a continuous trajectory, with sections on guidance, efficient numerical solvers, and flow-map models.
#Inference-opt#Research release
why featured
HKR-K passes because the manuscript update lists 3 perspectives plus continuous reverse process and solver content. HKR-H/R are weak: it is a textbook-style research resource, not a product, model release, or industry conflict.
editor take
arXiv 2510.21890v2 updates a diffusion-model book; three views collapse into velocity fields—useful math base, not new SOTA.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
UniMaia: Steering Chess Policies with Language for Human-like Play
UniMaia modulates a frozen Lc0-based chess policy network with a parameter-efficient text encoder and a ControlNet-style conditioning mechanism for prompt control over openings and player strength; the arXiv abstract reports state-of-the-art expected accuracy on several prompt-conditioned benchmarks, but the RSS snippet does not disclose dataset size or exact accuracy numbers.
#Agent#Fine-tuning#Benchmarking#UniMaia
why featured
HKR-H/K pass: the language-controlled chess-policy angle is fresh, and the article gives a ControlNet-style Lc0 conditioning mechanism. Missing dataset size, accuracy, and product implications keep it below featured.
editor take
UniMaia freezes Lc0 and adds text conditioning; exact accuracy is undisclosed, but this beats making general LLMs play chess.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Decision-focused Learning for Optimal PV-Battery Scheduling
The study trains an LSTM photovoltaic forecaster with decision-focused learning for battery scheduling, and over a 14-month evaluation across 20 buildings it reduces average electricity costs by 3.6% versus a standard two-phase approach after normalization against perfect-forecast and no-optimization bounds.
#Reasoning#arXiv#Research release
why featured
HKR-K passes with a testable setup and cost-reduction number; HKR-H and HKR-R are weak. The topic is a narrow PV-battery scheduling application, far from core AI products, models, or tooling.
editor take
DFL-LSTM cut bills 3.6% across 20 buildings, 14 months; RMSE worsened 8.2% to 19.9%, another loss for forecast-first evals.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks
The paper compares Muon and Adam across point-cloud and molecular learning settings; on ModelNet40, Muon outperforms Adam across all evaluated equivariant and geometric architectures, with checkpoints showing higher stable and effective ranks plus more regular loss surfaces.
#Reasoning#Benchmarking#arXiv#Muon
why featured
HKR-K and HKR-R pass, but the work is centered on equivariant networks plus point-cloud/molecular tasks, far from product or mainstream LLM practice. No hard exclusion; score stays in the lower research band.
editor take
Muon beats Adam across ModelNet40 architectures; I’d reproduce first, since effect sizes and variance are not disclosed.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
SYNAPSE: Neuro-Symbolic Visual Thought-to-Text Decoding via Topological Semantic Denoising
SYNAPSE uses commonsense graph structure and latent exemplars at inference time to denoise EEG-derived semantic candidates, improving stability across multiple EEG decoding benchmarks and frozen LLM backends, while the abstract does not disclose exact scores or model names.
#Reasoning#Multimodal#Safety#SYNAPSE
why featured
HKR-H and HKR-K pass, but benchmark scores are not disclosed and EEG decoding is academic rather than product-relevant. This stays in the upper low-value band, not featured.
editor take
SYNAPSE only denoises EEG candidates at inference; no scores or backends disclosed, so I don’t buy the stability claim yet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Patched-DeltaNet: Token-Level Event-Driven Memory for Linear-Time Anomaly Detection
Patched-DeltaNet reports 0.957 ROC-AUC on the SMD benchmark. It reaches 0.822 PA-F1 and reduces complexity to O(L/P).
#Memory#Inference-opt#Benchmarking#Research release
why featured
HKR-K passes on concrete benchmark scores and a complexity claim. HKR-H/R fail because this is a niche anomaly-detection paper with limited industry conversation value, so it stays in the lower research-release band.
editor take
Patched-DeltaNet reports 0.957 ROC-AUC on SMD; O(L/P) is appealing, but RSS lacks the unified-eval details.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization
LUCID deployed AMRS on health-and-wellness platforms for clinical users and consumer-wellness modes, using a causal Transformer world model to predict engagement, binary rating, valence, and arousal from logged listening data. Under a strict cold-start protocol, DPO improves predicted valence and arousal over behavior cloning while preserving diversity; the abstract does not disclose dataset size or deployment metrics.
#Agent#Reasoning#LUCID#AMRS
why featured
HKR-K passes with a concrete mechanism and cold-start condition. HKR-H/R are weak: sample size is not disclosed, and wellness music recommendation is too narrow for featured coverage.
editor take
AMRS predicts four signals with a causal Transformer; no sample size disclosed, so I don’t buy “deployed validation” yet.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Latent Diffusion for Missing Data
The paper proposes a two-stage missing-data framework that uses a robust VAE imputer to learn latent features, then trains diffusion in that latent space, and reports stable sample quality under MCAR corruption with training missing rates up to 50%.
#Multimodal#Benchmarking#Research release#Benchmark
why featured
HKR-K passes because the paper states a concrete VAE-plus-diffusion mechanism and MCAR 50% condition. HKR-H and HKR-R fail: this is a narrow missing-data paper with no product, agent, or industry hook.
editor take
Latent diffusion stays stable at 50% MCAR missingness; I buy the direction, but datasets and metrics are undisclosed.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
A Methodology to Assess Power Modeling in Energy-Aware Federated Learning on Heterogeneous Mobile Devices
The paper proposes a CPU power estimation methodology for heterogeneous ARM mobile devices and evaluates it on two Android devices: the analytical model keeps prediction error below 10%, while the approximate model reaches up to 959% error.
#Benchmarking#ARM#Android#AnycostFL
why featured
HKR-H/K pass: the 959% error and 2-Android-device test add a hook and concrete numbers. HKR-R fails because mobile FL power modeling is niche and lacks product, capability, or competitive stakes.
editor take
Analytical power modeling stayed under 10% error on two Android phones. A 959% approximate-model miss breaks FL energy scheduling claims.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity
The paper proposes QEMPO, which maximizes output entropy under a quality constraint and supports online and offline training; the abstract does not disclose benchmark names, model sizes, or specific diversity and quality gains.
#Alignment#Fine-tuning#Reasoning#Research release
why featured
HKR-K passes for a concrete optimization mechanism, but the post gives no benchmarks, model sizes, or gains. HKR-H and HKR-R are weak, so this stays in the 40–59 low-value research band.
editor take
QEMPO maximizes entropy under a quality constraint, but discloses no benchmarks or gains; don’t buy diversity-without-quality-loss yet.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Tackling Multimodal Learning Challenges with Mixture-of-Experts: A Survey
arXiv 2605.27431 surveys MoE for multimodal learning through three roles: an efficient multimodal engine, a representation learner, and an adapter for imperfect data such as modality imbalance and missing modalities.
#Multimodal#Inference-opt#Interpretability#Liangwei Nathan Zheng
why featured
HKR-K passes for a concrete 3-part taxonomy of multimodal MoE. HKR-H and HKR-R fail: no new model, benchmark, artifact, or practitioner nerve beyond a standard arXiv survey.
editor take
This IJCAI 2026 survey splits multimodal MoE into 3 roles; useful map, but no experiments, so don’t infer winners.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Graph Neural Networks for Source Detection: A Review and Benchmark Study
The paper reproduces four representative GNN architectures for epidemic source detection and benchmarks them against traditional and MLP baselines under controlled, comparable settings. Experiments report GNNs outperform all tested alternatives across multiple network topologies, while the authors release code and data on GitHub for reproducibility.
#Benchmarking#arXiv#GitHub#Shah and Zaman
why featured
HKR-K passes: 4 GNN architectures, comparable baselines, and GitHub code/data give testable value. HKR-H and HKR-R are weak, so this stays in all below the featured threshold.
editor take
The paper reproduces 4 GNNs for source detection; I buy the benchmark, but “substantially outperform” lives in the released topology and epidemic settings.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Benchmarking Inductive Biases for Multivariate Time-Series Anomaly Detection with a Robust Multi-View Channel-Graph Detector
The paper benchmarks 10 multivariate time-series anomaly detectors on five datasets with unified windowing, scoring, hardware, and metrics, and introduces a multi-view channel-graph detector that reaches 0.675 macro-average VUS-ROC, 5.1 points above LSTM-AE.
#Benchmarking#arXiv#MSDS#LSTM-AE
why featured
HKR-K passes with concrete benchmark setup and a reported VUS-ROC score. HKR-H and HKR-R are weak: the item is a narrow research metric story, not a product, ecosystem, or practitioner-wide debate.
editor take
This benchmarks 10 MTS anomaly detectors; 0.675 VUS-ROC is modest, but the MSDS event-density finding is the useful warning.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
PINE: Pruning Boosted Tree Ensembles with Conformal In-Distribution Prediction Equivalence
PINE prunes boosted tree ensembles by preserving prediction equivalence inside an in-distribution region, with its size controlled by one conformal calibration parameter, α. On 12 public tabular datasets, the method improves compression ratio by up to 30% while keeping prediction preservation comparable to existing faithful pruning methods.
#Inference-opt#Benchmarking#Research release#Benchmark
why featured
HKR-K passes on a clear mechanism and experiment numbers. HKR-H/R fail because boosted-tree pruning is narrow and distant from the LLM, agent, or product-deployment agenda.
editor take
PINE gets up to 30% more compression on 12 tabular sets; I buy the α knob, but OOD consistency is surrendered.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Do We Really Need Quantum Machine Learning? A Multidimensional Empirical Study
The paper benchmarks CSVM, QSVM, CCNN, and QCNN on MNIST across accuracy, runtime, parameters, and memory: QSVM reaches about 0.90 accuracy versus CSVM’s about 0.85 at 1,000 samples, while QCNN uses about 94% fewer parameters and 75% less memory than CCNN at higher feature counts.
#Vision#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the anti-hype question is clickable and the MNIST comparison gives concrete numbers. HKR-R fails because quantum ML remains niche with no product or engineering path disclosed.
editor take
QSVM hits 0.90 on 1k MNIST samples; I don’t buy “need QML” when runtime cost is the paper’s brake.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H1·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
The paper proposes falsification-driven RL for maritime motion planning, generating adversarial training scenarios where a vessel violates signal temporal logic traffic rules, and tests the method on open-sea navigation with two vessels for more consistent rule compliance.
#Agent#Robotics#Safety#Research release
why featured
HKR-K passes: falsification-driven training plus STL rules are concrete mechanisms. HKR-H/R are weak, and the maritime-navigation setting is narrow, so this stays in the lower research band.
editor take
Two-vessel open-sea tests keep this clean; STL falsification for RL is neat, but crowded ports remain unproven.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Research Shows Adversarial Fine-tuning Improves Robustness and Efficiency of Compressed Neural Networks
The paper evaluates adversarial fine-tuning for compressed neural networks and reports robustness comparable to adversarially trained models across several benchmark datasets while improving computational efficiency; the abstract does not disclose model architectures, dataset names, or numeric robustness gains, but it provides an open-source GitHub repository.
#Fine-tuning#Safety#Benchmarking#arXiv
why featured
HKR-K passes because the paper gives a concrete mechanism, benchmark evaluation, and code. HKR-H/R are weak: this is specialized robustness/compression work, useful but not a featured AI-industry story.
editor take
Compressed-model adversarial fine-tuning claims near adversarial-training robustness; architectures, datasets, and gains are undisclosed, so treat it as a reproducibility lead.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping
SmartIterator presents a six-phase visual analytics workflow for supervising unsupervised grouping across topic modeling, partition-based clustering, and density-based clustering, with IteraScope combining metric charts, Sankey-style transitions, embeddings, confidence plots, and HDBSCAN archetypes across three demonstrations.
#Benchmarking#Tools#SmartIterator#IteraScope
why featured
HKR-K passes with a 6-stage workflow, 3 task types, and 3 cases. HKR-H and HKR-R are weak; this is academic clustering visual analytics with limited near-term product signal for AI practitioners.
editor take
SmartIterator turns 3 clustering families into a six-phase review loop; I buy it, parameter sweeps beat single “best cluster” theater.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Local MDI+: Local Feature Importances for Tree-Based Models
The paper proposes Local MDI+, a sample-level feature importance method for tree-based models, and reports across 12 real-world benchmark datasets that using only its selected features yields an average 10% improvement in predictive performance.
#Interpretability#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with a named method, 12 datasets, and a 10% average gain. HKR-H/R are weak because local feature importance for tree models is niche traditional ML research with limited immediate industry pull.
editor take
Local MDI+ reports 10% gains on 12 datasets; TreeSHAP finally gets a structure-aware rival for tabular trees.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
AOE: Exhaustive Out-of-Distribution Detection via Recalibrating Outlier Labels
The paper proposes Adaptive Confidence Outlier Exposure, using a learnable temperature to convert model predictions on OOD samples into adaptive soft targets that retain class-wise relations while raising entropy; the abstract says experiments across multiple benchmarks show effectiveness, but the post does not disclose benchmark names, metric values, model backbones, or dataset counts.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-K passes via a concrete label-recalibration mechanism; HKR-H/R are weak and no metrics are disclosed. This is specialist OOD research, so it stays in the low-value all band.
editor take
AOE recalibrates OOD soft labels with learnable temperature; no benchmarks or numbers disclosed, so I file it as an OE patch.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Research Paper Proposes Insurance Pricing Optimization via Off-Policy Evaluation
The paper formulates insurance pricing as a decision-making problem, proposes a kernelized inverse propensity score estimator for variance reduction, and evaluates two pricing-rule methods—data-shared Lasso and neural-network policy parameterization—in a controlled synthetic travel insurance environment.
#Reasoning#Benchmarking#Research release
why featured
HKR-K passes via a concrete estimator and test setup, but HKR-H/R fail. Kernelized IPS for insurance pricing is a narrow statistical-actuarial topic, so hard-exclusion-technical-accessibility caps it below 40.
editor take
The paper optimizes insurance pricing with off-policy evaluation; validation is synthetic travel insurance, so NN gains need discounting.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Semantic-Aware Interpretable Multimodal Music Auto-Tagging
The paper presents a multimodal music auto-tagging framework that semantically clusters musically meaningful features and uses expectation maximization to assign weights to each group; the RSS snippet does not disclose dataset size or concrete performance numbers.
#Multimodal#Interpretability#Research release
why featured
HKR-K passes because the paper states a concrete mechanism, but dataset size and performance are not disclosed. The music-tagging angle is niche, so HKR-H and HKR-R fail and the item stays in the 40–59 band.
editor take
The paper uses semantic clustering plus EM weights for music tagging; no dataset or scores in RSS, so I don’t buy “competitive” yet.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors
The paper introduces MT-BKD, a Bayesian multi-teacher distillation method where a student learns from multiple teachers using teacher-informed priors and entropy-based weighting; experiments cover synthetic tasks, protein subcellular localization, and image classification, while the abstract does not disclose model sizes or exact accuracy gains.
#Fine-tuning#Inference-opt#Interpretability#Research release
why featured
HKR-K passes because the paper states a concrete mechanism and test domains. HKR-H/R are weak: the title is academic, and no metrics, code, or production relevance are disclosed.
editor take
MT-BKD spans 3 task types, but reports no sizes or gains; I don’t buy the generalization pitch without ablations.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Cost-Sensitive Evaluation for Binary Classifiers
The paper defines Weighted Accuracy and a reweighting framework for binary classifiers, proving that maximizing WA equals minimizing Total Classification Cost when unit classification costs are example-independent.
#Benchmarking#Research release#Benchmark
why featured
Niche classifier-evaluation theory paper: HKR-K has a new WA/reweighting framework and equivalence proof, but HKR-H is dry and HKR-R is limited to eval specialists; no product, model release, or broad industry trigger.
editor take
WA equals TCC minimization under example-independent unit costs; the useful punch is pushing class-imbalance fixes back to cost assumptions.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
Comparative Analysis of Liquid Neural Networks and LSTM for Sequential Pattern Recognition
Ye Kyaw Thu and coauthors compare CfC Liquid Neural Networks with LSTM across four sequential modalities and use temporal dropout to test robustness under missing data conditions.
#Benchmarking#Ye Kyaw Thu#Thazin Myint Oo#Thepchai Supnithi
why featured
HKR-K passes because the post names CfC vs. LSTM and temporal-dropout tests on 4 sequence data types. HKR-H/R fail: it is a niche academic benchmark with no product, open-source, or adoption hook.
editor take
CfC beats LSTM across 4 sequence modalities; effect sizes aren't disclosed here, so the clinical-utility claim stays undercooked.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
The paper proposes XTransfer for few-shot transfer of pretrained models across human-sensing modalities, using model repairing to adapt pretrained layers with limited sensor data and layer recombining to search and restructure source-model layers, but the abstract does not disclose dataset counts, accuracy numbers, or cost reductions.
#Multimodal#Fine-tuning#Inference-opt#XTransfer
why featured
HKR-K passes on the proposed mechanism, but the post gives no experimental numbers. The topic is niche academic ML, with no hard-exclusion trigger, so it stays in the low-value research band.
editor take
XTransfer uses repair and layer recombination for few-shot transfer; no datasets, accuracy, or cost numbers, so discount the SOTA claim.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
04:00
12d ago
arXiv · cs.LG· atomEN04:00 · 05·28
STARS: Spike Tail-Aware Relational Synthesis for ANN-to-SNN Data-Free Knowledge Distillation
STARS adds relational consistency alignment and tail-aware regularization to ANN-to-SNN data-free distillation, using teacher-derived thresholds and soft exceedance to synthesize batches, and reports gains up to 4.6% on CIFAR-10 and 6.7% on CIFAR-100 across multiple ANN-SNN pairs.
#Fine-tuning#Inference-opt#Benchmarking#STARS
why featured
HKR-K passes with concrete mechanisms and CIFAR gains. HKR-H/R fail: ANN-to-SNN distillation is specialist research with high access cost and no product or industry hook, so it stays in the low-value band.
editor take
STARS reports +6.7% on CIFAR-100; I buy the tail-constraint idea, but Tiny-ImageNet gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
01:20
12d ago
● P1HuggingFace Papers (takara mirror)· rssEN01:20 · 05·28
Research paper proposes method to infer large language model size from popular text memorization
The paper proposes a black-box method that uses only text fragments and next-token predictions to infer conservative lower bounds on LLM parameter counts from memorization of popular texts.
#Benchmarking#Interpretability#Research release
why featured
HKR-H/K/R all pass: the paper offers a testable black-box route to lower-bound LLM size from memorization. Missing model names and error numbers keep it in the 78–84 research band, not same-day P1.
editor take
Three sources all trace to one arXiv paper; closed labs should hate this because parameter secrecy is turning into a measurable side channel.
sharp
All 3 sources point to the same arXiv:2605.29223 paper, so this is attention around a method, not independent validation. The paper uses next-token memorization on popular texts to infer conservative parameter lower bounds, with fragment lengths, accuracy profiles, PCA latent index, and pairwise tests. I buy the attack surface, not the casual “it reveals true model size” reading. This measures a lower bound tied to memorized canonical text, so deduping, anti-memorization training, MoE routing, and distillation all distort it. Still, it hits a sore spot for closed labs: after GPT-4, parameter counts became product theater. Turning API completions into an audit probe makes that secrecy less durable, even if the estimates are noisy.
HKR breakdown
hook knowledge resonance
open source
92
SCORE
H1·K1·R1

more

feeds

admin