ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-06-08

173 papers · updated 3m ago
2026-06-08 · Mon
17:59
14h ago
NEW · 2 sourcesarXiv · cs.AI· atomEN17:59 · 06·08
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
OmniGameArena evaluates VLM game agents across 12 newly built UE5 games: 7 Solo, 3 PvP, and 2 Coop, while IDC tracks score changes and held-out variant behavior for 4 top agents after multiple reflection rounds.
#Agent#Vision#Benchmarking#OmniGameArena
why featured
HKR-H and HKR-K pass: the UE5 game setup and reflection-dynamics metric add concrete signal. HKR-R is weak, and this is a single arXiv benchmark without adoption, release details, or cross-source traction, so it stays in 60-71.
editor take
OmniGameArena tests 12 UE5 games and 12 VLMs; IDC reflection curves beat another cold-start leaderboard.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
17:55
14h ago
NEWarXiv · cs.AI· atomEN17:55 · 06·08
AHA-WAM: Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing
AHA-WAM uses a dual-DiT design to decouple low-frequency world planning from high-frequency action execution, reaching 92.80% average success on RoboTwin, 78.3% success across 4 real-world manipulation tasks, and 24.17 Hz closed-loop control with a 4.59x speedup over Fast-WAM.
#Robotics#Vision#Agent#AHA-WAM
why featured
HKR-K and HKR-R pass: the mechanism and metrics are concrete, and real-robot results matter. HKR-H is weak, and this is a single arXiv robotics paper with no product launch or source cluster, so it stays in the 60–71 band.
editor take
AHA-WAM hits 92.80% on RoboTwin, but only 4 real tasks; I'd inspect failure videos before buying the SOTA claim.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
17:53
14h ago
NEWarXiv · cs.AI· atomEN17:53 · 06·08
FASE: Fast Adaptive Semantic Entropy for Code Quality
FASE approximates code functional correctness with minimum spanning trees over structural and semantic dissimilarity graphs, and on HumanEval and BigCodeBench it improves Spearman correlation by 25% and ROCAUC by 19% versus LLM-entailment semantic entropy when using Qwen3-Embedding-8B.
#Agent#Code#Benchmarking#Qwen
why featured
HKR-K/R pass: FASE gives an MST approximation plus two testable benchmark gains, and code-agent evaluation is a real practitioner pain. HKR-H is weak, and this remains an arXiv benchmark paper without tooling or production proof.
editor take
FASE lifts Spearman 25% on HumanEval/BigCodeBench at 0.3% runtime cost; code-agent QA finally gets a cheap ruler.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
17:29
15h ago
NEW · 2 sourcesarXiv · cs.CL· atomEN17:29 · 06·08
Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan
The study converts community-sourced dictionaries into synthetic corpora and fine-tunes mT5-base with LoRA adapters; in-domain evaluation reaches BLEU 42.02, while an organic glossary test falls to BLEU 0.59.
#Fine-tuning#Benchmarking#Q'eqchi' Mayan#mT5
why featured
HKR-K and HKR-R pass: the paper gives a concrete PEFT setup and a sharp BLEU gap, 42.02 in-domain vs 0.59 organic vocab. HKR-H is weak; the scope is a niche NMT case study with limited product spillover.
editor take
mT5-base+LoRA hits BLEU 42.02 in-domain, 0.59 on organic glossary; synthetic data taught form, not language.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
17:11
15h ago
NEWarXiv · cs.CL· atomEN17:11 · 06·08
Collaborative Human-Agent Protocol (CHAP)
CHAP defines a shared workspace protocol for human-agent collaboration, using a Core with workspaces, participants, tasks, artifacts, and an append-only evidence log, while profiles add review, routing, handoff, identity, signatures, and transparency-backed audit.
#Agent#Tools#Memory#BrightbeamAI
why featured
HKR-K/R pass: CHAP offers concrete workspace and append-only evidence-log mechanics for human-agent collaboration. HKR-H is weak; adopters, benchmarks, and implementation maturity are not disclosed, so it stays in 60–71.
editor take
CHAP records human edits as diff, rationale, and hash; solid direction, but adoption hinges on MCP/A2A vendors.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
MAGE: All-[MASK] Block Already Knows Where to Look in Block Diffusion LLM
MAGE runs one exact attention pass at the first denoising step and reuses top-k index sets, matching Exact Attention at k=512 across three block-diffusion families on LongBench and reaching up to 6.82x end-to-end speedup at 128K context.
#Inference-opt#Benchmarking#MAGE#Quest
why featured
HKR-H/K/R pass, led by a concrete 6.82x 128K inference claim. The narrow block-diffusion-LLM scope keeps it below featured despite clear practitioner value.
editor take
MAGE hits 6.82x at 128K; the wild part is one All-[MASK] attention pass replaces later search.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Perplexity Can Miss SAE Feature Damage Under Quantization
The paper uses a frozen SAE to compare RTN-quantized activations on Pythia-70M and Gemma-2-2B, finding that Gemma-2-2B at INT7 improves perplexity while degrading 18.7% of active SAE features, and under sliding-window INT6 evaluation only 51.3% of active features survive.
#Interpretability#Inference-opt#Benchmarking#Pythia
why featured
HKR-H/K/R pass: the title has a counterintuitive metric failure, with 18.7% and 51.3% as testable numbers. Single arXiv paper plus SAE/RTN specificity keeps it below featured.
editor take
Gemma-2-2B INT7 improves perplexity yet damages 18.7% of SAE features; PPL is bad cover for quantized interpretability.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
PandaAI: A Practical Agent CQ2 for Neuro-symbolic Data Analysis and Decision-Making in Quantitative Finance
PandaAI tests a closed-loop neuro-symbolic LLM agent on CSI 300 stock data, reporting 18.2% higher Rank IC and 25.7% lower maximum drawdown than state-of-the-art time-series models.
#Agent#Reasoning#Fine-tuning#PandaAI
why featured
HKR-H/K/R pass, but this is a single arXiv quant-finance paper with limited authority and reproducibility detail. Defaulting to the lower band gives 70 and keeps it in all.
editor take
PandaAI reports 18.2% higher Rank IC on CSI 300; hold the finance-agent hype until splits and costs are disclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions
CrowdMath contains 164 expert-annotated progress chains from the 2016-2025 MIT PRIMES-AoPS CrowdMath program, and six frontier models reach 83-88% accuracy on next-post prediction while the best model scores only 0.42 macro-F1 on post-role classification.
#Reasoning#Benchmarking#MIT PRIMES#Art of Problem Solving
why featured
CrowdMath adds a concrete reasoning benchmark with 164 progress chains and two model-result contrasts, so HKR-K is strong and HKR-R is moderate; the dry paper framing keeps it below featured.
editor take
CrowdMath has 164 chains, yet role classification tops out at 0.42 macro-F1; MATH-style scores miss collaboration literacy.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws
The paper studies data-constrained pretraining with MIR on 72M to 1.4B parameter models and proposes SoftQ; SoftQ fits repeated-data experiments better than additive scaling laws and estimates MIR’s gain as roughly 1.3x more unique training data.
#Benchmarking#Research release#Open source
why featured
HKR-K is solid: 72M–1.4B models, MIR, SoftQ, and a 1.3x-data-equivalence claim. HKR-R hits data scarcity and training cost, while HKR-H is weak and the paper remains specialist, so it stays in all.
editor take
SoftQ prices MIR at 1.3x unique data; capped at 1.4B, this is not a rescue plan for frontier pretraining.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents
The paper proposes TRACE for monitoring long-horizon LLM agent trajectories, using a Triage-Inspect-Judge loop and reporting 0.713 aggregate F1 and 0.844 recall across ten SHADE-Arena task domains.
#Agent#Reasoning#Safety#TRACE
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and metrics, and agent monitoring matters to builders. It stays below featured because this is a single arXiv paper with no code or production validation disclosed.
editor take
TRACE hits 0.713 F1 on 10 SHADE-Arena domains; long-horizon agent monitoring is finally patching cross-step evidence.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Self-Evolving LLM Agents with In-Distribution Optimization
Q-Evolve evaluates a self-evolving LLM agent framework on AlfWorld, WebShop, and ScienceWorld; it trains an in-distribution critic from expert demonstrations plus agent trajectories, derives step-wise process rewards through advantage estimation, and reports stronger sample efficiency, robustness, and task performance than unnamed strong baselines.
#Agent#Reasoning#Research release#Benchmark
why featured
HKR-H/K/R all pass, but the article only gives arXiv-summary facts and no gain numbers, task difficulty, or lab authority. Defaulting to the lower band keeps it in all, not featured.
editor take
Q-Evolve tests 3 environments and labels step rewards via an IQL critic; unnamed strong baselines make “self-evolving” hard to buy.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA
The paper evaluates four Qwen2.5-7B-Instruct specialist agents on high-disagreement MedQA and MedMCQA subsets; on MedQA-250, the full system reaches ECE 0.091, a 74.4% reduction versus the single-specialist baseline, with AUROC 0.630 and 59.2% accuracy.
#Agent#Reasoning#Benchmarking#Qwen
why featured
HKR-K and HKR-R pass: 4 Qwen2.5-7B specialists and ECE 0.091 give testable signal, and medical calibration hits safety. HKR-H is weak, and this remains a single arXiv benchmark paper.
editor take
Four Qwen2.5-7B specialists cut MedQA-250 ECE to 0.091; at 59.2% accuracy, clinical deferral talk is premature.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails
SEAM detects scriptedness in interview speech using 8-second windows, reaches 0.971±0.004 ROC-AUC on an external interview-domain evaluation set, and reduces the quantized model footprint to 41.8MB.
#Audio#Benchmarking#Inference-opt#SEAM
why featured
HKR-H/K/R pass, but this is a single arXiv paper with metrics and size only; deployment cost, false-positive burden, and real platform validation are not disclosed, so it stays at the top of 60–71.
editor take
SEAM hits 0.971 AUC on 8-second audio; I like the shortcut-learning ablation more than another inflated audio benchmark.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models
The paper studies step-wise refusal dynamics in autoregressive and diffusion language models, showing that diffusion remasking can recover from harmful intermediate generations and that switching from AR to diffusion sampling improves jailbreak robustness under fixed weights; its SRI detector trains only on benign signals, while the abstract does not disclose sample size.
#Safety#Alignment#Benchmarking#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv paper with no sample size disclosed and no cross-source debate shown. Research-release signal fits 70, below featured.
editor take
Diffusion remasking recovers from harmful intermediates, but sample size is undisclosed; fixed-weight robustness would push safety work past token text.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication
The paper tests evidence-order sensitivity on 3,059 grounded items from FEVER, HotpotQA, NQ-Open, PopQA, and Controls, introducing QMV bounds and an ISR=1 answer/abstain gate; in a 528-item held-out audit, the gate reports 0.0-0.7% hallucination and 20.6-27.9% abstention with 95% confidence intervals.
#Reasoning#Alignment#Benchmarking#arXiv
why featured
HKR-K is strong with concrete numbers and mechanisms; HKR-R applies to evidence compression and hallucination tradeoffs. A single arXiv paper on binary adjudication is useful but not same-day featured material.
editor take
ISR=1 reports 0.0–0.7% hallucination on 528 audits; the 20.6–27.9% abstention makes it a verifier tool, not open-gen safety.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Closed-Form Spectral Regularization for Multi-Task Model Merging
The paper proposes SWUDI and SWUDI-A for training-data-free multi-task model merging, replacing iterative solvers with closed-form spectral filtering; across four general benchmarks and one multimodal merging benchmark covering VQA, Geometry, Chart, OCR, Grounding, and modality merging, the methods cut wall-clock time by 28-72x and peak GPU memory by up to 50%.
#Multimodal#Inference-opt#Benchmarking#arXiv
why featured
HKR-H/K/R pass on the 28–72x speed claim, closed-form mechanism, and GPU-memory cost angle. The topic is still a niche model-merging method paper, so it stays below featured.
editor take
SWUDI turns each-layer merging into one eigendecomposition and cuts time 28-72x; model merging finally looks deployable.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
SafeGene: Reusable Adapters for Transferable Safety Alignment
SafeGene represents safety as a reusable adapter, recalibrates layer-wise coefficients with few-shot data, and reduces harmful response rates across multiple model families and downstream tasks while preserving task performance.
#Fine-tuning#Alignment#Safety#SafeGene
why featured
HKR-H/K/R pass, but the body only gives the mechanism outline; reduction size, model list, and reproducible setup are not disclosed. Treat it as an interesting arXiv safety paper, not featured.
editor take
SafeGene makes safety a reusable adapter; no reduction numbers disclosed, but the engineering angle beats re-aligning after every fine-tune.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Bit-Exact AI Inference Verification Without Performance Tradeoffs
arXiv:2606.00279v2 proposes bit-exact re-computation for AI inference verification across vLLM, HF transformers, and multiple NVIDIA GPU variants, under the condition that the backend calls no atomic functions and the auditor has the right information for re-computation.
#Inference-opt#Safety#arXiv#vLLM
why featured
HKR-H/K/R pass via a concrete no-latency verification claim, stack coverage, and operator trust costs. Single arXiv source and low-level inference focus keep it below featured.
editor take
The paper gets bit-exact recomputation for vLLM/HF only without atomics; governance hype should wait on backend constraints.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Reinforcement Learning from Rich Feedback with Distributional DAgger
The paper introduces Distributional DAgger for training reasoning models from rich feedback, replacing RLVR’s one-bit final-answer reward. It reports improvements over RLVR and self-distillation baselines across three domains: scientific reasoning, coding, and hard math.
#Reasoning#Code#Fine-tuning#Research release
why featured
HKR-H/K/R pass, but the article gives no result numbers, release artifact, or reproducibility details. This is useful training-method research, not a same-day must-write item.
editor take
Distributional DAgger replaces 1-bit RLVR rewards with rich feedback; I buy it, RLVR’s signal poverty needed a formal teardown.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Stable Reasoning, Unstable Responses: Mitigating LLM Deception via Stability Asymmetry
arXiv:2603.26846v2 proposes Stability Asymmetry Regularization, which penalizes the distributional gap between internal CoT stability and external response stability under perturbation; the abstract says experiments identify and suppress intrinsic deception, but the RSS snippet does not disclose benchmark names or metric values.
#Reasoning#Alignment#Safety#Research release
why featured
HKR-H/K/R pass, but the body gives the SAR mechanism without metrics, model scale, or reproducible setup. A useful arXiv alignment paper, not enough for featured.
editor take
SAR penalizes CoT/response stability gaps under perturbation, but no benchmarks or metrics are disclosed; treat it as a testable safety-signal hypothesis.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training
BigMac uses a dependency-safe nested pipeline for multimodal LLM training, reduces encoder and generator activation memory complexity to O(1), keeps LLM activation memory unchanged, and reports 1.08×-1.9× training speedups over baseline systems across multiple MLLMs and workloads.
#Multimodal#Inference-opt#BigMac#Research release
why featured
HKR-H/K/R pass, but this is an arXiv training-systems paper with mechanism and speedup numbers only; no open-source artifact, replication details, or adoption signal, so it stays in all.
editor take
BigMac cuts encoder/generator activation memory to O(1); 1.08×-1.9× speedup is modest, but the systems trick looks usable.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability
The paper evaluates hate moderation on paired English and Tamil-English code-mixed content, where thresholds tuned on clean English produce a 0.265 decision flip rate and raise review rate from 0.138 to 0.297.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-H/K/R all pass: paired tests and flip-rate numbers give the paper concrete value for moderation teams. It remains a single arXiv study in a narrow workflow, below the featured threshold.
editor take
Code-mixing drives 0.265 action flips and 0.297 review rate; English-tuned moderation thresholds dump multilingual risk into human queues.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Scalable GANs with Transformers
The paper introduces GAT, a pure transformer GAN trained in a VAE latent space, and stabilizes S-to-XL scaling with lightweight intermediate supervision and width-aware learning-rate adjustment; GAT-XL/2 reaches 2.18 FID on class-conditional ImageNet-256 generation in 60 epochs, reported as 4x fewer epochs than strong baselines.
#Vision#Multimodal#Benchmarking#arXiv
why featured
HKR-H and HKR-K pass: the GAN comeback angle is clickable, and the post gives FID 2.18 plus training mechanisms. HKR-R is narrow, and this is a single arXiv paper, not same-day must-write news.
editor take
GAT-XL/2 hits 2.18 FID on ImageNet-256 in 60 epochs; GANs aren’t dead, but VAE latents carry a lot here.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
RePo: Language Models with Context Re-Positioning
RePo continues pre-training on OLMo-2 1B and 7B, using a differentiable module f_phi to assign token positions, and reports gains on noisy-context, structured-data, and longer-context tasks while keeping competitive short-context performance.
#Reasoning#Memory#Benchmarking#SakanaAI
why featured
HKR-H/K/R pass: the mechanism is novel, model sizes are concrete, and long-context reliability matters. It stays in 60–71 because the abstract gives no code, gain sizes, or production evidence.
editor take
RePo is tested only via OLMo-2 1B/7B continued pretraining; learnable positions look sane, but costs and strong baselines are missing.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs
MoDA improves visual grounding in instructional MLLMs with instruction-guided channel-wise multiplicative modulation, not token-level additive selection. The paper evaluates it on 12 benchmarks across LLaVA-1.5, LLaVA-MoRE, and Qwen3-VL, reporting +12.0 MMVP for LLaVA-1.5 and under 1% extra FLOPs.
#Multimodal#Vision#Fine-tuning#LLaVA
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and efficiency numbers. HKR-H fails, and the item remains a specialized architecture paper without product impact or external replication.
editor take
MoDA gains across 12 benchmarks at <1% FLOPs; channel-wise modulation looks like a cheap visual-attention brake for MLLMs.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models
The paper proposes MOPO, a constrained KL-regularized framework that maximizes a primary objective while enforcing lower bounds on secondary objectives through tunable safety thresholds, using pairwise preferences without point-wise rewards. Experiments show MOPO recovers Pareto-optimal policies on synthetic benchmarks and Pareto-dominates baselines when fine-tuning multi-billion-parameter models on human-preference data.
#Alignment#Fine-tuning#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: MOPO has a concrete mechanism and test claims for RLHF/alignment design. HKR-H is weak, and this is a single arXiv paper without code, top-lab backing, or cross-source discussion, so it stays in 60–71.
editor take
MOPO constrains secondary goals with thresholds and claims Pareto wins over DPO/IPO; I buy the setup, not the undisclosed dataset details.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models
TALAN inserts a sequence-conditioned latent side path into the transformer residual stream and co-trains it with LoRA or DoRA in one SFT loop. Across four Qwen3 backbones and four STEM/code benchmarks, it adds +1.41 points over LoRA and +1.85 over DoRA, with under 1% trainable parameters and 1.01-1.02x inference overhead versus matched LoRA.
#Fine-tuning#Reasoning#Code#Qwen
why featured
HKR-H/K/R pass on the LoRA-overhead comparison and concrete benchmark numbers, but this is still a single PEFT paper with +1.41 average gain and no disclosed open-source or adoption signal, so it stays in all.
editor take
TALAN is nonnegative across 16 Qwen3 cells and +1.41 over LoRA; seed variance says don’t bury LoRA yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation
The paper formalizes bias as symmetry breaking and applies loss-based regularization on four synthetic datasets, reducing fairness violations by more than 90% with about a 5% accuracy cost.
#Alignment#Safety#Benchmarking#arXiv
why featured
HKR-H/K/R all pass, but the evidence is limited to 4 synthetic datasets with no real-world model validation. Solid safety/alignment research signal, not a same-day must-write.
editor take
The paper cuts violations over 90% on 4 synthetic sets. Bit-flip fairness is neat, but causal confounding remains untouched.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
The paper proposes EDAS, a post-hoc advantage-shaping method for RLVR that adjusts incorrect rollouts using intra-group error diversity, and reports a 6.29-point average gain over DAPO on Qwen3-8B across seven math benchmarks.
#Reasoning#Alignment#Benchmarking#Qwen
why featured
HKR-K is clear: EDAS reweights erroneous rollout advantage by within-group error diversity and beats DAPO by 6.29 points on seven Qwen3-8B math benchmarks. The scope is narrow RLVR training, with no product or cost hook, so it stays in the interesting band.
editor take
EDAS beats DAPO by 6.29 points on Qwen3-8B across seven math sets; using error distribution for advantage shaping is pragmatic.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles
The study compares four ideology-annotation paradigms on AllSides articles using Llama-3.3-70B sentiment labels; fine-tuned GPT-4o-mini reaches the highest F1 at 72.48, yet uniquely produces significant community-level treatment effects and direct effects absent from human annotations.
#Fine-tuning#Benchmarking#Alignment#AllSides
why featured
HKR-H/K/R pass: the paper links sentiment to perceived ideology and reports F1=72.48 plus an LLM-only coupling. It stays in 60–71 because this is a single arXiv study, with no product, model, or deployment change.
editor take
Fine-tuned GPT-4o-mini hits F1=72.48, then invents sentiment–ideology coupling humans lack; silver-label evals need causal checks.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
MACD: Model-Aware Contrastive Decoding via Counterfactual Data
MACD uses a Video-LLM’s feedback to locate object regions linked to hallucination. It reduces hallucination on EventHallusion, MVBench, Perception-test, and Video-MME while maintaining or improving accuracy.
#Multimodal#Inference-opt#Benchmarking#Qwen
why featured
HKR-K/R pass: the paper offers a concrete decoding mechanism and a 4-benchmark test claim, with relevance to multimodal reliability. HKR-H is weak and effect sizes are not disclosed, so it stays in the 60–71 band.
editor take
MACD cuts hallucination on 4 video benchmarks, but deltas are undisclosed; model-feedback object targeting beats random CD noise.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
The Identity Trap in EEG Foundation Models: A Diagnostic Audit
The paper introduces FMScope to audit three EEG foundation models across four datasets, finding subject variance at 13-89x a random null in 12/12 pairs. Fine-tuning raises it by 10-63 percentage points, while erasing the linear subject axis improves label decoding by 6-12 points in primary within-subject cells.
#Benchmarking#Fine-tuning#Interpretability#LaBraM
why featured
HKR-H/K/R pass: the hook is identity leakage, and the paper gives 12/12 pairs plus 13-89x subject variance. EEG foundation models are vertical, so impact stays in 60-71 rather than featured.
editor take
FMScope audits 3 EEG FMs: subject variance hits 13-89x null in 12/12 pairs; treat high EEG scores as identity leakage first.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
From Sampled Outcomes to Capability Distributions: Rethinking Supervision for LLM Routing
The paper proposes DARS, a distribution-aware supervision framework for LLM routing. It replaces single-response labels with observations over semantically equivalent query formulations and stochastic generations, and experiments across diverse tasks show single-shot labels mislead model selection while distribution-aware labels make learned routing behavior more stable.
#Inference-opt#Benchmarking#Research release
why featured
HKR-H/K/R are present but modest: DARS reframes routing supervision from one sampled output to capability distributions. The post gives the mechanism, but not experiment scale, model list, or gains, so it stays in the 60-71 all band.
editor take
DARS labels routing via query rewrites and stochastic generations; no task count or lift disclosed, so I read it as anti-single-shot eval ammo.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Reinforcement Learning from Denoising Feedback
The paper introduces RLDF for estimating policy loss in diffusion language models using rollout and training feedback, and evaluates it on two DLM architectures, LLaDA and Dream, across multiple reasoning benchmarks.
#Reasoning#Benchmarking#LLaDA#Dream
why featured
HKR-H and HKR-K pass: RLDF gives a concrete DLM policy-loss mechanism and tests it on LLaDA, Dream, and reasoning benchmarks. HKR-R is weak, and the item stays in the 60–71 research-signal band.
editor take
RLDF reports gains on LLaDA and Dream, but no deltas in the snippet; DLM RL still lives or dies on loss estimation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Adaptive Pluralistic Alignment: A Pipeline for Dynamic Artificial Democracy
The paper introduces APA, a three-stage alignment pipeline using low-rank reward basis decomposition, social-choice voting, and new annotator weights over fixed bases; it tests a proof of concept on the PRISM multi-user alignment dataset and releases code and preference datasets.
#Alignment#Fine-tuning#PRISM#RachelFreedman
why featured
HKR-H/K/R all pass, but this is an arXiv proof of concept on PRISM with no production replacement claim or major-model result; keep it in all below the 72 featured line.
editor take
APA tests on PRISM; I buy the low-rank jury mechanism, but “artificial democracy” is still lab governance.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Scalable Joint Resource Allocation for SLO-Constrained LLM Inference in Heterogeneous GPU Clouds
The paper presents an SLO-constrained LLM inference allocation framework that jointly optimizes model choice, GPU provisioning, parallelism, and routing; on Azure LLM Inference Trace experiments, GH finds feasible solutions within 1 second, while AGH reaches near-optimal results within 3 seconds and remains lower-cost under up to 1.5x delay and accuracy inflation.
#Inference-opt#Benchmarking#Azure#Research release
why featured
HKR-K/R pass and HKR-H fails. The paper gives testable Azure Trace, 3s near-optimal, and 1.5x pressure claims for LLM inference cost/SLO, but its academic infra angle keeps it below featured.
editor take
AGH hits near-optimal on Azure Trace in 3 seconds; I buy the setup—MILP is too slow as an online scheduler baseline.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
On the Importance of Multiple Training Seeds for Evaluating Machine Unlearning
The paper argues that machine unlearning evaluations need multiple training seeds; experiments on image classification, federated learning-to-rank, and large language models show that single training-seed setups can produce non-representative results.
#Safety#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass: seed sensitivity in machine-unlearning eval is a useful methodological warning across three settings. The post gives no effect sizes or reproducible setup, so it stays in the 60–71 band.
editor take
Single training seeds skew unlearning evals; stop laundering benchmark confidence with extra unlearning seeds.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Sparsely gated tiny linear experts
The paper proposes sgatlin, replacing transformer feedforward layers with sparsely gated linear single-neuron experts, and reports lower language-model perplexity under an isoflop comparison across compute budgets.
#Inference-opt#Interpretability#Research release
why featured
HKR-H/K/R pass via the tiny-expert mechanism and compute angle, but the item gives no perplexity delta, model scale, code, or replication details; a single arXiv paper stays in the 60–71 band.
editor take
sgatlin replaces every FFN with single-neuron linear experts and lowers isoflop perplexity; I’d wait for replication before burying MoE.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
TabSwift: An Efficient Tabular Foundation Model with Row-Wise Attention
TabSwift uses a row-wise attention-only backbone for tabular in-context learning, adds gated attention stabilization, learnable register tokens, and adaptive layer-wise early exit for latency-sensitive inference.
#Reasoning#Inference-opt#TabSwift#TabPFN
why featured
HKR-K and HKR-R pass: the mechanisms are concrete, and efficient tabular foundation models matter to some practitioners. No benchmark numbers, open-source artifact, or production-replacement claim, so it stays in the 60–71 band.
editor take
TabSwift adds row-wise attention and layer-wise early exit, but gives no latency numbers here; I don’t buy “more efficient” yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio
The paper benchmarks LM-based lossless compression on full-fidelity audio across music, speech, and bioacoustics, with 16kHz-48kHz sampling and 8/16/24-bit depths. Trilobyte changes token vocabulary scaling from O(2^b) to O(1), making 24-bit LM-based compression tractable, while gains shrink beyond 8-bit.
#Audio#Benchmarking#Trilobyte#FLAC
why featured
HKR-H and HKR-K pass: the audio-compression use case is novel, with sample-rate, bit-depth, and Trilobyte scaling details. The topic stays niche research, not a product or competitive industry move, so it sits in all.
editor take
Trilobyte cuts 24-bit vocab from 16.7M to O(1); gains shrink with bit depth, so don't bury FLAC yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Discovering Interpretable Algorithms by Decompiling Transformers to RASP
The paper presents a method for extracting RASP programs from trained Transformers by faithfully re-parameterizing the model and applying causal interventions to find a small sufficient sub-program. Experiments on small Transformers trained on algorithmic and formal-language tasks often recover simple interpretable RASP programs from length-generalizing models.
#Interpretability#Reasoning#Research release
why featured
HKR-H and HKR-K pass: the decompilation angle is novel, and the paper gives a concrete reparameterization plus causal-intervention pipeline. HKR-R is weak because evidence is limited to small algorithms and formal-language tasks.
editor take
This decompiles small Transformers into RASP subprograms; narrow algorithmic tasks, but far stronger than attention-map interpretability.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
GRASP: Geometry-aware Residual Alignment for Scalable Pretraining Data Attribution
The paper introduces GRASP, which reframes data attribution as subset-level counterfactual utility prediction and models interactions with a quadratic geometric penalty; subset-retraining evaluations report over 2× higher task-level rank correlation and nearly 10× lower upfront artifact construction cost than scalable baselines.
#Benchmarking#GRASP#arXiv#Research release
why featured
HKR-K and HKR-R pass: the paper gives concrete mechanisms plus 2x/10x numbers and maps to pretraining data cost. HKR-H is weak, and a single arXiv paper stays in the lower all band.
editor take
GRASP reports over 2× rank-correlation gains on subset counterfactuals; I buy the setup, single-example attribution is tired.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Standard vs. Modular Sampling: Best Practices for Reliable LLM Unlearning
The paper evaluates single-neighbor retain sets, 1:1 sampling, and cyclic sampling in LLM unlearning, then proposes MELU, a modular entity-level strategy, with diverse neighbor sets to balance forget efficacy and model utility.
#Fine-tuning#Safety#Benchmarking#Research release
why featured
HKR-K has concrete sampling mechanisms and the MELU strategy; HKR-R connects to LLM deletion, compliance, and safety governance. HKR-H is weak, and no experimental numbers or code are disclosed, so it stays in the 60–71 band.
editor take
MELU attacks single-neighbor retain sets and 1:1 sampling; unlearning benchmarks need fewer toy retain splits.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Causal Evaluation of Membership Inference Attacks
The paper frames membership inference attack evaluation as causal inference, defines memorization as the causal effect of including a point in training, identifies interference in one-run protocols and distribution-shift confounding in zero-run protocols, and proposes estimators for multi-run, one-run, and zero-run settings with non-asymptotic consistency guarantees.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-K is strong and HKR-R is moderate: the paper gives MIA evaluation a testable causal frame, but only the abstract is available and experiment scale, benchmark results, and adoption signals are absent.
editor take
MIA evaluation becomes causal effect estimation; one-run has interference, zero-run has shift, so privacy papers owe less shiny AUC.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Spectral Scaling Laws of Muon
The paper tracks Muon momentum singular-value quantiles in 77M to 2.8B-parameter models and finds mid-early layers scale mildly at about M^-0.25, while some late layers scale up to M^-0.96, putting the standard 5-step Newton-Schulz setup into a failure regime at frontier scale.
#Fine-tuning#Inference-opt#Benchmarking#Muon
why featured
HKR-K is strong, while HKR-H and HKR-R are weak; the Muon scaling result helps training researchers, but reads like numerical optimization for most AI practitioners. Keep it in 60-71, not featured.
editor take
Muon late-layer singular values fall as M^-0.96; 5-step NS breaks at frontier scale, so layer-aware optimizer tuning stops being optional.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path
The paper analyzes the Rectified Flow interpolation path Xλ and reports a bell-shaped reconstruction gap between train and test samples, validated on audio and images, then uses the λ-resolved signal for a membership inference attack.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-H/K/R pass, but this is a technical arXiv privacy paper for generative-model safety readers. No tool release, incident, or flagship model impact keeps it in the 60–71 band.
editor take
Rectified Flows leak membership signals along Xλ; the bell-shaped reconstruction gap is a sharper privacy probe than final samples.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Elmes*: Automated Construction of Fine-Grained Evaluation Rubrics for LLMs in Long-Tail Education
Elmes* builds Edu-330 for educational LLM evaluation, covering 330 scenarios across 11 subjects, 3 grade bands, and 10 task types, with more than 1,000 second-level indicators and a multi-agent teacher-student-judge evaluation engine.
#Agent#Benchmarking#Reasoning#Tao Liu
why featured
HKR-K and HKR-R pass: the paper gives a reusable benchmark scale and addresses LLM evaluation in education. Single arXiv paper, non-major lab, and a dry academic title keep it in the 60–71 band.
editor take
Elmes* covers 330 education scenarios; the LLM-judge self-preference is the part that should make evaluators pause.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
On the Geometry of On-Policy Distillation
The paper compares OPD, SFT, and RLVR with parameter-space diagnostics, finding that OPD updates fewer weights than SFT and rapidly locks cumulative updates into a narrow low-dimensional subspace.
#Reasoning#Fine-tuning#Research release
why featured
This is a useful training-methods paper: HKR-K lands via a concrete geometry claim, and HKR-R lands for fine-tuning/RL practitioners. HKR-H is weak, and the available feed gives only abstract-level detail, so it stays below featured.
editor take
OPD locks early into a low-rank update channel; SFT degrades under the same constraint. I buy this over hand-wavy reasoning distillation talk.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
The Geometry of Last-Layer Model Stealing
arXiv:2606.06854 states exact conditions for perfectly copying a transformer network’s final layer. The paper also proves that a hidden network cannot be fully reverse engineered from final outputs alone.
#Safety#Interpretability#arXiv#Research release
why featured
HKR-H/K/R all pass, but this is a single theoretical arXiv paper with no disclosed experiment scale, code, or real API reproduction setup. Model-stealing security is relevant, yet not featured-level.
editor take
2606.06854 gives exact final-layer stealing conditions; the sharper claim is the proof that outputs alone cannot recover hidden layers.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
DiBS: Diffusion-Informed Branch Selection
DiBS uses a diffusion model to order branches for a complete symbolic Sudoku solver, and on the Royle 17-clue benchmark it reduces nodes, backtracks, and long-tail search cost versus strong heuristic baselines.
#Reasoning#DiBS#Research release#Open source
why featured
HKR-H and HKR-K pass: diffusion-guided symbolic search has a concrete mechanism and benchmark metrics. The claim stays on Sudoku, with no production solver or agent transfer result, so it remains interesting but not featured.
editor take
DiBS cuts nodes and backtracks on Royle 17-clue; I buy learned ordering plus completeness, but the snippet omits effect sizes.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Making the Most of Limited Data: Score-Aware Training for Text-to-Music Generation
The authors propose score-aware training for text-to-music generation, using audio-caption alignment scores as supervision; their 450M-parameter FluxAudio-based system ranked 2nd in objective evaluation across both ICME 2026 ATTM tracks and 3rd in the Efficiency Track final MOS evaluation.
#Audio#Fine-tuning#Benchmarking#FluxAudio
why featured
HKR-K is solid with a concrete mechanism and benchmark rank; HKR-R lands on training cost for audio-generation teams. HKR-H is weak, and a single arXiv competition paper stays below featured.
editor take
FluxAudio 450M took 3rd MOS in the Efficiency Track; text-to-music needs cleaner supervision, not bigger private piles.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning
The paper proposes TRUE, a framework for explaining LLM reasoning through executable reasoning verification, feasible-region DAG modeling, and causal failure mode analysis with Shapley values. Experiments span multiple reasoning benchmarks, while the RSS abstract does not disclose the tested model list, dataset names, or numerical scores.
#Reasoning#Interpretability#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the mechanism mix has substance and maps to reasoning-interpretability concerns. Model names and scores are not disclosed, and HKR-H fails, so this stays in the 60–71 research-signal band.
editor take
TRUE claims a 3-level explanation stack; no models or scores disclosed, so don’t treat “verifiable” as reliability evidence yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
The Dual Mechanisms of Spatial Variable Binding in Vision-Language Models
The paper shows VLMs use two mechanisms for spatial variable binding: intermediate language-model layers encode content-independent spatial relations, while the dominant spatial signal comes from vision encoders, with global enhancement across all image tokens improving performance on complex natural images from COCO.
#Multimodal#Vision#Interpretability#COCO
why featured
HKR-K passes: the paper offers a mechanism-level claim and COCO validation for spatial variable binding in VLMs. HKR-H and HKR-R are weak, so this stays in all below featured.
editor take
VLM spatial binding leans on the vision encoder; COCO gains from global image-token enhancement make LM-layer probes the smaller story.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
SecretFan: Synthesizing Realistic Data without Breaking Privacy
SecretFan reframes synthetic data generation as adequacy-guided search-based testing, uses a fuzzer for sample generation and a discriminator for selection, and reports good average utility and similarity scores across eight datasets used in prior evaluations.
#Safety#Benchmarking#SecretFan#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and 8-dataset evaluation, with privacy-compliance relevance. It is still a single arXiv paper without a major benchmark delta or production proof, so it sits in 60–71.
editor take
SecretFan reports good utility and similarity on 8 datasets; MIA and reconstruction metrics aren’t disclosed, so the privacy claim gets a haircut.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
A Dynamic Self-Evolving Extraction System
DySECT uses an LLM to extract triples into an incremental knowledge base, then feeds graph reasoning, probabilistic knowledge, few-shot examples, or KB-derived synthetic data back into extraction.
#RAG#Reasoning#Fine-tuning#DySECT
why featured
HKR-H and HKR-K pass: the paper names a concrete self-evolving loop for knowledge extraction. With no metrics, datasets, or production-replacement evidence disclosed, it stays in the 60–71 research-release band.
editor take
DySECT loops LLM triple extraction into a KB, but gives no eval numbers; I’m filing this under classic IE with an LLM shell.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
AAAC replaces fixed 4-bit scalar codebooks with two learned 64-byte scalar codebooks per layer. Each weight group selects the codebook minimizing activation-weighted reconstruction error, stores the choice in an unused sign bit, finishes quantization in 3–30 minutes on one GPU, and adds no memory beyond the model.
#Inference-opt#AAAC#AWQ#GPTQ
why featured
HKR-K and HKR-R pass: the paper gives a concrete mechanism and runtime, and it maps to inference cost. But it is a technical arXiv quantization paper without a major lab release, OSS adoption, or production replacement claim.
editor take
AAAC uses two 64-byte codebooks per layer for 4-bit weights; 3–30 minutes on one GPU is a direct shot at OmniQuant.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Generalization of Diffusion Models Arises with a Balanced Representation Space
The paper analyzes memorization and generalization in diffusion models using a two-layer ReLU DAE, proves that spiky representations correspond to memorization while balanced representations correspond to generalization, and validates the pattern on unconditional and text-to-image diffusion models.
#Multimodal#Vision#Interpretability#Research release
why featured
HKR-K is solid: the paper proposes a concrete representation mechanism for diffusion memorization versus generalization. HKR-R lands on IP and safety risk, but HKR-H is weak and the theory-heavy format keeps it below featured.
editor take
A two-layer ReLU DAE links spiky reps to memorization; diffusion leakage checks need representation probes, not just loss curves.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
ChronoForest: Closed-Loop Multi-Tree Diffusion Planning for Efficient Bridge Search and Route Composition
ChronoForest reaches 99.8%, 99.3%, and 99.5% success on the medium, large, and giant OGBench AntMaze-Stitch splits, and improves giant-stitch success by up to 34.5 points over previously reported diffusion-based results.
#Agent#Robotics#Reasoning#ChronoForest
why featured
HKR-H/K pass: the paper gives concrete OGBench success rates and a +34.5 pp giant-stitch gain. HKR-R fails because the work is narrow planning research, so it stays in the 60–71 band.
editor take
ChronoForest hits 99.5% on AntMaze-Stitch giant; diffusion planning’s bottleneck is moving from samples to closed-loop route evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
FIGMA: Towards Fine-Grained Music Retrieval
FIGMA uses a multi-view contrastive architecture for fine-grained music retrieval, with FGMCaps providing 380K training music-caption pairs and a 10K test set annotated for tempo, key, chord progression, beat count, genre, and mood, reaching up to 73.3% relative improvement over CLAP-based baselines.
#Audio#Embedding#Benchmarking#FIGMA
why featured
HKR-K is solid with dataset size, annotation fields, and a 73.3% reported gain. HKR-H and HKR-R are weak: this reads like a normal arXiv paper for audio retrieval and embedding specialists.
editor take
FIGMA beats CLAP baselines by up to 73.3% on FGMCaps; music retrieval is finally punishing lazy first-token-ish alignment.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Skip a Layer or Loop It? Learning Program-of-Layers in LLMs
The paper proposes PoLar, a program-of-layers method that skips or repeats pretrained LLM layers per input. The abstract says it improves mathematical reasoning accuracy over standard inference and prior dynamic-depth methods, but the post does not disclose the tested models, benchmark count, or gain sizes.
#Reasoning#Inference-opt#Research release
why featured
HKR-H and HKR-K pass: PoLar’s per-input layer skipping/looping is a concrete inference idea. Missing models, benchmark count, uplift size, and code keep it in the interesting-but-not-featured band.
editor take
PoLar skips or loops layers per input, but gains are undisclosed; I don’t buy the latent-reasoning claim before reproduction.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
MidSteer: Optimal Affine Framework for Steering Generative Models
The paper introduces MidSteer, an affine framework for concept manipulation, proves standard behavior removal is a LEACE special case, and evaluates it across vision diffusion models and large language models.
#Alignment#Safety#Multimodal#MidSteer
why featured
HKR-K/R pass: the paper offers a concrete mechanism and cross-model tests, and model control resonates with safety work. HKR-H is weak, with no metrics, code, or production-level practical claim disclosed.
editor take
MidSteer reduces behavior removal to LEACE; closed-form affine steering is auditable, but the snippet hides experiment scale.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training
DTG-FF sets new FF-family results across nine real-data benchmarks, including 91.8% on CIFAR-10 and the first FF baseline on ImageNet-100 at 224x224, but BP-DeepSup still leads by 2.40 points on CIFAR-10 and DTG-FF reaches only 49.4% at 224x224.
#Benchmarking#Vision#Geoffrey Hinton#Research release
why featured
HKR-H comes from the contrarian claim that synthetic benchmarks overstate FF scaling; HKR-K has 9 real-data benchmarks and accuracy figures. HKR-R is real for benchmark trust, but the layer-local training topic is niche, so it stays in all.
editor take
DTG-FF hits 91.8% on CIFAR-10 but only 49.4% at 224x224; real images and 8GB GPUs puncture the FF pitch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
GraphWalker: Patient Analogy Meets Information Gain for Clinical Reasoning with Large Language Models
GraphWalker lets frozen LLMs reason by analogy over retrieved patient cases without task-specific parameter updates. The framework combines data-driven and model-driven signals, patient cohort structure, and lazy greedy search with frontier expansion; the abstract says it outperforms demonstration-selection baselines on multiple real-world EHR benchmarks and remains more robust under cross-dataset shift.
#RAG#Reasoning#Agent#GraphWalker
why featured
HKR-K/R pass: the mechanism is concrete and clinical risk gives it relevance. No exact gains, artifact details, or major-lab signal are disclosed, so this stays in all rather than featured.
editor take
GraphWalker keeps LLMs frozen for patient-analogy retrieval; gains aren’t disclosed in the snippet, so verify EHR shift before buying the agentic framing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Automatic Causal Fairness Analysis with LLM-Generated Reporting
FairMind analyzes dataset-level fairness in a zero-shot setup, computes counterfactual causal effects under the standard fairness model, and uses LLMs to generate reports; the abstract does not disclose benchmark scores or release details.
#Alignment#Safety#FairMind#Plečko
why featured
HKR-K and HKR-R pass: FairMind links causal fairness computation with LLM-generated audit reports. HKR-H is weak, and deployment details are not disclosed, so this stays in the interesting all band.
editor take
FairMind computes counterfactual causal fairness zero-shot; scores and release are undisclosed. I trust closed-form effects, not LLM prose as audit.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
AdaJudge modifies reward modeling with gated refinement blocks and adaptive multi-view pooling, and the abstract reports stronger results than off-the-shelf reward models and traditional pooling baselines on RM-Bench and JudgeBench.
#Alignment#Benchmarking#AdaJudge#Research release
why featured
HKR-K and HKR-R pass: the post gives mechanisms and benchmarks, but this is a single arXiv method paper with no production replacement, released artifact, or cross-source debate.
editor take
AdaJudge beats off-the-shelf RMs on RM-Bench and JudgeBench; I buy the architecture, but RSS omits margins and release terms.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces
The paper characterizes reasoning on large-label multi-label tasks as two phases: broad shortlisting from hundreds of thousands to millions of candidate labels, then fine-grained reasoning over the shortlist. Using this mechanism, the authors develop a distillation strategy that consistently outperforms standard distillation across multiple datasets, while the RSS snippet does not disclose model names, benchmark scores, or code availability.
#Reasoning#Fine-tuning#Interpretability#Research release
why featured
HKR-K passes because the paper offers a two-stage mechanism and a distillation comparison for large output spaces. HKR-H and HKR-R are weak, and no concrete gain numbers are disclosed, so this stays in all.
editor take
The paper splits shortlist-then-reason into a distillation recipe; no scores or model names in RSS, but the angle beats leaderboard theater.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices
SigmaScale learns row and column diagonal scaling matrices from two vector sets, then evaluates SVD-based low-rank LLM compression on Llama 3.1 8B Instruct and Qwen3-8B under perplexity and zero-shot benchmarks.
#Inference-opt#Fine-tuning#Benchmarking#Llama
why featured
HKR-K and HKR-R pass: SVD low-rank compression plus learned scaling matrices is a concrete mechanism and targets inference cost. The post lacks compression ratio, speed, and quality-loss numbers, so it stays in the 60–71 band.
editor take
SigmaScale reports competitiveness on two 8B models; no compression ratio is disclosed, so SVD-compression hype stays capped.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
AI Level of Detail: Distance-Aware ML Model Precision Selection for Real-Time Human Motion Prediction in Games
The paper proposes AI LOD, which routes NPC motion prediction to FP32, FP16, or INT8 ONNX Runtime model variants based on distance from the player camera; evaluation on CMU Mocap reports negligible perceptual degradation within assigned distance ranges.
#Inference-opt#ONNX Runtime#CMU Mocap#arXiv
why featured
HKR-H/K/R pass, but this is a single arXiv systems paper for real-time game motion prediction. No release artifact, product adoption, or cross-source cluster is shown, so it stays in the 60-71 band.
editor take
AI LOD routes FP32/FP16/INT8 by camera distance; neat idea, but CMU Mocap isn’t a frame-budget proof.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders
The paper formalizes SAE concept learning as set alignment, defines three learning levels—detection, separation, and approximation—and validates the theory with synthetic ReLU and Top-K SAE experiments that test how SAE size and sparsity affect concept learning.
#Interpretability#Research release
why featured
HKR-K passes: the paper gives a set-alignment frame, three learning levels, and ReLU/Top-K synthetic tests. HKR-H and HKR-R are weak, so this stays all rather than featured.
editor take
The paper splits SAE concept learning into 3 levels, but tests only synthetic ReLU/Top-K; I buy the frame, not the generalization.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling
SCALE trains on 16 nodes and tests directly on 32 and 48 nodes, using Structured Representation Regularization to stabilize attention feature statistics; at N=48, it reduces average response time by 8.9% versus the same cross-attention pointer architecture without SRR.
#Agent#Reasoning#SCALE#Research release
why featured
HKR-K/R pass: SRR, 16→48-node extrapolation, and 8.9% latency reduction are concrete, and agent scheduling costs matter. HKR-H is weak; as a single arXiv paper without adoption or code signal, it fits the 60–71 band.
editor take
SCALE trains on 16 nodes and tests at 48, cutting latency 8.9%; good problem, but beating its own no-SRR ablation is thin.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
ADAGE: Active Defenses Against GNN Extraction
ADAGE monitors GNN query diversity and progressively perturbs outputs as accumulated leakage grows. The paper evaluates it on six benchmark datasets, four GNN models, and three adaptive attacker types, reporting that it blocks common extraction setups while preserving downstream predictive performance.
#Safety#Benchmarking#ADAGE#Research release
why featured
HKR-K passes with a concrete mechanism and test scale; HKR-R passes on model stealing and IP security. HKR-H is weak, and GNN defense is too niche for featured.
editor take
ADAGE keys perturbation to query diversity across 6 datasets, 4 GNNs, 3 attacker types; “impossible to steal” needs code, not trust.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
OPTIMUS-Prime: Minimal and Sufficient Concept Explanations for Deep Vision Models
OPTIMUS generates concept-based heatmaps for deep classification models, using prime implicants to guarantee sufficiency and minimality; the paper says it validates the method on a visual classification benchmark, but the snippet does not disclose the benchmark name.
#Vision#Interpretability#Benchmarking#Research release
why featured
HKR-K passes: prime implicants provide sufficiency and minimality guarantees for concept heatmaps. HKR-H/R are weak; the post only says vision classification benchmarks, with no benchmark names or deployment evidence.
editor take
OPTIMUS adds sufficiency and minimality guarantees via prime implicants; benchmark details are undisclosed, so don’t crown it saliency’s killer yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
pTNAS: Progressive Neural Architecture Search for Tabular Data
pTNAS searches tabular neural architectures with a filter-and-refine NAS pipeline, using the zero-cost pTProxy for initial filtering and fixed-budget scheduling for refinement; experiments report up to 82.75x less time to reach the globally best architecture versus other NAS methods and up to 4.78x higher end-to-end efficiency than TabPFN.
#Benchmarking#Inference-opt#TabPFN#Research release
why featured
HKR-K passes with a concrete mechanism and speed claims, making it useful research-feed signal. HKR-H and HKR-R are weak: tabular NAS is narrow and not featured-level for this audience.
editor take
pTNAS reports 82.75x faster tabular architecture search; I buy the efficiency angle, but TabPFN task scale is undisclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
The paper proposes Gaussian Trust Region Policy Optimization, which reshapes PPO’s trust region with a Gaussian kernel; the released code accompanies experiments across games, simulated robotic control, open-world exploration, and language model post-training.
#Agent#Robotics#Fine-tuning#Research release
why featured
HKR-K passes: GTR provides a testable PPO trust-region mechanism, public code, and experiments across games, robotics, open-world tasks, and LLM post-training. HKR-H/R are weak, so this stays all.
editor take
GTR reshapes PPO’s trust region with a Gaussian kernel; the non-monotonic constraint is sharp, but baselines and LM details are undisclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Architecturally Significant MLOps Guidelines for ML Model Integration and Deployment
The paper reviews 103 web sources and synthesizes 25 architecturally significant MLOps guidelines for ML model integration and deployment, grouping them into five categories and describing their impact on overall system architecture.
#Fine-tuning#arXiv#Research release
why featured
HKR-K has concrete counts and categories, and HKR-R maps to model-deployment pain. HKR-H is weak, and this is a review paper rather than a same-day industry trigger.
editor take
103 web sources yielded 25 MLOps guidelines; useful as a checklist, weak as architecture guidance without validation.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees
InvEvolve uses a reinforcement-learning-trained LLM to generate white-box inventory policies for online non-stationary demand, applies confidence-interval-based certification for statistical safety guarantees, and reports stronger performance than classical inventory policies and deep-learning methods on synthetic and real-world retail data.
#Agent#Reasoning#Safety#InvEvolve
why featured
HKR-H and HKR-K pass: the paper offers LLM-generated white-box policies with performance guarantees and retail-data tests. HKR-R is weak because inventory optimization is a narrow OR topic for AI practitioners.
editor take
InvEvolve adds confidence-interval certification to inventory policies; I buy the white-box angle, but margins are not disclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Certified Robustness to Data Poisoning in Gradient-Based Training
The paper presents a certification framework that does not modify the model or learning algorithm, using convex relaxations to over-approximate reachable parameters under poisoning threat models for gradient-based training.
#Safety#Alignment#Research release#Safety/alignment
why featured
HKR-K and HKR-R pass: the paper states a concrete certification mechanism and targets training-time poisoning risk. HKR-H is weak, and the post lacks scale, benchmarks, or code, so it stays mid-band.
editor take
This certifies poisoning robustness for gradient training across targeted, untargeted, and backdoor attacks; no scale disclosed, so LLM training claims wait.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Learning Explicit Behavioral Models with Adaptive Questions and World-Model Probes
Hikaru Shindo and seven coauthors introduce ESBM, a behavioral model using typed predicates, weighted rules, bounded options, and mechanism memory. After each Atari-style rollout, adaptive questions and world-model probes convert QA and transition-prediction errors into local edit constraints.
#Agent#Reasoning#Interpretability#Hikaru Shindo
why featured
HKR-K passes because ESBM gives a concrete modeling mechanism, converting QA and transition errors into local edit constraints. HKR-H and HKR-R are weak: the angle is academic, and Atari rollouts are distant from production agent pain points.
editor take
ESBM edits rules after each rollout using QA and transition errors; I buy the supervision signal, not the Atari-to-agent leap.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Calibrating Uncertainty for Zero-Shot Adversarial CLIP
The paper proposes an adversarial fine-tuning objective for CLIP that reparameterizes outputs as Dirichlet concentration parameters, aligning distributions under perturbations and reporting improved uncertainty calibration with competitive adversarial robustness across multiple zero-shot benchmarks while preserving clean accuracy.
#Vision#Fine-tuning#Safety#CLIP
why featured
HKR-K passes: the method is concrete and claims better calibration across zero-shot benchmarks while preserving clean accuracy. HKR-H and HKR-R are weak; no code, effect size, or production setting is disclosed.
editor take
Only the abstract is available; no benchmark counts disclosed. Dirichlet calibration for adversarial CLIP is plausible, but tables decide.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference
The authors introduce REMEDI, a machine-unlearning benchmark for clinical disease inference built on the MIMIC-III clinical database. It covers multi-label and multiclass tasks, diverse forget-instance setups, and metrics for both retained utility and achieved unlearning, while experiments show existing methods trade off utility against forgetting and fit multi-label classification poorly.
#Benchmarking#Safety#REMEDI#MIMIC-III
why featured
HKR-K is clear: REMEDI defines a MIMIC-III clinical unlearning benchmark, and HKR-R lands on privacy/compliance. The work is still a narrow research benchmark with weak HKR-H, so it stays in all.
editor take
REMEDI tests clinical unlearning on MIMIC-III; I buy the direction, since utility collapse in multi-label disease tasks is the hard part.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment
TEVI trains a masking module over sparse-autoencoder image embeddings to reconstruct CLIP representations conditioned on captions, improving retrieval on MS COCO, Flickr, IIW, and DOCCI, with stronger gains for richer captions and better robustness on RoCOCO.
#Vision#Multimodal#Embedding#CLIP
why featured
HKR-K passes via a concrete mechanism and MS COCO, Flickr, IIW, DOCCI, and RoCOCO evals. HKR-H/R are weak, and gains are not disclosed, so this stays browseable research signal.
editor take
TEVI filters CLIP image embeddings with captions; gains are undisclosed, so I’d file it as retrieval post-processing for now.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
CF-JEPA: Mask-free forward prediction with asymmetric encoder utilization for time-series representation learning
CF-JEPA replaces masking with multi-horizon forward prediction for time-series representation learning, using random crops as context views and predicting short-, mid-, and long-horizon future representations. Across 126 UCR and 26 UEA classification datasets, eight electricity transformer forecasting benchmarks, and KPI/Yahoo anomaly detection, it leads self-supervised baselines on UCR/UEA and reduces multivariate forecasting MSE by 27%.
#Benchmarking#University of California, Riverside#University of East Anglia#Yahoo
why featured
HKR-K passes with a concrete CF-JEPA mechanism, 152 benchmark datasets, and a 27% MSE reduction. HKR-H/R are weak because this is a narrow time-series representation paper, not a broad model or product story.
editor take
CF-JEPA leads on 152 classification sets; the online/EMA split is the sharp bit, with 27% lower MSE for free.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
AdaGRPO: A Capability-Aware Adaptive Enhancement for Flow-based GRPO
AdaGRPO adds two components to improve GRPO training for T2I flow models. It selects prompts through online curriculum filtering and fuses intra-group and global advantage estimates.
#Alignment#Fine-tuning#Research release
why featured
HKR-K passes because the summary names two testable mechanisms in AdaGRPO. HKR-H and HKR-R are weak: the title is academic, no result number is disclosed, and the topic is niche T2I post-training.
editor take
AdaGRPO discloses 2 training components, not metrics; I’d treat it as a Flow-GRPO patch, not a new T2I RL lane.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
GlucoFM-Bench: Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting
GlucoFM-Bench evaluates eight architectures for blood glucose forecasting across 15 public diabetes-related datasets covering 1,117 people, and the best zero-shot model performs within 5% of the best full-shot supervised model.
#Benchmarking#GlucoFM-Bench#Chronos-2#TimesFM
why featured
HKR-K passes with concrete benchmark scale and a testable zero-shot claim. HKR-H and HKR-R are weak because medical time-series forecasting is vertical and not a broad AI-practitioner conversation starter.
editor take
GlucoFM-Bench covers 1,117 people; Chronos-2 lands within 5% zero-shot, but full-data LSTM wins by 4–21%.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Limitations of Normalization in Attention Mechanism
The paper analyzes limits of softmax normalization in attention and validates the theory with pre-trained GPT-2 experiments: as the number of selected tokens increases, the model’s ability to distinguish informative tokens declines, and low-temperature settings create gradient-sensitivity challenges during training.
#Reasoning#Interpretability#GPT-2#Research release
why featured
HKR-K passes: the paper names concrete softmax-attention failure conditions and tests them with GPT-2 pretraining. HKR-H and HKR-R stay weak, so this remains an all-tier research item.
editor take
GPT-2 tests show selected-token growth dilutes attention selectivity; the useful bit is testable softmax bounds, not the diagnosis.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
The paper introduces DIRECT, a framework that decomposes object-insertion conditions into three separate pathways—appearance, geometry, and context—so users can adjust a 3D proxy to control pose, while experiments report better geometric controllability and visual quality than prior methods.
#Vision#Multimodal#DIRECT#Research release
why featured
HKR-K passes: DIRECT gives a testable mechanism via 3-way condition decomposition and 3D proxy pose control. HKR-H and HKR-R are weak; this is a single arXiv vision method without product or market spread yet.
editor take
DIRECT splits insertion into 3 pathways; it’s cleaner control than 2D inpainting, but the snippet hides the metrics.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
TrioPose: Native Triple-Stream Diffusion Transformers for Pose-Guided Text-to-Image Generation
TrioPose builds a TSPA-DiT triple-stream pose-aware architecture on SD3.5M and reports 64.33 AP on Human-Art, a 30% improvement over prior methods.
#Multimodal#Vision#TrioPose#SD3.5M
why featured
HKR-K passes with a named architecture and Human-Art AP result; HKR-H/R are weak. This is a niche vision-generation paper with no hard exclusion, so it sits in the interesting-but-not-featured band.
editor take
TrioPose hits 64.33 AP on Human-Art; treating pose as its own stream beats another brittle DiT conditioning hack.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Accelerating Reproducible Research in Synthetic EHR Generation
The paper introduces a synthetic EHR benchmarking framework that unifies data ingestion, model training, and evaluation, covering five baselines: MedGAN, CorGAN, PromptEHR, HALO, and GPT-2.
#Benchmarking#PyHealth#MedGAN#GPT-2
why featured
HKR-K passes: the framework unifies ingestion, training, and evaluation across MedGAN, CorGAN, PromptEHR, HALO, and GPT-2. HKR-H and HKR-R are weak, so this stays browseable rather than featured.
editor take
This framework unifies 5 synthetic EHR baselines; it targets ICD-9 diagnosis codes, so don’t sell it as broad medical generation eval.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Learning All-Terrain Locomotion for a Planetary Rover with Actively Articulated Suspension
ERNEST uses one neural-network controller to drive a four-wheeled rover with a 2-DoF Active Gimbal Suspension, trained in DARTS with rigid-contact dynamics and Bekker-Wong terramechanics; on a 20° dry sandy slope, the learned controller cuts cost of transport by 37%, while the passive suspension becomes immobilized on wet sand.
#Robotics#Agent#Research release
why featured
Niche robotics paper: HKR-H has the planetary-rover active-suspension hook, HKR-K gives a 37% transport-cost result on a 20° dry-sand slope. HKR-R is weak because it lacks a broad AI tooling or market stake.
editor take
ERNEST cuts transport cost 37% on a 20° dry sand slope. I buy this: one less terrain classifier, one less rover failure mode.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Predictive Statistics Shape Emergent World Representations of Grid Walkers
The authors train decoder-only transformers and recurrent networks on constrained random walks over a two-dimensional lattice, finding that the first attention block extracts a sufficient statistic while later layers convert it into next-step predictive geometry.
#Reasoning#Interpretability#Research release
why featured
HKR-K passes via a concrete toy-model mechanism in Transformers/RNNs. HKR-H and HKR-R are weak, so this is useful research-feed signal but below featured.
editor take
On 2D endpoint walks, the first Transformer attention block reads sufficient statistics; narrow toy setup, cleaner than world-model handwaving.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Textual Supervision Enhances Geospatial Representations in Vision-Language Models
The paper evaluates ViT, CLIP, LLaVA, Qwen, and Gemma model families across image clusters such as people, landmarks, and everyday objects grouped by localizability, and finds that textual supervision improves geospatial representations.
#Multimodal#Vision#Benchmarking#CLIP
why featured
HKR-K passes because the paper adds a cross-family VLM geospatial evaluation and a textual-supervision claim. HKR-H/R are weak: no metric, artifact, or product path is disclosed, so this stays a narrow research item.
editor take
The paper tests ViT, CLIP, LLaVA, Qwen, and Gemma; I want leakage controls, not another language-helps-geo claim.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
arXiv:2604.10098v2 surveys Attention Sink in Transformers across three dimensions: fundamental utilization, mechanistic interpretation, and strategic mitigation; the abstract says Attention Sink concentrates attention on small uninformative token subsets, affects training and inference dynamics, worsens hallucinations, and includes a related paper list on GitHub.
#Interpretability#Inference-opt#Safety#arXiv
why featured
HKR-K passes: the three-part survey taxonomy is useful for attention-sink work tied to long-context and inference behavior. HKR-H/R are weak, and it is an arXiv survey without a new model, dataset, or production result.
editor take
Attention Sink survey groups work into 3 tracks; I don’t buy the “first survey” pitch, but the GitHub list is useful for long-context inference.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Bootstrap Theory of Representational Emergence: Explanatory Insufficiency as a Driver of Representation Learning and World Models
arXiv:2606.07303 introduces TBER, a framework that formalizes representational transition into five stages: stabilized observation, anomaly detection, explanatory insufficiency, representational emergence, and provisional stabilization.
#Reasoning#Memory#Research release
why featured
HKR-K passes because the post gives a new TBER framing and five stages. HKR-H and HKR-R are weak: the title is academic, and there is no product, benchmark, or industry conflict, so it fits the 60–71 research band.
editor take
TBER offers a 5-stage representation-transition frame, but no experiments are disclosed; smells like theory scaffolding, not a world-model roadmap.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Aumann-SHAP: The Geometry of Counterfactual Interaction Explanations in Machine Learning
The paper introduces Aumann-SHAP, which discretizes a counterfactual hypercube into a micro-player cooperative game; on German Credit, interaction geometry changes feature-priority rankings in 12.3% of instances.
#Interpretability#Benchmarking#UCI#Research release
why featured
HKR-K passes with a concrete mechanism and a 12.3% result; HKR-H and HKR-R are weak because the angle is academic and validated on one dataset. Useful but narrow interpretability research, so tier all.
editor take
Aumann-SHAP flips 12.3% of German Credit rankings; attribution methods are finally treating interaction geometry as first-class.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training
The paper proposes NMP-QAT, where each neuron learns a discrete precision during training. Evaluations cover telecom and non-telecom datasets across MLP and tabular foundation-model architectures, but the abstract does not disclose exact compression ratios or accuracy numbers.
#Inference-opt#Fine-tuning#Research release
why featured
HKR-K passes because neuron-level mixed-precision QAT is a concrete mechanism for inference optimization. HKR-H and HKR-R are weak: no compression, accuracy, code, or deployment result is disclosed, so this stays in the lower all band.
editor take
NMP-QAT learns discrete precision per neuron, but the abstract gives no compression or accuracy numbers; discount the 6G-edge framing.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Building Better Activation Oracles
The paper improves Activation Oracle training in four areas and open-sources AObench; capability gains are marginal, while quality-of-life improvements are substantial.
#Interpretability#Benchmarking#AObench#Research release
why featured
HKR-K passes via AObench and four training-stage changes. HKR-H/R are weak because activation-oracle work is narrow interpretability tooling, so this stays in all rather than featured.
editor take
The paper tweaks AO training in 4 places and ships AObench; small capability gain, useful interpretability plumbing.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Twin: Tuning Learning Rate and Weight Decay of Deep Homogeneous Classifiers without Validation
Twin selects learning rate and weight decay without validation data by using training loss in the non-separable regime and parameter norm in the separable regime, reporting 1.28% mean absolute error versus an Oracle test-accuracy selector across 37 image-classification dataset-architecture configurations.
#Fine-tuning#Benchmarking#Twin#Research release
why featured
HKR-K passes with a concrete no-validation tuning method and 37-run result. HKR-H is weak and HKR-R is narrow, so this stays in the lower all tier rather than featured.
editor take
Twin is 1.28% off Oracle across 37 image setups; I don’t buy validation-free tuning beyond homogeneous classifiers yet.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Towards Efficient and Exact Forgetting Services in Pre-Trained-Model-based Continual Learning
The paper proposes Analytic Continual Unlearning for PTM-based continual learning, deriving gradient-free closed-form least-squares updates for each unlearning request. ACU supports both sample-level and class-level forgetting, while the abstract claims gains in unlearning effectiveness, model fidelity, and system efficiency without disclosing benchmark numbers in the snippet.
#Fine-tuning#Interpretability#Safety#Research release
why featured
HKR-K comes from the ACU mechanism, and HKR-R from privacy/compliance pressure. The item stays at abstract level: no benchmark numbers, artifact, or production replacement claim, so it lands in the lower research-signal band.
editor take
ACU uses closed-form least squares for continual unlearning; no benchmark numbers are disclosed, so don't treat “exact forgetting” as deployable yet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
SERNF fine-tunes real-world dexterous manipulation policies with normalizing flows and action-chunked critics, using exact likelihoods for multimodal action chunks and evaluating two hardware tasks: cutting tape with scissors retrieved from a case and palm-down in-hand cube rotation.
#Robotics#Fine-tuning#Research release
why featured
HKR-K passes because the method and two real-world tasks are concrete. HKR-H and HKR-R are weak: this is a specialized robot-learning paper, not a broad product, open-source, or benchmark event.
editor take
SERFN reports 2 hardware tasks; exact likelihoods for action chunks make conservative dexterous fine-tuning less hand-wavy.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Performance Variation in Deep Reinforcement Learning
The paper proposes min-max IPR and run-wise percentile highlighting to evaluate run-to-run variation in deep reinforcement learning, using three case studies covering PPO, SAC, TD-MPC, TD-MPC2, DQN, and Rainbow.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with two evaluation mechanisms and 3 cases. HKR-H and HKR-R are weak because the story stays in DRL reproducibility, far from mainstream AI product or model competition.
editor take
Three case studies target RL run variance; I buy the angle, mean CIs have hidden PPO/SAC reproducibility pain for too long.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data
Zhuphua Cao proposed FDRS, a digit-randomness screening framework for raw numerical research data, and evaluated it on RawData with n=253 and ErrData with n=255; Elastic-net Logistic Regression reached an AUC of 0.98395, while Random Forest reached 0.926667 accuracy.
#Benchmarking#Zhuphua Cao#arXiv#Research release
why featured
HKR-K passes with a named framework, dataset sizes, and AUC. HKR-H and HKR-R are weak: this is research-data auditing, not an AI product, model-capability, or industry-competition story; no hard exclusion applies.
editor take
FDRS hits 0.98395 AUC on 253/255 samples; I worry less about the model than its misuse as misconduct proof.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
MACS addresses the straggler effect in multimodal MoE expert-parallel inference with a training-free framework, using two mechanisms: entropy-weighted load for visual-token semantic value and dynamic modality-adaptive capacity for real-time modal composition.
#Multimodal#Inference-opt#MACS#Research release
why featured
A niche multimodal MoE inference paper: HKR-K comes from two concrete mechanisms, and HKR-R from cost/latency pain. No throughput or latency numbers are disclosed, and technical depth keeps it below 60.
editor take
MACS discloses 2 training-free mechanisms but no speedup number; multimodal MoE inference still bleeds at EP stragglers.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
OffQ: Taming Structured Outliers in LLM Quantization by Offsetting
OffQ uses top-1 PCA to identify a low-dimensional activation outlier subspace, rotates high-magnitude activations into 1 channel, and converts that channel into a shared offset to support W4A4KV4 uniform-grid quantization.
#Inference-opt#OffQ#Research release
why featured
HKR-K and HKR-R pass: the piece names a concrete quantization mechanism and W4A4KV4 target. HKR-H fails; no accuracy, throughput, or memory numbers are disclosed, and the technical bar keeps it in the lower interesting band.
editor take
OffQ funnels outlier activations into 1 channel, then offsets it; if W4A4KV4 holds, mixed precision loses an excuse.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Lighting-Aware Representation Learning under Controllable Lighting Variation
The paper proposes a lighting-aware representation learning framework that uses illumination variation as an explicit training signal. It evaluates image classification and object detection on ImageNet, ExDark, and PASCAL VOC, reporting gains over standard contrastive learning baselines under the same architecture and training budget.
#Vision#Benchmarking#arXiv#ImageNet
why featured
HKR-K passes: it gives a concrete training mechanism and ImageNet, ExDark, PASCAL VOC evaluation settings. HKR-H/R are weak, and the post gives no gain numbers, so this stays in all.
editor take
Lighting-aware loss wins on three vision benchmarks; no gain sizes disclosed, so I’d treat it as a low-light robustness patch.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
ULPS integrates a calibrated BERT-based language model into PPO training, using A*-generated symbolic trajectories and Monte Carlo dropout uncertainty, and reports over 9% execution-accuracy improvement after fine-tuning on MiniGridUnlockPickup.
#Agent#Reasoning#Fine-tuning#arXiv
why featured
HKR-K passes via a testable setup, mechanism, and >9% gain. HKR-H/R miss; this is a niche RL paper rather than a product, open-source framework, or broad agent update.
editor take
ULPS gains 9% on MiniGridUnlockPickup; I don’t buy the LLM-guided framing, since BERT trained on A* smells like distilled control.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
An Adaptive Data Cleaning Framework for Noisy Label Detection
The paper proposes an adaptive data-cleaning framework that detects noisy labels using local, global, and learning-dynamics features; on ImageNet-100 with 40% symmetric label noise, it reports recall of at least 98%.
#Benchmarking#Research release#Benchmark
why featured
HKR-K has a concrete mechanism and ImageNet-100 result; HKR-R touches data-quality pain for training teams. HKR-H is weak, and this is a single arXiv paper without code or production evidence, so it stays in the upper low-value research band.
editor take
ImageNet-100 hits ≥98% recall at 40% symmetric noise; I want precision, because high-recall cleaners often purge hard samples too.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R1
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory
arXiv:2606.06624 releases a nine-chapter book manuscript on deep representation learning. It frames large deep networks through representation learning, optimization, and information theory, then discusses interpretable and controllable model design.
#Interpretability#Memory#arXiv#Research release
why featured
HKR-H passes because the title has a “mathematical theory of memory” hook. HKR-K and HKR-R are weak: the post gives scope only, with no new mechanism, experiment, or industry impact.
editor take
arXiv posted a 9-chapter manuscript on representation learning; I’d audit Chapters 2-6 before buying the “undergrad math” claim.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H1·K0·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting
TimeGS reframes time series forecasting as 2D generative rendering, adds MB-GKG and MP-CCR blocks, and reports state-of-the-art or competitive results on standard benchmark datasets.
#Benchmarking#TimeGS#Research release#Open source
why featured
HKR-H and HKR-K pass via the unusual rendering angle and named mechanisms, but HKR-R is weak. This is a niche methods paper, far from agents, products, or flagship model updates, so it stays in the 40–59 band.
editor take
TimeGS casts forecasting as 2D Gaussian rendering; SOTA is claimed on standard benchmarks, but datasets and error tables are undisclosed here.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H1·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Synthics: Synthetic Physics-like Datasets for Machine Learning
Jari Vepsäläinen presents Synthics, a Bayesian probabilistic context-free grammar method for generating physics-like synthetic regression datasets, matching the Feynman equation corpus on all 8 studied structural features and selecting the 6th-best configuration out of 20 in a downstream gradient-boosted regressor tuning task.
#Benchmarking#Jari Vepsäläinen#Research release
why featured
HKR-K passes for a testable generator and 8 matched structural features, while HKR-H and HKR-R fail. The physics-like regression benchmark is useful to a niche ML audience, with no product, agent, or market impact.
editor take
Synthics matches Feynman on 8 structural features; I buy the direction, but 20 tuning configs don’t prove transfer.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
WAV: Multi-Resolution Block Residual Routing for Deep Decoder-Only Transformers
Kehan Wang proposes WAV v1, adding phase and split detail bases to block residual summaries in decoder-only Transformers; at 48 layers, it reduces TinyStories validation loss from 0.4960 to 0.4738 versus Block AttnRes, while the 12-layer setting is not consistently better.
#Reasoning#Inference-opt#Kehan Wang#arXiv
why featured
HKR-K passes via a concrete mechanism and TinyStories metric; HKR-H/R do not. The work is a niche transformer-architecture paper with limited practitioner pull, so it stays in the low-value research band.
editor take
WAV v1 cuts 48-layer TinyStories loss to 0.4738; I’d file it as a residual-routing trick, since 12-layer gains fail.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring
U-Balance rebalances CPS telemetry labels using behavioral uncertainty, relabeling high-uncertainty safe windows as unsafe; on a UAV benchmark with a 46:1 safe-to-unsafe ratio, it reaches a 0.806 F1 score and beats the strongest baseline by 14.3 percentage points.
#Safety#Benchmarking#U-Balance#GatedMLP
why featured
HKR-K passes with a concrete mechanism and UAV benchmark numbers. HKR-H/R miss: this reads like a narrow arXiv method paper, not a broadly resonant AI product or model story.
editor take
U-Balance hits 0.806 F1 on 46:1 UAV data; relabeling uncertain safe windows works, but label trust becomes the attack surface.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Federated Foundation Models over Vehicular Networks
The paper proposes M3T FedFMs for vehicular networks, evaluates a case study on the Waymo Open Dataset, and releases implementation code in a GitHub repository for reproducibility.
#Multimodal#Fine-tuning#Waymo#Research release
why featured
HKR-K passes via a named method, dataset case study, and code release; HKR-H/R are weak because the angle is niche vehicular FL. No hard exclusion, so it lands as a low-mid research release.
editor take
M3T FedFMs ran a Waymo case and released code; the vehicle-side FL bandwidth bill is undisclosed.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis
LoRA-DA derives a data-aware LoRA initialization from an objective with bias and variance terms, using Fisher-gradient approximation and Fisher information; the abstract says it improves final accuracy across multiple benchmarks, but the snippet does not disclose exact scores.
#Fine-tuning#Benchmarking#LoRA-DA#Research release
why featured
HKR-K passes for a new LoRA initialization mechanism; HKR-H/R are weak because no accuracy numbers, code status, or reproducible setup are disclosed. Technical but relevant to fine-tuning, so it stays in all.
editor take
LoRA-DA initializes LoRA with Fisher terms, but no scores are disclosed; I buy the theory, not the win yet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Learning Fair Demand Models
The paper studies fairness in a two-stage pricing pipeline with linear demand estimation followed by price optimization. It compares fairness constraints on training loss, prices, and demand under parity-wise and Rawlsian views, then tests the model with a real-world vaccine pricing case study.
#Alignment#Research release#Safety/alignment
why featured
HKR-K passes because the paper adds three fairness-constraint placements and a vaccine pricing case. HKR-H and HKR-R are weak: the title is academic, and the post gives no product deployment or industry conflict, so this stays in the lower research band.
editor take
The paper shows loss-parity gives multiple optima; in pricing systems, fairness-in-the-loss is the lazy dangerous fix.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
TargetSEC: Plug-and-Play In-the-Wild Speech Emotion Conversion via Arousal-Conditioned Latent Style Diffusion
TargetSEC generates emotion-focused style embeddings with latent diffusion conditioned on speaker identity and continuous emotion, and experiments on MSP-Podcast show higher conversion accuracy than non-duration baselines while matching duration-prediction systems without explicit temporal modeling.
#Audio#TargetSEC#MSP-Podcast#Research release
why featured
HKR-K passes via a concrete dataset and modeling mechanism. HKR-H/R are weak: this is narrow audio research with no product path or broader industry pressure, so it stays in the low-value research band.
editor take
TargetSEC beats non-duration MSP-Podcast baselines; matching duration-prediction systems without temporal modeling is the sharp claim.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Bias in Filter Feature Selection Evaluation: A Meta-Analysis of Datasets, Baselines, and Experimental Design Choices
The paper analyzes 28 high-profile filter feature selection studies published from 1994 to 2025. A multivariate linear regression using dataset count, baseline count, and new-method count explains 33% of the variance in win rate against chosen baselines.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via concrete sample size, time span, and the 33% variance claim. HKR-H/R are weak: this is niche classical ML evaluation methodology, useful to benchmark specialists but below featured threshold.
editor take
28 FFS papers show evaluation bias: dataset, baseline, and method counts explain 33% win-rate variance; even small benches are design-shaped.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset
The paper constructs a time-stamped Android app dataset and uses BYOL self-supervised pre-training for malware detection, reporting 98% accuracy and 89% F1 under time-aware evaluation with timestamp verification.
#Fine-tuning#Benchmarking#VirusTotal#MITRE ATT&CK
why featured
HKR-K passes with a timestamped dataset, BYOL pretraining, and temporal-evaluation metrics. HKR-H and HKR-R are weak because this is a narrow security-detection paper, below featured threshold.
editor take
BYOL hits 98% accuracy and 89% F1 under time-aware testing; for Android malware, fixing temporal leakage is the useful part.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Model Recycling Framework for Multi-Source Data-Free Supervised Transfer Learning
The paper proposes a model recycling framework for source-free supervised transfer learning, selecting subsets of related pre-trained models for reuse across multiple sources under white-box and black-box access, with parameter-efficient training as the stated mechanism.
#Fine-tuning#Research release
why featured
HKR-K passes for the data-free multi-source model reuse mechanism. HKR-H/R miss: no metrics, code, or production impact are disclosed, so this stays a narrow research update.
editor take
This proposes source-free model recycling for white-box and black-box access; no benchmark numbers disclosed, so the setup is useful but evidence is thin.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
1d ago
STILL DEVELOPING · 1darXiv · cs.LG· atomEN04:00 · 06·08
MVCL-DAF++: Enhancing Multimodal Intent Recognition via Prototype-Aware Contrastive Alignment and Coarse-to-Fine Dynamic Attention Fusion
MVCL-DAF++ improves rare-class recognition on MIntRec and MIntRec2.0 by +1.05% and +4.18% WF1, using prototype-aware contrastive alignment plus coarse-to-fine attention fusion, and the authors released source code on GitHub.
#Multimodal#Benchmarking#MVCL-DAF++#MIntRec
why featured
HKR-K passes with concrete WF1 gains and GitHub code. HKR-H and HKR-R are weak; the paper-style framing is niche for general AI practitioners, so it stays in the low-value research-update band.
editor take
MVCL-DAF++ gains 4.18% rare-class WF1 on MIntRec2.0. Nice small-benchmark SOTA; inspect the noise setup before buying it.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios
DEFINED assesses debate creativity with an eight-dimensional hierarchy, using a pretrained autoregressive language model and hierarchical scoring head. The abstract says it beats prompt-based LLM evaluators, but does not disclose dataset size or exact scores.
#Benchmarking#Fine-tuning#DEFINED#arXiv
why featured
HKR-K passes via the 8-dimension creativity rubric and hierarchical scoring head. HKR-H and HKR-R are weak, and missing dataset size or results keeps this in all, below featured.
editor take
DEFINED scores debate creativity on 8 dimensions, but dataset size and scores are undisclosed; I don’t buy the LLM-evaluator win yet.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Position: A Dynamical Systems Perspective Is Needed to Advance Time Series Modeling
arXiv:2602.16864v2 argues that time-series modeling needs a dynamical-systems perspective, covering DSR, long-term statistics prediction, performance upper bounds, generalization to unseen regimes such as tipping points, and potential control strategies.
#Reasoning#Benchmarking#arXiv#Research release
why featured
HKR-K passes, but there is no new model, metric, or reproducible artifact. The dynamical-systems angle is narrow time-series research, so it stays in the low-value/all band.
editor take
arXiv 2602.16864v2 calls out TS foundation-model hype; I buy it, black-box forecasting hits dynamical-systems ceilings fast.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
A Rolling-Window Framework for Churn Prediction and Behavioral Driver Identification
The study proposes a rolling-window churn prediction framework that separates behavioral evidence and outcomes with a 30-day observation window and a 30-day future evaluation window, reporting 87.6% accuracy and 0.94 ROC-AUC for the feature-based model.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via reproducible windows and metrics. HKR-H/R are weak: this is conventional churn-prediction modeling, distant from core AI-industry concerns, so it sits in the low-value browseable band.
editor take
A 30-day window hitting 0.94 AUC is fine; without platform details and baselines, don’t treat it as a churn benchmark.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Phonetic Error Analysis of Raw Waveform Acoustic Models
The paper analyzes error patterns of raw-waveform acoustic models on TIMIT phone recognition, where WSJ transfer learning reduces Dev/Test PER from 13.9%/15.3% to 11.3%/12.3%.
#Audio#Benchmarking#TIMIT#WSJ
why featured
HKR-K passes via concrete TIMIT/WSJ transfer conditions and PER numbers. HKR-H and HKR-R are weak because this is narrow speech-recognition research, so it stays in all rather than featured.
editor take
WSJ transfer cuts TIMIT Test PER to 12.3%; the useful bit is phonetic error anatomy, not another tiny ASR leaderboard win.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Modeling Nonlinear Feature Interactions with Product-Unit Residual Networks
The paper proposes PURe, a residual network with multiplicative product units, and evaluates it on one synthetic interaction benchmark plus two real-world datasets for accuracy, Gaussian-noise robustness, and low-data performance.
#Interpretability#Benchmarking#Research release#Benchmark
why featured
HKR-K passes because the paper gives a concrete architecture mechanism and evaluation setup. HKR-H/R fail: the angle is dry and has little practitioner resonance, so this stays in the low-value research band.
editor take
PURe has 1 synthetic and 2 real datasets; multiplicative residuals are neat, but the evidence is thin.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling
The author evaluates a frozen pop-jazz Music Transformer on 11 target genres. A 165-cell grid shows five adaptation methods improve held-out chord prediction by +2.89 to +3.61 macro points, while corrected Wilcoxon tests find no decisive winner between LoRA and IA3.
#Fine-tuning#Benchmarking#Music Transformer#Research release
why featured
HKR-K passes with concrete experiment counts and gains. HKR-H and HKR-R are weak because chord-symbol genre modeling is niche and distant from mainstream AI products or practitioner workflows.
editor take
165 runs gain only 2.89–3.61 points; chord-symbol adaptation is useful, but not a genre-modeling win.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Attention-Guided Autoencoder Fusion for Insulator Defect Detection Using UAV Transmission-Line Imaging
The paper proposes AE-YOLO, adding lightweight autoencoders and CBAM to the FPN-PAN neck for UAV insulator defect detection; with an EfficientNetV2 backbone, it reports 95.10% mAP@0.5, 96.40% precision, and 93.80% recall on the Insulator-Defect Detection dataset.
#Vision#Benchmarking#Research release#Benchmark
why featured
HKR-K passes because the paper gives a concrete architecture and mAP number; HKR-H and HKR-R fail. This is a narrow industrial-vision benchmark, so it sits in the 40–59 low-value band for the broader AI-practitioner feed.
editor take
AE-YOLO reports 95.10% mAP@0.5; WBF fuses YOLOv8/10/11, so don't read this as a clean single-model win.
HKR breakdown
hook knowledge resonance
open source
44
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Trio: Learning Time-Series Forecasting with Temporal-Spatial-Sample Attention and Structural Causal Priors
Trio applies temporal, spatial, and sample attention to multivariate time-series forecasting. Its TS-SCM generator creates synthetic tasks with dynamic lags, cross-variable interactions, noise, feedback, and distributional drift; experiments cover synthetic, industrial, and public benchmarks, while fully general PFN-style forecasting remains open.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via the attention design and TS-SCM setup; HKR-H/R fail, and the post gives no result numbers, code, or production claim. This is a niche forecasting paper, so it stays low in all.
editor take
Trio adds sample attention to forecasting; tests span synthetic, industrial, public sets, but zero-shot is exploratory and PFN-style forecasting remains unsolved.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0
04:00
1d ago
arXiv · cs.LG· atomEN04:00 · 06·08
Are You Sure? A Survey of Uncertainty Quantification in Symbolic Regression
Julia Reuter and Fabricio Olivetti de Franca survey uncertainty quantification in symbolic regression, grouping the literature into three directions: frequentist methods, Bayesian methods, and model selection.
#Benchmarking#Julia Reuter#Fabricio Olivetti de Franca#arXiv
why featured
HKR-K passes via the 3-part uncertainty-quantification taxonomy, but HKR-H and HKR-R are weak. This is a narrow research survey with no product, agent, or frontier-model impact.
editor take
Reuter groups SR uncertainty into 3 tracks; interpretable equations are not trustworthy equations without UQ.
HKR breakdown
hook knowledge resonance
open source
42
SCORE
H0·K1·R0

more

feeds

admin