ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-05-21

240 papers · updated 3m ago
2026-05-21 · Thu
21:00
18d ago
HuggingFace Papers (takara mirror)· rssEN21:00 · 05·21
Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection
The paper evaluates multilingual sparse autoencoders on LLaMA-3.1-8B and Gemma-2-9B, using an intersection of multilingual alignment and language separability to choose steering layers, then tests machine translation and CrossSumm with SpBLEU, ROUGE-L, COMET, and LaSE; the reported result is more stable language identification accuracy versus generation quality without exhaustive layerwise search.
#Interpretability#Multimodal#Reasoning#LLaMA
why featured
Only HKR-K lands: the post gives a concrete multilingual SAE layer-selection rule, but HKR-H is dry and HKR-R is narrow. No hard exclusion; this fits the lower end of research-release signal.
editor take
LLaMA-3.1-8B and Gemma-2-9B get multilingual SAEs; useful layer-search shortcut, but gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
17:53
18d ago
arXiv · cs.AI· atomEN17:53 · 05·21
The Matching Principle: Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning
The paper proposes the Matching Principle, which estimates label-preserving deployment nuisance covariance and regularizes the encoder Jacobian along its covered range; 12 of 13 pre-registered experimental blocks pass, including tests up to Qwen2.5-7B, while Office-31 fails under a pre-named eigengap condition.
#Reasoning#Alignment#Benchmarking#Qwen2.5-7B
why featured
hard-exclusion-technical-accessibility applies: the core claim depends on covariance, Jacobians, and geometric loss theory with no generalist on-ramp. Only HKR-K passes, so the item is capped and excluded.
editor take
Rajput folds robustness losses into covariance matching; 12/13 blocks pass, but I’d reproduce TDI before trusting it.
HKR breakdown
hook knowledge resonance
open source
51
SCORE
H0·K1·R0
17:49
18d ago
arXiv · cs.AI· atomEN17:49 · 05·21
Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models
The paper proposes a conservative drifting method for one-step generative modeling, replacing displacement velocity with a KDE-gradient velocity, and proves continuous-time finite-particle bounds with a root residual-velocity rate of N^{-1/(d+4)} under an additional h-uniform quadrature regularity condition.
#Reasoning#Research release
why featured
Hard-exclusion-1 applies: this is a KDE-gradient finite-particle convergence proof with no product, model, or reproducible practitioner hook. HKR-K passes only, so it stays excluded.
editor take
The paper proves N^{-1/(d+4)} finite-particle rates for conservative drifting; useful theory, but dimension makes it far from deployable one-step generation.
HKR breakdown
hook knowledge resonance
open source
47
SCORE
H0·K1·R0
17:48
18d ago
● P1arXiv · cs.AI· atomEN17:48 · 05·21
MOSS autonomous agent system achieves self-evolution through source-level code rewriting
MOSS raises the four-task mean grader score on OpenClaw from 0.25 to 0.61 in one source-level self-rewriting cycle, with candidate code verified by replaying curated failure batches in ephemeral trial workers before an in-place container swap.
#Agent#Code#Tools#MOSS
why featured
HKR-H/K/R all pass: self-rewriting agents are clickable, the 0.25→0.61 gain is concrete, and runtime self-modification hits agent safety nerves. Single arXiv source keeps it below P1.
editor take
MOSS pushes agent self-evolution into source rewrites, and 0.25→0.61 is eye-catching; four OpenClaw tasks is not proof of production autonomy.
sharp
All 3 entries trace to the same arXiv paper, so the agreement is ingestion overlap, not independent confirmation. MOSS’s sharp move is source-level rewriting: it targets routing, hook order, state invariants, and dispatch, instead of prompts, skill files, memory schemas, or workflow graphs. I buy the problem framing, but not the “production self-evolution” strength yet. The hard number is a four-task OpenClaw mean grader jump from 0.25 to 0.61 in one autonomous cycle, with ephemeral trial workers, replay verification, user-consent promotion, container swap, and rollback probes. That sounds less like an autonomous organism and more like a coding-agent-driven CI/CD loop. The deciding variable is replay-batch coverage, not the headline phrase “rewrites its own source.”
HKR breakdown
hook knowledge resonance
open source
92
SCORE
H1·K1·R1
17:44
18d ago
arXiv · cs.AI· atomEN17:44 · 05·21
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
Gated DeltaNet-2 separates linear-attention memory editing with channel-wise erase gate b_t and write gate w_t; under a 1.3B-parameter, 100B FineWeb-Edu-token setup, it reports the strongest overall results versus Mamba-2, Gated DeltaNet, KDA, and Mamba-3 variants.
#Reasoning#Inference-opt#Memory#NVlabs
why featured
HKR-K is strong and HKR-R is moderate: beating Mamba-2/KDA matters for cheaper long-sequence models. HKR-H is narrow, and the post gives abstract-level facts without code or broad reproduction details.
editor take
Gated DeltaNet-2 trains at 1.3B/100B tokens; splitting erase/write gates makes its RULER gains look like mechanism, not tuning luck.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
17:42
18d ago
arXiv · cs.AI· atomEN17:42 · 05·21
LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
LCGuard transforms shared KV caches before transmission in multi-agent LLM systems, treating cache artifacts as latent working memory. The paper defines unsafe sharing through adversarial reconstruction of agent-specific sensitive inputs, and reports lower reconstruction-based leakage and attack success rates across multiple model families and multi-agent benchmarks while keeping competitive task performance versus standard KV-sharing baselines.
#Agent#Safety#Memory#Research release
why featured
HKR-K/R pass: KV-cache leakage and LCGuard’s mitigation are useful for agent safety. The post gives no reduction numbers, model scale, or reproduction details, so it stays in the mid research-release band.
editor take
LCGuard filters shared KV caches; no deltas disclosed, but anchoring multi-agent privacy to adversarial reconstruction is the useful move.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
17:33
18d ago
arXiv · cs.AI· atomEN17:33 · 05·21
MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze
MambaGaze achieves 76.8% and 73.1% accuracy on CLARE and CL-Drive under leave-one-subject-out evaluation, using XMD encoding for blink and tracking-failure missingness, while Jetson edge benchmarks report 43-68 FPS real-time inference below 7.5W power consumption.
#Multimodal#Inference-opt#Benchmarking#NVIDIA
why featured
HKR-K passes with benchmark results, an explicit missing-data mechanism, and edge FPS/power. HKR-H and HKR-R are weak because gaze-based cognitive-load assessment is useful but narrow, so it stays in all.
editor take
MambaGaze hits 76.8%/73.1% LOSO accuracy; I buy the XMD trick, not stable cognitive-load inference yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
17:32
18d ago
arXiv · cs.CL· atomEN17:32 · 05·21
Reducing Political Manipulation with Consistency Training
The paper introduces Political Consistency Training, an RL method with two paradigms that reduces covert political bias in LLMs, and defines two metrics: Sentiment Consistency and Helpfulness Consistency.
#Alignment#Safety#Benchmarking#Research release
why featured
HKR-H/K/R pass: the title ties political manipulation to consistency training, and the summary gives two RL paradigms plus two metrics. No result numbers, model list, or artifact details are disclosed, so it stays in the 60–71 band.
editor take
PCT uses 2 RL paradigms to curb political bias; models and effect sizes aren’t disclosed, so I don’t buy the helpfulness claim yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
17:09
18d ago
HuggingFace Papers (takara mirror)· rssEN17:09 · 05·21
Research paper introduces ProxySHAP for approximating higher-order Shapley and Banzhaf interactions
The paper introduces ProxySHAP, which approximates higher-order Shapley and Banzhaf interactions using tree-based proxy models plus residual correction, and reports lower error than ProxySPEX and KernelSHAP-IQ on benchmarks that include large-scale settings with thousands of features.
#Interpretability#Benchmarking#ProxySHAP#ProxySPEX
why featured
HKR-K passes, but HKR-H/R fail. The item is a specialized interpretability-method paper with only an error claim versus ProxySPEX and KernelSHAP-IQ, triggering technical-accessibility fail.
editor take
ProxySHAP uses tree proxies plus residual correction; benchmarks claim wins on thousands of features, but code disclosure is absent here.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
17:04
18d ago
arXiv · cs.CL· atomEN17:04 · 05·21
ChronoMedKG: A Temporally Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning
ChronoMedKG introduces 460,497 evidence-linked triples across 13,431 diseases, ties associations to onset windows or progression stages, and adds ChronoTQA with 3,341 questions to test temporal clinical reasoning under retrieval conditions.
#RAG#Reasoning#Agent#ChronoMedKG
why featured
HKR-K is clear via dataset scale, and HKR-R is moderate for medical AI evaluation trust. The topic is vertical, and the body gives no model comparisons or deployment mechanism, so it stays in all.
editor take
ChronoMedKG keeps 460,497 evidence-linked triples; a 30-point temporal drop says clinical RAG still mishandles time.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
16:52
18d ago
arXiv · cs.CL· atomEN16:52 · 05·21
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild
AnyMo pre-trains a graph encoder with dense body-surface IMU simulation and paired placement views, then improves average HAR Accuracy/F1 by 11.7%/11.6% across 14 unseen downstream datasets.
#Multimodal#Embedding#Benchmarking#AnyMo
why featured
HKR-K passes via a concrete mechanism and 14 unseen-dataset gains. The human-motion/HAR scope is narrow for AI Radar, with weak HKR-H and HKR-R, so it stays in the lower research-signal band.
editor take
AnyMo gains 11.7% Accuracy on 14 unseen HAR sets; IMU generalization finally escapes fixed placement assumptions.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
16:51
18d ago
● P1arXiv · cs.CL· atomEN16:51 · 05·21
AMEL: Study of Accumulated Message Effects on LLM Judgments
AMEL tests 11 models across 75,898 API calls and finds that prior evaluation polarity shifts later LLM judgments in the same direction; negative histories induce 1.62x more bias than positive histories, while 5 and 50 prior turns produce the same shift.
#Reasoning#Benchmarking#Safety#OpenAI
why featured
HKR-H/K/R all pass: the paper claims conversation history systematically biases LLM judgments, backed by 75,898 API calls across 11 models. It affects eval reliability, safety review, and agent memory design, fitting the 78–84 research band.
editor take
11 models and 75,898 calls show polarity drag; if your LLM judge batches items in one chat, rerun your evals.
sharp
All 3 arXiv entries carry the same title and point to one v2 paper, so this is visibility across categories, not independent corroboration. The paper’s hook is strong: 75,898 API calls across 11 models from OpenAI, Anthropic, Google, and four open-source models show prior judgment polarity pulling later judgments with d=-0.17, rising to d=-0.34 on high-entropy items. I’d treat this as a direct hit on LLM-as-judge batching, not a cute bias artifact. Five prior turns and 50 prior turns produce the same shift, so longer context is not the culprit. Negative histories create 1.62x more bias than positive ones. Scaling trims the damage but leaves it: OpenAI Nano at -0.34, GPT-5.2 at -0.17; Anthropic Haiku at -0.22, Opus at -0.17. Fresh context per item is boring, expensive, and now hard to dodge.
HKR breakdown
hook knowledge resonance
open source
92
SCORE
H1·K1·R1
16:46
18d ago
arXiv · cs.CL· atomEN16:46 · 05·21
Tokenization with Split Trees
ToaST optimizes token counts with binary split trees and IP-based vocabulary selection, reducing token counts by over 11% versus BPE, WordPiece, and UnigramLM at vocabulary sizes of 40,960 and above.
#Inference-opt#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R pass, but this is a single arXiv tokenizer method with token-count results only; no open-source artifact, deployment path, or major-model adoption is disclosed, so it stays in the interesting research band.
editor take
ToaST cuts 11%+ tokens at 40,960 vocab; 1.5B runs gain 2.6–7.6%, so tokenizer work still has teeth.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
15:21
18d ago
HuggingFace Papers (takara mirror)· rssEN15:21 · 05·21
Enhancing Gaze Reasoning in Vision Foundation Models for Gaze Following
The paper proposes head-conditioned local LoRA and an out-of-cone penalty to improve gaze reasoning in vision foundation models for gaze following, reports state-of-the-art results on GazeFollow and VAT, highlights stronger gains when gaze targets are not semantically salient, and says the code will be released after paper acceptance.
#Vision#Reasoning#Fine-tuning#Research release
why featured
HKR-K passes with two concrete mechanisms and GazeFollow/VAT evaluation. HKR-H/R are weak, and the post gives no gain numbers or usable code, so this stays in the lower research band.
editor take
The paper claims SOTA on GazeFollow and VAT, but code waits for acceptance; I don’t buy gaze-following gains without repro.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
15:18
18d ago
HuggingFace Papers (takara mirror)· rssEN15:18 · 05·21
Decoupling Ego-Motion from Target Dynamics via Dual-Interval Motion Cues for UAV Detection
The paper proposes a vision-only UAV video detection framework that aligns adjacent frames with homography-based GMC, extracts short- and long-term motion cues through dual-interval differencing, and adds an MGA module to a Feature Pyramid Network, reporting consistent gains over a YOLOv8 baseline on VisDrone-VID without disclosing exact metrics in the snippet.
#Vision#YOLOv8#VisDrone-VID#Research release
why featured
HKR-K passes via concrete mechanisms and a benchmark setup, but HKR-H and HKR-R are weak. This is a narrow vision-detection paper, not hard-excluded, but below featured threshold.
editor take
The authors modify YOLOv8 on VisDrone-VID, but exact gains are undisclosed; until numbers land, GMC plus dual-interval differencing smells incremental.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
13:28
18d ago
HuggingFace Papers (takara mirror)· rssEN13:28 · 05·21
MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation
MaSC uses externally provided foreground concept masks to separate subject and background evaluation, reaching Krippendorff alpha 0.471 for concept preservation on DreamBench++ and AUC 0.992 for identity preservation on ORIDa.
#Vision#Multimodal#Benchmarking#MaSC
why featured
HKR-K passes with a testable evaluation mechanism and two metrics. HKR-H/R are weak: the title reads like a paper name, and the impact is concentrated in image-generation evaluation, so it fits the 60–71 research-signal band.
editor take
MaSC hits 0.471 alpha on DreamBench++; external foreground masks are the catch, so don't sell it as label-free evaluation.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
11:24
18d ago
HuggingFace Papers (takara mirror)· rssEN11:24 · 05·21
Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression
Meta-Soft compresses KV cache with a learnable orthogonal meta-library, a Gumbel-Softmax selector that synthesizes k prompt-specific Soft Tokens, and an attention-flow integration mechanism that moves information from removed tokens into retained tokens; the snippet says experiments on multiple datasets outperform existing eviction methods, but it does not disclose model sizes, compression ratios, latency numbers, or dataset names.
#Inference-opt#Memory#Research release#Benchmark
why featured
HKR-K and HKR-R pass: the item gives concrete compression mechanisms and a multi-dataset claim over eviction baselines. HKR-H is weak, and the post lacks code, throughput/memory numbers, or production evidence, so it stays in the interesting band.
editor take
Meta-Soft synthesizes k soft tokens via Gumbel-Softmax; no compression or latency numbers, so I’d treat it as idea-stage.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
05:37
19d ago
HuggingFace Papers (takara mirror)· rssEN05:37 · 05·21
FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments
FRED releases a multimodal autonomous driving dataset for flooded road environments, covering five locations with a 2.3 MP camera, 64-beam 360° LiDAR, IMU, and RTK GNSS data.
#Multimodal#Vision#Robotics#FRED
why featured
HKR-H and HKR-K pass: flooded roads are a concrete autonomy edge case, and the post gives sites plus sensors. HKR-R is weak because there is no benchmark result, license, adoption, or broader practitioner consequence.
editor take
FRED covers five flooded sites; sample count is undisclosed, but water-hazard labels beat another sunny-road dataset.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
05:18
19d ago
HuggingFace Papers (takara mirror)· rssEN05:18 · 05·21
Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness
DiTo changes token reduction for Diffusion Transformers from input-similarity matching to output-similarity-aware matching, reusing prior-step correspondences across reduction timesteps and reporting 1.6-3.9 dB higher PSNR than existing token reduction methods at comparable speedups.
#Vision#Inference-opt#DiTo#Research release
why featured
HKR-K/R pass: the item gives a concrete mechanism and a 1.6-3.9 dB PSNR gain tied to diffusion inference cost. HKR-H is weak, and this is a narrow single-paper summary, so it stays in all.
editor take
DiTo reports 1.6–3.9 dB PSNR gains at matched speedups; I buy the pivot from ViT-style input similarity to output-aware matching.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:08
19d ago
HuggingFace Papers (takara mirror)· rssEN04:08 · 05·21
Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables
The study tests knowledge graph construction on 6 statistical CSV datasets and finds serialization format plus extraction schema has a joint effect up to +1.180, while schema-format mismatch drops fact coverage below the unconstrained baseline on 4 of 6 datasets through entity inflation or extraction refusal.
#RAG#Benchmarking#CSVFidelity-Bench#Research release
why featured
HKR-H/K/R pass, but this is a narrow benchmarking paper, not a model or product release. Useful for table-to-KG/RAG pipelines, with limited industry spread, so it stays in 60–71.
editor take
CSVFidelity-Bench tests 15 CSV sets; schema mismatch undercuts unconstrained extraction on 4/6, so GraphRAG evals need direct graph access.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Matryoshka Concept Bottleneck Models
MCBM uses one nested concept hierarchy for multi-granularity inference without retraining separate models for concept budgets, reducing expected test-time intervention cost from linear order to O(log K) while guaranteeing monotonic performance improvement.
#Interpretability#Inference-opt#Research release
why featured
HKR-H and HKR-K pass via nested CBMs and the O(log K) intervention-cost claim; HKR-R is weak because impact centers on interpretability specialists. Single arXiv sourcing keeps it below featured.
editor take
MCBM claims O(log K) intervention cost. Experiments are undisclosed, so I’d treat it as a CBM deployment-cost paper.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction
SAVER uses a Conformal Groundability Gate to decide whether MNER spans or MRE entity pairs should consult visual evidence, then calibrates activation thresholds on a held-out split with Clopper-Pearson upper bounds. Experiments report higher F1 than text-only and always-on multimodal baselines, while reducing FLOPs and P90 latency.
#Multimodal#Vision#Benchmarking#SAVER
why featured
HKR-H/K/R all pass: selective vision is a clean hook, with a concrete calibration mechanism and cost-latency angle. The MNER/MRE scope is niche and exact F1/FLOPs/P90 numbers are not disclosed, so it stays in the 60–71 band.
editor take
SAVER gates vision per span with CGG and reports F1/FLOPs/P90 wins; datasets and margins aren’t disclosed here, so trust the routing idea, not the victory lap.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving
CoPhy distills VLM knowledge into a BEV encoder and removes the VLM at inference, uses an auto-regressive BEV world model to predict future semantic maps conditioned on candidate actions, and optimizes the driving policy with GRPO using physical rewards from BEV rollouts and cognitive rewards from a language-aligned scorer.
#Robotics#Vision#Reasoning#CoPhy
why featured
HKR-K/R pass: CoPhy gives a VLM-to-BEV distillation path, VLM-free inference, a BEV world model, and dual-reward GRPO. No results, code, or road-test evidence are disclosed, so this stays in the 60–71 band.
editor take
CoPhy claims SOTA on NAVSIM v1/v2, but RSS gives no scores; verify the BEV-distilled, VLM-free inference path first.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Bayesian Preference Learning for Test-Time Steerable Reward Models
ICRM models latent preference probabilities with a Bradley-Terry likelihood and a conjugate Beta prior, then steers reward models at test time using in-context preference demonstrations. The paper reports RM-Bench accuracy rising from 60.5 to 70.8 with more demonstrations, lower calibration error than a generative judge on moral dilemmas, broader Pareto frontiers under conflicting preferences, and stronger math reasoning rewards than a conventional reward model.
#Alignment#Reasoning#Benchmarking#Research release
why featured
HKR-K passes with a concrete mechanism and RM-Bench gain; HKR-R passes for alignment/eval relevance. As a single arXiv paper with a narrow technical title, it stays below the featured threshold.
editor take
ICRM lifts RM-Bench from 60.5 to 70.8; I buy test-time preference demos, but RSS omits model size and demo count.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Understanding and Improving Communication Performance in Multi-node LLM Inference
The paper introduces NVRAR, a hierarchical all-reduce algorithm using NVSHMEM, and reports 1.9–3.6x lower latency than NCCL for 128KB–2MB messages, plus up to 1.72x lower end-to-end batch latency for Llama 3.1 405B in multi-node decode-heavy tensor-parallel inference.
#Inference-opt#YALIS#NVRAR#NCCL
why featured
HKR-H/K/R pass: NVRAR vs NCCL and Llama 3.1 405B latency numbers are concrete. The topic is narrow distributed inference plumbing, so it stays below featured.
editor take
NVRAR cuts 128KB–2MB all-reduce latency 1.9–3.6x; for 405B decode, the ugly comms work is the bottleneck.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO matches dense Transformer performance under the same total parameter budget and training tokens, activates 20% of routed experts, and delivers a 2.93x inference speedup over dense inference on Jetson AGX Orin.
#Inference-opt#Tsinghua NLP#DECO#Jetson AGX Orin
why featured
HKR-K/R are strong: 20% expert activation and 2.93x Jetson AGX Orin speedup are concrete. The arXiv architecture angle is narrow for general AI pros, so it stays in the 60-71 band.
editor take
DECO activates 20% experts and runs 2.93x faster on Jetson AGX Orin; edge MoE finally tackles memory traffic head-on.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search
Lean Refactor uses a retrieval-augmented agentic framework to refactor Lean proofs, achieving over 70% token-level compression on competition benchmarks, over 20% on research repositories, and up to 60% compilation-time reduction while using version-filtered strategy retrieval for Lean/Mathlib compatibility.
#Agent#RAG#Code#Lean Refactor
why featured
HKR-K is strong and HKR-H comes from the concrete agentic proof-compression result; HKR-R is weak because Lean is niche. The practical numbers help, but the technical-accessibility drag keeps it in 60–71.
editor take
Lean Refactor cuts competition proofs by 70%+ tokens; I trust version-filtered retrieval more than the agentic-search wrapper.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR
PlexRL multiplexes unified LLM services across RLVR jobs with centralized model placement, state transitions, and function-level scheduling under affinity constraints, reducing user GPU-hour cost by up to 37.58% while preserving algorithmic flexibility and adding minimal per-job overhead.
#Reasoning#Inference-opt#PlexRL#Research release
why featured
HKR-K/R pass: the 37.58% GPU-hour cost cut and cluster orchestration mechanism are concrete and relevant to RLVR compute budgets. HKR-H is weak, and a single arXiv abstract keeps it below featured.
editor take
PlexRL cuts RLVR GPU-hour cost up to 37.58% via cluster scheduling; I buy it, but cluster scale is undisclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Efficient Numeracy in Language Models Through Single-Token Number Embeddings
The paper introduces BitTokens, which encodes any number as one token using its IEEE 754 floating-point representation, and reports that small language models learned basic arithmetic algorithms with near-perfect accuracy in experiments.
#Reasoning#Research release
why featured
HKR-H/K/R all have signal: single-token number embeddings are novel and tied to LLM numeracy pain. The post only gives basic arithmetic results, with no model size, error rate, code, or replication, so it stays in 60–71.
editor take
BitTokens packs any number into one token; near-perfect results cover basic arithmetic, not numeric reasoning broadly.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Runtime-Certified Bounded-Error Quantized Attention
The paper proposes a tiered KV cache architecture that stores INT8 keys and INT4 values in GPU memory while retaining FP16 originals in system RAM, computing per-head, per-step error bounds and fallbacks on LLaMA 3.1-8B with contexts up to 128K.
#Inference-opt#Safety#Benchmarking#LLaMA
why featured
HKR-K/R pass with a concrete KV-cache design, bit widths, and 128K test setup. HKR-H is weak, and this is a single arXiv paper without code or production evidence, so it stays in 60–71.
editor take
INT8/INT4 KV gets per-step error bounds plus FP16 fallback; don’t sell this as speed, it sells recoverability.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction
The paper proposes parallel chunk-level LLM reasoning with evidence-anchored consolidation, and experiments across multiple model types and sizes report about 84% lower omission error, up to 130% higher evidence traceability, and up to 91% fewer unsupported claims.
#Reasoning#RAG#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv methods paper with no named lab weight, artifact, or production replacement claim. Research-release signal fits 70 and tier all, below featured.
editor take
Parallel chunking cuts omissions 84%, but datasets and baselines aren’t disclosed here; don’t crown it a long-context fix yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Optimization Hyper-parameter Laws for Large Language Models
Opt-Laws predicts final LLM training loss from the LR schedule, model size, and data size; on held-out configurations, it achieves a 94% Top-2 hit rate for near-optimal schedule candidates and detects training divergence with F1=0.92.
#Reasoning#Benchmarking#Research release#Benchmark
why featured
HKR-K/R pass: the summary gives a testable mechanism and two metrics, tied to training-run failure cost. It stays all because this is a niche arXiv optimization paper with no code, author signal, or production validation disclosed.
editor take
Opt-Laws hits 94% Top-2 on held-out configs; I’d judge it by avoided full runs, not elegant loss prediction.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Multimodal LLMs under Pairwise Modalities
The paper proposes a two-stage framework for training MLLMs with pairwise modality data, using latent representation alignment and cross-modal recomposition; it evaluates the method by adding 3D point clouds and tactile modalities to pre-trained MLLMs with three modality pairs, while the RSS snippet does not disclose benchmark names or exact scores.
#Multimodal#Embedding#Research release
why featured
HKR-H and HKR-K pass: the paper offers a pairwise-modality training mechanism and 3 modality pairs. Without benchmarks, artifacts, or product impact, it stays in the lower research-release band.
editor take
It adds 3D point clouds and touch via 3 modality pairs; no benchmarks or scores disclosed, so treat it as a data-curation bet.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
DIVE: Embedding Compression via Self-Limiting Gradient Updates
DIVE compresses embeddings with a self-limiting hinge triplet loss and head-wise NT-Xent contrastive loss, and the 14M-parameter open-source adapter beats Matryoshka-Adaptor, Search-Adaptor, and SMEC across six BEIR datasets at every evaluated compression ratio.
#Embedding#RAG#Fine-tuning#DIVE
why featured
HKR-K has concrete mechanisms and BEIR comparisons; HKR-R hits RAG cost and latency. Still, this is a single arXiv compression method with benchmark wins, below the featured threshold.
editor take
DIVE uses a 14M adapter for embedding compression; it beats three baselines on six BEIR sets, but no absolute scores disclosed.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Research paper analyzes MXFP4 quantization error decomposition and recovery methods for LLM reinforcement learning
The paper decomposes MXFP4 quantization error into scale bias, deadzone truncation, and grid noise, then applies macro-block scaling, outlier fallback, and adaptive quantization noise on Qwen2.5-3B and Qwen3-30B-A3B-Base, recovering BF16 accuracy to within 0.7% and 3.0%, respectively.
#Reasoning#Fine-tuning#Inference-opt#Qwen
why featured
HKR-K is strong: the paper gives MXFP4 error mechanisms and Qwen experiment numbers. HKR-H/R are real for quantization and RL-tuning teams, but the low-level training focus keeps it in the 60–71 band.
editor take
MXFP4 lands within 0.7% of BF16 on Qwen2.5-3B; this error decomposition beats another mystery tuning recipe.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
STELLAR trains a 500M-parameter 3D perception model on 50 million driving examples. The model extends Sparse Window Transformer inputs to LiDAR, radar, cameras, and map priors, and reports a new state of the art on the Waymo Open Dataset challenge.
#Multimodal#Vision#Robotics#STELLAR
why featured
HKR-K/R pass on concrete scale, multimodal fusion, and Waymo benchmarking; HKR-H is weak. As a single AV perception paper rather than a product or foundation-model release, it stays in the 60–71 band.
editor take
STELLAR trains 500M parameters on 50M driving examples; autonomy perception is finally doing its scaling-law homework.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models
AVIS uses autoregressive video diffusion models for streaming video restoration, reducing initial latency from 114 seconds to 4 seconds and raising throughput from 0.71 to 1.18 FPS versus leading non-autoregressive solvers.
#Vision#Inference-opt#AVIS#AVIS Flash
why featured
HKR-H/K pass on the concrete latency/FPS gains and autoregressive streaming mechanism. HKR-R is weak: this remains a niche arXiv video inverse-problem paper, so it stays below featured.
editor take
AVIS Flash hits 5.91 FPS on one RTX 4090; video inverse solvers are starting to look deployable, not just publishable.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis
TelecomTS introduces an observability dataset derived from a 5G telecommunications network, preserving de-anonymized covariates and absolute scale information while covering anomaly detection, root cause analysis, and multi-modal question-answering tasks.
#Multimodal#Reasoning#Benchmarking#TelecomTS
why featured
Single arXiv dataset paper with concrete data shape and task setup, so HKR-K/R pass. The topic is narrow and lacks model or product impact, keeping it in the interesting-but-not-featured band.
editor take
TelecomTS keeps absolute 5G metric scale; normalized time-series benchmarks are a bad proxy for observability agents.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems
WestWorld pretrains on 89 simulation and real-world environments, using Sys-MoE and structural embeddings to improve zero-shot and few-shot trajectory prediction across diverse robot morphologies.
#Robotics#Reasoning#WestWorld#Unitree Go1
why featured
HKR-K and HKR-R pass: 89 environments plus Sys-MoE give concrete research signal, and cross-embodiment generalization matters for robotics teams. Single arXiv source and a jargon-heavy title keep it below featured.
editor take
WestWorld pretrains on 89 environments; Sys-MoE plus structural embeddings is practical for cross-morphology robots, but gains aren't disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
MeMo: Memory as a Model
MeMo encodes new knowledge into a dedicated memory model while keeping LLM parameters unchanged, and the paper evaluates it on three benchmarks: BrowseComp-Plus, NarrativeQA, and MuSiQue.
#RAG#Memory#Tools#MeMo
why featured
HKR-H/K/R pass, but the post gives only the mechanism and 3 benchmark names; no metrics, code, or model scale are disclosed. Interesting research signal, below featured threshold.
editor take
MeMo reports 3 benchmarks and corpus-size-independent retrieval cost; I’m waiting on update cost and latency, both absent here.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Informationally Compressive Anonymization for Privacy-Preserving Supervised Machine Learning
The paper introduces ICA and the VEIL architecture, which encode raw inputs inside a trusted Source Environment into low-dimensional, task-aligned latent vectors; the abstract says the method avoids noise budgets, gradient clipping, and encryption at inference time.
#Fine-tuning#Inference-opt#Safety#arXiv
why featured
HKR-K/R pass: the paper offers a concrete privacy mechanism and a non-degradation claim. As a single arXiv item with no disclosed metrics or artifact in the summary, it stays in the lower interesting band.
editor take
ICA compresses raw inputs into latent vectors; no benchmarks disclosed, so treat “zero reconstruction” as a theorem setup.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Towards Autonomous Mechanistic Reasoning in Virtual Cells
The paper introduces VCR-Agent, a multi-agent framework that uses mechanistic action graphs, biologically grounded retrieval, and verifier-based filtering to generate and validate virtual-cell explanations, and releases VC-TRACES from the Tahoe-100M atlas.
#Agent#RAG#Reasoning#VCR-Agent
why featured
HKR-H and HKR-K pass via the virtual-cell agent hook and concrete framework/dataset details. HKR-R is weak because the biology setting is niche and no product or general-agent impact is disclosed.
editor take
VCR-Agent derives VC-TRACES from Tahoe-100M; size is undisclosed, so the verifier’s hallucination filter is the bet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory
BudgetMem structures runtime agent memory as modules with Low, Mid, and High budget tiers, then trains a compact reinforcement-learning router to choose tiers per query; across LoCoMo, LongMemEval, and HotpotQA, it beats strong baselines in the high-budget setting and improves accuracy-cost frontiers under tighter budgets.
#Agent#Memory#Reasoning#BudgetMem
why featured
HKR-K/R pass: agent-memory cost control is useful, and the post names the RL routing mechanism plus benchmarks. No accuracy/cost numbers or artifact are disclosed, so it stays in the 60-71 all band.
editor take
BudgetMem tests three memory-budget tiers on 3 benchmarks; I like the setup, but RSS gives no cost numbers.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Statistical Guarantees in the Search for Less Discriminatory Algorithms
The paper formalizes LDA search as an optimal stopping problem and proposes an adaptive stopping algorithm that gives a high-probability upper bound on disparate-impact gains from continued retraining.
#Safety#Benchmarking#arXiv#Black et al.
why featured
HKR-K is clear: optimal stopping plus high-probability bounds. HKR-R lands on fairness-audit cost, but the academic framing and narrow scope keep it below the 72 featured line.
editor take
Black et al. turn LDA search into a stopping rule; dataset sizes aren’t disclosed, but legal audit teams will want this certificate.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Diffusion Models Memorize in Training -- and Generalize in Inference
The paper analyzes diffusion models’ denoising objective and finds a validation-training generalization gap most pronounced at intermediate noise levels, while inference does not reproduce training samples because sampling trajectories move far from the noisy training-sample distribution used during training.
#Multimodal#Benchmarking#Interpretability#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv paper on training dynamics with no product, artifact, or cross-source debate. It fits the 60–71 band for useful but non-featured research.
editor take
Diffusion overfits hardest at intermediate noise; the wild part is model error blocks recall once sampling leaves training-noise support.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
DelTA: Discriminative Token Credit Assignment for Verifiable Reward Reinforcement Learning
DelTA reweights a self-normalized RLVR surrogate with discriminative token coefficients, and on seven math benchmarks it improves over the strongest same-scale baselines by 3.26 points on Qwen3-8B-Base and 2.62 points on Qwen3-14B-Base.
#Reasoning#Fine-tuning#Alignment#Qwen
why featured
HKR-K is strong and HKR-R is moderate: concrete RLVR mechanism and Qwen3 math gains, but it is still an arXiv training paper with no product impact or cross-source cluster.
editor take
DelTA adds 3.26 points on Qwen3-8B across 7 math benchmarks; I like that it attacks RLVR’s formatting-token noise directly.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Code Generation by Differential Test Time Scaling
DiffCodeGen selects code-generation candidates with coverage-guided differential analysis, without public tests or extra LLM calls for selection. The paper evaluates it across 4 large language models and reports consistent gains over baselines, with competitive or better performance than state-of-the-art test-time scaling methods while using fewer time and token resources.
#Code#Inference-opt#Agent#DiffCodeGen
why featured
HKR-H/K/R pass, but the body gives only the mechanism and a 4-model evaluation, not gains, datasets, or artifacts. A single arXiv codegen method fits the 60–71 band.
editor take
DiffCodeGen selects candidates across 4 LLMs without extra LLM calls; code TTS needs execution traces, not more sampler spam.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data
Cormac Cureton and Narges Armanfard propose TabPFN-MT for multi-target tabular in-context learning, evaluating it on 344 datasets with fewer than 1,000 samples on average and reducing inference for T tasks from O(T) to O(1) forward passes.
#Reasoning#Inference-opt#Cormac Cureton#Narges Armanfard
why featured
HKR-H and HKR-K pass: TabPFN-MT gives a 344-dataset setup and an O(T)-to-O(1) multitask inference claim. The tabular small-sample focus narrows HKR-R, keeping it in the 60–71 research-signal band.
editor take
TabPFN-MT cuts T-task inference to O(1). For small tabular data, PFNs still look cleaner than general LLMs.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Research paper introduces Spectral Souping framework for online preference alignment
The paper introduces Spectral Souping, which learns an offline basis of specialized policies and merges outputs or parameters at inference time, adapting LLMs to individual preferences without costly online retraining against tailored preference rewards.
#Alignment#Fine-tuning#Inference-opt#Research release
why featured
HKR-H/K/R pass, but the post gives only the mechanism summary; authors, benchmark numbers, scale, and code are not disclosed. This is useful alignment research, not a same-day industry story.
editor take
Spectral Souping uses a two-phase offline-basis, inference-merge setup for preference alignment. No gains disclosed; “universal spectral representation” needs proof beyond soup demos.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models
The paper evaluates unlearned language models on TOFU multiple-choice QA and finds that models retain low calibration error around ECE 0.04 after unlearning, while forget-split accuracy drops and attribution with Integrated Gradients and Local Mutual Information shows greater reliance on correlation-based tokens.
#Alignment#Interpretability#Benchmarking#arXiv
why featured
HKR-H/K/R pass on a concrete evaluation paradox, ECE number, and safety-eval relevance. Single arXiv paper, narrow scope, no artifact or broad discussion, so it stays in all.
editor take
Unlearned models keep ECE≈0.04 while losing TOFU forget accuracy; calibration as unlearning reliability is a bad proxy.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting
The paper proposes two training-free Table QA prompting frameworks, TGN and PIP, and evaluates 17 LLMs against 6 baselines on TableBench and FeTaQa; TGN scores 3.8 points above the strongest TableBench baseline, while PIP reports SOTA over ReAct and Chain-of-Thought on FeTaQa.
#Reasoning#Tools#Fine-tuning#arXiv
why featured
HKR-K and HKR-R pass: the paper gives training-free mechanisms, a 17-model evaluation, and a +3.8-point gain. HKR-H fails because the angle is dry, so this stays in the 60–71 research-signal band.
editor take
TGN gains 3.8 on TableBench; training-free is not cheap until token cost and table-size limits are disclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
The paper benchmarks four local LLMs for EHR GraphRAG on one 8 GB VRAM consumer GPU; Llama 3.1 builds the richest graph with 1,172 entities, Qwen 2.5 scores highest on answer quality at 3.3/5, and 3.8B Phi-4-mini fails the pipeline because of structured-output errors.
#RAG#Benchmarking#Reasoning#Microsoft
why featured
HKR-K and HKR-R are clear: 8GB VRAM, four local models, and structured-output failure are testable details. The healthcare EHR niche limits reach, so it stays in the 60–71 band.
editor take
Four local models ran 8GB EHR GraphRAG; Qwen 2.5 tops out at 3.3/5. Offline compliance, not cheap reliability.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
roto 2.0: The Robot Tactile Olympiad
roto 2.0 introduces a GPU-parallelized tactile RL benchmark across four robotic morphologies with 16–24 DOF; its blind agents use only proprioception and tactile sensing, without state information or distillation, and achieve 13 Baoding ball rotations in 10 seconds.
#Robotics#Benchmarking#roto#Research release
why featured
This arXiv robotics benchmark clears HKR-H/K with concrete mechanisms and numbers. It lacks HKR-R beyond a narrow robotics RL crowd and has no product or platform impact, so it stays in the 60-71 band.
editor take
roto 2.0 spans four 16–24 DOF hands and hits 13 rotations in 10s; tactile RL finally gets a usable arena.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Winfree Oscillatory Neural Network
The paper proposes WONN, a neural architecture using generalized Winfree dynamics to evolve representations on a torus, and evaluates it on CIFAR, ImageNet, Maze-hard, and Sudoku, with Maze-hard reaching 80.1% accuracy using 1% of prior state-of-the-art parameters.
#Reasoning#Vision#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: 1% parameters and 80.1% on Maze-hard create a real hook, with Winfree torus dynamics and multiple benchmarks disclosed. A single niche arXiv architecture paper stays below featured.
editor take
WONN hits 80.1% on Maze-hard with 1% parameters; ImageNet details aren’t disclosed, so I’d file it under strong inductive bias.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
The paper introduces CSA, a deployment-side wrapper for RLVR-trained local LLMs, and reports pathwise validity plus non-refusing deployment across 480 specialist streams, 160 adversarial shift streams, and 10,300 online LoRA rounds.
#Safety#Fine-tuning#Alignment#Research release
why featured
HKR-K/R pass: CSA plus three concrete test scales, tied to RLVR deployment risk. HKR-H is weak, and the conformal-risk framing is specialist, so this stays in all.
editor take
CSA stayed non-refusing across 480 specialist streams, 160 shift streams, and 10,300 LoRA rounds; regulated local LLMs need wrappers like this.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
Frontier simulates modern LLM inference serving with disaggregated execution and stateful workloads, achieving below 4% average throughput error on a 16-H800 GPU testbed and reducing end-to-end latency error from 44.9% to 6.4% under co-location.
#Inference-opt#Agent#Reasoning#Frontier
why featured
HKR-H/K/R all pass, but this is an arXiv inference-simulation paper for infra readers, with no major-lab release or adoption signal, so it stays in 60–71.
editor take
Frontier gets under 4% throughput error on 16 H800s; inference simulation is finally catching up to PDD, AFD, and agent workloads.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Pseudo-Formalization for Automatic Proof Verification
The paper proposes Pseudo-Formalization and Block Verification, decomposing natural-language proofs into modules with premises, conclusions, and proofs, then evaluating PF+BV on 2 olympiad and research-level math benchmarks where it outperforms LLM-as-judge baselines on error-finding precision and recall.
#Reasoning#Benchmarking#ArxivMathGradingBench#Research release
why featured
HKR-K is clear: a new verification mechanism plus 2 benchmark comparisons. HKR-R is present around evaluation reliability, but the arXiv-only summary lacks effect sizes, dataset details, and reproducibility conditions, so it stays in all.
editor take
PF+BV beats LLM-as-judge on 2 math-verification benchmarks; I buy weak formalization before forced Lean translation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
TimeRewarder models temporal distances between frame pairs from robot demonstrations and human videos, supplying step-wise proxy rewards that reached near-perfect success on 9 of 10 Meta-World tasks with 200,000 environment interactions per task.
#Robotics#Vision#Fine-tuning#TimeRewarder
why featured
HKR-H and HKR-K pass: passive-video reward learning is a clear hook, with 10 tasks, 200k interactions, and 9 near-full-success results. As a single robotics paper with limited product immediacy, it stays in the 60–71 band.
editor take
TimeRewarder nearly solved 9/10 Meta-World tasks at 200k interactions each; I don’t buy real-robot generalization from this benchmark yet.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
PlanningBench abstracts real planning scenarios into more than 30 task types, subtasks, constraint families, and difficulty factors, then uses a constraint-driven synthesis pipeline to generate verifiable data for LLM evaluation and reinforcement-learning training.
#Reasoning#Benchmarking#Fine-tuning#PlanningBench
why featured
HKR-K and HKR-R pass: the paper offers a concrete verifiable planning-data pipeline. It stays in the 60–71 band because it is a single arXiv paper with no disclosed model gains or adoption.
editor take
PlanningBench spans 30+ planning factors; I buy the verifiable synthesis angle, but model roster and gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards
The paper introduces a PPO fine-tuning framework for code-generating LLMs, using execution-aware rewards for syntax, correctness, style, security, and simulator executability; it reports a 19% absolute pass@1 gain on MBPP and a 51% reduction in execution failures on RoboEval.
#Code#Fine-tuning#Robotics#Research release
why featured
HKR-K has concrete benchmark deltas, and HKR-R maps to code generation and robotics reliability. But this is a single arXiv paper with an academic title and no disclosed artifact or major-lab signal, so it stays in all.
editor take
PPO lifts MBPP pass@1 by 19% and cuts RoboEval failures 51%; I want the post-toy-benchmark survival rate.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis
Praxium detects cloud microservice anomalies with over 0.97 macro-F1 across 75 trials and four synthetic anomaly types, then uses causal impact analysis over recent software installations to infer the root cause under increasingly short package-install intervals.
#Agent#Reasoning#Praxium#PraxiPaaS
why featured
HKR-K is strong on metrics and attribution mechanism; HKR-R hits cloud incident triage. HKR-H is weak, and synthetic anomalies keep it in the 60–71 all band.
editor take
Praxium hits >0.97 macro-F1 across 75 synthetic trials; the SRE sell is causal install attribution under compressed rollout intervals.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
arXiv:2605.20189 proposes SOLAR, an autonomous agent for test-time adaptation using parameter-level meta-learning, multi-level reinforcement learning, and a knowledge base of valid modification strategies; the abstract says experiments cover six reasoning categories—commonsense, math, medical, coding, social, and logical—but does not disclose scores.
#Agent#Reasoning#Memory#SOLAR
why featured
HKR-H and HKR-K pass: the lifelong-agent hook is clear, and the summary gives three mechanisms plus six task categories. No scores, code, or production-replacement evidence are disclosed, so it stays in the 60–71 band.
editor take
SOLAR spans 6 reasoning categories, but scores are undisclosed; treating weights as an RL environment is clever, lifelong learning is unproven.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification
The paper tests LLaMA-3.1 8B across 8-, 4-, 3-, and 2-bit quantization on 82 interview transcripts, and proposes multi-pass prompt verification to reduce hallucinations and unstable qualitative-analysis outputs under low-bit settings.
#Inference-opt#Alignment#LLaMA#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete setup and a verification mechanism, and it speaks to low-cost deployment reliability. The use case is narrow and HKR-H fails, so it stays in the 60–71 band.
editor take
LLaMA-3.1 8B ran on 82 transcripts; 8-bit holds up, 4-bit needs verification, 2/3-bit is risky for coding.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison
ClaimDiff-RL uses reference-conditioned atomic visual claim differences as the reward unit for caption RL, separating hallucinated claims from omitted salient facts; on a 160-image human-labeled diagnostic benchmark, public captioning benchmarks, and VQA benchmarks, it improves the hallucination–missing-fact balance and surpasses Gemini-3-Pro-Preview on several fine-grained capability dimensions.
#Vision#Multimodal#Fine-tuning#ClaimDiff-RL
why featured
HKR-K/R pass: the paper offers a concrete reward mechanism and a 160-image diagnostic set for VLM hallucinations. As a single arXiv paper with limited scale, it stays in the 60–71 band.
editor take
ClaimDiff-RL rewards atomic visual claims; 160 diagnostic images is thin, but splitting hallucination from omission beats scalar caption scores.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding
Chronicle trains a 324M-parameter decoder-only Transformer from scratch for text and time series, uses one shared backbone, and reports evaluation on 19 NLU tasks, 24 UCR/UEA datasets, and Time-MMD multimodal forecasting.
#Multimodal#Benchmarking#Paul Quinlan#Gemma
why featured
HKR-H and HKR-K pass: a 324M decoder-only backbone spans text and time series with concrete benchmark settings. It remains a single arXiv research prototype without product impact or major-lab pull, so it stays in the 60–71 band.
editor take
Chronicle runs text and time series through one 324M backbone; I buy the setup, not the implicit scratch-training victory lap.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
AGPO uses group-level statistics to control clipping and decoding temperature, and Qwen2.5-14B trained with AGPO beats PPO and GRPO on nine English and Chinese math/STEM benchmarks under the same generated-token budget, reaching 67.3% on GSM8K and 40.5% on MATH; gains also transfer to Llama-3-8B and Gemma-2-9B.
#Reasoning#Fine-tuning#Benchmarking#Qwen
why featured
HKR-K is solid: AGPO gives a testable mechanism and Qwen2.5-14B math results. HKR-R is narrow to reasoning fine-tuners, and this is a single arXiv paper, so it stays in the 60–71 band.
editor take
AGPO beats PPO/GRPO on 9 math/STEM benchmarks; I buy the mechanism, not broad claims from 67.3% GSM8K.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds
CROWDio runs partitioned ONNX inference for a 67M-parameter DistilBERT across five Android handsets, holding peak per-device RSS at 43±2 MB and reducing streaming-concurrency batch latency by 34% versus barrier synchronization.
#Inference-opt#CROWDio#DistilBERT#Android
why featured
HKR-K is strong and HKR-H has a concrete Android-crowd hook, but the item is a narrow systems-optimization arXiv paper with limited practitioner reach, so it stays in all.
editor take
CROWDio runs 67M DistilBERT on five Androids at 43±2MB RSS; neat, but the comms bill is still underexplained.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Spectral Structural Distortion Reveals Redundant Neurons in Neural Networks
The paper proposes a spectral structural importance score that compares neuron-level graphs before and after each layer transformation to identify redundant units; pruning recomputes scores after each structural change, performs no intermediate parameter updates, and applies one recovery fine-tuning stage after reaching the target reduction.
#Inference-opt#Interpretability#Fine-tuning#Research release
why featured
HKR-K and HKR-R pass via a concrete pruning mechanism and cost angle. HKR-H is weak, and the article lacks compression ratios, accuracy loss, or benchmark details, so it stays in all.
editor take
This scores pruning via graph-spectral distortion, but reports no compression ratios or baselines here; for now, it's an interpretable-pruning candidate.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
CAdam reframes 3DGS densification as signal verification and reduces Gaussian counts by 85%-97% across SDS, ISM, and VFDS objectives while preserving comparable perceptual quality in optimization-based generative distillation.
#Vision#Inference-opt#Research release
why featured
HKR-K is strong via the 85%-97% Gaussian reduction and a clear densification mechanism; HKR-H comes from the efficiency contrast. The SDS/ISM/VFDS context is narrow, so it stays in all rather than featured.
editor take
CAdam cuts Gaussian counts 85%-97% under SDS, ISM, VFDS; the SNR gate is the sane part—stop densifying noise.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization
HeadQ corrects KV-cache quantization error with a low-rank residual side code in a learned query basis; across six models, K-only WikiText-103 decode experiments with dense values removed about 84%–94% of excess perplexity on the strongest 2-bit rows.
#Inference-opt#Benchmarking#HeadQ#Pythia
why featured
HKR-K is strong and HKR-R is limited: the paper gives a concrete correction mechanism and 84%-94% reductions, but HKR-H is weak and there is no product release or broad sourcing. This stays in the 60-71 all band.
editor take
HeadQ removes 84–94% excess perplexity in six-model 2-bit K-only decode; KV quantization needs logits, not MSE worship.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs
Sangwoo Park and eight coauthors propose SELFCI, a complementary self-distillation framework that uses two independent reverse KL objectives over feedback-derived teacher distributions to separate task-relevant information preservation from minimal disclosure; the 28-page paper includes 16 figures, but the abstract does not disclose exact improvement numbers over GRPO or other baselines.
#Alignment#Safety#Agent#Sangwoo Park
why featured
HKR-K/R pass: SELFCI adds a two-teacher reverse-KL self-distillation setup for retention vs disclosure. HKR-H is weak, and the excerpt gives no gains or reproducible result, so it stays in all.
editor take
SELFCI splits privacy and utility with two reverse-KL losses; no gains disclosed, so the GRPO-beating claim stays soft.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders
The paper benchmarks 13 Python-accessible JPEG decode paths on five matched 16-vCPU Google Cloud CPUs, using the 50,000-image ImageNet validation split to compare single-thread throughput with PyTorch DataLoader throughput at 0, 2, 4, and 8 workers.
#Benchmarking#Tools#PyTorch#TensorFlow
why featured
HKR-H/K/R pass, but this is an ML data-pipeline benchmark with impact mainly for vision-training engineers. No model release, product capability, or industry-level event, so it stays in the 60–71 band.
editor take
13 JPEG paths across five 16-vCPU CPUs show single-thread decode charts mislead PyTorch DataLoader choices.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Multi-step likelihood-ratio correction for reinforcement learning with verifiable rewards
The paper proposes NFPO, which augments PPO for RLVR with the cumulative likelihood ratio over the next N-1 tokens, and reports consistent gains on reasoning benchmarks while the snippet does not disclose benchmark names or exact scores.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K is clear: NFPO adds a concrete likelihood-ratio correction to PPO. HKR-R applies for RLVR stability, but no gain size, model scale, or reproduction detail is disclosed, so this stays all.
editor take
NFPO adds next-N-1-token likelihood ratios to PPO; scores aren’t disclosed, so RLVR is back to bias-variance bookkeeping.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory
DODOCO instruments five MoE checkpoints across five sequence-mixer designs and an EP scan from 4 to 32 H100 ranks. The study finds EP scaling changes each architecture’s per-expert max/mean token ratio by at most 5%, while mock tokens overestimate routing Gini by up to 2.35× and create a batch-size trend that disappears with real text.
#Inference-opt#Benchmarking#DeepSeek#Qwen
why featured
HKR-K/R pass: it gives test scale and Gini-bias numbers, and MoE serving cost matters to infra teams. HKR-H is weak; EP dispatch diagnostics are narrow, so this stays in the 60-71 all band.
editor take
DODOCO tests 5 MoEs on 4–32 H100 EP ranks; mock tokens inflate routing Gini 2.35×, so many AlltoAll papers rest on sand.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers
The paper introduces TaperNorm, which tapers RMSNorm or LayerNorm into sample-independent linear or affine maps, and reports up to 1.18× higher throughput after folding in a KV-cached autoregressive decoding benchmark.
#Inference-opt#Research release
why featured
HKR-K is clear: TaperNorm tapers RMSNorm/LayerNorm into a sample-independent mapping and reports 1.18x throughput. HKR-R is cost-relevant, but HKR-H is weak and the feed only gives abstract-level detail, so it stays in 60–71.
editor take
TaperNorm reports 1.18× decoding throughput; I trust foldable inference knives more than another architecture slogan.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
rePIRL: Learn PRM with Inverse RL for LLM Reasoning
rePIRL trains process reward models with a dual learning loop that alternately updates the policy and PRM, and the arXiv abstract says it outperforms existing methods on standardized math and coding reasoning datasets.
#Reasoning#Alignment#Fine-tuning#arXiv
why featured
HKR-K and HKR-R pass: the paper gives a concrete inverse-RL PRM training mechanism tied to reasoning reliability. No gains, model scale, or reproducibility details are disclosed, so it stays in the 60–71 research band.
editor take
rePIRL alternates policy and PRM updates; no scores in the snippet, so treat the generalization claim as unverified.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization
OCTOPUS compresses transformer KV cache with joint quantization of rotated coordinate triplets; across text, video, and audio, it matches or beats prior rotation codecs at every reported bit width and metric, and a fused Triton path reconstructs keys online without materializing uncompressed keys.
#Inference-opt#Multimodal#OCTOPUS#TurboQuant
why featured
HKR-K/R pass: KV-cache quantization is practical for serving cost and memory. HKR-H fails because the angle is a dense arXiv method, and the snippet lacks speedup, memory numbers, code status, or adoption, so this stays in all.
editor take
OCTOPUS beats TurboQuant at every reported bit width; KV-cache compression is now fighting over geometry, not just kernels.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs
TimeSRL uses a two-stage LLM pipeline to convert passive-sensing time series into natural-language abstractions before predicting mental-health outcomes, and under a leave-one-dataset-out protocol it reduces anxiety MAE by 3.1–44.1% versus non-LLM and LLM baselines.
#Reasoning#Fine-tuning#Benchmarking#TimeSRL
why featured
HKR-H and HKR-K pass: the cross-modal framing is fresh, and LOSO plus MAE reductions are concrete. It remains a vertical arXiv paper with no artifact or deployment, so it stays in the 60–71 band.
editor take
TimeSRL cuts anxiety MAE 3.1–44.1% under LOSO; I buy the semantic bottleneck, but mental-health cohorts leak easily.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA
JUDO uses normal images as visual domain context to segment defect regions, injects domain knowledge through SFT, and guides reasoning with GRPO rewards; the paper reports higher MMAD benchmark performance than Qwen2.5-VL-7B and GPT-4o, while the RSS abstract does not disclose exact scores.
#Multimodal#Vision#Reasoning#JUDO
why featured
HKR-H/K pass: JUDO uses normal images as visual context plus SFT and GRPO, claiming MMAD gains over Qwen2.5-VL-7B and GPT-4o. Single arXiv paper and niche inspection scope keep it in all.
editor take
JUDO beats GPT-4o on MMAD, exact scores undisclosed; in industrial QA, normal-image context still trumps generic vision muscle.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Mechanisms of Misgeneralization in Physical Sequence Modeling
The paper defines physical misgeneralization: generated trajectories look plausible individually, but their aggregate physical-quantity distribution is wrong, and it uses a data deviation kernel to predict mass shifts across synthetic, maze-navigation, and double-pendulum tasks.
#Robotics#Benchmarking#Research release
why featured
HKR-K passes via a named failure mode and prediction mechanism; HKR-H passes on the plausible-trajectory/wrong-distribution hook. The arXiv paper is niche research, not a product, safety incident, or broad tooling release, so it stays in 60–71.
editor take
Physical misgeneralization names a nasty failure: valid-looking trajectories, shifted energy distributions. For robotics, that beats another planner score.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding
NeuroQA introduces 56,953 image-grounded 3D brain MRI QA pairs from 12,977 subjects across 12 datasets, and the best zero-shot vision-language model reaches 47.5% accuracy on closed-format public test items, below the 49.4% text-only majority-template floor.
#Vision#Multimodal#Benchmarking#NeuroQA
why featured
HKR-H/K pass: the dataset scale and 47.5% zero-shot result are concrete. Scope is narrow medical 3D MRI benchmarking, with no product or major-model release, so it stays in the 60–71 research-signal band.
editor take
NeuroQA has 56,953 3D MRI QAs; best zero-shot hits 47.5%, below the 49.4% text-only majority floor.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Compute Only Once: UG-Separation for Efficient Large Recommendation Models
ByteDance presents UG-Sep for TokenMixer-based recommendation models, reusing user-side computation through separated user and item flows, then adding W8A16 weight-only quantization; online A/B tests across Douyin Feed, Hongguo Feed, Chuanshanjia Ads, and Qianchuan Ads report up to 20% lower inference latency without adverse business-metric changes.
#Inference-opt#ByteDance#Douyin#TokenMixer
why featured
HKR-K/R pass via a concrete mechanism and online A/B latency number. HKR-H is weak because UG-Separation for TokenMixer is vertical infra research, with no product or open-source hook for a broader AI audience.
editor take
UG-Sep cuts online A/B latency up to 20%; TokenMixer recommenders finally get reusable user-side compute across ads and feeds.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation
CompilerKV compiles KV-retention correction tables offline from a calibration corpus and reaches compressed SOTA on four backbones under a 512-token budget, beating the strongest prefill-only baseline by 1.67 points on average with a 95% CI of [+1.08,+2.37].
#Inference-opt#CompilerKV#LongBench#SnapKV
why featured
HKR-K/R pass: 512-token budget, four backbones, and +1.67 avg over the strongest prefill-only baseline. HKR-H fails on a narrow arXiv title; no deployment or open-source hook, so it stays in 60–71.
editor take
CompilerKV beats the best prefill-only baseline by 1.67 at 512 tokens; 0.4–0.8 cross-model loss makes online SnapKV-style estimation look shaky.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
How Many Human Survey Respondents Is a Large Language Model Worth? An Uncertainty Quantification Perspective
The paper proposes a framework that converts LLM-simulated survey responses into confidence sets for human population parameters and adaptively selects the simulation sample size; the abstract does not disclose specific model names, dataset counts, or coverage numbers.
#Benchmarking#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the title has a sharp hook and the paper offers confidence sets plus adaptive sample sizing. Missing models, datasets, coverage rates, and respondent-equivalence numbers keep it in all.
editor take
This frames LLM survey simulation as coverage control; no model names or rates disclosed, so stop treating 10k synthetic answers as sample size.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
The paper proposes ProxyCoT, a training framework that generates chain-of-thought traces on proxy contexts via reinforcement learning or teacher distillation, then grounds them in full long contexts with supervised fine-tuning; the abstract says it outperforms strong baselines across datasets with lower computational overhead.
#Reasoning#Fine-tuning#Research release
why featured
HKR-K is clear via the ProxyCoT training mechanism, and HKR-R hits long-context cost concerns. The post does not disclose scores, dataset names, cost reduction, or code, so it stays in the 60–71 research-release band.
editor take
ProxyCoT trains CoT on proxy contexts, then SFTs full contexts; 10M-token windows still fail at retrieval-conditioned reasoning.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models
The paper introduces PolyNeXt, replacing ReLU, GELU, and softmax in MLPs, convolutions, and attention with Hadamard-product polynomial modules, and reports matching or exceeding activation-based MetaFormer models on ImageNet classification, ADE20K segmentation, and out-of-distribution robustness.
#Vision#Benchmarking#MetaFormer#PolyNeXt
why featured
HKR-H/K pass: PolyNeXt has a counterintuitive activation-free vision design and tests on ImageNet, ADE20K, and OOD robustness. HKR-R is weak; no deployment, cost, open-weight, or flagship-model impact is disclosed.
editor take
PolyNeXt swaps ReLU, GELU, and softmax for Hadamard products; I buy the direction, but scores are undisclosed here.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU
DASH searches hybrid attention architectures on Qwen2.5-3B-Instruct using 12.3 million tokens per run and finishes in about 20 minutes on a single RTX Pro 6000 GPU.
#Inference-opt#Reasoning#Benchmarking#Qwen
why featured
HKR-H and HKR-K pass: the title has a one-GPU minutes-level search hook, and the post gives hardware/token conditions. Still, architecture search is specialist research, below featured threshold.
editor take
DASH searches Qwen2.5-3B with 12.3M tokens in 20 minutes; Jet-Nemotron’s 200B-token search bar just got embarrassing.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Mitigating Label Bias with Interpretable Rubric Embeddings
The paper proposes rubric embeddings to replace black-box embeddings with expert-defined criteria, evaluates them on a new dataset of applications to a large master's program, and reports reduced group disparities plus improved cohort quality measures under biased-label conditions.
#Embedding#Interpretability#Alignment#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete mechanism and admissions-data test, with fairness relevance. Single arXiv source and no disclosed effect sizes keep it in the 60–71 band.
editor take
Rubric embeddings reduce disparities on master's admissions data; sample size is undisclosed, so interpretability is no bias waiver.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
DEL trains numerical prediction with digit conditional probability and binary cross-entropy, and reports higher overall accuracy and numerical-distance results across seven mathematical reasoning benchmarks and four LLM families: CodeLlama, Mistral, DeepSeek, and Qwen-2.5.
#Reasoning#Code#Fine-tuning#CodeLlama
why featured
HKR-K/R pass: the mechanism and evaluation setup are concrete, and LLM numeracy is a real practitioner pain. This is still a single arXiv method paper with no major model release, product impact, or cross-source cluster, so it stays in 60–71.
editor take
DEL wins on 7 math benchmarks and 4 model families; I want stress tests on long decimals and unit conversions.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing
Bryce Hinkley and Peyman Najafirad introduce Residual Paving, a routed residual editing method that cuts edit-prompt refusal on the Gemma-3-4B-IT held-out split from 88.6% to 4.0%, while harmful keep-side refusal remains below the frozen baseline at 65.3% versus 81.6%.
#Alignment#Safety#Interpretability#Bryce Hinkley
why featured
HKR-K and HKR-R pass: the paper gives testable refusal metrics and a concrete safety trade-off. Single arXiv paper, high jargon, and no product impact keep it in the 60–71 band.
editor take
Residual Paving cuts Gemma edit refusal to 4.0%, but harmful refusal drops to 65.3%; the router fix still bleeds safety.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
TRAM: Test-Time Risk Adaptation with Mixture of Agents
TRAM reuses a fixed library of risk-neutral policies at test time, scores each source policy by target reward and occupancy-based deployment risk, and reduces deployment risk without parameter updates in gridworlds, MuJoCo Reacher, Safety-Gymnasium, and an LLM alignment setting.
#Agent#Alignment#Safety#TRAM
why featured
HKR-K/R pass: the mechanism and test settings are concrete, and the safety angle matters for agent deployment. HKR-H is weak; no major-lab or discussion signal, so it stays in the 60–71 research-release band.
editor take
TRAM mixes fixed policies with zero test-time updates; I buy the engineering, but source-hull mismatch is the deployment bill.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation
DISC uses a hypernetwork to generate task-specific visuomotor policy parameters from instructions, outperforming entangled baselines on LIBERO-90, Meta-World, and a real-world benchmark with identical visual contexts; the authors say it also surpasses pretrained π0 without external pretraining data, and the code is available on GitHub.
#Robotics#Vision#Fine-tuning#DISC
why featured
HKR-K passes: DISC gives a concrete instruction-to-policy-parameters mechanism, reports wins on LIBERO-90, Meta-World, and a real same-vision benchmark, and releases code. No quantified gains or broad product impact, so it stays in 60–71.
editor take
DISC compiles instructions into full policy weights; wins on LIBERO-90 and Meta-World, but the π0 claim needs replication.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals
AVSD trains Qwen3-8B and Qwen3-4B with multi-view privileged self-distillation on AIME24, AIME25, and HMMT25, separating cross-view consensus from view-specific residuals and improving Avg@8 over the strongest baselines by 3.1% and 2.2%, while Qwen3-8B gains 2.4% on Codeforces and LiveCodeBench v6.
#Reasoning#Code#Fine-tuning#Qwen
why featured
HKR-K is clear: multi-view self-distillation reports 3.1%/2.2% gains on AIME24/AIME25/HMMT25. HKR-R is present for small-model training costs, but HKR-H is weak and the story stays in the 60–71 research band.
editor take
AVSD adds 3.1% Avg@8 on Qwen3-8B; gating privileged-view residuals is a cleaner bet than single-teacher distillation.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Effective Model Pruning: Measure the Redundancy of Model Components
The paper proposes Effective Model Pruning, which computes Neff from importance-score distributions via the inverse Simpson index and removes the N-Neff lowest-scoring components; experiments cover MLPs, CNNs, Transformers, LLMs, KAN, and criteria including weight magnitude, attention score, and image pixels.
#Inference-opt#Benchmarking#Research release
why featured
HKR-K is clear: EMP gives a reproducible pruning rule across MLP, CNN, Transformer, LLM, and KAN. HKR-R comes from cost compression; HKR-H is weak, so a single arXiv method paper stays in 60–71.
editor take
EMP sets pruning count via inverse Simpson index; it spans 5 architectures, but LLM size and compression ratios are undisclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection
The paper proposes TSRL for deepfake detection, modeling training as an MDP where a PPO Tutor assigns each sample loss a continuous 0-1 weight using visual features, EMA loss, and forgetting counts.
#Vision#Agent#Safety#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete training mechanism and touches deepfake safety. No metrics, code detail, or production-replacement claim are disclosed, so it stays in the 60-71 research band.
editor take
TSRL uses PPO to assign 0–1 loss weights; without cross-dataset metrics, this smells like curriculum overfitting.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
CASCADE Conformal Prediction for Two-Stage Clinical Decision Support
CASCADE propagates epistemic uncertainty from a screening classifier into a downstream dose-change regressor, using Venn-Abers multi-probabilistic uncertainty to scale conformal intervals and producing 38.9% narrower intervals than standard conformal baselines for confident Parkinson's Disease patients.
#Reasoning#Safety#CASCADE#Research release
why featured
HKR-K is strong via the uncertainty-transfer mechanism and 38.9% interval reduction; HKR-R is limited to safety-minded practitioners. The clinical conformal-prediction niche lacks product or platform impact, so this stays in all.
editor take
CASCADE narrows confident PD dose intervals by 38.9%; I buy the mechanism if coverage isn't hidden behind averages.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
LLM Pretraining Shapes a Generalizable Manifold: Insights into Cross-Modal Transfer to Time Series
The paper argues that language pretraining gives time-series training a reusable manifold; a linear probe decodes realistic trajectories from frozen LLM states without paired supervision, while projected-space retrieval yields competitive forecasts and finetuning behaves as low-dimensional alignment.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the cross-modal transfer angle is novel, and the frozen-state linear-probe claim is testable. Impact stays paper-level, with no product, code, or benchmark traction, so it sits in 60-71.
editor take
The paper claims frozen LLM states linearly decode time series. Models and benchmarks are undisclosed, so treat it as mechanism, not capability.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Dynamic Shapley Computation
The paper introduces D-Shap, which represents Shapley data valuation as a player-by-task matrix, updates new task valuations in milliseconds, and reduces new-player update cost by up to three orders of magnitude while matching full recomputation quality across tested models.
#Fine-tuning#Benchmarking#Research release
why featured
HKR-K is solid: D-Shap has a concrete matrix mechanism plus millisecond updates and up to 1000x cost reduction. HKR-H and HKR-R are weak; no hard-exclusion trigger, so it fits the 60–71 research-signal band.
editor take
D-Shap makes Shapley updates millisecond-level via a player-by-task matrix; the bet lives or dies on locality holding in real data.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Consistently Informative Soft-Label Temperature for Knowledge Distillation
The paper proposes CIST, assigning separate sample-wise adaptive temperatures to teacher and student models, reweighting the distillation objective by teacher confidence and student learning difficulty, and reporting consistent gains over standard KD and strong baselines on vision and language distillation tasks with negligible computational overhead.
#Fine-tuning#Inference-opt#arXiv#Research release
why featured
HKR-K passes on a concrete distillation mechanism, and HKR-R passes on deployment-cost relevance. No results, model scale, or artifact are disclosed, so this stays in the 60–71 arXiv-method band.
editor take
CIST gives teacher and student separate sample-wise temperatures; gains are undisclosed, but fixed-temperature KD deserves this cut.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
ECUAS_n: A family of metrics for evaluating uncertainty-augmented systems
The paper proposes the ECUAS_n metric family for UA systems that output predictions and uncertainty scores, using proper scoring rules and a parameter n that controls the trade-off between incorrect prediction costs and imperfect uncertainty costs under application-specific decision settings.
#Benchmarking#Safety#Research release#Benchmark
why featured
HKR-K passes: ECUAS_n gives a concrete metric mechanism for uncertainty-augmented systems. HKR-H and HKR-R are weak, and the feed only gives abstract-level detail with no deployment or tooling impact.
editor take
ECUAS_n scores predictions and uncertainty with proper scoring rules; I buy the direction, but choosing n is the trap.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees
The paper proposes a Learning-to-Defer framework that routes extractive QA queries to specialized experts, provides theoretical guarantees for optimal deferral, and evaluates reliability and computational cost on SQuADv1, SQuADv2, and TriviaQA; the abstract does not disclose exact overhead-reduction percentages or model counts.
#Reasoning#Inference-opt#Research release#Benchmark
why featured
HKR-K passes for a concrete allocation mechanism and benchmark setup; HKR-R passes on cost/reliability for query routing. HKR-H is weak, and the extractive-QA research scope keeps it in the 60-71 band.
editor take
Learning-to-Defer tests 3 QA sets, but gives no overhead cut; I’d worry about tail-query routing outside benchmarks.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
The Economics of AI Inference: Inflation Dynamics, Welfare Costs, and Optimal Monetary Policy under the Inference-Cost Phillips Curve
The paper introduces an Inference-Cost Phillips Curve that adds AI inference marginal costs to a New Keynesian Phillips curve, then estimates an empirical slope of 0.087 using U.S. monthly data from 2022M01 to 2026M04.
#Inference-opt#Research release
why featured
HKR-H and HKR-K pass: it links inference cost to the Phillips Curve and reports a 0.087 slope from 2022M01-2026M04 US data. HKR-R is weak because macro policy modeling sits far from product and engineering practice.
editor take
Inference cost enters a Phillips curve with slope 0.087; the macro leap is bold, but identification has to survive first.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment
The paper proposes CLAIR for federated LoRA fine-tuning, using structured low-rank plus block-sparse decomposition to recover the shared LoRA subspace and detect contaminated clients under heterogeneous client conditions.
#Fine-tuning#Alignment#Research release
why featured
HKR-K and HKR-R pass: the mechanism is concrete and tied to private fine-tuning risk. HKR-H fails, and this is a single arXiv paper without production replacement or large-scale deployment evidence.
editor take
CLAIR detects contaminated clients in federated LoRA; the experiment is only text-copying, far from real instruction tuning.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Preference-aware Influence-function-based Data Selection Method for Efficient Fine-Tuning
The paper introduces PRISM, a data selection method that weights target examples using the current model’s preferences, builds a preference-aware target representation, and scores candidate training samples by alignment; the abstract says experiments across model families and scales improved efficient fine-tuning and safety-oriented SFT repair, but it does not disclose datasets, model names, or exact gains.
#Fine-tuning#Alignment#Safety#PRISM
why featured
HKR-K and HKR-R pass: PRISM offers a testable data-selection mechanism tied to fine-tuning efficiency and preference alignment. HKR-H is weak, and this is a single arXiv method paper without code or production evidence.
editor take
PRISM weights targets by current-model preference; datasets and gains are undisclosed, so I’d treat it as a testable SFT data-selection trick.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
PULSE achieves state-of-the-art results on non-stationary time series forecasting
PULSE uses a Disentangle-Evolve-Simulate framework for non-stationary time series forecasting, combines phase-anchored disentanglement, a Phase Router, and Statistic-Aware Mixup, and reports state-of-the-art or competitive results with a simple MLP backbone across 12 real-world benchmarks.
#Reasoning#Benchmarking#PULSE#Research release
why featured
HKR-K passes with a concrete framework, 12 benchmarks, and open code. HKR-H and HKR-R are weak because this is a specialized forecasting paper without a product or industry-conflict hook, so it fits the 60–71 band.
editor take
PULSE hits near-SOTA on 12 benchmarks; I buy small MLP plus phase bias over another Transformer flex.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
When AI Gets It Wrong: Reliability and Risk in AI-Assisted Medication Decision Systems
An arXiv paper evaluates AI-assisted medication decision systems using controlled simulated scenarios covering drug interactions and dosage decisions; the post does not disclose the number of scenarios, model names, or quantitative failure rates.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-H and HKR-R pass on high-stakes medication safety, but HKR-K is weak: no model names, sample size, or result numbers are disclosed. This stays in the interesting research band.
editor take
This paper tests AI medication systems, but scenario count and model names are undisclosed; useful failure taxonomy, weak benchmark.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K0·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
CALIPER tests whether post-drift data is sufficient for retraining using single-pass weighted local regression, and across four domains, three learner families, and two detectors it matches or exceeds the best fixed retraining window with low per-update time and memory.
#Benchmarking#CALIPER#Research release#Benchmark
why featured
HKR-H and HKR-K pass: the retraining trigger is concrete, with a named method and test matrix. HKR-R is weak because this is niche concept-drift research, so it stays in the 60–71 “interesting” band.
editor take
CALIPER gates retraining data with one-pass local regression; across 4 domains and 3 learner families, it beats fixed windows.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Epistemic Uncertainty Quantification for Pre-trained VLMs via Riemannian Flow Matching
REPVLM quantifies epistemic uncertainty with negative log-density on the hyperspherical manifold of VLM embeddings, and the abstract says it achieves near-perfect correlation with prediction error, but the post does not disclose the correlation coefficient or evaluation setup.
#Vision#Multimodal#Benchmarking#REPVLM
why featured
HKR-K/R pass: the mechanism is clear and relevant to VLM reliability. HKR-H is weak, with abstract-level detail only; correlation coefficient, datasets, and reproduction details are not disclosed.
editor take
REPVLM uses hyperspherical negative log-density for uncertainty; “near-perfect correlation” lacks coefficients, so I don’t buy it yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
The paper shows across 11 conditions and 0.82M to 85M-parameter models that weight decay separates memorization, developmental grokking, and collapse, with a memorization-to-development boundary at λc=0.0158.
#Interpretability#Benchmarking#Research release
why featured
HKR-H/K pass: the paper offers a concrete diagnostic angle and testable numbers. HKR-R is weak, and the training-dynamics focus keeps it in all below featured.
editor take
Across 11 conditions, λc=0.0158 is useful; don’t launder modular-arithmetic grokking into language-model claims.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
When Fairness Metrics Disagree: Evaluating the Reliability of Demographic Fairness Assessment in Machine Learning
The paper evaluates multiple demographic fairness metrics in face recognition and introduces the Fairness Disagreement Index to measure cross-metric inconsistency; the abstract says disagreements remain high across thresholds and model configurations, while the RSS snippet does not disclose dataset names or exact numeric results.
#Safety#Benchmarking#Research release#Safety/alignment
why featured
HKR-H/K/R all pass, but this is a single arXiv fairness-evaluation paper. It offers a metric and experiment result, not a production replacement or major model update, so it stays in the 60–71 band.
editor take
The paper adds FDI for fairness-metric disagreement, but gives no datasets or numbers; single-metric fairness claims look weak.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Quant.npu: Enabling Efficient Mobile NPU Inference for On-Device LLMs via Fully Static Quantization
Quant.npu adapts on-device LLM inference to mobile NPU constraints with integer-only fully static quantization, using learnable quantization parameters, rotation matrices, a two-stage quantization pipeline, and sensitivity-guided mixed precision; experiments on real-world mobile NPUs report accuracy comparable to state-of-the-art PTQ methods and up to 15.1% lower inference latency.
#Inference-opt#Quant.npu#Research release
why featured
HKR-K is solid with a concrete mechanism and 15.1% latency figure; HKR-R applies to on-device deployment pain. HKR-H is weak, and NPU quantization is niche, so it stays in 60–71.
editor take
Quant.npu cuts real mobile NPU latency by 15.1%; I care if it survives long context, but the abstract omits that.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Can Conversational XAI Improve User Performance? An Experimental Study
The researchers tested conversational XAI against Q&A-based assistance with 42 participants; both treatment groups significantly outperformed the model, but the preliminary results showed no performance difference between assistance types and only modest engagement.
#Interpretability#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives a 42-person experiment and a testable “no difference between aid types” result for XAI design. HKR-H is weak, and the small sample keeps it in the mid all band.
editor take
With 42 participants, conversational XAI failed to beat Q&A help; don’t sell a chat wrapper as performance gain yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty
The authors co-trained one self-driving car and 12 pedestrians with MAPPO; in 500 evaluation episodes, the co-trained SDC reached 78% of goals with a 14% collision rate, versus 35% goals and 33% collisions for the best rule-based baseline.
#Agent#Robotics#Safety#Prakash Aryan
why featured
HKR-K/R pass: the paper gives test settings and baseline numbers, and AV safety has practitioner pull. It remains an arXiv research item with no product or code impact disclosed, so it stays in all.
editor take
MAPPO trains 1 car and 12 pedestrians, yet 500 runs still hit 14% collisions; I’d call this a stress-test generator, not safety.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations
The paper proposes CAML, which treats each active-learning round as a meta-learning task, uses the current labeled set for adaptation and the newly queried batch for generalization evaluation, and reports minority-group accuracy gains of up to 27.8% on Dominoes, 29.9% on Waterbirds, 14.3% on SpuCo, and 24.0% on CivilComments.
#Fine-tuning#Alignment#Benchmarking#Research release
why featured
HKR-K is strong via a named method and four gain figures; HKR-R lands for robustness practitioners. HKR-H is weak, and this remains an academic arXiv paper, so it sits in the interesting-not-featured band.
editor take
CAML turns active-learning rounds into meta-learning tasks and reports up to 29.9% minority accuracy gain; I buy the mechanism, not the missing cost details.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
Mahjax implements a fully vectorized Riichi Mahjong environment in JAX and reaches 2 million steps per second under no-red rules and 1 million steps per second under red rules on eight NVIDIA A100 GPUs.
#Agent#Robotics#Benchmarking#Mahjax
why featured
HKR-H comes from the Mahjong+GPU+JAX angle, and HKR-K has concrete 8xA100 throughput numbers. HKR-R is weak because it lacks product impact or broad developer-tool relevance, so it stays in the 60-71 band.
editor take
Mahjax hits 2M steps/sec on 8 A100s; Riichi RL needs tougher self-play evaluation more than another fast env.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization
The paper proposes TextReg, a regularization framework for text-space prompt optimization, and reports out-of-distribution accuracy gains up to 11.8% over TextGrad and 16.5% over REVOLVE across multiple reasoning benchmarks.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives a regularized text-space optimization method plus two gain figures, and prompt overfitting is practitioner-relevant. HKR-H is weak, and a single arXiv paper stays in the interesting band.
editor take
TextReg beats TextGrad by 11.8% OOD on reasoning benchmarks. Prompt optimization needed this anti-bloat regularizer badly.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
JoyAI-Image v2 proposes a unified MLLM plus MMDiT architecture for visual understanding, text-to-image generation, and instruction-guided editing, with training signals for long-text rendering, spatial grounding, and general and spatial edits; the abstract says it reaches state-of-the-art or highly competitive results across multiple benchmarks, but does not disclose exact scores.
#Multimodal#Vision#Reasoning#JoyAI-Image
why featured
HKR-H/K pass: the unified multimodal setup and MLLM+MMDiT mechanism add some signal. HKR-R fails because the post gives no scores, artifact, or major-lab context, so this stays in the normal research-release band.
editor take
JoyAI-Image v2 couples MLLM with MMDiT, but scores are undisclosed; treat the SOTA claim as unverified.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Reviving Error Correction in Modern Deep Time-Series Forecasting
The paper proposes UEC-STD, an architecture-agnostic error corrector that plugs into existing time-series forecasters without retraining and tests it across 4 backbones and 10 datasets.
#Inference-opt#arXiv#Research release#Open source
why featured
HKR-K passes via a concrete mechanism and evaluation scale. HKR-H and HKR-R are weak; with no major lab, product tie-in, or cross-source discussion, this sits in the all research stream.
editor take
UEC-STD plugs into 4 backbones and 10 datasets without retraining; I buy the angle—fixing inference drift beats swapping models.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Towards the Anonymization of Language Modeling
The paper proposes anonymization methods for BERT-style MLM and GPT-style CLM specialization, evaluates them on one medical dataset against baselines, and targets memorization of direct and indirect identifiers; the RSS snippet does not disclose concrete privacy or utility metrics.
#Fine-tuning#Safety#Research release
why featured
This is a privacy/safety research item with HKR-K/R: it covers anonymization training for BERT-style MLM and GPT-style CLM on medical-data memorization. HKR-H is weak, and metrics are not disclosed, so it stays in all.
editor take
The paper tests one medical dataset but discloses no metrics; without attack success rates, I don't buy the privacy claim.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Research Paper Explores Military Object Detection in Multi-Spectrum Drone Imagery
The paper builds four KIIT-MiTA-derived datasets—Gray Scale, Thermal Vision, Night Vision, and Obscura Vision—and trains YOLOv11-small to detect military objects in drone imagery under low-visibility, heat-based, and nighttime conditions.
#Vision#KIIT-MiTA#YOLOv11-small#Research release
why featured
HKR-H/K/R all pass via the drone-defense hook and concrete dataset/model setup, but this is a single arXiv vision paper with no disclosed metric leap, artifact, or product impact, so it stays in all.
editor take
The paper trains YOLOv11-small on 4 KIIT-MiTA variants; mAP is undisclosed, so don’t buy the military-detection claim yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Conditioning Gaussian Processes on Almost Anything
The paper recasts Gaussian Processes as a class of linear diffusion models, recovers standard GP conditioning exactly in the linear-Gaussian setting, and supports conditioning statements with point-wise likelihood evaluation, including nonlinear physics and natural language via large language models.
#Reasoning#Research release
why featured
HKR-H/K pass: the title has a curiosity hook and the summary gives the GP↔linear-diffusion mechanism. HKR-R misses; this is a niche cs.LG theory paper, so it stays in 60–71.
editor take
They cast GP conditioning as a diffusion ODE; exact for linear-Gaussian, but LLM-based language likelihoods deserve skepticism.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
FAIR-Pruner: A Flexible Framework for Automatic Layer-Wise Pruning via Tolerance of Difference
FAIR-Pruner uses Tolerance of Difference to assign non-uniform layer-wise pruning depths from two within-layer rankings, and evaluates accuracy–compression trade-offs on CIFAR-10, ImageNet, five vision architectures, and prune-only routed-expert Qwen1.5-MoE-A2.7B-Chat experiments.
#Vision#Inference-opt#Qwen#Research release
why featured
HKR-K and HKR-R pass: it offers a named pruning mechanism and Qwen/MoE experiments tied to inference cost. HKR-H is weak, and a single arXiv compression paper fits the 60–71 band.
editor take
FAIR-Pruner allocates per-layer pruning via ToD; Qwen1.5-MoE-A2.7B is prune-only, so don't infer LLM serving wins yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Same Target, Different Basins: Hard vs. Soft Labels for Annotator Distributions
The paper compares multipass and stochastic label sampling on CIFAR-10H, finding hard-label delivery outperforms soft-label training when only a small number of annotations per example is available, while both hard-label methods match soft-label training when full annotator distributions are available.
#Fine-tuning#Benchmarking#CIFAR-10H#SVHN
why featured
HKR-H and HKR-K pass: the paper offers a counterintuitive CIFAR-10H result under sparse annotation. HKR-R is weak because the impact stays within labeling/training methodology, so it fits the 60–71 all band.
editor take
Hard labels beat soft labels with few CIFAR-10H votes; multipass looks practical, but the OOD evidence is only descriptive.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment
PREFINE adapts DPO to trajectory-level preferences in continuous control, using a small set of low-cost and high-cost trajectories to fine-tune a reward-optimized RL policy while reducing constraint violations and catastrophic failures by over 60%.
#Fine-tuning#Alignment#Safety#PREFINE
why featured
HKR-K and HKR-R pass: the item has a concrete mechanism and a >60% result, and it touches safety alignment. HKR-H is weak, and the single arXiv RL-control scope keeps it in the 60–71 band.
editor take
PREFINE ports DPO to continuous-control trajectories and cuts violations over 60%; its counterfactual sampling may hide the real safety cost.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
Yang Liu and coauthors propose CP-MoE, a continual learning framework that uses a transient expert, consistency-preserving routing bias, and transient expert-guided regularization to reduce forgetting in LLM/VLM MoE models; the paper reports validation on SuperNI and VQA v2, but the arXiv abstract does not disclose exact scores.
#Fine-tuning#Multimodal#RAG#Yang Liu
why featured
HKR-K passes through a concrete mechanism and benchmarks; HKR-H is weak and HKR-R is limited by missing scores, code, and deployment evidence. This is a normal arXiv methods paper, so it stays in all.
editor take
CP-MoE claims SOTA on SuperNI and VQA v2, but no scores are disclosed; I don’t buy anti-forgetting from abstracts.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Learning-to-Defer with Expert-Conditional Advice
The paper proposes Learning-to-Defer with advice, models expert and advice as a composite action space, proves H-consistency and an excess-risk transfer bound, and reports gains over standard Learning-to-Defer across tabular, language, and multimodal tasks.
#RAG#Tools#Multimodal#Research release
why featured
HKR-K passes via a concrete LTD-with-advice mechanism and tests on 3 task types. HKR-H/R are weak, with no major lab, artifact, or production-replacement claim, so it stays in the lower research-release band.
editor take
Composite expert-advice actions beat standard deferral on 3 task types; the useful bit is proving split routing/advice heads inconsistent.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining
The authors introduce SpectralEarth-FM and SpectralEarth-MM, pairing HSI from three spaceborne sensors with Sentinel and Landsat data, then pretraining on about 2 million locations, 25 million georeferenced patches, and over 40 TB of data.
#Multimodal#Vision#Benchmarking#SpectralEarth-FM
why featured
HKR-K passes on concrete scale and multimodal pretraining setup. HKR-H and HKR-R are weak because the story is a niche Earth-observation foundation-model paper, so it stays in all.
editor take
SpectralEarth-MM hits 40TB and 25M patches; I buy HSI fusion, but PANGAEA-only SOTA leaves generalization under-proven.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning
SMoA partitions each layer into multiple aligned spectral blocks and adds a Hadamard-modulated low-rank branch to every diagonal block, reporting higher average performance than LoRA and LoRA-style baselines under a lower-budget setting across multiple tasks.
#Fine-tuning#SMoA#LoRA#Research release
why featured
HKR-K/R pass: SMoA adds spectral blocks plus Hadamard-modulated low-rank branches for cheaper PEFT. HKR-H fails and the feed gives no parameter or benchmark numbers, so this stays all.
editor take
SMoA claims better average scores than LoRA via spectral blocks plus Hadamard branches; no models, tasks, or parameter counts disclosed.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Paper proposes proactive client selection method for fair and efficient federated learning
The paper proposes a proactive federated-learning client selection framework that optimizes fixed-size client sets before training, using mutual information from differentially private contingency tables and simulated annealing over a Potential Federation Loss objective; experiments on four benchmarks report faster convergence, better fairness, and higher accuracy than uniform sampling, including when adaptive aggregation or sampling baselines are used.
#Fine-tuning#Safety#Benchmarking#Research release
why featured
HKR-K passes with a concrete mechanism and 4 benchmarks. HKR-H/R are weak: this is niche federated-learning optimization, far from mainstream model or agent product news, so it stays in all.
editor take
DP contingency tables preselect clients and beat uniform sampling on 4 benchmarks; I worry PFL tuning eats the saved rounds.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System
The authors train a vision transformer on LARDv2 for runway keypoint regression, decompose per-patch embeddings with K-SVD sparse dictionary learning, and propose OOMS runtime monitoring to provide representation-level evidence requested by EASA learning-assurance guidance.
#Vision#Interpretability#Safety#EASA
why featured
HKR-K and HKR-R pass: the mechanism and certification target are concrete, but this is a narrow aviation-safety interpretability paper with high reading cost and no broad product or agent impact.
editor take
LARDv2 runway regression gets OOMS monitoring; K-SVD content/style splits are qualitative, still far from aviation-grade evidence.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Complementing Reinforcement Learning with SFT Through Logit Averaging in LLM Post-Training
The paper introduces logit averaging between a frozen SFT reference policy and a trainable policy inside GRPO, without KL regularization or a critic, and evaluates it on MATH, cn-k12, and MMLU against canonical KL-regularized GRPO.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K passes via a concrete GRPO post-training mechanism and MATH/cn-k12/MMLU comparisons. HKR-H and HKR-R are weak because this is a specialist paper, so it stays in all.
editor take
Logit-averaging frozen SFT with trainable GRPO matches or beats KL-GRPO on 3 benchmarks; small trick, very reproducible-looking.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
NeighborDiv: Training-free Zero-shot Generalist Graph Anomaly Detection via Neighbor Diversity
NeighborDiv detects graph anomalies using the variance of inter-neighbor feature similarities, replacing node-to-neighbor consistency with a neighbor-to-neighbor diversity signal, and reports relative gains over the second-best baseline of 10.25% average AUC and 17.78% average AP under SDIT, plus 6.89% AUC and 9.58% AP under UMDT.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with a training-free mechanism and two benchmark gains. HKR-H/R are weak because graph anomaly detection is narrow research, so this fits all rather than featured.
editor take
NeighborDiv reports +10.25% AUC and +17.78% AP under SDIT; I buy the training-free angle, but “zero volatility” needs dataset receipts.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Why Ask One When You Can Ask k? Learning-to-Defer to the Top-k Experts
The paper introduces Top-k Learning-to-Defer, assigning each query to the k most cost-effective experts, and proposes a k-independent consistent surrogate loss that supports one-stage and two-stage settings.
#Reasoning#Benchmarking#Research release
why featured
This is a method-heavy ML paper: HKR-H comes from the top-k expert deferral setup, and HKR-K from the consistent surrogate-loss claim. No experiment numbers, code, or production use case are disclosed, so it stays in the 60–71 band.
editor take
Top-k L2D routes each query to k experts; experiment scale is undisclosed, so the k-independent loss is the claim to test.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting
Dynamic TMoE detects distribution shifts with MMD and dynamically adds or prunes heterogeneous experts, while a temporal memory router uses recurrent states and an anomaly repository; experiments on nine benchmarks report 10.4% lower MSE and 7.8% lower MAE without test-time updates.
#Reasoning#Memory#Dynamic TMoE#arXiv
why featured
HKR-K passes via a concrete mechanism and benchmark numbers. HKR-H and HKR-R are weak because this is niche time-series forecasting research without product or agent impact, so it stays in the lower all band.
editor take
Dynamic TMoE cuts MSE 10.4% on 9 benchmarks. I buy drift-aware experts, but latency and expert-growth caps are undisclosed.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction
The paper trains an LLM-based survey framework on 1972–2021 General Social Surveys data to predict missing opinions; retrodiction performs strongly under cross-validation, while prediction of entirely unasked opinions remains modest.
#Embedding#Benchmarking#arXiv#General Social Surveys
why featured
HKR-H/K/R all pass, but this is a methods paper, not a product launch, major-lab move, or reusable tool release. It fits the 60–71 band for interesting but not featured research.
editor take
This uses 1972–2021 GSS to fill missing opinions; unasked-opinion prediction stays modest, so don’t sell retrodiction as simulation.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H1·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Adaptive Signal Resuscitation: Channel-wise Post-Pruning Repair for Sparse Vision Networks
The paper proposes ASR, a training-free channel-wise post-pruning repair method; on ResNet-50 at 90% sparsity, it recovers 55.6% CIFAR-10 top-1 accuracy, compared with 41.0% for layer-wise repair and 28.0% for BatchNorm-only recalibration.
#Vision#Inference-opt#ASR#ResNet-50
why featured
HKR-K lands with a concrete pruning-repair result, and HKR-R is modest through inference-cost relevance. HKR-H misses because the title is specialist; no product, open-source release, or major-lab signal.
editor take
ASR lifts ResNet-50 at 90% sparsity to 55.6% on CIFAR-10; training-free pruning repair needs less BatchNorm folklore.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model
The paper proposes Musical Attention, a Transformer attention mechanism that adds metadata including bar numbers, keys, signatures, and tempos, representing each note with five events plus three metadata elements to model correlations across eight features.
#Audio#Research release
why featured
HKR-K passes because the paper specifies a music-aware attention mechanism with bar, key, meter, tempo, and eight features. HKR-H and HKR-R are weak: no product angle, major lab, or practitioner-level tension.
editor take
Musical Attention uses 8 note features, but no metrics are disclosed; I don’t buy “significantly reduces repetition” without code and listening tests.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs
FedCoE maintains multiple global experts and a shared gating network for federated learning, reaching 78.00% average global accuracy, 89.32% personalized accuracy, and 77.27% cold-start accuracy without local fine-tuning.
#Fine-tuning#Inference-opt#FedCoE#Research release
why featured
HKR-K passes because the paper gives a concrete mechanism and benchmark numbers. HKR-H/R are weak: this is a niche federated-learning method with no product rollout, open-source artifact, or broad industry trigger.
editor take
FedCoE reports 78.00% global and 89.32% personalized accuracy; federated MoE looks sane, but datasets and baselines aren't disclosed here.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control
NaP-Control uses reinforcement learning to manipulate the latent noise of a task-agnostic diffusion policy prior, replacing gradient-based test-time guidance for physics-based whole-body character control; the arXiv abstract says experiments show higher success rates and faster inference across diverse tasks, but the RSS snippet does not disclose exact metrics or benchmark settings.
#Robotics#Inference-opt#Research release
why featured
HKR-K passes on the latent-noise RL mechanism, but success-rate and speed gains lack numbers. The character-control angle is narrow, so it lands in the low 60s as a standard research release.
editor take
NaP-Control predicts diffusion noise with RL and skips test-time guidance; no success or latency numbers, so I don’t buy “fast” yet.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning
Xikai Zhang and eight coauthors propose FBOS-RL, a feedback-driven reinforcement learning framework that uses environment feedback for exploration enhancement and combines two objectives, EPA and ECC, to improve training efficiency over GRPO under the same number of rollouts.
#Reasoning#Alignment#Xikai Zhang#Yongzhi Li
why featured
HKR-K passes with a concrete mechanism and GRPO comparison condition. HKR-H is weak and HKR-R lacks disclosed effect size, code, or model impact, so this sits in the lower all band.
editor take
FBOS-RL adds EPA and ECC to GRPO sampling, but exact gains aren’t disclosed; I don’t buy the flywheel claim without same-rollout replication.
HKR breakdown
hook knowledge resonance
open source
61
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Differentially Private Model Merging
The paper proposes two post-processing methods, random selection and linear combination, to generate models for any target differential privacy parameter from existing models trained on the same dataset with different privacy-utility tradeoffs, without additional training.
#Fine-tuning#Safety#Research release
why featured
HKR-K passes: the paper names two post-processing mechanisms for DP model merging without extra training. HKR-H and HKR-R fail because this is a dry single arXiv item with unproven practitioner impact.
editor take
The paper merges existing DP models via random selection or linear combinations; useful trick, but the cost hides in pretraining multiple privacy tiers.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Secure, Verifiable, and Scalable Multi-Client Data Sharing via Consensus-Based Privacy-Preserving Data Distribution
The paper proposes the CPPDD framework for secure multi-client data aggregation, using per-client affine masking and sequential consensus locking, and reports linear scaling to N=500 on MNIST-derived vectors with sub-millisecond per-client computation.
#Safety#CPPDD#Research release
why featured
HKR-K and HKR-R pass via concrete protocol details and privacy relevance, but HKR-H fails. A single arXiv paper on privacy-preserving aggregation lacks product pull, so it stays below featured.
editor take
CPPDD reports N=500 MNIST vectors and sub-ms clients; I don’t buy the N-1 collusion claim without disclosed baselines.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
The General Theory of Localization Methods
The paper proposes the localization method, a machine learning framework built on localization kernels and local means, and relates it to self-attention, kernel methods, MeanShift, Hopfield networks, LLE, denoising autoencoders, and Transformer construction via hierarchical local models.
#Reasoning#Research release
why featured
HKR-H and HKR-K pass, but this is a theory-heavy arXiv paper with only a unifying-framework claim disclosed; no experiments, code, or production payoff are given, so it stays in all.
editor take
This unifies 8 model families via localization kernels; no experiment numbers, so I’d file it as theory synthesis, not a new architecture.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models
The paper proposes Linear-DPO for text-to-image preference optimization, using a unified reverse-time SDE objective for diffusion and flow-matching models and testing it on SD1.5, SDXL, and SD3-Medium against existing baselines.
#Alignment#Multimodal#Research release
why featured
HKR-K passes: new method, unifying mechanism, and tests on SD1.5, SDXL, and SD3-Medium. HKR-H/R are weak, and the item is an arXiv abstract-level paper, so it stays in all.
editor take
Linear-DPO tests SD1.5, SDXL, and SD3-Medium; the sharp claim is that sigmoid DPO mismatches image regression.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Study Compares Automated ICD Classification for Psychiatric Diagnoses Across NLP Approaches
The study evaluates automated ICD coding on 145,513 Spanish psychiatric descriptions, comparing BoW, TF-IDF, e5_large, BioLORD, and Llama-3-8B; end-to-end fine-tuned e5_large achieves the top micro-F1 score of 0.866 and outperforms classical text representations.
#Embedding#Fine-tuning#Benchmarking#e5_large
why featured
HKR-K passes with dataset size and micro-F1. HKR-H is weak because the angle reads like a routine medical NLP paper; HKR-R is limited without a product, open model, or broad industry deployment hook.
editor take
e5_large hits 0.866 micro-F1 on 145,513 Spanish psychiatry notes; Llama-3-8B losing here is a size-scaling warning.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Closed-Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training
The paper proposes AutoScale, a closed-loop data engine that uses Graph-RAE, Cluster-GA, and cluster-guided vector retrieval to optimize real-synthetic driving data mixtures, and reports higher NavSim performance than vanilla co-training and cross-domain baselines with fewer synthetic samples under constrained budgets.
#Robotics#Benchmarking#AutoScale#NavSim
why featured
HKR-K passes via the closed-loop data-mixture mechanism and NavSim condition. HKR-H/R are weak, and the post gives no exact lift or sample-saving rate, keeping it a niche research item.
editor take
AutoScale beats baselines on NavSim with fewer synthetic samples. No gains disclosed, so don’t crown a driving-data flywheel yet.
HKR breakdown
hook knowledge resonance
open source
57
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Sequential Data Augmentation for Generative Recommendation
The paper introduces GenPAS for generative recommendation, modeling data augmentation as stochastic sampling over input-target pairs with 3 bias-controlled steps, and evaluates it against existing strategies on benchmark and industrial datasets.
#Fine-tuning#Benchmarking#Snap Research#Research release
why featured
HKR-K passes via a named mechanism and test settings; HKR-H/R fail because the angle is narrow recommender-system research. No hard exclusion, but general AI-practitioner value stays in the 40–59 band.
editor take
GenPAS frames recsys augmentation as 3-step sampling. The useful part is treating sample construction as a first-order training knob.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Learning Incentive Structures for Cooperative Resilience in Multi-Agent Systems under Social Dilemmas
The paper proposes a multi-agent reinforcement learning framework that ranks trajectories with a resilience metric and infers reward functions, then evaluates three incentive structures in disrupted resource-sharing environments under social dilemmas.
#Agent#Reasoning#Research release
why featured
HKR-K passes via a concrete MARL mechanism and 3 incentive structures. HKR-H/R are weak: this is specialized multi-agent RL research, not a product or practice-shaping release.
editor take
The paper tests 3 incentive schemes; hybrid rewards reduce collapse, but RSS omits environment scale and baseline strength.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
A Mechanistic Study of Tabular Foundation Models
The paper analyzes tabular foundation models across classification and regression tasks, finding that different architectures converge in accuracy while using distinct similarity-based readouts; the authors validate these mechanisms with causal interventions, trace permutation invariances to removable positional parameters, and reproduce predicted failures using engineered perturbations plus hub and rank attacks.
#Interpretability#Benchmarking#arXiv#Research release
why featured
HKR-K passes: the paper reports tabular foundation-model readout mechanisms and reproducible intervention tests. HKR-H/R are weak; this is useful method signal, not a broad industry story.
editor take
Tabular FMs converge on accuracy but split in readout mechanics; causal interventions and hub/rank attacks expose failures leaderboards miss.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
Geometry-Lite evaluates prompt-level safety probes across nine instruction-tuned backbones from 1.2B to 70B and seven safety benchmarks, mapping each layer’s final prompt-token representation to signed margins from centroid, local-neighborhood, and supervised linear-boundary readouts; the paper finds persistent boundary-position geometry drives pooled AUROC, while finite-difference drift adds only small recall-oriented corrections under shifted low-FPR thresholds.
#Safety#Interpretability#Benchmarking#Woo Seob Sim
why featured
HKR-K passes via concrete test setup and mechanism claims; HKR-H/R are weak because the angle is niche and highly technical. No hard exclusion applied, but accessibility keeps it in the lower research-signal band.
editor take
Geometry-Lite tests 9 models on 7 safety sets; I buy the punchline: safety signal looks positional, not layer-drift.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
TreeText-CTS: Compact, Source-Traceable Tree-Path Evidence for Irregular Clinical Time-Series Prediction
TreeText-CTS converts irregular EHR trajectories into deterministic tree-path evidence units and reports the best AUROC and AUPRC among evaluated text-based EHR time-series interfaces across three clinical prediction tasks, improving AUPRC by 6.0 to 9.7 absolute percentage points over the strongest prior text-based interface.
#Interpretability#Benchmarking#TreeText-CTS#PhysioNet
why featured
HKR-K passes with a concrete mechanism and AUPRC gains, but HKR-H and HKR-R are weak. The topic is niche clinical time-series modeling, so it stays in all rather than featured.
editor take
TreeText-CTS adds 6.0–9.7 AUPRC points on 3 EHR tasks; I trust tree-path evidence over free-form clinical summaries.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Time-Prompt: Integrated Heterogeneous Prompts for LLMs in Time Series Forecasting
The paper introduces Time-Prompt, a framework that combines learnable soft prompts, textual hard prompts, semantic-space embeddings, and cross-modal alignment for LLM-based time-series forecasting, with evaluations on 6 public datasets and 3 carbon-emission datasets.
#Fine-tuning#Multimodal#Embedding#Time-Prompt
why featured
HKR-K passes via concrete prompt components and evaluation on 6 public plus 3 carbon-emission datasets. HKR-H/R are weak: this is a routine arXiv methods paper with no deployment or production-replacement claim.
editor take
Time-Prompt tests 9 datasets; without SOTA deltas in the abstract, I file it as prompt-engineering incrementalism for LLM forecasting.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
The paper introduces Distribution-Aware Reward, an on-policy RL objective that scores multiple decoded samples with CRPS as an empirical predictive distribution, reporting a 6-point Spearman gain on KBSS and competitive MoleculeNet results using only SMILES strings.
#Reasoning#Fine-tuning#Benchmarking#arXiv
why featured
HKR-K passes via a concrete mechanism and KBSS number; HKR-H/R are weak because the angle is academic and narrow. No hard exclusion, but technical accessibility keeps it in the 40–59 low-value research band.
editor take
Distribution-Aware Reward trains multi-sample distributions with CRPS and gains 6 Spearman on KBSS; I like the move, but MoleculeNet splits matter.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
IMPACT: Influence Modeling for Open-Set Time Series Anomaly Detection
IMPACT uses an influence function to estimate each training sample’s effect, then uses influence scores to create realistic unseen time-series anomalies and repurpose high-influence samples for decontamination; the abstract reports tests across multiple OSAD settings and contamination rates, but does not disclose dataset counts, metric values, or baseline names in the RSS snippet.
#Benchmarking#Research release#Open source#Benchmark
why featured
HKR-K passes on a concrete mechanism: influence-function scores generate unseen anomalies, with OSAD settings, contamination rates, and code. HKR-H/R fail because the title is academic and the audience impact is narrow; no hard-exclusion rule triggered.
editor take
IMPACT generates unseen anomalies via influence scores; RSS omits datasets and metrics, so treat “SOTA” as unverified.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
A Rigorous, Tractable Measure of Model Complexity
The paper introduces a model-complexity measure based on gradient similarities across inputs, applies it to parametric models and kernel-based non-parametric models, and proves it generalizes mechanisms such as polynomial degree, Matérn length scale, kNN neighbor count, decision-tree split count, and random-forest tree count.
#Interpretability#Benchmarking#Research release
why featured
HKR-K passes because the paper gives a testable gradient-similarity complexity measure across kernels, kNN, trees, and forests. HKR-H and HKR-R are weak, so this stays in all below featured.
editor take
Gradient-similarity complexity spans five classic mechanisms; I want the LLM-scale run, not another elegant theorem zoo.
HKR breakdown
hook knowledge resonance
open source
54
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Towards Understanding Self-Pretraining for Sequence Classification
The paper replicates and ablates Amos et al. 2024 on self-pretraining, finding that the bottleneck is label supervision learning useful query-key Attention patterns from random initialization, while masked reconstruction detects Attention-score directions that supervised labels miss.
#Reasoning#Benchmarking#Amos et al.#Research release
why featured
HKR-K passes for a concrete SPT replication/ablation claim, but HKR-H and HKR-R are weak. The topic is narrow training theory with no product or engineering hook, so it stays in the low-value band.
editor take
SPT boosts LRA classification by learning query-key patterns from scratch; labels are blind where masked reconstruction sees signal.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Why Aggregate Accuracy Is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems
The paper analyzes facial recognition systems in law enforcement and security, arguing that aggregate accuracy can hide subgroup FPR and FNR disparities; the RSS snippet does not disclose a specific dataset, benchmark, or numerical error rates.
#Vision#Safety#Benchmarking#Research release
why featured
HKR-R passes because law-enforcement face recognition carries safety and compliance stakes. HKR-H/K are weak: no dataset, error rates, or reproducible setup are disclosed, so this stays in the low-value research-summary band.
editor take
The paper flags subgroup FPR/FNR gaps but gives no dataset or error rates; correct claim, thin evidence.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K0·R1
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation
The paper proposes a unified formulation that splits representation learning into a task component and a constraint component, then tests how different tasks interact with causal constraints on CausalVerse.
#Reasoning#Benchmarking#CausalVerse#Research release
why featured
HKR-K passes via the unified formulation and CausalVerse test setup. HKR-H/R fail: no result numbers, artifact, or practical stake, so this stays a low-value research item.
editor take
CausalVerse shows causal constraints are task-dependent. Scores aren't disclosed; without a reproducible task-constraint matrix, this risks taxonomy cosplay.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
CIG: Exploration via Conditional Information Gain
The paper introduces CIG, an intrinsic reward that approximates trajectory-level information gain with a log-determinant objective over an ensemble disagreement kernel, and evaluates it against prior exploration methods on 12 MiniGrid and OGBench tasks under clean and stochastic-distractor settings.
#Reasoning#CIG#MiniGrid#OGBench
why featured
HKR-K passes through a concrete intrinsic-reward mechanism and 12-task evaluation. HKR-H/R miss: the title is a standard RL paper and the audience impact is narrow, so this stays in the 40–59 band.
editor take
CIG tests log-det ensemble disagreement on 12 tasks; I buy the idea, but short-rollout model-based setup limits extrapolation.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search
FedKDNAS lets each client select a lightweight architecture under accuracy-resource constraints, and evaluations on six datasets against six FL baselines report up to 15% higher accuracy, about 28% lower client CPU usage, and up to 44x lower communication overhead under non-IID conditions.
#Fine-tuning#Inference-opt#Research release#Benchmark
why featured
HKR-K passes with concrete benchmark scale and resource gains. HKR-H/R are weak because this is niche federated-learning research with no product rollout or broad practitioner controversy.
editor take
FedKDNAS beats 6 FL baselines on 6 datasets; 15% accuracy and 44x comms gains hinge on per-client architectures.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Divide and Contrast: Learning Robust Temporal Features without Augmentation
Di-COT trains time-series representations by randomly partitioning each window into a small number of overlapping sub-blocks per iteration, uses a contrastive loss dependent on batch size and sub-block count rather than sequence length, and reports tests on six real-world datasets plus UCR and UEA benchmarks.
#Embedding#Benchmarking#Di-COT#UCR
why featured
HKR-K passes via a concrete training mechanism and benchmark scope. HKR-H/R are weak: this is a niche time-series representation paper with no product release, deployment claim, or reported performance number.
editor take
Di-COT removes sequence length from loss cost; six real datasets plus UCR/UEA is solid, but training-time gains lack numbers here.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
The paper introduces SMFP, a one-step generative policy that maps Gaussian noise to actions via a MeanFlow transform, trains it with off-policy mirror descent and an entropy surrogate, and reports better results than Gaussian and generative baselines across seven MuJoCo benchmarks.
#Agent#Inference-opt#Benchmarking#MuJoCo
why featured
Triggers hard-exclusion-technical-accessibility: MeanFlow, entropic mirror descent, and MuJoCo need RL/optimization context. HKR-K passes on the 7-benchmark claim; HKR-H/R fail, so score is capped.
editor take
SMFP beats baselines on 7 MuJoCo tasks; one-step sampling is the hook, but I’d wait for code and ablations.
HKR breakdown
hook knowledge resonance
open source
50
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
The paper proposes CA-LIG, a framework that computes layer-wise Integrated Gradients inside each Transformer block and fuses them with class-specific attention gradients, with evaluations across BERT, XLM-R, AfroLM, and a Masked Autoencoder vision Transformer.
#Interpretability#Vision#Benchmarking#BERT
why featured
HKR-K passes because the article names a concrete CA-LIG mechanism and model coverage. HKR-H/R are weak, and no metrics or production impact are disclosed, so it stays in the low-value but non-noise band.
editor take
CA-LIG spans 4 Transformer families, but the snippet gives no metrics; “clearer explanations” needs code and faithfulness numbers.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues
The paper proposes FF-BPSN for target-oriented proactive dialogue path planning, using two transformer-based decoders for forward and backward planning, then evaluating it on DuRecDial and DuRecDial 2.0.
#Agent#Reasoning#arXiv#DuRecDial
why featured
HKR-K passes on a concrete planning mechanism and datasets, but HKR-H/R fail. This is narrow dialogue-planning research with no product tie-in, major-lab signal, or practitioner-facing experiment, so it sits in the 40-59 band.
editor take
FF-BPSN uses dual decoders for bidirectional planning; DuRecDial-only evals make the SOTA claim stay in dialogue routing, not agents.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G
The paper proposes an AI-native 6G vision that uses one foundation model and collaborative multi-agent systems to unify network management; the abstract does not disclose experiments, datasets, or a deployment timeline.
#Agent#Multimodal#Research release
why featured
HKR-K passes on the proposed one-foundation-model plus multi-agent architecture; HKR-H/R are weak, and the body discloses no experiments, dataset, or deployment timeline.
editor take
One foundation model manages 6G networks; no experiments, datasets, or timeline disclosed, so this reads like roadmap staking.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for VLMs and Autonomous Agents
WildRoadBench evaluates VLM grounding and LLM-driven agents on the same professionally annotated UAV road-damage corpus, using per-class AP_50 under two protocols. The abstract says closed-source frontier models lead but leave over half the metric unused; the post does not disclose dataset size, model names, or the fixed interaction-budget value.
#Agent#Vision#Benchmarking#WildRoadBench
why featured
HKR-K passes via the two-track AP_50 setup, but HKR-H/R are weak. The abstract omits scale, model list, and interaction budget, so this stays in the 40–59 low-value band.
editor take
WildRoadBench tests VLMs and agents on identical UAV images; dataset size, model names, and budget stay undisclosed, so agent failures sting most.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning
The paper proposes Tunable MAGMAX, a continual-learning model-merging framework that uses a preference vector to control how many elements are selected from each task vector and automatically constructs that vector from small amounts of target-environment data plus training-task datasets.
#Fine-tuning#Inference-opt#Benchmarking#MAGMAX
why featured
HKR-K passes for a concrete mechanism, but the post lacks experiment scale, benchmark gains, or reproducible conditions. The angle is too niche for HKR-H/R, so it stays in all.
editor take
Tunable MAGMAX controls per-task vector element counts with one preference vector. Benchmarks and sample sizes are undisclosed; deployment claims feel early.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction
STM3 combines Multiscale Mamba, a Disentangled MoE framework, and an adaptive graph causal network for long-term spatio-temporal prediction, reports state-of-the-art results on 10 real-world benchmarks, and beats the second-best model on PEMSD8 by 7.1% MAE, 8.5% RMSE, and 15.9% MAPE.
#Benchmarking#STM3#Mamba#Research release
why featured
HKR-K passes via concrete mechanisms and PEMSD8 gains; HKR-H/R fail because this is a narrow spatio-temporal forecasting paper with little practitioner resonance.
editor take
STM3 claims SOTA on 10 benchmarks and -7.1% MAE on PEMSD8; long-sequence compute cost is undisclosed.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition
The paper releases 2 open-source iris recognition algorithms with IREX-compliant C++ implementations, evaluates 4 methods under IREX X protocols, and reports tests across 8 academic iris benchmarks.
#Vision#Benchmarking#IREX#arXiv
why featured
HKR-K passes on concrete artifacts and benchmark counts; HKR-H/R are weak because iris-recognition evaluation is niche and far from mainstream AI product or model competition.
editor take
The paper opens 2 iris algorithms and an IREX C++ template; CRYPTS hit 1:N latency, so the win is entry friction.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
MoRe: Modular Representations for Continual Learning on Sequential Data
MoRe decomposes knowledge into two module levels, fundamental and specific, and tests the framework on synthetic benchmarks and real-world LLM activations; the abstract reports better plasticity-stability trade-offs but does not disclose metric values.
#Memory#Interpretability#MoRe#Research release
why featured
HKR-K passes via a modular representation mechanism and LLM-activation tests. HKR-H/R are weak, and metrics are not disclosed, so this stays in the low-value research band.
editor take
MoRe splits representations into fundamental/specific modules, but gives no metrics; using LLM activations beats another parameter-tuning CL recipe.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Augmented Analytics and Decision Quality: The Role of Trust among Non-Technical BI Users
The paper surveys 250 business professionals and uses PLS-SEM to analyze how augmented analytics, trust, BI adoption, and decision quality relate among non-technical BI users.
#Research release
why featured
HKR-K passes via the 250-person survey and PLS-SEM method. HKR-H/R are weak: this is academic BI-adoption work with no product mechanism, model capability, or industry shock.
editor take
The paper surveys 250 BI users; self-reports plus PLS-SEM don't prove decision quality, and trust may just mean compliance.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net
The paper presents a two-stage low-light image enhancement framework using frozen algorithmic preprocessing and a compact depthwise-separable U-Net, reporting 3rd place in the CVPR 2026 NTIRE Efficient Low-Light Image Enhancement Challenge; the abstract says it includes extended benchmarks and ablations but does not disclose parameter counts in the snippet.
#Vision#Inference-opt#Benchmarking#CVPR
why featured
HKR-K passes via the named method and NTIRE ranking; HKR-H/R fail because the angle is technical and far from model, agent, or product stakes. No hard exclusion, but it sits in the low-value research band.
editor take
This took 3rd at NTIRE 2026; parameter counts aren't disclosed, so the lightweight claim stays unproven.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification
Brown Zaz and four coauthors propose Transductive Sharpening, a loss-level change that minimizes prediction entropy on unlabeled nodes while counterbalancing it on labeled nodes, and the 19-page arXiv paper reports node-classification gains across benchmarks with 4 figures and 17 tables.
#Benchmarking#Brown Zaz#Mar Gonzàlez I Català#Moshe Eliasof
why featured
HKR-K passes for a concrete mechanism and reported experiments. HKR-H/R fail because the story is narrow graph-learning research with no product, open-source tool, or industry impact hook, so it sits in the low-value band.
editor take
Transductive Sharpening changes only the loss, with 17 tables; I buy the angle, pending low-label-rate robustness.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
19d ago
arXiv · cs.LG· atomEN04:00 · 05·21
Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies
The paper evaluates ensemble RL trading strategies combining A2C, PPO, and SAC with SVM, decision trees, and logistic regression, comparing them against base RL models on cumulative returns, Sharpe ratio, Calmar ratio, and maximum drawdown; the RSS snippet does not disclose the dataset, backtest period, or exact return figures.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-K passes on method detail, but the post lacks dataset, return numbers, and reproducible conditions. The quant-finance angle sits far from core AI product or model-industry concerns, so it stays in the low-value band.
editor take
A2C/PPO/SAC get three classifiers; no dataset or returns disclosed, so don’t buy “consistently outperform” yet.
HKR breakdown
hook knowledge resonance
open source
43
SCORE
H0·K1·R0
03:44
19d ago
HuggingFace Papers (takara mirror)· rssEN03:44 · 05·21
Bounding-Box Trajectories Matter for Video Anomaly Detection
TrajVAD models multi-class bounding-box trajectories with normalizing flows; TrajVAD-T reaches 87.7% AP on ShanghaiTech without pose estimation, while TrajVAD-P adds pose features and reports 88.6% AUROC and 90.9% AP on ShanghaiTech.
#Vision#Benchmarking#TrajVAD#ShanghaiTech
why featured
HKR-K passes on a concrete method and benchmark numbers. HKR-H/R are weak because this is niche video-anomaly research with no product rollout, open-source artifact, or broad practitioner debate hook.
editor take
TrajVAD-P reports 90.9% AP on ShanghaiTech; box trajectories beating pose-heavy baselines is a useful slap at feature bloat.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
01:17
19d ago
HuggingFace Papers (takara mirror)· rssEN01:17 · 05·21
Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models
DEX trains multi-modality medical vision foundation models with expert pools, image-wise activation, and a group EMA director; its Medical Vision Universe benchmark contains over 4 million images across 10 modalities, and evaluations cover 26 downstream tasks.
#Multimodal#Vision#Benchmarking#DEX
why featured
HKR-K passes: the paper gives DEX mechanics, 4M images, 10 modalities, and 26 evaluated tasks. HKR-H/R are weak, so this is an informative but narrow research release with no hard exclusion.
editor take
DEX trains on 4M medical images across 10 modalities; I buy expert pools, but 26-task gains lack numbers here.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0

more

feeds

admin