ax@ax-radar:~/papers $ grep -E 'arxiv|paper' sources/tags
45 srcsignal 72%cycle 04:32

papers · 2026-06-03

215 papers · updated 3m ago
2026-06-03 · Wed
17:27
5d ago
HuggingFace Papers (takara mirror)· rssEN17:27 · 06·03
Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data
The paper introduces SEE, a method that uses 160 unique examples to elicit a base model’s ability to predict external judges’ multi-attribute scores, improving held-out calibration across three benchmarks while preserving answer quality.
#Alignment#Benchmarking#Fine-tuning#Research release
why featured
HKR-H/K/R all pass: latent self-evaluation is a neat hook, and the summary gives 160 samples plus 3 benchmarks. As a single calibration paper with no model names, benchmark names, or code status disclosed, it stays below featured.
editor take
SEE improves calibration on three benchmarks with 160 examples; I buy elicitation, but cross-judge stability is the hard signal.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
15:58
5d ago
HuggingFace Papers (takara mirror)· rssEN15:58 · 06·03
MetaPoint: Precise Spatial Control in Agentic Visual Generation
MetaPoint represents a continuous 2D coordinate as one special token and a bounding box as two tokens, while using existing positional encodings without new architecture or custom attention masks.
#Agent#Vision#Multimodal#MetaPoint
why featured
HKR-H/K/R all pass, but the post gives only the title and mechanism summary, with no benchmarks, code, or reproduction setup. Useful research signal, below featured threshold.
editor take
MetaPoint encodes a 2D point in 1 token; I buy the no-architecture-change part, but pixel-level claims lack benchmarks.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
14:56
5d ago
HuggingFace Papers (takara mirror)· rssEN14:56 · 06·03
SAID: Accelerating Diffusion-Based Language Models via Scaffold-Aware Iterative Decoding
SAID accelerates diffusion language model inference on LLaDA-8B and LLaDA 1.5 by spending denoising steps on scaffold tokens first and assigning extra steps only to low-confidence tokens, reaching a maximum 9.1x speedup across math, coding, and knowledge benchmarks.
#Inference-opt#Reasoning#Code#TH-AI-Lab-PKU
why featured
HKR-H/K/R all pass: 9.1x, scaffold tokens, and CHLG are concrete, and inference cost matters. The score stays in all because this is a single niche DLLM paper, not a broad product or lab release.
editor take
SAID hits 9.1x on LLaDA-8B/1.5; diffusion LMs need this inference bill fixed before AR displacement talk.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
14:52
5d ago
HuggingFace Papers (takara mirror)· rssEN14:52 · 06·03
Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance
The paper releases EgoProactive and extends five existing datasets into Pro²Bench, using a unified schema to evaluate proactive guidance and recovery when users deviate from the expected procedure.
#Agent#Multimodal#Vision#Llama
why featured
HKR-H/K pass: the paper frames off-track procedural recovery as a benchmark and names EgoProactive, Pro²Bench, and 5 source datasets. HKR-R is weak, and the feed does not disclose results or code, so this stays below featured.
editor take
EgoProactive extends 5 datasets; sample counts aren’t disclosed, so I’d audit OOP labels and recovery injection first.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
14:19
5d ago
HuggingFace Papers (takara mirror)· rssEN14:19 · 06·03
Scene-Centric Unsupervised Video Panoptic Segmentation
VideoCUPS introduces the first unsupervised video panoptic segmentation method, generating temporally consistent pseudo-labels from depth, motion, and visual cues, and the paper adds an evaluation protocol plus 4 competitive baselines.
#Vision#Benchmarking#VideoCUPS#Research release
why featured
HKR-K passes: VideoCUPS gives a pseudo-label mechanism, an evaluation protocol, and 4 baselines for unsupervised VPS. HKR-H/R are weak; the topic is narrow CV research with no product or practitioner nerve, so it stays in all.
editor take
VideoCUPS defines unsupervised VPS with 4 baselines; I buy the task, not the win—RSS gives no dataset or scores.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
14:06
5d ago
HuggingFace Papers (takara mirror)· rssEN14:06 · 06·03
BreastGPT: A Multimodal Large Language Model for Breast Cancer Clinical Routine
BreastGPT achieves 75.66% closed-ended accuracy and an 89.92% open-ended score on BreastStage-Bench, using BreastStage, a corpus with 1.86 million instruction-following pairs from 17 sub-datasets, 5 imaging modalities, and 136 task templates.
#Multimodal#Vision#Benchmarking#BreastGPT
why featured
HKR-K is solid because the paper gives dataset scale, modality count, and benchmark scores. HKR-H/R are weak: this is a breast-cancer clinical vertical, not a broad AI product or competitive industry event.
editor take
BreastGPT hits 75.66% on 1.86M pairs; don’t sell clinic impact until external validation and prospective trials show up.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R0
11:53
5d ago
HuggingFace Papers (takara mirror)· rssEN11:53 · 06·03
NextMotionQA: Benchmarking and Judging Human Motion Understanding with Vision-Language Models
NextMotionQA evaluates 12 VLMs on multiple-choice QA, video captioning, and fine-grained error correction, with tasks organized across three semantic axes and three complexity levels; VLM judges align with experts on coarse criteria at Cohen’s κ=0.70, but fall to κ=0.10 on part-level judgments.
#Multimodal#Vision#Benchmarking#NextMotionQA
why featured
HKR-H and HKR-K pass: the paper gives a concrete VLM failure gap in fine-grained motion judging. HKR-R is weak because the niche eval topic lacks a broad practitioner nerve.
editor take
NextMotionQA tests 12 VLMs; part-level κ drops to 0.10. Using VLMs as motion judges breaks at fine granularity.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
11:38
5d ago
HuggingFace Papers (takara mirror)· rssEN11:38 · 06·03
Archi: Agentic Operations at the CMS Experiment
Archi has run for CERN LHC’s CMS Computing Operations team since February 2026, combining documentation, historical data, and live monitoring systems to provide retrieval and analysis support for technical operators.
#Agent#RAG#Reasoning#Archi
why featured
HKR-H/K/R pass via the CERN CMS production-ops hook, Feb 2026 deployment, and real agent operations angle. The high-energy-physics ops setting and summary-level detail keep it in the 60–71 band.
editor take
Archi has run in CERN CMS ops since February; no eval size disclosed, but local open-weight parity is the punchline.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
11:19
5d ago
HuggingFace Papers (takara mirror)· rssEN11:19 · 06·03
Research identifies trace-mediated peak bias in deep reinforcement learning agents
The paper identifies Trace-Mediated Peak Bias in deep reinforcement learning: at intermediate eligibility trace depths, agents prefer trajectories with high reward peaks over alternatives with higher cumulative returns.
#Reasoning#Alignment#Research release
why featured
HKR-H/K pass: the paper has a counterintuitive RL-bias hook and a concrete mechanism around eligibility-trace depth. Impact stays narrow: no product tie-in, code artifact, or measured deployment effect is disclosed, so it lands in all.
editor take
TMPB appears at intermediate trace depths; I buy the optimizer mechanism, not the leap to human Peak-End Rule.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
10:38
5d ago
HuggingFace Papers (takara mirror)· rssEN10:38 · 06·03
VISTA: Vision-Grounded and Physics-Validated Adaptation of UMI Data for VLA Training
VISTA adapts UMI data for VLA training with three components: UMI-VQA for wrist-mounted fisheye VQA supervision, a physical-validation pipeline scoring trajectory continuity, self-collision risk, and execution fidelity, and a two-stage co-training recipe for vision-language grounding plus action prediction; the authors release the pipeline, dataset, validated trajectories, and pretrained model.
#Robotics#Vision#Multimodal#VISTA
why featured
HKR-K and HKR-R pass: the paper gives concrete training components and open artifacts, tied to robotics data scarcity. HKR-H is weak, and no performance numbers or broad lab signal are disclosed, so it stays in the interesting-but-not-featured band.
editor take
VISTA puts 3 gates on UMI data; no metric numbers disclosed, and the physical-validation filter is the part I trust.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
08:50
5d ago
HuggingFace Papers (takara mirror)· rssEN08:50 · 06·03
Research on Spectral Diagnostics of Modality Imbalance in Medical Vision-Language Models
The paper introduces Spectral Alignment Score and evaluates 15 VLMs with 6 alignment metrics and bidirectional retrieval, finding that medical images retain richer structural information than paired clinical reports and that SAS has the strongest zero-label correlation with medical-domain retrieval performance.
#Multimodal#Vision#Benchmarking#Research release
why featured
HKR-K is solid: a new metric and a 15-VLM evaluation setup are concrete. HKR-R passes narrowly via medical multimodal safety, but HKR-H is weak and there is no product or wider industry trigger, so it stays in the 60–71 band.
editor take
SAS tests 15 VLMs and 6 metrics; I buy the asymmetric diagnostic, because one alignment score hides medical mismatch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
08:43
5d ago
HuggingFace Papers (takara mirror)· rssEN08:43 · 06·03
COMBINER: Composed Image Retrieval Guided by Attribute-Based Neighbor Relations
COMBINER addresses composed image retrieval with attribute prototypes, using three modules: Adaptive Semantic Disentanglement, Unified Prototype-based Composition, and Dual Relations Modeling, and the paper reports experiments on three benchmark datasets, but the RSS snippet does not disclose metric values, dataset names, model size, or release timing beyond a planned GitHub implementation link.
#Multimodal#Vision#Embedding#COMBINER
why featured
HKR-K passes via a concrete mechanism and 3 benchmark datasets; HKR-H/R fail because the title is technical and no metrics are disclosed. This fits a low-value research brief, not featured.
editor take
COMBINER tests attribute prototypes on 3 CIR benchmarks; metrics and dataset names are missing, so I don’t buy the “first study” framing yet.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
08:34
5d ago
HuggingFace Papers (takara mirror)· rssEN08:34 · 06·03
A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs
The researchers build a benchmark from ActivityNet and news videos and evaluate nine MLLMs for positional bias in multi-video summarization under two-video and four-video input settings.
#Multimodal#Vision#Benchmarking#ActivityNet
why featured
HKR-H and HKR-K pass: positional bias in multi-video summarization is a fresh eval angle, with 9 MLLMs and two-/four-video setups. Impact stays in the 60–71 band because effect sizes and model rankings are not disclosed.
editor take
Nine MLLMs show slot bias in 2- and 4-video summarization; average scores hide an input-order bug.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
08:27
6d ago
HuggingFace Papers (takara mirror)· rssEN08:27 · 06·03
VCIFBench: Evaluating Complex Instruction Following for Video Understanding
VCIFBench evaluates complex instruction following in video understanding with 306 satisfiable test instructions, a 540-pair DPO preference dataset, and a 30-item conflict diagnostic subset, and experiments on 10 MLLMs show joint constraint satisfaction remains difficult.
#Multimodal#Vision#Benchmarking#VCIFBench
why featured
HKR-K and HKR-R pass: the dataset size and diagnostics are concrete for video-MLLM evaluation. It remains a single benchmark paper with an academic title and no broader industry hook, so it sits in the 60–71 band.
editor take
VCIFBench tests 10 MLLMs on 306 video instructions; its conflict subset is the useful jab at shallow video QA.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
06:38
6d ago
HuggingFace Papers (takara mirror)· rssEN06:38 · 06·03
Self-Evolving Deep Research via Joint Generation and Evaluation
The paper introduces SCORE, a co-evolutionary training framework that jointly trains an evaluator and a solver inside one shared-parameter model, using a meta-harness to dynamically control the evaluation environment based on solver performance for deep research report generation.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: SCORE uses one shared-parameter model for evaluator and solver, with a meta-harness controlling evaluation. No results, code, or major-lab backing are disclosed, so it stays in the 60–71 research band.
editor take
SCORE shares weights between judge and solver; no benchmark numbers disclosed, so this smells like reward hacking with nicer branding.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
05:25
6d ago
HuggingFace Papers (takara mirror)· rssEN05:25 · 06·03
Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning
The paper proposes a difficulty-aware SFT-then-RL framework for small language model reasoning and reports tests on 2 SLMs across 5 reasoning benchmarks against SFT, distillation, and RL baselines; the post does not disclose model names, benchmark names, or scores.
#Reasoning#Fine-tuning#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete training mechanism and test setup for small-model reasoning. Model names and scores are not disclosed, and HKR-H is weak, so it stays in all.
editor take
The paper tests 2 SLMs on 5 reasoning benchmarks; no names or scores disclosed, so “consistent gains” needs proof.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:47
6d ago
HuggingFace Papers (takara mirror)· rssEN04:47 · 06·03
RowNet: A Memory Transformer for Tabular Regression
RowNet predicts real estate price per square meter with two retrieval layers, multi-head attention, and a mixture-of-experts module; the post does not disclose dataset size, baseline results, or error metrics.
#Memory#Reasoning#RowNet#Research release
why featured
HKR-K passes on RowNet’s two-stage retrieval and multi-head attention mechanism. HKR-H and HKR-R are weak, and the post lacks dataset size, baselines, and error metrics, so it stays in the lower research-release band.
editor take
RowNet uses two retrieval layers for price regression, but reports no errors; without GBDT baselines, I don't buy the tabular-neural pitch.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
ProjQ: Project-and-Quantize for Adapter-Aware LLM Compression
ProjQ constrains quantization noise to a low-rank manifold via orthogonal subspace projection, and experiments on LLaMA-2, Qwen2.5, and Qwen3 report up to 2× lower evaluation loss for compensation and 3-bit language modeling performance matching standard 4-bit baselines.
#Fine-tuning#Inference-opt#LLaMA-2#Qwen2.5
why featured
HKR-H/K/R pass, but this is a single arXiv compression paper with no disclosed code, cost benchmark, or cross-source uptake. It sits at the high end of 60–71, below featured.
editor take
ProjQ matches 4-bit baselines at 3 bits; I buy this path—shape noise for LoRA, don't just crush weights.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services
ReLoRA re-adapts LoRA adapters after base-model updates using Bayesian compatibility-aware initialization and scheduled regularization, reducing time-to-readiness by up to 8.9x and improving accuracy by up to 4.6% versus baselines.
#Fine-tuning#Inference-opt#Yang Xu#Zihuai Xu
why featured
HKR-K and HKR-R are strong: concrete mechanism and rollout numbers. HKR-H is narrower, and a single arXiv paper without code, benchmark details, or independent replication keeps it below featured.
editor take
ReLoRA cuts LoRA re-adaptation time by up to 8.9x; I buy the pain, adapter drift is an ops tax.
HKR breakdown
hook knowledge resonance
open source
71
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation
The authors sweep teacher update schedules on Qwen3-8B and find that complete teacher-freezing isolation periods, not teacher age, drive stable self on-policy distillation; their CGTR method gates refreshes on reward improvement and length-tail safety, achieving zero collapse and the best final score across four tasks.
#Reasoning#Fine-tuning#Alignment#Qwen
why featured
HKR-H and HKR-K pass: the Qwen3-8B self-distillation study gives a concrete stability mechanism and 4-task result. HKR-R is narrow, mainly for post-training/alignment practitioners, so it stays below featured.
editor take
Qwen3-8B shows isolation periods stop collapse; I buy the mechanism, because clock refresh can canonize a drifting student.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning
Hidden-Align aligns last-layer hidden states of correct rollouts at the anchor token during RL training, improving average pass@1 over DAPO by 3.8, 6.2, and 5.4 percentage points on Qwen3-1.7B, 4B, and 14B across eight math reasoning benchmarks.
#Reasoning#Alignment#Benchmarking#Qwen
why featured
HKR-H/K pass: the mechanism is specific and the benchmark gains are concrete. It remains a training-research arXiv paper with limited spillover beyond math benchmarks, so it stays in the 60–71 band.
editor take
Hidden-Align adds 3.8/6.2/5.4 pass@1 points on Qwen3; hidden-state geometry as RL regularization beats squeezing one reward bit harder.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models
HARVE introduces RewardHackBench with 13 reward-hacking patterns, evaluates eight reward models, and proposes a training-free reward-head vector editing method that removes components aligned with a multidirectional hacking subspace.
#Alignment#Safety#Interpretability#HARVE
why featured
HKR-H/K/R all pass, but this is still a single arXiv item with abstract-level facts only; no code, effect size, or cross-source discussion is disclosed, so it stays at the upper end of 60–71.
editor take
HARVE tests 8 reward models on 13 hacking patterns; training-free reward-head editing smells like targeted desensitization for RMs.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
FLIPS: Instance-Fingerprinting for LLMs via Pseudo-random Sequences
FLIPS distinguishes 237 deployed configurations of the same LLM by exploiting biases in generated binary random sequences, reporting 96% closed-set accuracy and 90% open-set accuracy, compared with 35% for an adapted LLMmap baseline.
#Safety#Benchmarking#FLIPS#LLMmap
why featured
HKR-H/K pass: the mechanism and numbers are concrete, and LLM instance fingerprinting has security value. HKR-R is weak; as a single arXiv paper with no adoption signal, it stays all.
editor take
FLIPS reports 96% closed-set accuracy across 237 same-model configs; regulators checking only weights are missing sampling and quantization drift.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Exploiting Verification-Generation Gap: Test-Time RL with Confidence-Conditioned Verification
The paper proposes TTRL-CoCoV, a confidence-conditioned test-time RL framework that changes verification for high-, medium-, and low-confidence samples, and reports average absolute gains over TTRL of 9.8% in Pass@1 and 18.7% in Pass@16 across 6 reasoning benchmarks.
#Reasoning#Benchmarking#Alignment#TTRL-CoCoV
why featured
HKR-H and HKR-K pass: the mechanism and six-benchmark gains are concrete. It is a single arXiv research item with no deployment data in the supplied text, so it stays in the 60–71 band.
editor take
TTRL-CoCoV lifts Pass@16 by 18.7% on 6 reasoning benchmarks; test-time RL is moving from first-shot accuracy to coverage.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Speedrunning Tabular Foundation Model Pretraining
Researchers introduced a nanoTabPFN pretraining speedrun where contributors edit a single-file training script and target a fixed downstream ROC AUC on subsampled TabArena using one NVIDIA L40S GPU; the current record reaches the target in 0.92 minutes, 81x faster than the 74.32-minute baseline with 22x fewer synthetic datasets.
#Benchmarking#nanoTabPFN#NVIDIA#TabArena
why featured
HKR-H/K/R pass: the speedrun framing is clickable and the 0.92-minute, 81x claim is concrete. Scope is still tabular FM pretraining, so it stays in the 60–71 band.
editor take
nanoTabPFN hits target in 0.92 minutes on one L40S; great for training hacks, not proof of broad tabular generalization.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction
SEAOTTER combines a sensor-embedded autoencoder with one-time transcoding to standard JPEG, and at a 200:1 compression ratio it reports 7x faster encoding, 3.5x faster decoding, and +8% ImageNet top-1 accuracy versus AVIF while retaining JPEG infrastructure compatibility.
#Robotics#Vision#Inference-opt#SEAOTTER
why featured
HKR-H/K pass: SEAOTTER has concrete compression and speed numbers plus JPEG infrastructure compatibility. A single arXiv vision-compression paper remains niche, with no disclosed open-source details, author authority, or production replacement evidence.
editor take
SEAOTTER beats AVIF at 200:1: 7x encode, 3.5x decode, +8% ImageNet; cloud robotics benefits more than photo storage.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment
The paper proposes LPCD, a plug-in framework for live-streaming risk assessment that models intent and narrative variation at the latent level, enforces latent counterfactual consistency, and adds parameter-free calibration at inference time; experiments on large-scale industrial datasets and online production traffic report consistent gains over state-of-the-art baselines, while the snippet does not disclose dataset sizes or metric values.
#Reasoning#Safety#Benchmarking#Research release
why featured
HKR-H/K/R pass: tactical OOD in livestream risk has a clear adversarial hook, and LPCD plus online traffic tests add substance. The scope is niche, with no open artifact or business metric disclosed, so it stays in 60–71.
editor take
LPCD beats SOTA on industrial data and live traffic; metrics are undisclosed. I don't buy deployment claims without ablations.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
LatentChem replaces explicit Chain-of-Thought with continuous thought vectors for chemical reasoning, reports a 59.88% non-tie win rate against a strong CoT baseline on ChemCoTBench, and reduces average reasoning-step overhead by 10.84× with a 5.96× wall-clock speedup across evaluated benchmarks.
#Reasoning#Benchmarking#Inference-opt#LatentChem
why featured
HKR-H comes from latent vectors replacing text CoT; HKR-K has a 59.88% non-tie win rate and 1/10.84 step cost. Chemical reasoning is narrow, with no code or major-lab backing disclosed, so it stays all.
editor take
LatentChem cuts CoT overhead 10.84×; 59.88% non-tie wins isn’t a rout, but it dents the “reasoning must be written” dogma.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Lethe Method Achieves Persistent Knowledge Erasure in Federated Unlearning
Lethe addresses knowledge resurfacing after federated unlearning by using a Reshape-Rectify-Restore pipeline with a temporary adapter, gradient-ascent updates, layer-wise dual-stream rectification, and a short recovery stage; experiments report resurfacing rates below 1% in most cases after many follow-up training rounds.
#Fine-tuning#Alignment#Lethe#Research release
why featured
HKR-H/K/R all pass, but this is a single arXiv paper in a narrow federated-unlearning niche; code, benchmark setup, and adoption signals are not disclosed, so it stays in all at 70.
editor take
Lethe reports sub-1% resurfacing in most FU cases, but datasets and follow-up rounds aren’t disclosed; don’t buy persistent deletion yet.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Building Reliable Long-Form Generation via Hallucination Rejection Sampling
The paper proposes SHARS, an inference-time framework that uses any hallucination detector to reject and resample hallucinated segments during long-form generation, with code released on GitHub; the abstract says standardized benchmarks show reduced hallucinations, but the snippet does not disclose specific scores.
#Inference-opt#Safety#Alignment#Research release
why featured
HKR-H/K/R all pass, but the article gives mechanism and open code without benchmark numbers. Useful hallucination-control research, not a top-lab or product release, so it stays in all.
editor take
SHARS rejects hallucinated segments at inference; scores aren't disclosed. Detector calibration and resample cost decide whether this survives.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale
The paper tests cross-modal representational convergence at million-sample scale and finds mutual-nearest-neighbor alignment holds on about 1K samples, then drops sharply for text-image, text-audio, and text-video settings.
#Multimodal#Embedding#Benchmarking#arXiv
why featured
HKR-H/K pass: the paper gives a million-scale cross-modal representation test and a ~1K-sample boundary. As arXiv representation research with no tool, model release, or production claim, it stays in the 60–71 band.
editor take
Million-scale samples break the ~1K mutual-neighbor alignment story; stop treating Platonic convergence as settled multimodal doctrine.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Tool-Aware Optimization with Entropy Guidance for Efficient Agentic Reinforcement Learning
TAO-RL optimizes agentic reinforcement learning with trajectory filtering and a tool-aware entropy bonus, and the paper reports better results than existing methods across 7 reasoning benchmarks and 3 model scales.
#Agent#Tools#Reasoning#Research release
why featured
This Agent RL paper has a concrete mechanism and evaluation setup, but only title-level and summary-level facts are disclosed; no code, cost numbers, or production evidence. HKR-K/R pass, HKR-H is weak, so it stays all.
editor take
TAO-RL reports 7 benchmarks and 3 scales; I trust the trajectory filtering more than the entropy bonus story.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation
The paper uses three frontier code models to generate FIM hard negatives across eight languages, then fine-tunes Qwen2.5-Coder-7B-Instruct on a 100K-row subset, raising Delulu exact match by 18.8 points and edit similarity by 0.22 across every language and hallucination type.
#Code#Fine-tuning#Benchmarking#Qwen2.5-Coder
why featured
HKR-H/K/R all pass, but this is a single arXiv code fine-tuning paper with subfield impact. The +18.8-point Delulu gain is concrete, yet not a model release or major product update, so it stays in the 60–71 band.
editor take
Qwen2.5-Coder-7B gains 18.8 EM from 100K hard negatives; for IDE hallucinations, SFT is still very alive.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models
The paper analyzes residual stream geometry during multi-operand addition, proposes the Iso-Raw-Sum Trajectory and Noisy Quantization Model, and validates a geometric consistency check that detects and corrects quantization failures during inference.
#Reasoning#Interpretability#Inference-opt#Research release
why featured
HKR-H and HKR-K pass: the title has a clear twist, and the post names residual-stream analysis plus inference-time correction. The topic is narrow mechanistic interpretability, so it stays below featured.
editor take
This pins multi-operand addition errors on residual-stream quantization geometry; I buy the direction, but model sizes and fix rates are undisclosed.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
SeeTraceAct: Visibility-Aware Latent Planning from Cross-Embodiment Demonstration Videos
SeeTraceAct conditions a VLA robot policy on one unseen-task demonstration video, predicts visibility-aware future end-effector traces for spatial grounding, and achieves the best success rate across all four RoboCasa-DC settings plus a 12.5 percentage-point average success gain on a real-world Franka Panda benchmark with human demonstrations.
#Robotics#Vision#Multimodal#SeeTraceAct
why featured
HKR-H and HKR-K pass: cross-embodiment demos and a +12.5 pp real-robot gain are concrete. As a single arXiv robotics paper, it is distant from mainstream AI workflows, so HKR-R fails and the item stays in all.
editor take
SeeTraceAct lifts Franka Panda real success by 12.5 points; visible trace prediction beats black-box VLA localization here.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Reasoning Structure of Large Language Models
The paper introduces a scalable logic-puzzle benchmark and a pipeline that converts unstructured reasoning traces into verifiable claim-dependency graphs, then defines a reasoning-efficiency metric; its experiments on open-source reasoning models show structural measures distinguish behaviors that token count and final-answer accuracy conflate.
#Reasoning#Benchmarking#Interpretability#Research release
why featured
HKR-K and HKR-R pass: the paper offers a new metric and verifiable graph structure for reasoning traces. It lacks model names, scores, or a debate-driving result, so it stays in the 60–71 band.
editor take
The paper maps traces into claim-dependency graphs; with only open models tested, I’d trust it for diagnosis, not accuracy replacement.
HKR breakdown
hook knowledge resonance
open source
69
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Learning without Training: The Implicit Dynamics of In-Context Learning
arXiv:2507.16003v4 shows that one self-attention layer stacked with an MLP can make a standard forward pass with context mathematically equivalent to a no-context forward pass with a minimal low-rank update to the MLP weights, offering a mechanism for LLM in-context learning without weight updates.
#Reasoning#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the title has a real hook, and the summary gives a testable mechanism. It remains theory-heavy arXiv work without numbers, model names, or product impact, so it stays below featured.
editor take
One attention layer plus MLP equals a low-rank update; I buy the mechanism, not yet a GPT-5-scale ICL explanation.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Neuron Populations Exhibit Divergent Selectivity with Scale
The paper studies language models up to 30B parameters and vision models up to 5B parameters, finding that Rosetta Neurons grow in absolute count under a sublinear power law while taking a smaller share of all neurons; the authors also report higher selectivity, greater monosemanticity, and stronger domain specialization with scale.
#Interpretability#Benchmarking#arXiv#Research release
why featured
HKR-K is strong: the paper gives scale, a power-law claim, and selectivity changes. HKR-R lands for interpretability/safety, but with only arXiv-level detail and no tool or deployment angle, it stays in 60–71.
editor take
Rosetta Neurons shrink in share but sharpen by 30B; interpretability looks less like coverage, more like sparse experts.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling
R2IF optimizes LLM function calling with format/correctness constraints, CER, SMV composite rewards, and GRPO, and reports up to 34.62% improvement over baselines on BFCL/ACEBench with Llama3.2-3B.
#Reasoning#Tools#Alignment#R2IF
why featured
HKR-K and HKR-R pass: the paper states a concrete reward design and benchmark gain, and function-calling reliability matters to agent builders. HKR-H is weak, and this is a single arXiv paper without external validation, so it stays in 60–71.
editor take
R2IF lifts Llama3.2-3B by 34.62% on BFCL; I’d audit reward leakage before buying the interpretability claim.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Visual Instruction Tuning Aligns Modalities through Abstraction
The paper analyzes multiple vision-language architectures and finds that visual instruction tuning embeds visual features into intermediate semantic layers of the LLM, while fine-tuning only those layers preserves performance on vision-centric benchmarks and reduces training time.
#Multimodal#Vision#Fine-tuning#Research release
why featured
HKR-H and HKR-K pass: the middle-layer alignment claim is novel and testable. Single-source arXiv coverage lacks model list, training-time delta, or code details, so it stays in all.
editor take
Visual instruction tuning mainly hits middle LLM layers; middle-layer tuning preserves vision benchmarks, but training-time savings are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Aligning Data-Driven Predictors with Allocation: A Decision-Focused Approach to Survival Analysis
The paper proposes optimizing survival models with NDCG for organ allocation; on historical US heart-transplant data, its bootstrapping method raises baseline-model NDCG by 50-100%, which the authors report translates into tens of thousands of additional life-years per year under transplant allocation.
#Benchmarking#Alignment#arXiv#Research release
why featured
HKR-H/K/R all pass, but this is specialized survival-analysis work, not an LLM, agent, or product update. The post lacks reproduction detail and external validation, so it stays in the 60-71 research-signal band.
editor take
NDCG lifts transplant survival models 50-100%; the “tens of thousands of life-years” claim rests on replay, with clinical constraints undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing
The paper proposes a two-stage sample scoring function that separates learning dynamics for core and spurious features, then trains standard ERM on selected samples; experiments report stronger performance than state-of-the-art debiasing methods while using as little as 10% of the original training data.
#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the 10% data result is testable, and dataset debiasing matters in practice. HKR-H fails, and without code, uptake, or production evidence, it stays in the 60–71 band.
editor take
ERM wins with 10% data here; I buy the setup, but cross-dataset scoring stability is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression
The paper proposes sign lock-in theory for the one-bit wall in sub-bit compression: most weights keep their initialization signs, and effective sign flips under SGD noise follow a geometric-tail bound under bounded updates and rare near-zero re-entry.
#Inference-opt#Fine-tuning#Research release
why featured
HKR-H/K/R pass, but this is a single arXiv theory paper with only mechanism-level detail; model list, scale, and reproducible evidence are not disclosed, so it stays in the 60–71 band.
editor take
Sign lock-in blames the one-bit wall on initialization signs; the geometric-tail claim is crisp, but accuracy evidence is undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for VLMs and Autonomous Agents
WildRoadBench evaluates VLMs and LLM-driven agents on the same professionally annotated UAV road-damage corpus using per-class AP_50 under two protocols. Closed-source frontier models lead the VLM track but leave more than half the metric unused, open-source grounders plateau lower, and several agents fail to submit valid predictions within the fixed budget.
#Vision#Agent#Benchmarking#WildRoadBench
why featured
HKR-H and HKR-K pass: aerial road damage tests VLMs/agents outside toy tasks, with AP_50 and budget-failure results. The domain is academic and narrow, so it stays below featured.
editor take
WildRoadBench tests VLMs and agents on one UAV corpus; closed VLMs still lose over half AP_50, and agents trail despite tools.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Automatic Layer Selection for Hallucination Detection
The paper proposes FEPoID for automatic layer selection in hallucination detection across question-answering and summarization benchmarks, covering multiple LLM architectures and scales. The method is training-free, adds negligible computational overhead, outperforms tested criteria and existing baselines, and the authors publish code on GitHub.
#Safety#Interpretability#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: the paper gives a testable training-free mechanism, low-overhead claim, and open code. It remains a single arXiv paper without adoption or broad discussion, so it stays in all.
editor take
FEPoID selects layers via the first intrinsic-dimension peak; I buy the direction, but model lists and gains are undisclosed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation
FiRe-OPD filters low-quality rollout samples at the trajectory level and applies soft token reweighting inside retained traces; the paper reports gains of 6.25 on AIME 2024 in a strong-to-weak setting and 18.81 on Miner in a multi-teacher setting.
#Fine-tuning#Alignment#Reasoning#FiRe-OPD
why featured
HKR-K/R pass: FiRe-OPD gives a concrete two-level optimization recipe and two benchmark gains. HKR-H is weak; a single arXiv post-training paper lacks broad pull, so it stays in all.
editor take
FiRe-OPD reports +6.25 on AIME 2024 and +18.81 on Miner; full-trace KL looks increasingly lazy.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
MLSkip: Data Skipping for ML Filters via Lightweight Metadata
MLSkip uses Parquet min-max metadata to prune ML filter predicates; on TPC-H and TPC-DS tables with selectivity below 0.1%, its average pruning effectiveness reaches 27.4%. A size-bounded 2D convex-hull metadata structure raises pruning effectiveness to 38.31%, costs at most 45 bytes per row group and column pair, and shows a 1.07× end-to-end speedup over PyTorch in DuckDB.
#Inference-opt#MLSkip#DuckDB#PyTorch
why featured
HKR-K/R pass: the paper gives reproducible benchmarks, pruning rates, and metadata overhead, tied to inference cost. HKR-H is weak, and the database-systems angle lacks open-source or adoption signals, so it stays in all.
editor take
MLSkip prunes 38.31% of row groups below 0.1% selectivity; 1.07× end-to-end speedup keeps this firmly early-stage.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
WaterSIC: Information-Theoretically (Near) Optimal Linear Layer Quantization
WaterSIC assigns different quantization rates to weight-matrix columns for dense linear layers, stays within a 0.255-bit rate gap to the information-theoretic limit under any input-activation covariance matrix, and reports new state-of-the-art results on Llama and Qwen models at 1 to 4 bits.
#Inference-opt#Llama#Qwen#WaterSIC
why featured
HKR-K/R pass via the 0.255-bit optimality gap and Llama/Qwen 1–4 bit results tied to inference cost. HKR-H is weak, and the information-theoretic framing earns a technical-accessibility penalty, so it stays all.
editor take
WaterSIC gets column-wise quantization within 0.255 bits of the limit; GPTQ’s worst-case gap now has a cleaner target.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Solipsistic Superintelligence Is Unlikely to Be Cooperative
The paper argues that solipsistic AI design creates a train-test-deploy gap through endogenous non-stationarity, and its abstract names three directions: dynamic evaluation testbeds with adaptive counterparties, institutions as design primitives, and human agency as a structural feature.
#Agent#Alignment#Benchmarking#Research release
why featured
HKR-H/K/R are present but thin: the item offers an abstract-level alignment claim, not experiments, author context, reproducible evals, or debate signal. Mid-high for safety research, below featured.
editor take
This pins cooperation failure on endogenous deployment drift; only the abstract is disclosed, with no dynamic-eval benchmark.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Clustered Self-Assessment: A Simple yet Effective Method for Uncertainty Quantification in Large Language Models
The paper proposes Clustered Self-Assessment for LLM uncertainty quantification: it clusters sampled generations into semantic groups, turns them into multiple-choice options, and uses the model’s option probabilities as confidence estimates, reporting competitive results with as few as 2 additional samples.
#Reasoning#Alignment#Benchmarking#Research release
why featured
HKR-H/K/R pass, but only abstract-level facts are available: authors, experiment scale, and baseline deltas are not disclosed. Useful UQ paper, not same-day must-write.
editor take
Clustered Self-Assessment needs just 2 extra samples for confidence; simple idea, strong fit for production refusal thresholds.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings
The paper analyzes the stability of multiple KGEMs across several datasets and finds that initialization, triple ordering, negative sampling, dropout, and hardware each induce instability of comparable magnitude in link prediction results.
#Embedding#Benchmarking#Research release#Benchmark
why featured
HKR-H/K/R all pass: the title has a hook, the abstract gives five instability sources, and reproducibility matters to evaluators. Importance stays in 60–71 because KG embeddings are niche and model/dataset counts are not disclosed.
editor take
KGEM paper isolates 5 stochastic sources with comparable instability; I’d discount any link-prediction leaderboard reporting only MRR.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference
DriftSched applies feedback-driven compensation to runtime token drift in multi-tenant LLM inference on NVIDIA L4 GPUs, reducing workload estimation error by 38.8% MAE and 40.5% RMSE on average; under sustained GPU contention, SJF beats FIFO with about 42% lower median end-to-end latency and about 16% lower P99 latency.
#Inference-opt#Benchmarking#NVIDIA#Research release
why featured
HKR-K/R pass: the paper gives NVIDIA L4 multi-tenant inference numbers and hits latency/cost nerves; HKR-H is weak because the angle is a systems-paper title. Specialized infra research fits the 60-71 band, not featured.
editor take
DriftSched cuts L4 estimation error 38.8%; inference schedulers need token-drift control, not another throughput victory lap.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Alignment-Aware Decoding
The paper introduces alignment-aware decoding to improve LLM alignment at inference time; AAD requires only a standard DPO setup and outperforms strong baselines across diverse alignment benchmarks and model scales.
#Alignment#Inference-opt#Benchmarking#Research release
why featured
HKR-K/R pass: AAD moves alignment intervention into decoding and claims wins across benchmarks and model scales. Single arXiv paper lacks exact gains, code, or major-lab backing, so it stays in the 60–71 band.
editor take
AAD only needs standard DPO setup; I buy inference-time alignment, but the snippet omits latency cost and decoding details.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Exact Equivariance, Kept Through Training, Buys Zero-Shot Generalisation Across the Symmetry Group
The paper proves that an equivariant encoder and predictor make one-step relMSE exactly invariant over group G. In tests, the non-equivariant baseline’s out-of-distribution error rises by 13.8x in 2D, 17.2x in 3D, and 157x across the SE(3) ladder.
#Robotics#Benchmarking#Reasoning#Sutton
why featured
HKR-H/K pass: the title has a concrete exact-equivariance-to-zero-shot hook, and the summary gives relMSE invariance plus 13.8/17.2/157x OOD errors. Niche geometric ML limits HKR-R; technical accessibility keeps it below featured.
editor take
Equivariance holds SE(3) OOD error at 1.00x; the baseline hits 157x, a clean win for hard structure over scale.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks
KVarN applies Hadamard rotation and dual-axis variance normalization across K and V matrices for calibration-free KV-cache quantization, targeting autoregressive decoding where token-scale errors accumulate, and reports 2-bit state-of-the-art results on MATH500, AIME24, and HumanEval with a vLLM implementation released.
#Reasoning#Inference-opt#Benchmarking#Huawei
why featured
HKR-K/R pass: 2-bit KV-cache, calibration-free design, and MATH500/AIME24/HumanEval are concrete. HKR-H is weak; this remains a specialist arXiv method with no disclosed deployment or major-model adoption, so it stays in the interesting band.
editor take
KVarN reports 2-bit KV-cache wins on MATH500, AIME24, HumanEval; I trust decoding-error analysis over prefill-only quant papers.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Assistax: A Multi-Agent Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics
Assistax introduces an open-source reinforcement learning benchmark for assistive robotics tasks, using JAX hardware acceleration in physics-based simulation and reporting up to 370× faster open-loop wall-clock time for vectorized training runs than CPU-based alternatives.
#Agent#Robotics#Benchmarking#Assistax
why featured
HKR-H/K pass via the 370x speedup and open-source JAX mechanism. HKR-R is weak because this is a specialized RL/robotics benchmark with limited spillover for general AI practitioners, so it stays in 60–71.
editor take
Assistax claims 370× faster JAX vectorized RL for assistive robotics; speed is real value, patient realism remains the hard gap.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
The paper proposes Flow Map Reward Guidance, a training-free single-trajectory method that recasts generative guidance as deterministic optimal control; at text-to-image scale, it matches or exceeds baselines on inverse problems and reward-guided generation with as few as 3 NFEs, and the code is released on GitHub.
#Alignment#Inference-opt#Vision#Research release
why featured
HKR-K and HKR-R pass: concrete mechanism, 3-NFE result, and open code. HKR-H is weak, and this is a single arXiv method paper, below the featured bar.
editor take
FMRG claims image guidance at 3 NFEs, training-free; memory cost is undisclosed, but slow diffusion guidance looks exposed.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning
GRZO improves zeroth-order fine-tuning with group-relative normalization, increasing effective gradient-direction count from one to batch size at no extra forward cost; on Llama3-8B it beats MeZO by 3.0 average accuracy while using 23% lower peak GPU memory.
#Fine-tuning#Inference-opt#arXiv#RoBERTa
why featured
HKR-K/R pass: the paper reports a concrete GRZO normalization mechanism and Llama3-8B gains over MeZO. HKR-H fails because this is a niche optimizer paper, so it stays in the 60–71 research band.
editor take
GRZO beats MeZO by 3.0 on Llama3-8B with 23% less peak memory; zeroth-order fine-tuning finally looks engineerable.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Multi-Segment Attention: Efficient KV-Cache Management for Faster LLM Serving
AsymCache uses Multi-Segment Attention to process non-contiguous KV contexts and make latency-aware cache residency decisions; in common LLM serving workloads, it reduces TTFT by 1.90-2.03x and TPOT by 1.62-1.71x over recent baselines, while cutting average job latency by up to 18.1% in Continuum-style agent serving.
#Inference-opt#Agent#AsymCache#Continuum
why featured
HKR-K and HKR-R pass: the paper states a concrete Multi-Segment Attention mechanism and latency figures tied to serving cost. HKR-H is weak, and a single arXiv systems paper stays below featured threshold.
editor take
AsymCache cuts TTFT by 1.90–2.03x; I trust KV work that attacks attention-kernel constants over vague memory-saving claims.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
PURGE: Projected Unlearning via Retain-Guided Erasure
PURGE adapts A-GEM gradient projection for machine unlearning, constraining each erasure step to avoid increasing retain-set loss; across 5 datasets and 22 class-level forgetting tasks, it keeps retain accuracy above 96% and brings membership-inference AUROC close to 0.5.
#Fine-tuning#Safety#Benchmarking#A-GEM
why featured
HKR-K is strong: mechanism and evaluation numbers are concrete. HKR-R comes from privacy/compliance relevance, but no major-lab signal, artifact, or production replacement claim keeps it in the high all band.
editor take
PURGE keeps 96% retain accuracy across 22 class-forgetting tasks; retain-confusion is the clever bit, since uniform targets leak to MIA.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
TadA-Bench: A Million-Variant Benchmark for Future-Round Discovery Toward Agentic Protein Engineering
TadA-Bench builds a million-variant wet-lab replay benchmark from 31 TadA directed-evolution rounds, where models receive earlier experimental rounds and rank variants that appear only in later rounds.
#Agent#Benchmarking#TadA-Bench#Hugging Face
why featured
HKR-H/K pass: a million variants and 31 wet-lab replay rounds give concrete benchmark value. HKR-R is weak because protein engineering is niche for general AI practitioners, so it stays in 60–71.
editor take
TadA-Bench uses 31 wet-lab rounds and 1M variants to punish interpolation; random-split wins look cheap here.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
A Close Look at World Model Recovery in Supervised Fine-Tuned LLM Planners
The paper tests supervised fine-tuned LLM planners with interpretability experiments and finds that training on valid action sequences lets models linearly encode action validity and some state predicates.
#Reasoning#Interpretability#Fine-tuning#Research release
why featured
HKR-K and HKR-R pass: the paper offers a concrete testable claim about SFT planner representations and feeds the world-model debate. HKR-H is weak, and a single arXiv technical paper stays below featured.
editor take
SFT makes LLM planners linearly encode action validity. No model scale disclosed; I don't buy broad generalization yet.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
dLLM-Cache accelerates diffusion LLM inference with training-free adaptive caching, combining long-interval prompt caching and feature-similarity response updates, and reports up to 9.1x FLOPs reduction on LongBench-HotpotQA for LLaDA 8B and Dream 7B.
#Inference-opt#LLaDA#Dream#LongBench
why featured
HKR-H/K/R all pass: the paper gives a concrete 9.1x FLOPs result and targets inference cost. It stays in all because this is a single arXiv inference-optimization paper for a niche dLLM stack.
editor take
dLLM-Cache cuts HotpotQA FLOPs 9.1x; I buy this route, because diffusion LLMs owe an inference bill.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
How Quantization Changes Interpretable Features: A Sparse Autoencoder Analysis of Language Models
The study uses a frozen SAE to compare full-precision and RTN-quantized activations on Pythia-70M and Gemma-2-2B, finding 62.4% and 51.3% active-feature survival at INT6, while Gemma-2-2B INT7 improves perplexity but degrades 18.7% of features.
#Interpretability#Inference-opt#Safety#Pythia
why featured
HKR-H/K/R pass via a concrete quantization–interpretability hook and INT6 survival rates. Score stays below featured because it is a narrow arXiv paper with small models and no disclosed production impact.
editor take
Gemma-2-2B INT7 improves perplexity while damaging 18.7% of features; metric-only quantization signoff is unsafe.
HKR breakdown
hook knowledge resonance
open source
68
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
CauTion: Knowing When to Trust LLMs for Ensemble Causal Discovery
CauTion integrates LLM domain knowledge into an ensemble causal discovery pipeline with three stages: consensus voting resolves up to 96% of agreed edges, annotation-free trust calibration restricts LLM arbitration to unreliable algorithmic evidence, and cycle repair enforces an acyclic graph; experiments cover six datasets and report stronger gains on larger graphs.
#Reasoning#Tools#Benchmarking#OpenCausaLab
why featured
HKR-H/K/R all pass because the paper has a trust-calibration hook, a concrete 3-stage method, and numbers. The causal-discovery focus is niche, with no product impact or artifact disclosed, so it stays in the 60–71 band.
editor take
CauTion resolves up to 96% consensus edges across six datasets; limiting LLMs to weak-evidence edges feels engineering-real.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Multi²: Hierarchical Multi-Agent Decision-Making with LLM-Based Agents in Interactive Environments
Multi² splits LLM-based agents into a high-level sub-goal generator trained with SFT and a low-level atomic-action executor trained with offline-to-online RL, and the paper releases three hierarchical benchmark datasets; the abstract does not disclose the number of environments, baseline names, or scores.
#Agent#Reasoning#Benchmarking#Multi²
why featured
HKR-K/R pass through the agent hierarchy mechanism and 3 benchmarks. Single arXiv source with no environment count, baselines, or scores keeps it in the 60–71 research-signal band.
editor take
Multi² splits SFT subgoals from RL actions and ships 3 benchmarks; scores aren’t disclosed, so I don’t buy stable long-horizon control yet.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling
SketchSong predicts high-level sketch tokens before generating audio tokens, and explicitly models four tracks: vocals, bass, drums, and other instruments.
#Audio#Multimodal#Benchmarking#SketchSong
why featured
HKR-K is clear: sketch tokens precede audio tokens, with vocals, bass, drums, and other instruments modeled as four tracks. HKR-R is absent; the post gives no access path, benchmark result, or workflow-cost hook.
editor take
SketchSong models 4 tracks and plans sketch tokens first. Metrics are undisclosed; don't sell this as a Suno-class leap.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Training a Predictive Coding Network on ImageNet Using Equilibrium Propagation
The authors train a 10-layer convolutional PCN, VGG10, on full-size ImageNet using an EP-based method, reaching a 13.23% top-5 test error rate versus a 12.2% backpropagation baseline.
#Vision#Benchmarking#ImageNet#Research release
why featured
HKR-H and HKR-K pass: full-size ImageNet and 13.23% top-5 error give a testable result. As a single arXiv training-method paper with limited product impact, it fits the interesting all band.
editor take
EP trains VGG10 on ImageNet to 13.23% top-5 error; 1.03 points off backprop, so stop laughing at physics training.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs
The paper trains scalar affine adapters on vector-label interpretability artifacts while keeping the LM frozen; with d_model+1 parameters, the adapters raise generation scoring from 50% to 70% at 70B scale and reach 94% recall@1 for topic identification.
#Interpretability#Fine-tuning#Reasoning#Research release
why featured
HKR-K/R pass on concrete adapter size and 70B metrics; HKR-H is weak because the title is specialist. No code, lab name, or independent uptake is disclosed, so this stays in all rather than featured.
editor take
A d_model+1 affine adapter lifts 70B self-interpretation scoring from 50% to 70%; 85% gain from bias smells like representation priors.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Flicker-DDPM: Accelerating Denoising Diffusion with 1/f Colored Noise
Flicker-DDPM replaces white noise in the forward process with 1/f colored noise, uses a spatial correlation kernel σ(d)=(d+1)^-η, and matches or exceeds a standard DDPM baseline on CIFAR-10 with 3.33 times fewer sampling steps and negligible extra compute per step.
#Inference-opt#Flicker-DDPM#Research release
why featured
HKR-H and HKR-K pass: the mechanism and 3.33x step reduction are concrete. HKR-R is weak because validation is limited to CIFAR-10 and standard DDPM, not production diffusion workloads.
editor take
Flicker-DDPM matches DDPM on CIFAR-10 with 3.33× fewer steps; I’d wait for ImageNet before buying the speedup.
HKR breakdown
hook knowledge resonance
open source
67
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention
The paper shows that fixed block causal attention has boundary reachability failures, derives a top-1 accuracy upper bound of 1/K on a constructed K-way boundary-copy distribution, and validates the coverage mismatch in controlled 1024-token experiments plus an 8K-token Qwen2.5-7B probe.
#Reasoning#Inference-opt#Benchmarking#Qwen2.5-7B
why featured
HKR-K is strong via the 1/K bound and reproducible probes; HKR-R lands on long-context reliability. The topic is still a specialized attention-engineering paper, below featured threshold.
editor take
Fixed block causal attention hits 1/K on boundary copy; this reads more like a structural bug report than another sparse-attention patch.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Visual Graph Scaffolds for Structural Reasoning in Large Language Models
The paper rewrites teacher-provided reasoning traces into graph mind maps for multi-hop question answering, and visual graph guidance remains effective after direct answer clues are removed, supervised fine-tuning, and KL-based distillation.
#Reasoning#Vision#Fine-tuning#Research release
why featured
HKR-H/K pass: the visual scaffold and answer-clue ablation create a clear research hook. No model names, dataset names, or result numbers are disclosed, so this stays a mid-band arXiv reasoning paper.
editor take
The paper trains multi-hop QA with visual mind maps; no models or scores disclosed, so I read it as a leakage-control probe.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Sample-Size Scaling of the African Languages NLI Evaluation
The paper tests NLI sample-size scaling on 16 African languages in AfriXNLI with 50 to 500 labeled examples, using XLM-R Large fine-tuned on XNLI and AfroXLM-R Large, and finds language-sensitive, often non-monotonic performance rather than steady gains from more annotations.
#Fine-tuning#Benchmarking#AfriXNLI#XLM-R
why featured
HKR-H and HKR-K pass: non-monotonic scaling in low-resource NLI is a real hook with testable sample ranges and model names. Industry impact is narrow, so it stays in the 60–71 band.
editor take
AfriXNLI scaling hits 500 labels across 16 languages and still goes non-monotonic; more annotation is a weak default here.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Dynamic Short Convolutions Improve Transformers
The paper adds dynamic short convolutions to language models from 150M to 2B parameters, reporting a 1.33x compute advantage over compute-matched Transformers when applied to K/Q/V vectors and 1.60x when added after every linear layer.
#Reasoning#Inference-opt#Mamba-2#Gated DeltaNet
why featured
HKR-K/R pass: the paper reports 150M-2B tests, K/Q/V dynamic short convolutions, and a 1.33x iso-compute edge. HKR-H is weak; this remains a specialist architecture paper, not same-day industry news.
editor take
Dynamic short convolutions claim 1.33x compute savings at 150M–2B; I’d distrust extrapolation, but the K/Q/V locality bet is sharp.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Forgetting is Not Erasure: Recovering Latent Knowledge via Transport Keys
The paper tests catastrophic forgetting with stitched evaluation and compact task-specific transport keys, finding on split CIFAR-100 with a ResNet-style network that the keys recover most original Task A performance after sequential training on Task B.
#Memory#Vision#Interpretability#Research release
why featured
HKR-H and HKR-K pass: the title has a counterintuitive claim, and the post gives transport keys plus split CIFAR-100 conditions. HKR-R is weak; this is an arXiv-only result far from products or frontier models.
editor take
Transport keys recover most Task A performance on split CIFAR-100; no numbers disclosed, so don’t generalize this to LLM forgetting.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Fast-dLLM++: Fréchet Profile Decoding for Faster Diffusion LLM Inference
Fast-dLLM++ uses Fréchet profile decoding to select parallel commit sets for diffusion LLM inference, leaves the model and cache unchanged, and reports up to 37% higher throughput at comparable accuracy on LLaDA-8B across GSM8K, MATH, HumanEval, and MBPP.
#Inference-opt#Reasoning#Code#Fast-dLLM++
why featured
HKR-K is solid: 37% throughput gain on LLaDA-8B across four benchmarks. HKR-R touches inference cost, but HKR-H is weak and diffusion-LLM decoding is niche, so this stays in all.
editor take
Fast-dLLM++ reports up to 37% throughput gain on LLaDA-8B; I buy it, dLLM inference is commit-policy bound.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Study compares prompting strategies for African language natural language inference
The paper evaluates NLI prompting on Swahili, Yoruba, and Hausa with AfriXNLI, comparing five prompt strategies across Llama3.2-3B and Gemma3-4B. It removes few-shot examples and Chain-of-Thought to isolate prompt design, and reports contrastive prompting as the most reliable strategy across languages and models.
#Reasoning#Benchmarking#Llama#Gemma
why featured
HKR-K passes with a concrete dataset, languages, prompting strategies, and model set; HKR-R passes on low-resource evaluation gaps. The topic is academic and narrow, so it stays in all.
editor take
AfriXNLI tests 3 languages, 2 models, 5 prompts; no scores disclosed, but contrastive wins because label skew still dominates.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset
KITScenes Multimodal presents a European autonomous driving dataset with synchronized high-resolution global-shutter cameras, lidar beyond 400 meters, 4D imaging radar, redundant GNSS/INS, 3D-mapped traffic elements, and four benchmarks for online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving.
#Multimodal#Vision#Robotics#KITScenes
why featured
HKR-H and HKR-K pass: KITScenes gives a concrete sensor stack and four benchmarks. A single arXiv dataset release is vertical, with limited general AI product or model impact, so it sits in 60–71.
editor take
KITScenes ships 400m+ lidar and 4 benchmarks; I buy the sensor stack, but the “most complete maps” claim needs annotation specs.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter
The paper proposes Compress-then-Merge, which maps T LoRAs into shared r-dimensional subspaces before merging and directly produces a rank-r LoRA; experiments across multiple models and tasks report better results than existing single-LoRA-output baselines.
#Fine-tuning#Inference-opt#Benchmarking#Research release
why featured
HKR-H/K/R pass, but the post gives only the mechanism and a baseline claim; datasets, effect size, and code are not disclosed. This is useful fine-tuning research, not a featured-level industry update.
editor take
CtM compresses T LoRAs into r-dimensional subspaces before merging. Model and task names are undisclosed; I buy the ordering flip.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Whom to Query for What: Adaptive Group Elicitation via Multi-Turn LLM Interactions
The paper proposes an adaptive group elicitation framework that selects both questions and respondents under explicit query and participation budgets, combining an LLM-based expected information gain objective with heterogeneous graph neural network propagation, and reports improved population-level response prediction across three real-world opinion datasets, including over 12% relative gain on CES at a 10% respondent budget.
#Agent#Reasoning#Research release
why featured
HKR-H/K pass: joint question-and-respondent selection is a concrete mechanism with 3 datasets and 12%+ gain. HKR-R is weak because this is an academic opinion-prediction paper, not a mainstream model or agent workflow story.
editor take
Three opinion datasets improve; CES gains >12% at 10% respondent budget, but LLM-EIG cost is undisclosed.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Explainable Forecasting of Scientific Breakthroughs from Concept Network Dynamics
The paper introduces a two-stage LightGBM method that uses 59 semantic and topological features to predict OpenAlex concept-pair link formation and future weight; validation across four technology and biomedical domains reports ROC-AUC of 0.954–0.967 without re-tuning, versus roughly 0.90 for prior models, and RMSLE of 0.45–0.6 over one- to five-year horizons.
#Benchmarking#Interpretability#OpenAlex#Research release
why featured
HKR-H and HKR-K pass: the title has a breakthrough-forecasting hook, and the summary gives model design, feature count, and metrics. HKR-R fails; this is a single arXiv paper with no product or industry move.
editor take
LightGBM hits 0.954–0.967 AUC with 59 features; I’d trust “breakthrough forecasting” only after seeing negatives and time splits.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation
The paper analyzes multi-stream residual connections in HC-based language models: after an early seeding stage, residual mixing often stays close to identity, both signal and interpretable features concentrate in a dominant stream, and symmetry breaking at stream initialization reduces dominant-stream behavior and improves performance across mHC variants; the authors state that the code is publicly available.
#Interpretability#Benchmarking#Research release#Open source
why featured
HKR-H and HKR-K pass: the paper names a concrete Hyper-Connections failure mode, mitigation path, and public code. The work is architecture-internal, so reach stays below featured.
editor take
HC streams often collapse into one dominant stream; no scale numbers disclosed. Symmetry-broken init helps, but multi-stream isn't free capacity.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models
The paper uses embedding matrices to estimate near-orthogonality deviation ε, separates dozens of open-source models into high-ε and low-ε classes, and replaces raw vector count with k/d in an adjusted capacity formula that reduces prediction error by two orders of magnitude without extra parameters.
#Interpretability#Benchmarking#Research release
why featured
HKR-K passes with testable ε estimation and a k/d correction result. HKR-H/R are weak, and this is a single theoretical arXiv paper, so it fits all rather than featured.
editor take
This estimates ε across dozens of models; k/d cuts error 100x, but “capacity” still needs causal feature evidence.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)
The paper proposes aligned training for SAEs, enforcing an encoder-decoder inner product of 1 for every feature to improve reconstruction, remove dead features, and increase stability across training seeds without adding hyperparameters or computational cost.
#Interpretability#arXiv#Research release
why featured
HKR-K and HKR-R pass: SAE stability and dead features are real interpretability pains, with a concrete parameter-free constraint. The paper is technical and lacks broad product impact, so it stays in the 60–71 band.
editor take
Aligned training fixes SAE encoder-decoder inner products at 1; zero hyperparams and compute makes this cleaner than another sparsity-loss hack.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
Vision-OPD instantiates a crop-conditioned teacher and a full-image student from the same MLLM, then minimizes token-level divergence on the student’s on-policy rollouts. The method targets the regional-to-global perception gap and uses no external teacher, ground-truth labels, reward verifier, or inference-time tool use.
#Multimodal#Vision#Fine-tuning#Vision-OPD
why featured
HKR-K and HKR-R pass: the mechanism is concrete and the problem matters for multimodal deployments. The post gives no benchmark numbers, model scale, or release details, so it stays in the ordinary research band.
editor take
Vision-OPD uses one MLLM as crop teacher and full-image student; I buy the mechanism, focus beats tool-stacking here.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Neural Fields as World Models
The paper proposes isomorphic world models and implements them with motor-gated neural fields, testing the same architecture across three experiments: ballistic prediction without teleporting, offline improvement of a catching policy through a frozen learned world model, and body-selective motor channels without body labels.
#Reasoning#Robotics#Research release
why featured
HKR-H/K pass: the paper offers a world-model angle plus motor-gated neural fields tested in 3 tasks. HKR-R is weak because it has no platform, cost, or practitioner workflow hook, so it stays in all.
editor take
Motor-gated neural fields pass 3 experiments; I buy the spatial-topology bet, but “preliminary evidence” is far from robot-ready world models.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data
The paper introduces Asymmetric Langevin Unlearning, which uses public data to reduce certified unlearning cost by O(1/n_pub^2), analyzes utility under distribution mismatch between public and private sources, and reports evaluations with variational Rényi divergence and membership inference attacks.
#Safety#Alignment#Research release
why featured
Single arXiv unlearning paper with all HKR axes, but it stays theory-heavy: the post gives the algorithm, asymptotic cost, and distribution-shift analysis, with no code, scale, or product artifact.
editor take
ALU cuts certified unlearning cost by O(1/n_pub^2); I buy public-data noise buffering, but mismatch bounds decide deployment.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Constitutional On-Policy Safe Distillation
The paper introduces COPSD, which calibrates the teacher with a Cross-SFT cold start before constitution-conditioned on-policy distillation, and reports a stronger safety-helpfulness trade-off across 12 benchmarks while reducing the safety tax on general reasoning.
#Alignment#Safety#Fine-tuning#Research release
why featured
HKR-K is supported by Cross-SFT cold start, constitutional on-policy distillation, and 12 benchmarks; HKR-R lands on safety-helpfulness tradeoffs. HKR-H is weak, with no code, author signal, or outside discussion disclosed.
editor take
COPSD reports 12 benchmarks; the useful part is admitting OPSD can compress safety into terse refusals.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Pruning Deep Neural Networks via the Marchenko--Pastur Distribution
The paper proposes a Marchenko--Pastur random-matrix pruning method for deep neural networks, and on ImageNet-1k ViT-B/16 reaches 83.41% top-1 after only 3 distillation epochs while reducing sparse-execution MACs by 59.81%.
#Inference-opt#Fine-tuning#arXiv#Research release
why featured
HKR-K and HKR-R pass: the paper gives testable ImageNet-1k numbers and targets inference cost. HKR-H is weak, and the method is technical, so it stays in the 60–71 band.
editor take
MP pruning gets ViT-B/16 to 83.41% after 3 distill epochs, but A40 gains only 1.388×; training budget wins, hardware payoff stays thin.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Fast Unlearning at Scale via Margin Self-Correction
The paper introduces MArgin Self-Correction, an unlearning method that stops online without downstream validation and reports competitive forget-retain trade-offs on TOFU, MUSE News, and MUSE Books, but the abstract does not disclose the exact compute-cost fraction versus baselines.
#Fine-tuning#Alignment#Benchmarking#MASC
why featured
HKR-K and HKR-R pass: MASC offers a testable mechanism and benchmarks, but compute-cost ratios are not disclosed and the title is paper-like. No hard exclusion; this stays useful but not featured.
editor take
MASC stops on logit-gap criteria across TOFU and MUSE; cost is only called a fraction, so I don’t buy the scale claim yet.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Names Don’t Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning
The paper proposes a symbol-invariant Transformer that uses parallel embedding streams and aggregated attention to handle interchangeable tokens, and reports experiments confirming renaming invariance on open-vocabulary tasks requiring generalization to novel symbols.
#Reasoning#Benchmarking#Research release
why featured
HKR-H/K pass: the title has a counterintuitive hook and names parallel embedding streams plus aggregate attention. No metrics, code, or production evidence, so it stays in all.
editor take
The paper proves renaming invariance; experiments are undisclosed here, so don’t read open-vocab generalization as broader reasoning gain.
HKR breakdown
hook knowledge resonance
open source
66
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Multiple Choice Learning of Low-Rank Adapters for Language Modeling
The paper proposes LoRA-MCL, a Low-Rank Adaptation training scheme using Multiple Choice Learning and winner-takes-all loss, and evaluates it on audio captioning, visual captioning, and machine translation to produce diverse and relevant continuations at inference time.
#Fine-tuning#Audio#Vision#Research release
why featured
HKR-K has a concrete training mechanism, and HKR-R fits LoRA fine-tuning users. HKR-H is weak; this is a single method paper with no disclosed code, benchmark numbers, or production case, so it stays in 60–71.
editor take
LoRA-MCL trains multiple LoRA branches with winner-takes-all loss; metrics and model sizes are undisclosed, so diversity isn’t quality yet.
HKR breakdown
hook knowledge resonance
open source
65
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Training-Free Multi-Concept LoRA Composition with Prompt-Aware Weighting
The paper proposes W-Switch and W-Composite, two training-free methods that weight multiple LoRA modules by the semantic influence of trigger tokens in the target prompt, and evaluates them on the ComposLoRA testbed with image-based similarity metrics, LLM-based assessment, and a user study.
#Multimodal#Vision#Fine-tuning#LoRA
why featured
HKR-H and HKR-K pass: training-free multi-concept LoRA composition is a useful hook, with two named methods and a testbed. HKR-R is weak because this is a niche image-customization paper, so it stays in the mid research band.
editor take
W-Switch weights multiple LoRAs by trigger-token influence; I buy the training-free angle, but no gains are disclosed.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
TimeOmni-VL: Unified Models for Time Series Understanding and Generation
TimeOmni-VL uses Bi-TSI for bidirectional mapping between time series and images, then evaluates unified modeling on TSUMM-Suite with six understanding tasks and two generation tasks.
#Multimodal#Reasoning#TimeOmni-VL#TSUMM-Suite
why featured
HKR-K passes with Bi-TSI and the TSUMM-Suite task setup; HKR-H/R are weak. This is useful arXiv research, but niche time-series scope keeps it in the 60–71 band.
editor take
TimeOmni-VL tests Bi-TSI on 8 TSUMM tasks; without metrics, “near-lossless” is the bet to verify.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Curriculum-Adapted Robust Reinforcement Learning for UAV Deconfliction in Adversarial Environments
The paper proposes a curriculum-guided robust RL framework for UAV deconfliction that increases adversarial observation perturbation intensity and aligns TD-error distributions across stages. In fixed GNSS spoofing tests, the adapted policy reached near-perfect mission success, while standard and robust RL baselines achieved 20-56%.
#Robotics#Reasoning#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the paper has a concrete adversarial UAV hook and measured baselines. It stays in the 60-71 band because the topic is specialized and lacks product, open-source, or major-lab relevance.
editor take
Curriculum robust RL nears perfect success under fixed GNSS spoofing; 20-56% baselines are weak, so inspect the TD-distance metric.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Low-Frequency Shortcuts in Texture-Driven Visual Learning
The paper analyzes shortcut learning in texture-driven visual domains and finds that models rely on a few low-frequency components; pruning those components raises ID accuracy by up to 8% and improves robustness to low-frequency corruptions by up to 40%.
#Vision#Benchmarking#Research release
why featured
HKR-H and HKR-K pass: the title has a counterintuitive shortcut-learning hook and the summary gives 8% and 40% results. HKR-R is weak, so this stays in the 60–71 research-signal band.
editor take
Pruning low-frequency components lifts ID accuracy by 8%; texture-heavy vision models are overusing the wrong spectrum.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
How Visible Are Silent Manipulation Failures? Observability Study of False-Success Detection in Simulated Robot Episodes
The paper tests false-success detection on 2 simulated bimanual ALOHA tasks, keeping only episodes the robot marked successful and relabeling them with privileged simulator state. Cube transfer failures are almost fully recoverable from joint data, while peg insertion needs vision to close most of the gap; the authors say proprioceptive separability depends on velocity differences below realistic sensor noise, making the result an optimistic simulator upper bound.
#Robotics#Vision#Benchmarking#ALOHA
why featured
HKR-H and HKR-K pass: the hook is silent robot failure detection, and the summary gives testable results across two ALOHA tasks. The scope is narrow and simulation-heavy, so HKR-R is weak and the item stays in all.
editor take
Two simulated ALOHA tasks expose false-success detection limits; I’d treat noiseless proprioception gains as benchmark inflation.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Multi-component Causal Tracing in Large Language Models
The paper proposes a multi-component causal tracing framework for LLMs, intervening on attention heads and MLP neurons together, using soft interventions and metric transformation to convert combinatorial component selection into constrained continuous optimization.
#Interpretability#Reasoning#Research release#Open source
why featured
HKR-K/R pass: the paper offers a concrete multi-component causal tracing mechanism for interpretability and safety debugging. HKR-H is weak, and no metrics, artifact details, or lab authority are disclosed.
editor take
The paper traces attention heads and MLP neurons jointly. No models or benchmarks disclosed; I don't buy the baseline win yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning
The paper analyzes 15 calibration sources for high-sparsity LLM pruning and finds calibration perplexity correlates positively with General retention at ρ=+0.71, but negatively with Math and Code retention at ρ=-0.53 and -0.59; on LLaMA-3.1-8B with SparseGPT 60% sparsity, a uniform multi-source mix reaches 58.8% total retention.
#Inference-opt#Code#Benchmarking#Research release
why featured
HKR-K and HKR-R pass: 15 calibration sources and ρ=+0.71 give a testable pruning claim tied to capability retention. HKR-H is weak, and the topic is narrow implementation research, so it stays in all.
editor take
15 calibration sources show opposite correlations; for 60% SparseGPT pruning, source mixing beats MetaMath by 8.8 points.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Geometry-Aware Tabular Diffusion
GATD adds pairwise angles and lengths from column value differences to tabular diffusion denoisers, achieving 8/10 Shape wins, 7/10 Trend wins, and 9/10 downstream utility wins across ten datasets.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes: the mechanism and 10-dataset results are concrete enough for synthetic-tabular-data practitioners. HKR-H and HKR-R are weak, so this stays in the all tier as a niche research release.
editor take
GATD wins utility on 9/10 tabular datasets; I buy the claim because ablations pin gains on geometry supervision.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models
NAtS-L selects Gated DeltaNet linear attention or softmax attention per token within the same layer, targeting the quadratic-complexity bottleneck of long-context transformers while preserving tokens needed for long-term retrieval; the abstract does not disclose benchmark numbers, training scale, or exact latency gains.
#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the routing mechanism is concrete and long-context cost matters. HKR-H is weak, and benchmarks, code, and latency numbers are not disclosed, so this stays in all.
editor take
NAtS-L switches Gated DeltaNet/softmax per token. No scores or latency disclosed; I don’t buy “efficient” yet.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Distribution-Calibrated Inference-Time Compute for Thinking LLM-as-a-Judge
The paper proposes a distribution-calibrated aggregation scheme for LLM-as-a-Judge, using n independent thinking-rating samples per item and a Bradley-Terry-Davidson count model that combines polarity with the non-tie rate for three-way preferences.
#Reasoning#Benchmarking#Inference-opt#Research release
why featured
HKR-K and HKR-R pass: the paper gives a concrete aggregation mechanism for LLM-as-judge reliability. No lab backing, benchmark gains, or click hook are disclosed, so it stays mid-band.
editor take
The paper uses n independent judge samples; without benchmark deltas disclosed here, “beats individual humans” is not a free pass.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
CADFit: Precise Mesh-to-CAD Program Generation with Hybrid Optimization
CADFit reconstructs editable CAD construction sequences from meshes using IoU-driven hybrid optimization over structured programs. It supports extrusions, revolutions, fillets, and chamfers; the abstract says it beats prior mesh-to-CAD methods on multiple benchmarks but does not disclose exact scores.
#Multimodal#Vision#Code#CADFit
why featured
HKR-H and HKR-K pass: mesh-to-editable-CAD is a concrete hook, and the mechanism lists IoU optimization plus CAD operations. HKR-R is weak; scores are not disclosed, so this stays in all.
editor take
CADFit supports 4 CAD operations, but no scores are disclosed; I don’t buy the SOTA claim before Invalid Ratio lands.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Learning Unmasking Policies for Diffusion Language Models
The paper trains unmasking policies for diffusion language models with reinforcement learning, using a single-layer transformer that maps token confidences to decisions. Experiments show parity with state-of-the-art heuristics in semi-autoregressive block generation and better results in full-diffusion sampling.
#Inference-opt#Reasoning#Research release#Benchmark
why featured
HKR-K passes because the paper adds a concrete training mechanism for diffusion LM unmasking. HKR-H and HKR-R are weak; the post lacks benchmark numbers, model scale, and reproducible conditions, so it fits all rather than featured.
editor take
A single-layer transformer learns unmasking and beats heuristics in full diffusion; hand-tuned thresholds look tired for dLLM inference.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Self-Soupervision: Cooking Model Soups without Labels
Self-Soupervision extends model soups to self-supervised learning, using unlabeled data and mixed SSL ingredients such as MAE, MoCoV3, MMCR, and LeJEPA, and reports robustness gains of 3.5% on ImageNet-C and 7% on LAION-C.
#Fine-tuning#Vision#Benchmarking#arXiv
why featured
HKR-K is solid with two reported robustness gains, and HKR-H has a niche tuning hook. This remains an arXiv training-method paper with no code, setup detail, or product impact disclosed, so it stays in all.
editor take
Self-Soupervision gains 3.5% on ImageNet-C and 7% on LAION-C; wild part: MAE, MoCoV3, MMCR, LeJEPA all mix.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction
MAVEN-T trains a compact trajectory-prediction student with heterogeneous distillation and PPO rewards for collision avoidance, comfort, and progress, reporting 6.2× parameter compression, 3.7× inference acceleration, and 14.6 ms latency on an NVIDIA Jetson AGX Orin across five driving datasets.
#Robotics#Inference-opt#Fine-tuning#NVIDIA
why featured
HKR-K and HKR-R pass: 14.6ms latency and 3.7× speedup on Jetson AGX Orin are concrete. The topic is narrow trajectory-prediction research, so it stays in the interesting band.
editor take
MAVEN-T hits 14.6 ms on Jetson Orin; I trust the 6.2× compression more than PPO fixing teacher bias.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Human-Like Goalkeeping in a Realistic Football Simulation: A Sample-Efficient Reinforcement Learning Approach
The paper proposes a sample-efficient DRL method for goalkeeper agents in EA SPORTS FC 25, where its agent achieved a 10% higher ball-saving rate than the built-in AI, while ablations showed 50% faster training than standard DRL methods.
#Robotics#Benchmarking#EA SPORTS FC 25#Research release
why featured
HKR-H and HKR-K pass: a football-game goalkeeper beats built-in AI, with 10% save-rate and 50% training-speed figures. HKR-R is weak because this RL game paper is far from model or product news, so it sits in the 60-71 band.
editor take
EA SPORTS FC 25’s DRL goalkeeper saves 10% more; the 50% faster training via pre-collected data makes it production-plausible.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals
The paper proposes a pre-fusion calibration module for language, audio, and visual streams, evaluated on five benchmarks covering sentiment understanding, action recognition, audio-visual event detection, and audio-visual emotion classification. The module compares modalities at the summary level, generates instance-wise and dimension-wise modulation for original modality features, and plugs into different fusion backbones without changing prediction heads.
#Multimodal#Audio#Vision#Research release
why featured
HKR-H and HKR-K pass, but this is a single arXiv methods paper with no production replacement, code artifact, or broad industry spillover. It fits the 60–71 research-signal band, so tier all.
editor take
The paper tests pre-fusion calibration on 5 multimodal benchmarks; no gains table disclosed, so I’d treat it as a noise-control plug-in.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Effect of Demographic Bias on Skin Lesion Classification
The study uses linear programming to build controlled demographic datasets and evaluates three ResNet-based skin lesion classification strategies, finding that sex bias mainly comes from data imbalance while age bias consistently favors younger groups across training distributions.
#Vision#Benchmarking#Alignment#arXiv
why featured
Single arXiv medical-imaging fairness paper. HKR-K/R pass: it gives an LP dataset-control method and concrete sex/age bias results; HKR-H fails, and no product or industry adoption signal keeps it in all.
editor take
Linear-programmed splits across 3 ResNet setups make the age result sting: sex bias tracks imbalance, age bias survives distribution fixes.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection
Alejandro Ascarate and four coauthors show that within-dataset class-split anomaly detection becomes ill-posed when the held-out anomaly class overlaps the normal mixture in representation space, with scores collapsing toward chance or inverting; they introduce a training-free neighborhood class leakage diagnostic and test it on Fashion-MNIST, CIFAR-10, and Imagenette.
#Benchmarking#Alejandro Ascarate#Leo Lebrat#Rodrigo Santa Cruz
why featured
HKR-H and HKR-K pass: the paper claims class-split anomaly tests can reverse score direction and proposes a no-training leakage diagnostic across 3 datasets. HKR-R is weak because the scope is niche ML evaluation, so it stays all.
editor take
Ascarate et al. show score inversion on 3 datasets; single-AUROC class-split AD papers now smell like geometry leakage.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
QuITE: Query-Based Irregular Time Series Embedding
QuITE uses learnable query tokens and one self-attention layer to aggregate irregular observations, producing backbone-compatible representations without interpolation and reporting average relative gains up to 54.7% for forecasting and 15.8% for classification across real-world benchmarks.
#Embedding#Benchmarking#Research release#Open source
why featured
Only HKR-K lands: the mechanism and benchmark numbers are concrete, but irregular time-series embedding is niche research with a low-click title, so it stays in the 60 band.
editor take
QuITE reports +54.7% forecasting with one attention-layer embedding; the smart bet is fixing IMTS before the backbone.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
CL-DMDF: Dynamic Multimodal Data Fusion Model Based on Contrastive Learning
The paper proposes CL-DMDF for multimodal fusion with uncertain or missing modalities, using feature- and modality-level attention, an entity-centroid contrastive learning module, and adaptive fusion, with experiments reported on 3 datasets; the RSS snippet does not disclose dataset names or exact metrics.
#Multimodal#Research release
why featured
HKR-K passes: the paper gives concrete mechanisms for missing-modality fusion and tests on 3 datasets. HKR-H and HKR-R are weak because the title is academic and lacks product, open-source, or performance numbers.
editor take
CL-DMDF reports 3 datasets; names and metrics are missing, so don’t buy the missing-modality claim yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data
DAD4TS uses a diffusion model and reinforcement learning to generate augmented time-series samples for small-scale forecasting, and the paper evaluates it against seven comparative methods across six real-world datasets and eight time-series models, with effectiveness validated on five datasets.
#Fine-tuning#Reasoning#DAD4TS#Research release
why featured
HKR-K and HKR-R pass: the mechanism and evaluation setup are concrete, and small-data forecasting is a real practitioner pain. The topic remains a niche time-series research paper, not a product or foundation-model update.
editor take
DAD4TS worked on 5 of 6 real datasets; small-data time-series augmentation gets evidence, but the RL controller needs ablation.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Causal Neural Probabilistic Circuits
The paper proposes CNPC, combining a neural attribute predictor with a causal probabilistic circuit compiled from a causal graph, and evaluates it on five benchmark datasets in in-distribution and out-of-distribution settings against five baseline models.
#Interpretability#Reasoning#Benchmarking#Research release
why featured
HKR-K passes: the post gives a concrete mechanism and benchmark setup. HKR-H and HKR-R are weak; this is specialized ML research, not a broader practitioner story, so it stays in the 60–71 band.
editor take
CNPC beats five baselines on five datasets; I buy causal circuits for CBMs, but graph quality is the fragile part.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
ASymPO: Asymmetric-Scale Policy Optimization for Asynchronous LLM Post-Training Without Behavior Information
The paper proposes ASymPO, which normalizes each response’s token loss by the current average token negative log-probability, so asynchronous mathematical reasoning post-training can use current-policy probabilities without behavior-policy probabilities, importance ratios, or clipping.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K passes: ASymPO gives a concrete loss-normalization mechanism for asynchronous math-reasoning post-training. HKR-H/R are weak, so this stays in the 60s as a niche research release.
editor take
ASymPO normalizes token loss by current average NLL; no metrics shown here, but dropping behavior logprobs is a serious cut.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals
The paper proposes a shared wavelet token schema using a one-level Haar DWT/IDWT frontend, and reports 39.92 dB audio, 29.37 dB image, and 23.93 dB video PSNR on Speech Commands, EuroSAT RGB, and DAVIS 2017.
#Multimodal#Audio#Vision#Research release
why featured
HKR-H and HKR-K pass: the hook is wavelets as tokenizers, and the post gives Haar DWT/IDWT plus three PSNR numbers. HKR-R is weak; this is preliminary arXiv work without model, product, or workflow impact.
editor take
Haar DWT shares one schema across audio, images, video; the wild part is 50% dense video tokens hitting 34.45dB.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models
The paper analyzes graph-token behavior in representative Graph Language Models and finds graph sink tokens show large activations on a small set of hidden-state dimensions, with a bias toward early graph-token positions. Pruning, repositioning, and swapping interventions show these sinks are not the most important semantic or structural tokens for downstream prediction.
#Interpretability#Reasoning#Research release
why featured
HKR-K passes via concrete activation patterns and three interventions. HKR-H/R are weak because graph-language-model interpretability is narrow, so this is useful research signal but below featured threshold.
editor take
GLM graph sinks spike on few hidden dimensions; activation saliency is a bad proxy for topology use.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
CoMPAS3D: A Dataset and Benchmark for Interactive Motion
CoMPAS3D provides 3 hours of improvised partner salsa motion capture from 18 dancers, with over 2,800 expert-annotated segments, and defines benchmarks for move classification, proficiency estimation, and follower generation under objective and subjective evaluation metrics.
#Robotics#Multimodal#Benchmarking#CoMPAS3D
why featured
HKR-K passes with concrete dataset scale and benchmark tasks. HKR-H/R are weak: this is a niche motion-generation benchmark, not a broad practitioner conversation, so it stays in the 60–71 band.
editor take
CoMPAS3D ships 3 hours, 18 dancers, 2,800 labels; salsa exposes interaction failures FID and beat alignment politely ignore.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
High-Precision APT Malware Attribution with Out-of-Scope Resilience
The paper presents ranked binary classifiers with explicit abstention for APT malware attribution; in the hardest setting, where 87% of test samples came from 60 APT groups excluded from training, the method abstained on 94% of out-of-scope samples and maintained 92% precision and 95% selective accuracy on classified samples.
#Benchmarking#Safety#Research release#Benchmark
why featured
HKR-K is strong and HKR-R lands on security reliability, but this is a niche APT-attribution paper with no product or general AI workflow impact, so it stays in the lower band.
editor take
The method abstains on 94% out-of-scope cases with 87% OOD tests; for APT attribution, refusing to guess is the feature.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Spike-Aware C++ INT8 Inference for Sparse Spiking Language Models on Commodity CPUs
The paper runs SymbolicLight V1 with a C++ INT8 CPU runtime on an AMD Ryzen 7 5800X, reaching 22.63 tokens/s single-thread decoding for the 874M-parameter export, while reporting WikiText-2 perplexity of 24.80 and leaving measured CPU energy as undisclosed.
#Inference-opt#SymbolicLight#TinyLlama#Qwen
why featured
HKR-H comes from the odd pairing of spiking LMs and commodity CPUs; HKR-K has reproducible hardware and speed/perplexity numbers. The low-level inference angle narrows the audience, so it stays in 60–71.
editor take
SymbolicLight 874M hits 22.63 tok/s single-thread, but PPL is 24.80; sparse CPU inference works, quality still bites.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Social Caption: Evaluating Social Understanding in Multimodal Models
The paper introduces SOCIAL CAPTION, a framework that evaluates MLLM social understanding across three dimensions: Social Inference, Holistic Social Analysis, and Directed Social Analysis, while analyzing how scale, architecture, and spoken context affect performance; the RSS abstract does not disclose dataset size, model list, or benchmark scores.
#Multimodal#Vision#Benchmarking#Research release
why featured
HKR-K passes because the paper introduces a named benchmark and concrete evaluation variables. HKR-H/R miss: the abstract gives no surprising result, ranking, deployment impact, or practitioner-pressure hook.
editor take
SOCIAL CAPTION discloses 3 axes only; no model list or scores, so don’t trust the social-understanding benchmark yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data
DECA partitions LLM parameters into disjoint blocks and runs sequential block-wise Adam for decentralized full-parameter fine-tuning on non-IID client data without a central server; the abstract claims faster convergence, stronger downstream performance, and resource efficiency, but the RSS snippet does not disclose concrete memory, communication, or benchmark numbers.
#Fine-tuning#Research release
why featured
HKR-K/R pass: the mechanism is relevant to full-parameter LLM tuning, but no memory, communication, or gain numbers are disclosed. The academic optimizer framing keeps it in the interesting band.
editor take
DECA uses serverless block-wise Adam; RSS gives no memory or communication numbers, so don’t buy the FPFT efficiency claim yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks
BYORn identifies semantically misaligned responses during supervised fine-tuning and replaces them with model-generated alternatives to break the trigger-target correlation in vision-language backdoor attacks. The abstract does not disclose datasets, attack success rates, or model sizes.
#Multimodal#Vision#Fine-tuning#BYORn
why featured
Single arXiv safety paper with a concrete defense mechanism, but no datasets, attack-success rates, or model scale disclosed. HKR-K/R pass while HKR-H is weak, so it stays all.
editor take
BYORn swaps misaligned SFT targets with self-generated replies; no ASR, datasets, or model sizes disclosed, so the frontier claim is thin.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
When Model Merging Breaks Routing: Training-Free Calibration for MoE
arXiv:2606.03391 introduces HARC, a training-free calibration method that uses second-order curvature information to realign merged MoE routers and solves the closed-form objective with matrix-free conjugate gradient; experiments cover mathematical reasoning and code generation, but the snippet does not disclose exact scores.
#Reasoning#Code#Inference-opt#Research release
why featured
HKR-H/K pass: the title names a MoE routing failure, and the summary gives a concrete calibration mechanism. Single arXiv paper, no reported scores, code link, or production gain, so it stays in all.
editor take
HARC calibrates merged MoE routers with second-order curvature, but no scores are disclosed; I buy routing breakdown, not “substantial” gains.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Re-Evaluating Continual Learning with Few-Shot Adaptation
The paper replaces standard 0-shot forgetting evaluation in continual learning with few-shot assessment and tests it on continual image classification task sequences, introducing a per-shot plasticity metric to measure adaptation across shots.
#Fine-tuning#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a concrete evaluation change and metric, but result numbers are not disclosed and HKR-H/R are weak. This is useful niche research, so it stays in the lower interesting band.
editor take
This paper swaps 0-shot forgetting for few-shot evaluation; I buy it, continual learning has overfit to perfect-recall scoring.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
IdEst: Assessing Self-Supervised Learning Representations via Intrinsic Dimension
IdEst estimates the intrinsic dimension of self-supervised representations with a Minimum Spanning Tree dimension estimator, and the paper reports strong correlation with downstream linear probe performance across multiple datasets, architectures, and SSL pretraining objectives.
#Benchmarking#Research release#Benchmark
why featured
HKR-K passes with a testable representation-evaluation mechanism, but HKR-H and HKR-R are weak. The post gives no correlation numbers, cost savings, or production replacement evidence, so it stays in all.
editor take
IdEst uses MST intrinsic dimension for SSL reps; correlation and compute savings are undisclosed, so don’t retire linear probes yet.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Physics-Guided Policy Optimization with Self-Distillation
PGPO modulates policy-optimization step size using a mutual-information estimate between student predictions and a feedback-conditioned teacher, and on Science-QA it outperforms SDPO in 3 of 4 domains with gains up to 4.5 points while staying stable where SDPO collapses late in training.
#Fine-tuning#Alignment#Reasoning#Research release
why featured
HKR-K passes with a concrete mechanism and Science-QA numbers. HKR-H/R are weak: this is a single arXiv post-training method without code, scale, or production-replacement evidence, so it stays in all.
editor take
PGPO beats SDPO on 3/4 Science-QA domains, up to +4.5 points; ignore the physics gloss, MI-gated step size is the payload.
HKR breakdown
hook knowledge resonance
open source
62
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
CoralBay: A Self-Supervised CT Foundation Model
CoralBay extends DINO with a hierarchical 3D Swin backbone and self-distillation over concatenated multi-scale features for CT representation learning; the paper also adds a public reproducible 3D radiology leaderboard to the open-source eva framework, while the RSS abstract does not disclose dataset counts or metric values.
#Vision#Benchmarking#CoralBay#DINO
why featured
HKR-K passes via the training mechanism and reproducible leaderboard, while HKR-H and HKR-R are weak. A CT foundation-model paper has research value, but its audience is narrow, so it stays in the lower interesting band.
editor take
CoralBay extends DINO with 3D Swin; RSS lacks dataset counts and metrics, so the leaderboard deserves replication first.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
A Robust and Explainable Transformer-Based Framework for Phishing Email Detection
The paper proposes a DistilBERT-based phishing email detection framework with Fast Gradient Method adversarial training and stochastic character-level perturbations. It integrates LIME, SHAP, and Integrated Gradients, then uses Flan-T5-Small with a rule-based prompt to generate evidence-based explanations.
#Safety#Interpretability#Benchmarking#Research release
why featured
HKR-K comes from concrete robustness and explanation mechanisms, and HKR-R from phishing defense and compliance needs. No metrics, dataset results, or artifact are disclosed, so this stays a narrow research signal.
editor take
DistilBERT gets FGM, char noise, and three XAI tools; no dataset or metrics in the abstract, so trust the explanation layer lightly.
HKR breakdown
hook knowledge resonance
open source
60
SCORE
H0·K1·R1
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data
The paper introduces FGRPO, a federated GRPO framework that decentralizes reasoning-model fine-tuning across heterogeneous data owners and uses adaptive aggregation based on relative performance gain; the abstract does not disclose benchmark numbers, client counts, or privacy mechanism details.
#Reasoning#Fine-tuning#Alignment#Research release
why featured
HKR-K passes: FGRPO adds federated GRPO and relative-performance-gain aggregation. HKR-H/R are weak; no metrics, code, or production claim is disclosed, so it stays below featured.
editor take
FGRPO aggregates federated GRPO by relative gain, but no clients, privacy mechanism, or benchmarks are disclosed; I don’t buy the privacy claim yet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation
KeyVT selects 3D question-answering context at both view and token levels, using pixel features, camera parameters, and optimal transport, and the paper reports evaluation on three benchmarks with gains over existing tuning-free methods.
#Vision#Multimodal#Reasoning#KeyVT
why featured
HKR-K passes via a concrete mechanism and 3-benchmark claim. HKR-H/R are weak: this is niche 3D QA research, and the post does not disclose margins, code, or reproduction details.
editor take
KeyVT beats tuning-free baselines on 3 benchmarks; 3D QA is still context-budget bound, and OT token pruning is a practical lever.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Grounding Functional Similarity by Invariance-Aware Model Stitching
The paper introduces invariance-aware model stitching with a forward-backward compatibility requirement, arguing that standard stitching can mislabel independently trained models as functionally similar when their representations align despite using different information cues.
#Benchmarking#Interpretability#Research release
why featured
HKR-K passes on a concrete mechanism for model-stitching evaluation. HKR-H and HKR-R miss: the angle is narrow and academic, so this stays in the lower research-news band.
editor take
This pins model-stitching false similarity on invariance blindness; experiments aren’t disclosed, but the forward-backward test is the right cut.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Easy-to-Use Shielding for Reinforcement Learning
The paper introduces tempestpy, a Python library that connects Tempest-based shield synthesis to the Gymnasium API, and adds MiniGridSafe for safety-oriented RL scenarios; the RSS abstract says shielded and unshielded RL are evaluated across multiple environments, but it does not disclose environment counts or scores.
#Agent#Safety#Tools#Tempest
why featured
HKR-K passes: the paper names tempestpy and a Gymnasium integration as a testable mechanism. HKR-H/R are weak; environment counts, benchmark scores, and deployment path are not disclosed.
editor take
tempestpy plugs Tempest shields into Gymnasium; counts and scores are undisclosed, so I buy tooling, not safety claims.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery
This position paper proposes standards for mechanistic ML and argues that, in high-dimensional proxy regimes, many incompatible mechanisms can induce the same observational relationships, so predictive success and fluent LLM explanations do not provide sufficient evidence for mechanism discovery.
#Reasoning#Interpretability#Safety#Research release
why featured
HKR-H and HKR-K pass, but this is an arXiv position paper with methodology claims only and no disclosed experiment numbers or product impact. Lower-band research commentary, not featured.
editor take
The paper says LLMs collapse many valid mechanisms into one story; I buy the warning—high predictive scores are not discovery.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H1·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
The paper proposes a RAG-enhanced LLM recommender framework for CTR prediction, combining GCN-based retrieval with a multi-head early-exit architecture. The abstract says inference stops dynamically using real-time confidence across multiple heads, but the post does not disclose concrete latency, accuracy, or compute-saving numbers.
#RAG#Inference-opt#Research release
why featured
HKR-K passes for the GCN retrieval plus multi-head early-exit mechanism. HKR-H and HKR-R miss: no result numbers, narrow recommender context, and no practitioner debate hook, so this stays in the lower all band.
editor take
The abstract gives GCN retrieval plus multi-head early exit, but no latency, AUC, or compute savings; CTR claims need numbers.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Attribution via Distributional Paths for Information Revelation
The paper introduces Reveal-IG, which moves path attribution from input-space trajectories to structured probe distributions, preserves completeness for expected model response, and reports more stable signed attributions across ImageNet classification and tabular regression, while the abstract does not disclose exact metric values.
#Interpretability#Vision#Reveal-IG#ImageNet
why featured
HKR-K passes with a new attribution mechanism and two test settings. HKR-H/R are weak; this is a narrow interpretability-method paper, so it stays below featured.
editor take
Reveal-IG keeps completeness for expected response; no metric values in the abstract, so I’d file it as an IG path-artifact fix.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
AugMask: Training Diffusion Models on Incomplete Tabular Data via Stochastic Augmentation and Masking
AugMask separates conditional stochastic augmentation from denoising supervision on observed coordinates, so missing entries act as uncertain conditioning context rather than targets; the abstract says standard diffusion-based tabular generators outperform specialized missing-aware baselines across multiple datasets and missingness regimes, but it does not disclose dataset names or exact scores.
#Fine-tuning#Inference-opt#AugMask#arXiv
why featured
HKR-K passes via a concrete mechanism and cross-dataset performance claim. HKR-H/R are weak because the angle is technical and niche; no hard exclusion, so it lands in the 40-59 research-release band.
editor take
AugMask trains only observed coordinates; datasets and scores are undisclosed, so don’t buy the tabular-diffusion win yet.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Laplacian Representations for Decision-Time Planning
The paper introduces ALPS, a hierarchical planning algorithm that uses Laplacian representations to capture state-space distances across multiple time scales, and reports better results than commonly used baselines on selected offline goal-conditioned RL tasks from OGBench.
#Reasoning#Benchmarking#OGBench#Research release
why featured
HKR-K passes: it names a new algorithmic mechanism and OGBench test setting. HKR-H/R are weak, and the post gives no effect sizes, authorship signal, or artifact, so it stays in all.
editor take
ALPS beats common baselines on selected OGBench offline goal-RL tasks; RSS gives no task count or margin.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing
Oleg Miroshnichenko proposes the HITL-GB framework for short-term rental dynamic pricing, where a contextual bandit recommends prices and a human accepts, modifies, or rejects them, validating historical warm-up on 1,461 nightly pricing episodes from 2 rooms between April 2022 and April 2026 and reducing HF-TS cold start from about 150 episodes to about 30.
#Agent#Oleg Miroshnichenko#Research release
why featured
HKR-K passes with concrete sample size and cold-start reduction, making it a narrow methods reference. HKR-H/R miss because short-term rental pricing is too niche and lacks model, tool, or platform impact.
editor take
HITL-GB cuts HF-TS cold start to 30 episodes on 1,461 nights; the 2-room base makes clinical-credit claims too loud.
HKR breakdown
hook knowledge resonance
open source
58
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
AnchorMoE: Interpretable Time Series Classification via Anchor-Routed Mixture of Experts
The paper proposes AnchorMoE, an MoE-based time-series classifier that routes local patches to specialized experts and expresses each prediction as an exact additive decomposition over input segments.
#Interpretability#AnchorMoE#Research release
why featured
HKR-K passes for the anchor-routed MoE and additive attribution mechanism, but HKR-H and HKR-R are weak. With no reported metrics or practical replacement claim, this stays in the lower all band.
editor take
AnchorMoE decomposes each prediction into patch-level additive terms; no benchmark numbers disclosed, so the safety pitch is premature.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Fast and Expressive Multi-Byte Prediction with Probabilistic Circuits
Andreas Grivas and eight coauthors propose MTPC, a probabilistic-circuit framework for modeling joint distributions over future bytes, and test it by retrofitting EvaByte and byte-fied Llama3.2 3B with speculative decoding.
#Inference-opt#Andreas Grivas#EvaByte#Llama3.2 3B
why featured
HKR-K passes: MTPC’s mechanism and test targets are concrete for decoding-optimization watchers. HKR-H and HKR-R are weak; no speed gains, open artifact, or production-replacement claim are disclosed.
editor take
MTPC retrofits EvaByte and Llama3.2 3B for multi-byte prediction; nice abstraction, but speedup numbers aren't disclosed here.
HKR breakdown
hook knowledge resonance
open source
56
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
ParaBlock: Communication-Computation Parallel Block Coordinate Federated Learning for Large Language Models
The paper proposes ParaBlock, which uses two parallel threads for communication and computation in federated block-coordinate LLM fine-tuning; the authors prove the same convergence rate as standard federated block coordinate descent and evaluate it on general instruction following and mathematical reasoning tasks.
#Fine-tuning#Inference-opt#Reasoning#ParaBlock
why featured
HKR-K passes with a concrete mechanism and test settings. HKR-H/R are weak: this is a federated-optimization paper with a high practitioner threshold, but it does not trigger hard exclusion.
editor take
ParaBlock overlaps communication and compute with 2 threads; convergence is claimed intact, but latency gains lack numbers here.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Distill-then-Replace: Efficient Task-Specific Hybrid Attention Model Construction
The paper proposes DtR, which transfers pretrained full-attention weights to linear-attention counterparts via blockwise local distillation, then greedily replaces full-attention layers while monitoring target-task validation performance in a single pass without retraining or neural architecture search.
#Inference-opt#Fine-tuning#Research release
why featured
HKR-K passes because the summary discloses DtR’s two-step construction. HKR-H/R are weak, with no speed, accuracy, model scale, or dataset details, so this stays a narrow model-efficiency paper.
editor take
DtR builds hybrid attention models in one greedy pass. No speed numbers disclosed; I don't buy “efficient” without them.
HKR breakdown
hook knowledge resonance
open source
55
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
COD10K-C: Benchmarking Robustness of Camouflaged Object Detection Under Natural Image Corruptions
COD10K-C builds a robustness benchmark from COD10K with 8 corruption types, 5 severity levels, 40 conditions, and 81,040 evaluation pairs; RobustCODLite retains 92.3% of its clean Dice score under corruption, versus 87.7% for SINet-v2, 84.8% for ZoomNet, and 84.1% for PFNet.
#Vision#Benchmarking#COD10K-C#SINet-v2
why featured
HKR-K passes on concrete benchmark size and RobustCODLite retention. HKR-H/R miss: this is niche camouflaged-object robustness research with no product, cost, safety, or competitive angle, so it stays in all.
editor take
COD10K-C adds 8 corruption types and 81,040 pairs; camouflaged detection is finally paying its real-camera debt.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
What Do Students Learn? A Feature-Level Analysis of Dark Knowledge
The paper analyzes knowledge distillation with the Interaction Tensor framework and proposes teacher-free Confusion Distillation, which uses evolving confusion patterns as soft targets and beats CS-KD and PS-KD by 1.2% on CIFAR-100 with ResNet-34 and ResNet-50.
#Fine-tuning#Benchmarking#arXiv#ResNet
why featured
HKR-K passes with a named mechanism and testable number. HKR-H/R are weak because the impact stays inside CIFAR-100 and ResNet-34/50 distillation experiments, so this fits the lower all band.
editor take
Confusion Distillation gains 1.2% on CIFAR-100, but only ResNet-34/50; I’d treat this as distillation-regularization evidence.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance
FinStressTS provides 30 diagnostic environments across six financial mechanisms and benchmarks 15 time-series models with NMAE for point forecasting and CRPS for probabilistic forecasting.
#Benchmarking#FinStressTS#Research release#Benchmark
why featured
HKR-K passes on concrete benchmark scope and tested models. HKR-H/R are weak, and finance time-series forecasting is a vertical research topic with limited spillover for general AI practitioners.
editor take
FinStressTS tests 15 models in 30 settings; Transformers lose to HAR/VAR on volatility, tails, and jumps, so keep the boring baselines.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching
The paper introduces TiWeaver, a unified multivariate time-series forecasting framework that uses G²AT for adaptive contextual patching and FADE for fine-grained asynchronous inter-channel dependencies, reporting state-of-the-art results on 12 real-world datasets with up to 25% improvement over existing methods.
#Benchmarking#TiWeaver#Research release#Benchmark
why featured
HKR-K passes on concrete mechanisms and a 25% benchmark claim. The story is a niche time-series modeling paper with no product, open-source tool, or adoption angle, so it stays in the low-value research band.
editor take
TiWeaver claims up to 25% on 12 datasets; I’d check ablations first—G²AT/FADE matter only if gains survive beyond tail cases.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Learn When and Where to Connect: Adaptive Virtual Nodes for Dynamic Message Passing on Graphs
MAVN selects needed virtual nodes from a candidate pool at each layer, connects each chosen VN to a nonempty node subset, and improves backbone MPNNs by up to 46.5% across nine real-world datasets.
#Reasoning#arXiv#MAVN#Research release
why featured
HKR-K passes with a concrete mechanism and 46.5% result; HKR-H/R fail because this is a narrow graph-ML paper. No hard exclusion, but it stays in the 40–59 low-value band.
editor take
MAVN reports up to 46.5% gains on 9 graph datasets; adaptive virtual nodes make old-school MPNNs look under-tuned.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
PSViT: Structured Pruning Method for Spiking Vision Transformers
PSViT compresses Spiking Vision Transformers with channel-wise structured pruning and reports 22.4% memory savings from single-shot pruning on ImageNet-1K, with accuracy dropping from 73.3% to 70.3% without fine-tuning and reaching 72.8% after fine-tuning.
#Vision#Inference-opt#PSViT#SViT
why featured
HKR-K passes with a concrete pruning mechanism and ImageNet-1K metrics. HKR-H/R are weak because this is a narrow model-compression paper with limited general-practitioner pull.
editor take
PSViT saves 22.4% memory in one prune; 73.3% to 72.8% after tuning makes structured pruning the deployable SViT bet.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Annot-Mix: Learning with Noisy Class Labels from Multiple Annotators via a Mixup Extension
Annot-Mix extends mixup to handle multiple class labels per instance while tracking which annotator produced each label, and it outperforms 11 mostly state-of-the-art methods on 11 datasets with noisy labels from human or simulated annotators.
#Fine-tuning#Benchmarking#Research release#Open source
why featured
HKR-K passes via a concrete method and 11-by-11 evaluation claim. HKR-H and HKR-R fail; this is a niche supervised-learning paper with no product, agent, or industry consequence, so it stays in the 40–59 band.
editor take
Annot-Mix beats 11 methods on 11 noisy-label datasets; treating annotator identity as signal is cleaner than flattening workers into vote noise.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Towards Fair Graph Prompting: A Dual-Prompt Mechanism for Mitigating Attribute and Structural Bias
Yuhan Yang and coauthors propose ADPrompt, a fairness-aware graph prompting framework with two modules for attribute prompts and layer-wise structure prompts, and evaluate it on four benchmark datasets against seven baselines for node classification.
#Fine-tuning#Alignment#Benchmarking#Yuhan Yang
why featured
HKR-K passes because the mechanism and evaluation setup are concrete. HKR-H and HKR-R are weak; fair graph prompting is narrow for general AI practitioners, so this stays in the lower research band.
editor take
ADPrompt splits fairness into 2 prompt modules; 4 datasets and 7 baselines are fine, but gains are undisclosed here.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games
The paper uses Atari-HEAD eye-tracking data to train six action-prediction network settings, and across 20 games, removing peripheral visual information reduces median prediction accuracy by 35.27-43.90%.
#Vision#Benchmarking#Atari-HEAD#Research release
why featured
HKR-K passes via concrete experimental setup and effect size; HKR-H/R are weak because this is a narrow academic vision/cognition result with no product, model release, or practitioner workflow hook.
editor take
Atari-HEAD drops 35.27-43.90% median action accuracy without peripheral vision; gaze-map-only imitation is too narrow.
HKR breakdown
hook knowledge resonance
open source
52
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
FlashbackCL: Mitigating Temporal Forgetting in Federated Learning
FlashbackCL improves Flashback by 6.9% to 10.0% on CIFAR-10 with 50 clients and three controlled temporal shift modes, and reduces temporal forgetting by up to 68%; a 5-variant ablation identifies Class-Balanced Reservoir Sampling replay as the critical component.
#Fine-tuning#Memory#Benchmarking#Research release
why featured
HKR-K passes on concrete benchmark conditions and gains; HKR-H/R fail because the topic is narrow and lacks product, agent, or foundation-model impact. This fits a low-value research brief, not featured.
editor take
FlashbackCL gains 6.9%-10.0% on 50-client CIFAR-10; CBRS replay looks like the payload, decayed counts like plumbing.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Speech emotion recognition using attention-based LSTM with residual connections
ResLSTM-SA achieves 0.6517 maximum UAR on RAVDESS under strict speaker-independent partitioning, with the ResLSTM-SA-h64 variant using only 46.8k trainable parameters and outperforming attention-LSTM baselines plus several reported CNN and CNN-LSTM systems.
#Audio#Benchmarking#RAVDESS#Research release
why featured
HKR-K passes via concrete UAR and parameter counts, but HKR-H and HKR-R fail: this is an incremental speech-emotion benchmark paper with no product, tooling, or adoption angle.
editor take
ResLSTM-SA hits 0.6517 UAR on RAVDESS; 46.8k params is neat, but one SER dataset can't sell deployment.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases
RelGT-AC evaluates autocomplete on 7 tasks across 3 RelBench v2 datasets, adding column masking, a unified head for classification and regression, and a TF-IDF text encoder; it beats the GraphSAGE baseline on all 3 regression autocomplete tasks and gains up to 10 AUROC points on text-heavy eligibility tasks.
#Reasoning#Embedding#Benchmarking#RelGT-AC
why featured
HKR-K passes: the paper provides RelBench v2 scope, column masking, and TF-IDF encoder details. HKR-H/R are weak because the topic is narrow database/GNN research, so it stays in all.
editor take
RelGT-AC runs 7 RelBench v2 tasks and wins via TF-IDF text columns; honestly, GraphSAGE is a soft target.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Optimizing Random Forest Tree Count with Plateau Search and Optuna Integration
The authors propose a triplet-based plateau-search algorithm that removes tree count from the TPE search space and uses relative OOB-score changes across three forest sizes to choose a near-minimal sufficient Random Forest size.
#Benchmarking#Optuna#Research release#Open source
why featured
HKR-K passes because the paper gives a concrete tuning mechanism. HKR-H and HKR-R are weak: classic random-forest sizing is narrow, and the feed text gives no measured gain.
editor take
Triplet OOB plateau search picks tree counts outside TPE; small idea, useful fix for Optuna's right-boundary bias.
HKR breakdown
hook knowledge resonance
open source
48
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Cooperation of Experts: Fusing Heterogeneous Information with Large Margin
CoE encodes multi-typed information into heterogeneous multiplex networks, uses domain-specific encoders to learn relational patterns in separate semantic spaces, and coordinates experts through a large-margin mechanism; the abstract says the code is available on GitHub, but the RSS snippet does not disclose benchmark counts or scores.
#Embedding#Benchmarking#CoE#Research release
why featured
HKR-K passes on mechanism and open code, but HKR-H/R fail. The arXiv abstract gives no benchmark count, effect size, or deployment use, so this stays low-value research signal.
editor take
CoE ships code, but RSS gives no benchmark count or scores; large-margin experts sound plausible, minus tables it’s still a claim.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Localized, High-resolution Geographic Representations with Slepian Functions
The paper proposes a geographic location encoder built from spherical Slepian functions and reports stronger results than baselines across five classification, regression, and image-augmented prediction tasks.
#Embedding#Benchmarking#Research release#Benchmark
why featured
HKR-K passes via a named mechanism and five-task claim. HKR-H/R are weak, and Slepian-function geospatial encoding is too specialized without product or agent implications.
editor take
Slepian geo-encodings beat baselines on 5 tasks; I buy the bias—local capacity fits real GIS better than uniform global features.
HKR breakdown
hook knowledge resonance
open source
46
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Lingo_Research_Group at SemEval-2026 Task 9: Evaluating Prompt Variants for Polarization Detection
Lingo_Research_Group tested 12 prompt variants with aya-101 and Gemma3-27B for SemEval-2026 Task 9, covering binary polarization detection, type classification, and manifestation identification, with official 22-language test macro F1 scores of 0.762, 0.587, and 0.444.
#Benchmarking#Lingo_Research_Group#SemEval#Gemma
why featured
HKR-K passes because the paper gives testable prompt counts, language coverage, and F1 scores. HKR-H/R are weak: this is a narrow SemEval system submission with little product or competitive signal for AI practitioners.
editor take
Gemma3-27B hits only 0.444 F1 on 22-language fine-grained labels; prompt tweaking runs out of road fast here.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
04:00
6d ago
arXiv · cs.LG· atomEN04:00 · 06·03
Privacy-Robust Incrementality Measurement for Advertising Systems under Signal Loss
The paper formulates privacy-constrained advertising incrementality measurement as a robust causal decision problem and tests it on 2.0M Criteo Uplift rows and 64K Hillstrom email rows, where clean conversion lifts are 0.00112 and 0.00495 respectively.
#Benchmarking#Criteo#Hillstrom#Research release
why featured
HKR-K passes with dataset sizes and lift numbers. HKR-H is weak and HKR-R is narrow: this is a niche ad causal-measurement paper, useful to a small slice of AI practitioners.
editor take
The paper tests 2.0M Criteo and 64K Hillstrom rows; finite-sample cases stay unresolved, so ads attribution precision looks fake.
HKR breakdown
hook knowledge resonance
open source
45
SCORE
H0·K1·R0
03:56
6d ago
HuggingFace Papers (takara mirror)· rssEN03:56 · 06·03
CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding
CleanCodec reframes audio tokenization as a selective information bottleneck and encodes speech at 12.5 tokens per second, improving speaker similarity and intelligibility over existing codecs while downstream text-to-speech and voice conversion evaluations show up to 17x faster inference.
#Audio#Inference-opt#CleanCodec#Research release
why featured
HKR-H/K/R all pass, but this is a narrow speech-codec paper for TTS and voice-conversion builders. The post does not disclose open-source code, model size, or product adoption, so it stays in the 60–71 band.
editor take
CleanCodec runs speech coding at 12.5 tokens/s; 17x speedup is spicy, but baselines and noise conditions are undisclosed.
HKR breakdown
hook knowledge resonance
open source
70
SCORE
H1·K1·R1
03:22
6d ago
HuggingFace Papers (takara mirror)· rssEN03:22 · 06·03
Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language Models
CAPR compresses dLLM denoising traces into path states, uses cached sibling continuations to train a block-level value head, and reduces rollout-generation cost to about 0.75x flat rollouts and 0.6x tree rollouts under standard settings.
#Reasoning#Fine-tuning#Inference-opt#LLaDA
why featured
HKR-K passes: CAPR adds path-state compression, sibling-continuation caching, and rollout-cost numbers. HKR-H and HKR-R are weak because this remains a niche dLLM training paper without deployment scale or product impact.
editor take
CAPR cuts dLLM rollout cost to 0.75x flat rollouts; I buy the premise—diffusion LMs need their own RL machinery.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H0·K1·R0
02:51
6d ago
HuggingFace Papers (takara mirror)· rssEN02:51 · 06·03
DLLG: Dynamic Logit-Level Gating of LLM Experts
DLLG uses a lightweight gating module to predict step-wise fusion weights, learning token-level expert fusion from sparse response-level supervision without token-level labels or expert retraining.
#Reasoning#Code#Inference-opt#Research release
why featured
HKR-H and HKR-K pass: the paper proposes expert fusion without token labels or retraining. The topic is niche model-fusion research, with no disclosed code, scale test, or production replacement claim, so it stays in all.
editor take
DLLG learns token fusion from response labels, but no scores are disclosed; I don’t buy “scalable” before latency costs.
HKR breakdown
hook knowledge resonance
open source
64
SCORE
H1·K1·R0
01:35
6d ago
HuggingFace Papers (takara mirror)· rssEN01:35 · 06·03
Federated Learning for Privacy-Preserving Multi-Center Sepsis Early Prediction
The study evaluates horizontal federated learning for early sepsis prediction on 648 clinically screened samples from three tertiary hospitals in China, reports accuracy comparable to a centralized baseline, and finds that attackers cannot reconstruct original patient records from transmitted model parameters under its privacy analysis.
#Fine-tuning#Safety#Research release
why featured
HKR-K and HKR-R pass on the 3-hospital, 648-case FL privacy result. HKR-H is weak, and the item stays in the 60-71 band because it is a clinical prediction paper with no product path or open artifact disclosed.
editor take
Three hospitals, 648 cases, near-centralized accuracy; no external validation disclosed, so the FL privacy win outruns the evidence.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1
01:01
6d ago
HuggingFace Papers (takara mirror)· rssEN01:01 · 06·03
Measuring What Matters: Synthetic Benchmarks for Concept Bottleneck Models
The paper introduces synthetic benchmarks for concept bottleneck models across two use cases, decision support and automation, and the benchmarks generate labeled datasets while controlling data modality, concept choice, annotation quality, and completeness.
#Interpretability#Benchmarking#Research release#Benchmark
why featured
HKR-K/R pass: the benchmark design and controlled variables are concrete, and interpretability evaluation is a real trust issue. HKR-H is weak, and the CBM focus is academic with no product adoption signal, so it stays in the 60-71 band.
editor take
CBM gets synthetic benchmarks with 4 controlled variables; I buy it, because real concept labels are scarce.
HKR breakdown
hook knowledge resonance
open source
63
SCORE
H0·K1·R1

more

feeds

admin