FEATUREDHuggingFace Papers (takara mirror)· rssEN21:09 · 05·12
→State-Centric Decision Process
State-Centric Decision Process makes an agent commit to natural-language predicates, act, and verify observations, producing certified states, mappings, transitions, and termination criteria; the paper evaluates SDP on five planning, scientific exploration, web reasoning, and multi-hop QA benchmarks, where it reports the best training-free results on all five and larger gains as horizon length increases.
#Agent#Reasoning#Benchmarking#Research release
why featured
HKR-H/K/R all pass, but the post gives only the mechanism and benchmark categories, with no scores, code, or reproducibility setup. This fits the 72–77 research-update band, with no hard-exclusion rule triggered.
editor take
SDP makes agents expose state, not vibes; that is a cleaner attack on long-horizon failure than another prompt scaffold.
sharp
SDP’s useful move is forcing language agents into an auditable state machine, instead of piling on another CoT scaffold. Each step commits to a natural-language predicate, takes an action, then checks the observation; passed predicates become certified states, carrying state space, observation mapping, transitions, and termination criteria.
The paper reports best training-free results on five benchmarks, with larger gains as horizon length increases. That is the right failure mode to attack. ReAct-style agents often bury the breakage halfway through a trajectory; SDP at least gives per-predicate credit assignment and failure localization. The missing detail is serious: the abstract gives no scores, base models, or verifier cost. If the same LLM both proposes and “certifies” predicates, certification starts to smell like self-grading.
HKR breakdown
hook ✓knowledge ✓resonance ✓