20:03
28d ago
→Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
VLA reformulates linear-attention memory updates as online regularized least squares; at T=1,000 it reduces the state norm by 109× versus standard linear attention, maintains 62% accuracy at the per-head capacity boundary, and its Triton-fused kernel becomes faster than softmax attention at about 43,000 tokens.
73
SCORE
H0·K1·R1