r/LocalLLaMA· rssEN03:38 · 06·07
→GraphKV, KV Cache Optimization Based on Graph Embedding Models
GraphKV compressed the KV cache for Qwen2.5-7B NF4 in a 32k-token next-token decode test from 1,879,048,192 bytes to 558,530,560 bytes, reporting 3.36x compression, 0.990316 cosine similarity, top10 of 1.00, and argmax match.
#Inference-opt#Embedding#GraphKV#Qwen
why featured
HKR-H/K/R all pass: the post gives a concrete KV-cache compression claim. Single Reddit sourcing and no disclosed paper, repo, or third-party replication keep it in the 60–71 band.
editor take
GraphKV claims 3.36x KV-cache compression at 32k decode; Reddit body is 403, with no code or reproduction details.
HKR breakdown
hook ✓knowledge ✓resonance ✓