Releases: CrazyAngelm/megaquant-kv-cache
Releases · CrazyAngelm/megaquant-kv-cache
v0.1.0 — Initial research PoC release
Initial public research PoC release.
Highlights:
- Metadata-aware low-bit KV-cache compression benchmark.
- Main point: 5.11x modeled KV payload compression vs FP16.
- In a local GPT-2 CPU/Python fake-quant benchmark: +11.0% attention-output cosine vs local RotorQuant-3b baseline, at +4.35% modeled memory.
- Includes conservative scope notes: not a production runtime, CUDA, real-VRAM, or throughput claim.
Feedback wanted on:
- effective-bit accounting,
- metadata overhead modeling,
- benchmark scope and baselines,
- next experiments worth running.