Skip to content

Releases: CrazyAngelm/megaquant-kv-cache

v0.1.0 — Initial research PoC release

24 Apr 14:51

Choose a tag to compare

Initial public research PoC release.

Highlights:

  • Metadata-aware low-bit KV-cache compression benchmark.
  • Main point: 5.11x modeled KV payload compression vs FP16.
  • In a local GPT-2 CPU/Python fake-quant benchmark: +11.0% attention-output cosine vs local RotorQuant-3b baseline, at +4.35% modeled memory.
  • Includes conservative scope notes: not a production runtime, CUDA, real-VRAM, or throughput claim.

Feedback wanted on:

  • effective-bit accounting,
  • metadata overhead modeling,
  • benchmark scope and baselines,
  • next experiments worth running.