Skip to content

Commit fa35fc0

Browse files
committed
Add plain-language benchmark comparisons
1 parent 1bd26fc commit fa35fc0

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,16 @@ score_cos_mean = 0.997257
2424

2525
Within this repository's current benchmark setup, this is the best observed quality/compression tradeoff among the tested methods.
2626

27+
## Plain-language comparison
28+
29+
For readers who just want the headline numbers:
30+
31+
- vs **FP16 KV-cache payload**: `affine_seven_level_3bit_g64_meta4` uses about **19.6%** of the modeled payload, i.e. about **80.4% memory saving** and about **5.11x compression**.
32+
- vs local **RotorQuant-3b** baseline in this benchmark: it uses about **4.35% more modeled memory** (`3.130399` vs `3.000000` effective bits/dim), but gives about **11.03% higher attention-output cosine** (`0.942270` vs `0.848665`).
33+
- if you want a lower-memory point instead of the main quality point, `hadamard_affine_four_level_2bit_g64_meta8` uses about **24.8% less modeled memory** than local `RotorQuant-3b` (`2.255399` vs `3.000000` bits/dim) while showing slightly higher attention-output cosine in this benchmark (`0.851023` vs `0.848665`).
34+
35+
These are benchmark-local modeled payload comparisons, not production VRAM or kernel-throughput claims.
36+
2737
## What `effective_bits_per_dim` means
2838

2939
`effective_bits_per_dim` is a **theoretical accounting model** for quantized code bits plus declared metadata bits. It is not measured packed tensor storage, kernel layout overhead, or actual GPU VRAM usage.

0 commit comments

Comments
 (0)