Commit 70a313e
feat: full TurboQuant with rotation matrices in Metal kernels PrismML-Eng#21
Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant
memory) directly in the Metal shader. Both quantize and dequantize now
perform the full TurboQuant algorithm:
Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL
Dequantize: codebook → inverse rotate → QJL correction → rescale
Previous version (no rotation) produced garbage. This should produce
meaningful output since the rotation Gaussianizes the KV distribution.
Note: dequantize does full 128-element rotation per chunk (8× work).
Optimization possible with caching or restructured kernel in follow-up.
Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 9f3771a commit 70a313e
2 files changed
Lines changed: 8403 additions & 104 deletions
0 commit comments