Skip to content

Commit 70a313e

Browse files
TheTomclaude
andcommitted
feat: full TurboQuant with rotation matrices in Metal kernels PrismML-Eng#21
Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant memory) directly in the Metal shader. Both quantize and dequantize now perform the full TurboQuant algorithm: Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL Dequantize: codebook → inverse rotate → QJL correction → rescale Previous version (no rotation) produced garbage. This should produce meaningful output since the rotation Gaussianizes the KV distribution. Note: dequantize does full 128-element rotation per chunk (8× work). Optimization possible with caching or restructured kernel in follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 9f3771a commit 70a313e

2 files changed

Lines changed: 8403 additions & 104 deletions

File tree

0 commit comments

Comments
 (0)