Commit 70a313e

and

committed

feat: full TurboQuant with rotation matrices in Metal kernels PrismML-Eng#21

Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant memory) directly in the Metal shader. Both quantize and dequantize now perform the full TurboQuant algorithm: Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL Dequantize: codebook → inverse rotate → QJL correction → rescale Previous version (no rotation) produced garbage. This should produce meaningful output since the rotation Gaussianizes the KV distribution. Note: dequantize does full 128-element rotation per chunk (8× work). Optimization possible with caching or restructured kernel in follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1 parent 9f3771a commit 70a313eCopy full SHA for 70a313e

2 files changed

ggml/src/ggml-metal
- ggml-metal.metal
- turbo-matrices.h

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 70a313e

File tree

0 commit comments