Feature Request: TurboQuant and MTP support #23032

Bino5150 · 2026-05-14T02:27:58Z

Bino5150
May 14, 2026

It's been an either/or thing. I'm having great success with the TurboQuants, especially on my laptop with limited vram, but I know there's performance left on the table by not being able to make use of the models native MTP. The 2 different forks diverge so much, I couldn't figure out how to patch it all together, but it would be really awesome to be able to make use of both at the same time. Thanks y'all for the awesome work you're doing, we appreciate it.

Diablo-D3 · 2026-05-15T11:04:44Z

Diablo-D3
May 15, 2026

TurboQuant has been rejected by gg. It does not meaningfully outperform llama.cpp's current implementation which uses Hadamard rotations, while being dramatically slower. The same goes with SpinQuants, RotorQuants, and all the other techniques that use Hadamard rotations underneath.

MTP is currently parked in a PR and will be merged into mainline hopefully soon.

#22673

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: TurboQuant and MTP support #23032

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feature Request: TurboQuant and MTP support #23032

Uh oh!

Bino5150 May 14, 2026

Replies: 1 comment

Uh oh!

Diablo-D3 May 15, 2026

Bino5150
May 14, 2026

Diablo-D3
May 15, 2026