Replies: 1 comment
-
|
TurboQuant has been rejected by gg. It does not meaningfully outperform llama.cpp's current implementation which uses Hadamard rotations, while being dramatically slower. The same goes with SpinQuants, RotorQuants, and all the other techniques that use Hadamard rotations underneath. MTP is currently parked in a PR and will be merged into mainline hopefully soon. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
It's been an either/or thing. I'm having great success with the TurboQuants, especially on my laptop with limited vram, but I know there's performance left on the table by not being able to make use of the models native MTP. The 2 different forks diverge so much, I couldn't figure out how to patch it all together, but it would be really awesome to be able to make use of both at the same time. Thanks y'all for the awesome work you're doing, we appreciate it.
Beta Was this translation helpful? Give feedback.
All reactions