Replies: 2 comments
-
|
Small fix proposal as preliminary groundwork to improve MTP on Gemma 4 |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Here is PR #24277 that addresses and fixes the described issue |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Thank you @am17an and everyone for your hard work on getting MTP working with Gemma 4 - #23398 . It increases tg/s significantly. However, I’ve found that for agentic workloads the tg/s drops, and the slowdown is 100% reproducible.
To reproduce:
Start llama-server with Gemma 4 + it-assistant-MTP as usual, then paste the file into the chat server (I used the web browser UI).
Test model:
Test file:
1.1 Without MTP:
1.2 With MTP
It appears that the decoding path diverges from the one used when generating long, high‑entropy narratives.
Beta Was this translation helpful? Give feedback.
All reactions