Gemma4 26B on Radeon 780M iGPU via llama.cpp Vulkan — ~23–25 tok/s #24222
l1v1ngth3dr34m
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Field report, not a bug: Gemma4 26B Q4_K_M is running well on a Radeon 780M iGPU via llama.cpp Vulkan.
I wanted to share this because I expected the Radeon 780M path to be marginal, but it turned out to be a real win.
Hardware / software
-DGGML_VULKAN=ONModel
ggml-org/gemma-4-26B-A4B-it-GGUFgemma-4-26B-A4B-it-Q4_K_M.ggufllama-cli/llama-bench-ngl 99Benchmark result
CLI smoke test:
A second no-reasoning smoke test with:
returned final-answer-only output at about:
Comparison with Ollama on the same box
Earlier,
ollama run gemma4:26bon the same machine was around:So the llama.cpp Vulkan path is roughly a 5x–6x generation uplift in my local testing.
Important caveat: Ollama blob reuse did not work
The installed Ollama
gemma4:26bmodel blob is GGUF, but it did not load as a standalone upstream llama.cpp model file.It failed with:
The same general pattern happened with the installed Ollama
gemma4:e4bblob.So the working path was not “point llama.cpp at the Ollama blob.” The working path was using the upstream llama.cpp-facing GGUF from
ggml-org.What I tested before 26B
I first verified the stack with smaller GGUFs:
Then Gemma4 26B was the real surprise:
Conclusion
For this machine, the split is now pretty clear:
ggml-orgGemma4 26B Q4_K_M + Vulkan is the performance path.Curious if anyone else has Radeon 780M / 890M / Phoenix / Strix Point numbers for Gemma4 26B or similar MoE models.
Beta Was this translation helpful? Give feedback.
All reactions