Gemma4 26B on Radeon 780M iGPU via llama.cpp Vulkan — ~23–25 tok/s #24222

l1v1ngth3dr34m · 2026-06-06T00:50:29Z

l1v1ngth3dr34m
Jun 6, 2026

Field report, not a bug: Gemma4 26B Q4_K_M is running well on a Radeon 780M iGPU via llama.cpp Vulkan.

I wanted to share this because I expected the Radeon 780M path to be marginal, but it turned out to be a real win.

Vulkan0: AMD Radeon 780M Graphics (RADV PHOENIX) (33081 MiB, 32998 MiB free)

ggml-org Gemma4 26B Q4_K_M via llama.cpp Vulkan

pp128: ~208.72 t/s
tg64:  ~25.00 t/s

CLI smoke test:

Prompt:     ~37.3 t/s
Generation: ~23.4 t/s

A second no-reasoning smoke test with:

--reasoning off --reasoning-budget 0

returned final-answer-only output at about:

Prompt:     ~117.6 t/s
Generation: ~24.4 t/s

Earlier, ollama run gemma4:26b on the same machine was around:

~4.5 t/s generation

So the llama.cpp Vulkan path is roughly a 5x–6x generation uplift in my local testing.

The installed Ollama gemma4:26b model blob is GGUF, but it did not load as a standalone upstream llama.cpp model file.

It failed with:

wrong number of tensors; expected 1014, got 658

The same general pattern happened with the installed Ollama gemma4:e4b blob.

So the working path was not “point llama.cpp at the Ollama blob.” The working path was using the upstream llama.cpp-facing GGUF from ggml-org.

I first verified the stack with smaller GGUFs:

TinyLlama 1.1B Q4_K_M:
ngl 0:  tg128 ~85.06 t/s
ngl 99: tg128 ~108.20 t/s

Mistral 7B Q4_K_M:
ngl 0:  tg128 ~12.97 t/s
ngl 99: tg128 ~18.18 t/s

Then Gemma4 26B was the real surprise:

Gemma4 26B Q4_K_M:
ngl 99: tg64 ~25.00 t/s

For this machine, the split is now pretty clear:

Ollama remains the convenient daemon/API path.
llama.cpp + standard ggml-org Gemma4 26B Q4_K_M + Vulkan is the performance path.
Reusing Ollama’s installed Gemma4 blob directly in upstream llama.cpp was not viable.
Radeon 780M is much more useful than I expected for Gemma4 26B when using the right GGUF and runtime path.

Curious if anyone else has Radeon 780M / 890M / Phoenix / Strix Point numbers for Gemma4 26B or similar MoE models.