Skip to content

Eval bug: Gemma 4 generates <unused> tokens in infinite loop #21516

@CSCSoftware

Description

@CSCSoftware

Bug Description

Gemma 4 models generate an infinite stream of <unused> tokens (Token ID 14 = <unused8>) on the Vulkan backend, both with GPU offloading and CPU-only. No valid text is produced — the model runs until MaxTokens is exhausted.

This happens despite having all known Gemma 4 fixes applied:

Environment

Steps to Reproduce

  1. Build llama.cpp from current master with Vulkan enabled
  2. Load gemma-4-E2B-it-Q4_K_M.gguf with -ngl 99
  3. Send any prompt (e.g., "Hello")
  4. Observe: model generates ~18000+ tokens of <unused8> (token id=14) without producing any readable text or hitting EOG

Diagnostic Data

Token sampling output (first 10 tokens):

Token[0] id=14
Token[1] id=14
Token[2] id=14
Token[3] id=14
Token[4] id=14
Token[5] id=14
Token[6] id=14
Token[7] id=14
Token[8] id=14
Token[9] id=14

Token 14 in Gemma 4 vocab = <unused8>.

Generation stats: 183 tok/s, 18432 tokens generated, 44.7 seconds — no EOG token emitted.

Additional Testing

  • CPU-only (-ngl 0): Same result — generates [multimodal] tokens (id=5) in an infinite loop. Also broken.
  • PR models : set gemma 4 FFN MoE prec to F32 #21506 applied (F32 MoE FFN precision): No improvement on either CPU or Vulkan.
  • Ollama: Same model works correctly in Ollama (which uses its own llama.cpp fork), producing valid responses.

Init Logs (successful)

Model handle: OK
Vocab size: 262144
Layers: 35
Context size: 32768
Gemma tokens: start_of_turn=105, end_of_turn=106, bos=2
System tokens: 11
Initialization complete!

Model loads correctly, context/sampler/batch all initialized — the issue is purely in inference/sampling.

Related Issues

The root cause may be Vulkan-specific numerical precision issues beyond what #21506 addresses, or a different code path in the Vulkan compute shaders.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions