Skip to content

Gemma4 think tags not detected #2092

@LukeyGuy47

Description

@LukeyGuy47

Description:
Currently, when using Gemma 4 models with native thinking enabled, frontends like SillyTavern fail to stream the output correctly because the raw <|channel>thought tags are passed directly into the text stream, causing the frontend parsers to buffer the entire generation before rendering.

Upstream llama.cpp recently added support for the peg-gemma4 chat template, which natively translates Gemma 4's channel tags into the standard OpenAI reasoning_content field over the API. Porting this upstream handling to KoboldCpp would fix the streaming lock-up in compatible frontends.

Steps to Reproduce:

  1. Load a Gemma 4 model (e.g., google_gemma-4-31B-it) in KoboldCpp v1.111.
  2. Launch with the argument: --jinja_kwargs='{"enable_thinking":true}'
  3. Connect a frontend like SillyTavern using the OpenAI-compatible API endpoint (/v1/chat/completions).
  4. Generate a response.
  5. Result: The backend generates tokens normally (streaming is active in the console), but the frontend UI freezes/buffers until the generation is 100% complete due to the custom <|channel> tags in the raw text block.

Expected Behavior:
The thoughts should be stripped from the main text stream and sent as reasoning_content in the API payload, allowing frontends to stream both the reasoning and the response seamlessly.

Additional Context:
Testing the exact same model and prompt directly in standard llama.cpp server results in flawless streaming. The llama.cpp logs show it correctly identifying the template and activating the reasoning budget, which KoboldCpp does not currently do:

Plaintext

srv params_from_: Chat format: peg-gemma4
init: chat template, thinking = 1
reasoning-budget: activated, budget=2147483647 tokens

Environment:

  • OS: Windows 11
  • GPU: RTX 4080 + RTX 3090 (Tensor Split)
  • KoboldCpp Version: 1.111

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions