fix Gemma 4 multimodal chat-template markers in processor_gemma4

Google-ML-Automation · Google-ML-Automation · commit 5f3dc2b45cb9 · 2026-06-12T16:44:54.000-07:00
The Gemma 4 multimodal SFT path was emitting Gemma 3 chat-template markers
("&lt;start_of_turn&gt;", "&lt;end_of_turn&gt;") which are NOT special tokens in the
Gemma 4 tokenizer. They BPE-tokenize into 7-token noise sequences each, so a
training label like "A&lt;end_of_turn&gt;" became an 8-token sequence
([236776 'A', 236820 '&lt;', 643 'end', 236779 '_', 1340 'of', 236779 '_',
887 'turn', 236813 '&gt;']).

With sft_train_on_completion_only=true the model learned to reproduce this
noise sequence after every answer, producing severe response-format collapse
post-SFT (e.g. "A&lt;B&lt;C&lt;D&lt;...").

The Gemma 4 chat template uses different special tokens:
  &lt;bos&gt;    (id 2)
  &lt;|turn&gt;  (id 105)
  &lt;turn|&gt;  (id 106)
This CL switches the prompt and response formatters to use them.

PiperOrigin-RevId: 931396545
diff --git a/src/maxtext/multimodal/processor.py b/src/maxtext/multimodal/processor.py
@@ -135,7 +135,7 @@ def reformat_response(response, model_name):
     formatted_response = f"{response}<end_of_turn>"
     return formatted_response
   elif model_name in ["gemma4-26b", "gemma4-31b", "gemma4-e2b", "gemma4-e4b"]:
-    formatted_response = f"{response}<end_of_turn>"
+    formatted_response = f"{response}<turn|>"
     return formatted_response
   elif model_name in ["qwen3-omni-30b-a3b", "qwen3.5-35b-a3b", "qwen3.5-397b-a17b"]:
     formatted_response = f"{response}<|im_end|>"
diff --git a/src/maxtext/multimodal/processor_gemma4.py b/src/maxtext/multimodal/processor_gemma4.py
@@ -94,7 +94,7 @@ def reformat_prompt_gemma4(prompt, image_placeholder, num_images):
   image_placeholder_count = prompt.count(GEMMA4_IMAGE_PLACEHOLDER_IN_PROMPT)
   if image_placeholder_count < num_images:
     prompt = GEMMA4_IMAGE_PLACEHOLDER_IN_PROMPT * (num_images - image_placeholder_count) + prompt
-  formatted_prompt = f"<start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model\n"
+  formatted_prompt = f"<bos><|turn>user\n{prompt}<turn|>\n<|turn>model\n"
   return formatted_prompt