Skip to content

[RL] Honor tokenizer chat templates for base models that lack one#4049

Open
dasoto wants to merge 8 commits into
mainfrom
davidsotomora-rl-chat-template
Open

[RL] Honor tokenizer chat templates for base models that lack one#4049
dasoto wants to merge 8 commits into
mainfrom
davidsotomora-rl-chat-template

Conversation

@dasoto
Copy link
Copy Markdown
Collaborator

@dasoto dasoto commented Jun 3, 2026

Description

Problem statement

Tokenizers loaded via AutoTokenizer.from_pretrained for some model families (Gemma 4 base/IT, Llama 2 base, several others) have chat_template = None. utils_rl.process_data calls apply_chat_template(...) unconditionally, causing a hard ValueError crash with no actionable error message.

The SFT pipeline solves this by reading config.chat_template (a Jinja string) or loading from config.chat_template_path via load_chat_template_from_file, then attaching the result via validate_and_configure_sft_columns. The RL pipeline doesn't do any of this.

Solution Proposed:

This PR clarifies and decouples the configuration fields for chat templates used during dataset preprocessing versus those loaded into the model tokenizer during Reinforcement Learning (RL) training workloads.

  1. Decoupled Chat Templates:
  • chat_template_path: Retargeted to load the model tokenizer's chat template file (when a base model lacks one).
  • data_template_path: Introduced as a new configuration parameter to specify the dataset formatting template (defining how system prompts, questions, and answers are structured).
  1. Codebase Integration:
  • Updated rl_train and the data preparation pipeline in train_rl.py to load dataset formats from data_template_path and fallback tokenizer chat templates from chat_template_path.
  • Updated the RL configuration registry (types.py and base.yml) to include the new data_template_path field and standard defaults.
  • Migrated rl.yml to use the new parameter names.
  1. Testing:
  • Updated unit tests in train_rl_test.py to reflect the renamed fields.
  • Added the TokenizerChatTemplateTest suite to verify that tokenizers are successfully populated via Jinja strings or file paths.

FIXES: b/519224406

Tests

  • Unit test pass
  • Running RL workloads for models and tokenizers that have and don't have a chat template.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Jun 3, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/trainers/post_train/rl/train_rl.py 81.81% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@dasoto dasoto force-pushed the davidsotomora-rl-chat-template branch from 6e6275f to 4957551 Compare June 3, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant