[RL] Honor tokenizer chat templates for base models that lack one by dasoto · Pull Request #4049 · AI-Hypercomputer/maxtext

dasoto · 2026-06-03T05:28:06Z

Description

Problem statement

Tokenizers loaded via AutoTokenizer.from_pretrained for some model families (Gemma 4 base/IT, Llama 2 base, several others) have chat_template = None. utils_rl.process_data calls apply_chat_template(...) unconditionally, causing a hard ValueError crash with no actionable error message.

The SFT pipeline solves this by reading config.chat_template (a Jinja string) or loading from config.chat_template_path via load_chat_template_from_file, then attaching the result via validate_and_configure_sft_columns. The RL pipeline doesn't do any of this.

Solution Proposed:

This PR clarifies and decouples the configuration fields for chat templates used during dataset preprocessing versus those loaded into the model tokenizer during Reinforcement Learning (RL) training workloads.

Decoupled Chat Templates:

chat_template_path: Retargeted to load the model tokenizer's chat template file (when a base model lacks one).
data_template_path: Introduced as a new configuration parameter to specify the dataset formatting template (defining how system prompts, questions, and answers are structured).

Codebase Integration:

Updated rl_train and the data preparation pipeline in train_rl.py to load dataset formats from data_template_path and fallback tokenizer chat templates from chat_template_path.
Updated the RL configuration registry (types.py and base.yml) to include the new data_template_path field and standard defaults.
Migrated rl.yml to use the new parameter names.

Testing:

Updated unit tests in train_rl_test.py to reflect the renamed fields.
Added the TokenizerChatTemplateTest suite to verify that tokenizers are successfully populated via Jinja strings or file paths.

FIXES: b/519224406

Tests

Unit test pass
Running RL workloads for models and tokenizers that have and don't have a chat template.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

… to specify one

… template

google-cla · 2026-06-03T05:28:24Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

codecov · 2026-06-03T05:33:13Z

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/post_train/rl/train_rl.py	81.81%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

…-template

dasoto added 4 commits June 2, 2026 17:14

Adding option for rl workloads that use models without chat_templates…

6c8a1d4

… to specify one

Adding tokenizer_chat_template_path to valid inputs for config

ab3b935

Adding data_template_path, and using chat_template_path for tokenizer…

f3e108a

… template

removing old key

4957551

dasoto force-pushed the davidsotomora-rl-chat-template branch from 6e6275f to 4957551 Compare June 3, 2026 18:36

dasoto added 4 commits June 3, 2026 18:57

Linting

6aeaba5

Apply pyink formatting

885d96b

Merge remote-tracking branch 'origin/main' into davidsotomora-rl-chat…

fd9717a

…-template

Fix pylint protected-access on _rl_train_impl test from merged main

792a8ee

dasoto marked this pull request as ready for review June 3, 2026 19:41

dasoto requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, abhinavclemson, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners June 3, 2026 19:42

dasoto requested review from darisoy, dipannita08 and igorts-git as code owners June 3, 2026 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RL] Honor tokenizer chat templates for base models that lack one#4049

[RL] Honor tokenizer chat templates for base models that lack one#4049
dasoto wants to merge 8 commits into
mainfrom
davidsotomora-rl-chat-template

dasoto commented Jun 3, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Jun 3, 2026

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dasoto commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem statement

Solution Proposed:

Tests

Checklist

Uh oh!

google-cla Bot commented Jun 3, 2026

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dasoto commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading