Add open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts datasets to RL by SurbhiJainUSC · Pull Request #3509 · AI-Hypercomputer/maxtext

SurbhiJainUSC · 2026-03-26T23:18:48Z

Description

Adds support for open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts.
Renamed prepare_openinstructmath2_dataset() to prepare_train_and_eval_dataset() to handle multiple datasets generically.
Updated process_data() to find questions and answers across various keys (e.g., "problem", "prompt", "solution", "expected_answer").
Added process_answer() and process_mcq() to parse MCQ options directly from the question text.
For MCQs, the training pipeline now accepts both the option letter (e.g., "A") and the corresponding value as correct answers.
Removed check_answer() reward function and simplified reward by overlapping it with check_numbers().
Fixed a bug where math_verify_func() would fail during training because the underlying Math-Verify 'parse' function uses signal.alarm(), which is incompatible with threaded environments. Implemented a manual math verification function with a 5-second timeout, ensuring it can run safely in a multithreaded context.
Fixed normalize_final_answer() to handle mixed numbers (e.g., converting "3 \frac{1}{2}" to "3+\frac{1}{2}").
Added more unit tests for the new MCQ logic, verify_math capabilities, and enhanced normalization rules.

Tests

Tested on v5p-8:

python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=llama3.1-8b-Instruct \
run_name=$RUN_NAME \
base_output_directory=$BASE_OUTPUT_DIRECTORY \
batch_size=5 \
dataset_name=bethgelab/CuratedThoughts \
train_split=train \
hf_train_files=OpenR1-Math-220k-default/train-*.parquet  \
num_batches=5 \
scan_layers=True \
hbm_utilization_vllm=0.4 \
rollout_data_parallelism=2 \
rollout_tensor_parallelism=2 \
allow_split_physical_axes=true \
load_parameters_path=$MAXTEXT_CKPT_PATH \
skip_jax_distributed_system=True

python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=llama3.1-8b-Instruct \
run_name=$RUN_NAME \
base_output_directory=$BASE_OUTPUT_DIRECTORY \
batch_size=4 \
dataset_name=open-r1/OpenR1-Math-220k \
train_split=train \
hf_train_files=data/train-*.parquet  \
num_batches=5 \
scan_layers=True \
hbm_utilization_vllm=0.4 \
rollout_data_parallelism=2 \
rollout_tensor_parallelism=2 \
allow_split_physical_axes=true \
load_parameters_path=$MAXTEXT_CKPT_PATH

python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=llama3.1-8b-Instruct \
load_parameters_path=$MAXTEXT_CKPT_PATH \
run_name=$RUN_NAME \
base_output_directory=$BASE_OUTPUT_DIRECTORY \
dataset_name=nvidia/OpenMathInstruct-2 \
train_split=train_1M steps=10 \
rollout_tensor_parallelism=1 \
hf_train_files=data/train_1M-*.parquet \
skip_jax_distributed_system=True

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-26T23:22:39Z

Codecov Report

❌ Patch coverage is 9.18367% with 178 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/post_train/rl/utils_rl.py	10.76%	116 Missing ⚠️
src/maxtext/trainers/post_train/rl/evaluate_rl.py	7.69%	36 Missing ⚠️
src/maxtext/trainers/post_train/rl/train_rl.py	3.70%	26 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ctness datasetname, add MATH-500 to preprocess_math_string

A9isha

Let's remember to add the condition that even if the dataset_name and eval_dataset_name are same, we need to check if the splits are also the same - and only then do a split of the dataset

SurbhiJainUSC force-pushed the rl_dataset branch 22 times, most recently from 6ec6d77 to dbd0cd4 Compare April 1, 2026 18:22

SurbhiJainUSC marked this pull request as ready for review April 1, 2026 18:43

SurbhiJainUSC requested review from RissyRan, bvandermoon, gobbleturk, khatwanimohit, richjames0 and vipannalla as code owners April 1, 2026 18:43

SurbhiJainUSC requested review from dipannita08, igorts-git and jesselu-google as code owners April 1, 2026 18:43

SurbhiJainUSC force-pushed the rl_dataset branch from dbd0cd4 to cf8d567 Compare April 2, 2026 22:21

Add open-r1/OpenR1-Math-220k dataset to RL

2f74c84

SurbhiJainUSC force-pushed the rl_dataset branch from cf8d567 to 2f74c84 Compare April 2, 2026 22:43

SurbhiJainUSC changed the title ~~Add open-r1/OpenR1-Math-220k dataset to RL~~ Add open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts datasets to RL Apr 2, 2026

support nvidia/OpenMathReasoning

ae6671b

A9isha force-pushed the rl_dataset branch from 8e73d13 to ae6671b Compare April 3, 2026 06:55

SurbhiJainUSC and others added 3 commits April 5, 2026 05:59

Add open-r1/OpenR1-Math-220k dataset to RL

04a07ed

Add open-r1/OpenR1-Math-220k dataset to RL

17dda16

refactor eval dataset load, remove tfds dataset load, fix check_corre…

6455b7a

…ctness datasetname, add MATH-500 to preprocess_math_string

A9isha force-pushed the rl_dataset branch from ae00589 to 6455b7a Compare April 6, 2026 22:33

A9isha and others added 4 commits April 6, 2026 23:05

Surbhi's changes

aaa2c4c

temp

d9136ff

Update condition for dataset split

c7ffd8e

remove conditional preprocess_math_string

8a415ba

A9isha force-pushed the rl_dataset branch from 8a415ba to 18c26ad Compare April 7, 2026 20:25

temporarily log all the answers from the counter evaluate

2898d50

A9isha force-pushed the rl_dataset branch from b158096 to 2898d50 Compare April 8, 2026 21:15

port over multiprocessing changes from Tunix

1a024c2

A9isha requested a review from jacoguzo as a code owner April 8, 2026 21:24

A9isha and others added 3 commits April 8, 2026 21:26

preprocessing strings

53518ea

update rl.yml

9b10d25

microbatch changes

4ebc5be

A9isha reviewed Apr 9, 2026

View reviewed changes

create persistent pool

0294540

A9isha requested a review from abhinavclemson as a code owner April 9, 2026 20:53

reduce the versobity of the logs from math_verify_pool

514a229

SurbhiJainUSC closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts datasets to RL#3509

Add open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts datasets to RL#3509
SurbhiJainUSC wants to merge 16 commits intomainfrom
rl_dataset

SurbhiJainUSC commented Mar 26, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

A9isha left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SurbhiJainUSC commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

A9isha left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SurbhiJainUSC commented Mar 26, 2026 •

edited

Loading

codecov Bot commented Mar 26, 2026 •

edited

Loading