Skip to content

Add open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts datasets to RL#3509

Closed
SurbhiJainUSC wants to merge 16 commits intomainfrom
rl_dataset
Closed

Add open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts datasets to RL#3509
SurbhiJainUSC wants to merge 16 commits intomainfrom
rl_dataset

Conversation

@SurbhiJainUSC
Copy link
Copy Markdown
Collaborator

@SurbhiJainUSC SurbhiJainUSC commented Mar 26, 2026

Description

  • Adds support for open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts.
  • Renamed prepare_openinstructmath2_dataset() to prepare_train_and_eval_dataset() to handle multiple datasets generically.
  • Updated process_data() to find questions and answers across various keys (e.g., "problem", "prompt", "solution", "expected_answer").
  • Added process_answer() and process_mcq() to parse MCQ options directly from the question text.
  • For MCQs, the training pipeline now accepts both the option letter (e.g., "A") and the corresponding value as correct answers.
  • Removed check_answer() reward function and simplified reward by overlapping it with check_numbers().
  • Fixed a bug where math_verify_func() would fail during training because the underlying Math-Verify 'parse' function uses signal.alarm(), which is incompatible with threaded environments. Implemented a manual math verification function with a 5-second timeout, ensuring it can run safely in a multithreaded context.
  • Fixed normalize_final_answer() to handle mixed numbers (e.g., converting "3 \frac{1}{2}" to "3+\frac{1}{2}").
  • Added more unit tests for the new MCQ logic, verify_math capabilities, and enhanced normalization rules.

Tests

Tested on v5p-8:

python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=llama3.1-8b-Instruct \
run_name=$RUN_NAME \
base_output_directory=$BASE_OUTPUT_DIRECTORY \
batch_size=5 \
dataset_name=bethgelab/CuratedThoughts \
train_split=train \
hf_train_files=OpenR1-Math-220k-default/train-*.parquet  \
num_batches=5 \
scan_layers=True \
hbm_utilization_vllm=0.4 \
rollout_data_parallelism=2 \
rollout_tensor_parallelism=2 \
allow_split_physical_axes=true \
load_parameters_path=$MAXTEXT_CKPT_PATH \
skip_jax_distributed_system=True
python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=llama3.1-8b-Instruct \
run_name=$RUN_NAME \
base_output_directory=$BASE_OUTPUT_DIRECTORY \
batch_size=4 \
dataset_name=open-r1/OpenR1-Math-220k \
train_split=train \
hf_train_files=data/train-*.parquet  \
num_batches=5 \
scan_layers=True \
hbm_utilization_vllm=0.4 \
rollout_data_parallelism=2 \
rollout_tensor_parallelism=2 \
allow_split_physical_axes=true \
load_parameters_path=$MAXTEXT_CKPT_PATH
python3 -m maxtext.trainers.post_train.rl.train_rl \
model_name=llama3.1-8b-Instruct \
load_parameters_path=$MAXTEXT_CKPT_PATH \
run_name=$RUN_NAME \
base_output_directory=$BASE_OUTPUT_DIRECTORY \
dataset_name=nvidia/OpenMathInstruct-2 \
train_split=train_1M steps=10 \
rollout_tensor_parallelism=1 \
hf_train_files=data/train_1M-*.parquet \
skip_jax_distributed_system=True

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 9.18367% with 178 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/trainers/post_train/rl/utils_rl.py 10.76% 116 Missing ⚠️
src/maxtext/trainers/post_train/rl/evaluate_rl.py 7.69% 36 Missing ⚠️
src/maxtext/trainers/post_train/rl/train_rl.py 3.70% 26 Missing ⚠️

📢 Thoughts on this report? Let us know!

@SurbhiJainUSC SurbhiJainUSC force-pushed the rl_dataset branch 22 times, most recently from 6ec6d77 to dbd0cd4 Compare April 1, 2026 18:22
@SurbhiJainUSC SurbhiJainUSC marked this pull request as ready for review April 1, 2026 18:43
@SurbhiJainUSC SurbhiJainUSC changed the title Add open-r1/OpenR1-Math-220k dataset to RL Add open-r1/OpenR1-Math-220k and bethgelab/CuratedThoughts datasets to RL Apr 2, 2026
@A9isha A9isha requested a review from jacoguzo as a code owner April 8, 2026 21:24
Copy link
Copy Markdown
Collaborator

@A9isha A9isha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remember to add the condition that even if the dataset_name and eval_dataset_name are same, we need to check if the splits are also the same - and only then do a split of the dataset

@A9isha A9isha requested a review from abhinavclemson as a code owner April 9, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants