Add open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning to RL and fix reward function by SurbhiJainUSC · Pull Request #3629 · AI-Hypercomputer/maxtext

SurbhiJainUSC · 2026-04-10T00:52:49Z

Description

Co-authored with @A9isha

This PR introduces significant improvements to the RL reward and evaluation logic for math-based datasets.

addition of multiprocessing for math_verify to prevent trainer and evaluation hangs
support for new datasets (OpenR1-Math-220k, OpenMathReasoning)
answer extraction and normalization logic handles Multiple Choice Questions by allowing both option letters and values as correct
Added more evaluation modes like majority voting and pass@1 estimation
Sync weights to vllm prior to pre RL evaluation
Added are_equal_under_sympy() for symbolic equality checking as a fallback before math_verify

Tests

Tested on OpenMathInstruct and OpenR1 dataset

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-04-10T00:58:36Z

Codecov Report

❌ Patch coverage is 64.75410% with 129 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...maxtext/trainers/post_train/rl/math_verify_pool.py	60.31%	44 Missing and 6 partials ⚠️
src/maxtext/trainers/post_train/rl/utils_rl.py	78.03%	17 Missing and 12 partials ⚠️
src/maxtext/trainers/post_train/rl/evaluate_rl.py	56.14%	21 Missing and 4 partials ⚠️
src/maxtext/trainers/post_train/rl/train_rl.py	50.98%	18 Missing and 7 partials ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-04-20T23:33:23Z

🤖 Hi @SurbhiJainUSC, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This Pull Request introduces significant improvements to the Reinforcement Learning (RL) reward and evaluation logic, specifically for math-based datasets. Key highlights include the addition of multiprocessing for math_verify to prevent trainer hangs, support for new datasets (OpenR1-Math-220k, OpenMathReasoning), and more robust evaluation modes like majority voting and pass@1 estimation. The inclusion of comprehensive unit tests for the new grading logic and multiprocessing pool is a strong positive.

🔍 General Feedback

Multiprocessing for Math Verification: The move to a process-isolated, timeout-bounded math_verify pool is a critical improvement for stability, especially when dealing with complex symbolic computations that might hang.
Improved Answer Extraction: The new answer extraction and normalization logic is more robust and handles Multiple Choice Questions (MCQ) better by allowing both option letters and values as correct.
Evaluation Modes: The addition of maj@K and pass@1 metrics provides a more complete picture of model performance beyond simple pass@K.
LaTeX Normalization: The fix_latex_escaping utility addresses common issues with improperly escaped LaTeX strings, although it requires some tuning to avoid over-correcting common English words.

NicoGrande

Looks good overall! In general, can we add type hints to arguments / return types in the new functions we are adding. It makes it much easier to understand the intent of the function and reason about correctness!

SurbhiJainUSC · 2026-04-22T23:54:36Z

Looks good overall! In general, can we add type hints to arguments / return types in the new functions we are adding. It makes it much easier to understand the intent of the function and reason about correctness!

That's a great suggestion.

NicoGrande

LGTM

…RL and fix reward function Co-authored-by: A9isha <mazumdera@google.com>

SurbhiJainUSC force-pushed the rl-debug branch 8 times, most recently from ffcdb16 to 5b5bcfd Compare April 15, 2026 20:50

SurbhiJainUSC force-pushed the rl-debug branch 2 times, most recently from def1309 to cf7471d Compare April 20, 2026 22:48

SurbhiJainUSC changed the title ~~Rl debug~~ Add open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning to RL and fix reward function Apr 20, 2026

SurbhiJainUSC force-pushed the rl-debug branch 6 times, most recently from 50b9d63 to 0f10655 Compare April 20, 2026 23:33

SurbhiJainUSC added the gemini-review label Apr 20, 2026

github-actions Bot reviewed Apr 20, 2026

View reviewed changes

SurbhiJainUSC force-pushed the rl-debug branch from 0f10655 to 5a01808 Compare April 21, 2026 00:01

SurbhiJainUSC marked this pull request as ready for review April 21, 2026 00:20

SurbhiJainUSC requested review from abhinavclemson, bvandermoon, gobbleturk, khatwanimohit, parambole, richjames0 and shralex as code owners April 21, 2026 00:20

SurbhiJainUSC requested a review from igorts-git as a code owner April 21, 2026 00:20

SurbhiJainUSC force-pushed the rl-debug branch 4 times, most recently from ef788e9 to 7815e66 Compare April 21, 2026 21:31

A9isha reviewed Apr 22, 2026

View reviewed changes

Comment thread src/maxtext/trainers/post_train/rl/train_rl.py Outdated

A9isha reviewed Apr 22, 2026

View reviewed changes

Comment thread src/maxtext/trainers/post_train/rl/train_rl.py

SurbhiJainUSC force-pushed the rl-debug branch 8 times, most recently from 6476df0 to 7668316 Compare April 22, 2026 23:29

NicoGrande reviewed Apr 22, 2026

View reviewed changes

Comment thread src/maxtext/configs/post_train/rl.yml Outdated

Comment thread src/maxtext/trainers/post_train/rl/train_rl.py Outdated

SurbhiJainUSC force-pushed the rl-debug branch 3 times, most recently from a8611e6 to f1002cf Compare April 22, 2026 23:52

SurbhiJainUSC force-pushed the rl-debug branch from f1002cf to 6dd9797 Compare April 22, 2026 23:58

SurbhiJainUSC requested a review from NicoGrande April 23, 2026 16:13

NicoGrande approved these changes Apr 23, 2026

View reviewed changes

khatwanimohit approved these changes Apr 23, 2026

View reviewed changes

Comment thread src/maxtext/trainers/post_train/rl/math_verify_pool.py Outdated

Comment thread src/maxtext/trainers/post_train/rl/math_verify_pool.py

SurbhiJainUSC force-pushed the rl-debug branch from 6dd9797 to 30bc19c Compare April 23, 2026 21:04

Add open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning to …

e042c1f

…RL and fix reward function Co-authored-by: A9isha <mazumdera@google.com>

SurbhiJainUSC force-pushed the rl-debug branch from 30bc19c to e042c1f Compare April 23, 2026 21:06

SurbhiJainUSC added the pull ready label Apr 23, 2026

copybara-service Bot merged commit c54d88f into main Apr 23, 2026
41 of 43 checks passed

copybara-service Bot deleted the rl-debug branch April 23, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning to RL and fix reward function#3629

Add open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning to RL and fix reward function#3629
copybara-service[bot] merged 1 commit intomainfrom
rl-debug

SurbhiJainUSC commented Apr 10, 2026 •

edited by A9isha

Loading

Uh oh!

codecov Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicoGrande left a comment

Uh oh!

Uh oh!

Uh oh!

SurbhiJainUSC commented Apr 22, 2026

Uh oh!

NicoGrande left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SurbhiJainUSC commented Apr 10, 2026 • edited by A9isha Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicoGrande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SurbhiJainUSC commented Apr 22, 2026

Uh oh!

NicoGrande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SurbhiJainUSC commented Apr 10, 2026 •

edited by A9isha

Loading

codecov Bot commented Apr 10, 2026 •

edited

Loading