Commit 87c9f3f
Pooya Moradi
Add eval_batch_size knob for faster post-train RL evaluation
Post-train RL evaluation batched at trainer_config.batch_size, which
for GRPO is intentionally small (e.g. 4 prompts per training step ×
8 generations = 32 trajectories per step — set by the GRPO recipe to
keep trainer HBM workable for the backward pass). At eval time this
is wasteful: eval is greedy decode only (no backward), so the trainer
budget doesn't apply, and vLLM rollout has many DP replicas sitting
idle when only 4 prompts are dispatched per batch.
Add an `eval_batch_size` knob (default -1 = use batch_size, preserving
old behavior) that overrides the batch dimension during dataset
preparation for the test split. Setting it to e.g. 128 on a sampler
with 8 DP replicas gives a ~32x eval throughput improvement on TPU
without affecting training behavior.
Total eval examples = num_test_batches * eval_batch_size, so users
should adjust num_test_batches when increasing eval_batch_size to keep
total eval set size constant.1 parent c2d7758 commit 87c9f3f
3 files changed
Lines changed: 21 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
126 | 132 | | |
127 | 133 | | |
128 | 134 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2020 | 2020 | | |
2021 | 2021 | | |
2022 | 2022 | | |
| 2023 | + | |
2023 | 2024 | | |
2024 | 2025 | | |
2025 | 2026 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
346 | 358 | | |
347 | 359 | | |
348 | | - | |
| 360 | + | |
349 | 361 | | |
350 | | - | |
| 362 | + | |
351 | 363 | | |
352 | 364 | | |
353 | 365 | | |
| |||
0 commit comments