Commit a5c61ef
Pooya Moradi
Add eval_batch_size knob for faster post-train RL evaluation
Post-train RL evaluation batched at trainer_config.batch_size, which
for GRPO is intentionally small (e.g. 4 prompts per training step ×
8 generations = 32 trajectories per step — set by the GRPO recipe to
keep trainer HBM workable for the backward pass). At eval time this
is wasteful: eval is greedy decode only (no backward), so the trainer
budget doesn't apply, and vLLM rollout has many DP replicas sitting
idle when only 4 prompts are dispatched per batch.
Add an `eval_batch_size` knob (default -1 = use batch_size, preserving
old behavior) that overrides the batch dimension during dataset
preparation for the test split. Setting it to e.g. 128 on a sampler
with 8 DP replicas gives a ~32x eval throughput improvement on TPU
without affecting training behavior.
Total eval examples = num_test_batches * eval_batch_size, so users
should adjust num_test_batches when increasing eval_batch_size to keep
total eval set size constant.1 parent 493fba6 commit a5c61ef
3 files changed
Lines changed: 21 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
123 | 129 | | |
124 | 130 | | |
125 | 131 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2003 | 2003 | | |
2004 | 2004 | | |
2005 | 2005 | | |
| 2006 | + | |
2006 | 2007 | | |
2007 | 2008 | | |
2008 | 2009 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
346 | 358 | | |
347 | 359 | | |
348 | | - | |
| 360 | + | |
349 | 361 | | |
350 | | - | |
| 362 | + | |
351 | 363 | | |
352 | 364 | | |
353 | 365 | | |
| |||
0 commit comments