You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add eval_batch_size knob for faster post-train RL evaluation
Post-train RL evaluation batched at trainer_config.batch_size, which
for GRPO is intentionally small (e.g. 4 prompts per training step ×
8 generations = 32 trajectories — anything larger blows out KV cache
for the trainer). At eval time this is wasteful: vLLM rollout has many
DP replicas sitting idle when only 4 prompts are dispatched per batch.
Add an `eval_batch_size` knob (default -1 = use batch_size, preserving
old behavior) that overrides the batch dimension during dataset
preparation for the test split. Setting it to e.g. 128 on a sampler
with 8 DP replicas gives a ~32x eval throughput improvement on TPU
without affecting training behavior.
Total eval examples = num_test_batches * eval_batch_size, so users
should adjust num_test_batches when increasing eval_batch_size to keep
total eval set size constant.
0 commit comments