Add time/per_group_wall_time metric#1656
Conversation
…ime). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces the time/per_group_wall_time metric, which calculates the mean generation time across a batch to provide a more consistent timing signal. The implementation updates the TokenStatistics data structure, the data loader's batch aggregation logic, and the training loop's metric logging. Review feedback suggests moving the changelog entry to the 'Added' section for better organization and optimizing the mean calculation by using a running sum instead of a list to improve memory efficiency and simplify the code.
…ntry to Added. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tchData. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ply@anthropic.com>
| total_tokens = result.token_statistics.num_prompt_tokens + result.token_statistics.num_response_tokens | ||
| step_metrics["val/actor_tokens_per_second"] = total_tokens / result.token_statistics.generation_time | ||
| step_metrics["time/getting_response"] = result.token_statistics.generation_time | ||
| step_metrics["time/per_group_wall_time"] = result.token_statistics.mean_per_group_wall_time |
There was a problem hiding this comment.
would median or mean be better? I worry about weird outlier group times?
There was a problem hiding this comment.
I like mean for exactly that reason, it'll capture the aggregate times. I worry with the median we won't see the worst times.
Summary
time/per_group_wall_timemetric: the mean of per-group generation wall-clock times across the batch.time/getting_response, which is the max of per-group times and gets dominated by tail-inflated groups when many requests are in flight (high--num_async_steps, many engines, etc.).metadata["start_time"]inadd_request;current_timeinfinalize_completed_request); per-groupgeneration_timeis already onTokenStatistics. This change just plumbs the mean throughmake_batch_from_groupsand emits it as a new metric.Applies to both
grpo_fast.pyandgrpo.pysince they shareDataPreparationActor.Test plan
scripts/train/debug/single_gpu_grpo.shsucceeds;time/per_group_wall_timeappears in wandb.scripts/train/qwen/qwen3_4b_dapo_math_oc.sh(50 steps): inspecttime/per_group_wall_timevstime/getting_response; mean should be <= max.🤖 Generated with Claude Code