add tracer in v1 to log generator perf metrics#720
Merged
Conversation
|
@JenniferWang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91038187. |
2 tasks
6ffde9c to
072695e
Compare
facebook-github-bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
Summary: ## tl;dr Add tracer in v1 to log perf metrics to wandb ## V0 vs V1 Metrics Parity Comparison | Category | v0 Metric | v1 Metric | Parity | |----------|-----------|-----------|--------| | **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same | | **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same | | **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same | | **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally |⚠️ Skip (by design) | | **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same | ## Test Plan Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml` ``` wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run drawn-waterfall-686 wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p wandb: Detected [openai] in use. .... rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 === ... generator/generate/count_requests: 13.0 generator/generate/count_sequences_completed: 96.0 generator_perf/generate/total_duration_avg_s: 3.6518315022786463 generator_perf/generate/total_duration_max_s: 9.2080615234375 generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946 generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05 generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546 ... ``` Make sure integration tests that do not initialize the tracer still works `pytest tests/integration_tests/test_generator_lifecycle.py -v -s` ## Next Steps [ ] implement the prefetch logic & shared memory [-] Add metric similar to generator v0 [ ] Perf/Throughput testing compared to generator v0 Differential Revision: D91038187
allenwang28
approved these changes
Jan 22, 2026
Contributor
allenwang28
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
facebook-github-bot
pushed a commit
that referenced
this pull request
Jan 23, 2026
Summary: ## tl;dr Add tracer in v1 to log perf metrics to wandb ## V0 vs V1 Metrics Parity Comparison | Category | v0 Metric | v1 Metric | Parity | |----------|-----------|-----------|--------| | **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same | | **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same | | **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same | | **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally |⚠️ Skip (by design) | | **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same | ## Test Plan Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml` ``` wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run drawn-waterfall-686 wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p wandb: Detected [openai] in use. .... rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 === ... generator/generate/count_requests: 13.0 generator/generate/count_sequences_completed: 96.0 generator_perf/generate/total_duration_avg_s: 3.6518315022786463 generator_perf/generate/total_duration_max_s: 9.2080615234375 generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946 generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05 generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546 ... ``` Make sure integration tests that do not initialize the tracer still works `pytest tests/integration_tests/test_generator_lifecycle.py -v -s` ## Next Steps [ ] implement the prefetch logic & shared memory [-] Add metric similar to generator v0 [ ] Perf/Throughput testing compared to generator v0 Reviewed By: allenwang28 Differential Revision: D91038187
facebook-github-bot
pushed a commit
that referenced
this pull request
Jan 23, 2026
Summary: ## tl;dr Add tracer in v1 to log perf metrics to wandb ## V0 vs V1 Metrics Parity Comparison | Category | v0 Metric | v1 Metric | Parity | |----------|-----------|-----------|--------| | **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same | | **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same | | **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same | | **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally |⚠️ Skip (by design) | | **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same | ## Test Plan Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml` ``` wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run drawn-waterfall-686 wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p wandb: Detected [openai] in use. .... rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 === ... generator/generate/count_requests: 13.0 generator/generate/count_sequences_completed: 96.0 generator_perf/generate/total_duration_avg_s: 3.6518315022786463 generator_perf/generate/total_duration_max_s: 9.2080615234375 generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946 generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05 generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546 ... ``` Make sure integration tests that do not initialize the tracer still works `pytest tests/integration_tests/test_generator_lifecycle.py -v -s` ## Next Steps [ ] implement the prefetch logic & shared memory [-] Add metric similar to generator v0 [ ] Perf/Throughput testing compared to generator v0 Reviewed By: allenwang28 Differential Revision: D91038187
facebook-github-bot
pushed a commit
that referenced
this pull request
Jan 26, 2026
Summary: ## tl;dr Add tracer in v1 to log perf metrics to wandb ## V0 vs V1 Metrics Parity Comparison | Category | v0 Metric | v1 Metric | Parity | |----------|-----------|-----------|--------| | **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same | | **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same | | **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same | | **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally |⚠️ Skip (by design) | | **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same | ## Test Plan Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml` ``` wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run drawn-waterfall-686 wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p wandb: Detected [openai] in use. .... rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 === ... generator/generate/count_requests: 13.0 generator/generate/count_sequences_completed: 96.0 generator_perf/generate/total_duration_avg_s: 3.6518315022786463 generator_perf/generate/total_duration_max_s: 9.2080615234375 generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946 generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05 generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546 ... ``` Make sure integration tests that do not initialize the tracer still works `pytest tests/integration_tests/test_generator_lifecycle.py -v -s` ## Next Steps [ ] implement the prefetch logic & shared memory [-] Add metric similar to generator v0 [ ] Perf/Throughput testing compared to generator v0 Reviewed By: allenwang28 Differential Revision: D91038187
072695e to
dc35fed
Compare
allenwang28
approved these changes
Jan 26, 2026
felipemello1
approved these changes
Jan 26, 2026
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #720 +/- ##
==========================================
- Coverage 78.33% 71.40% -6.93%
==========================================
Files 36 41 +5
Lines 4209 4288 +79
==========================================
- Hits 3297 3062 -235
- Misses 912 1226 +314 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary: ## tl;dr Add tracer in v1 to log perf metrics to wandb ## V0 vs V1 Metrics Parity Comparison | Category | v0 Metric | v1 Metric | Parity | |----------|-----------|-----------|--------| | **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same | | **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same | | **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same | | **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally |⚠️ Skip (by design) | | **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity | | **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same | ## Test Plan Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml` ``` wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run drawn-waterfall-686 wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p wandb: Detected [openai] in use. .... rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 === ... generator/generate/count_requests: 13.0 generator/generate/count_sequences_completed: 96.0 generator_perf/generate/total_duration_avg_s: 3.6518315022786463 generator_perf/generate/total_duration_max_s: 9.2080615234375 generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946 generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05 generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546 ... ``` Make sure integration tests that do not initialize the tracer still works `pytest tests/integration_tests/test_generator_lifecycle.py -v -s` ## Next Steps [ ] implement the prefetch logic & shared memory [-] Add metric similar to generator v0 [ ] Perf/Throughput testing compared to generator v0 Reviewed By: allenwang28 Differential Revision: D91038187
dc35fed to
4f53917
Compare
HosseinKaviani-H
pushed a commit
to HosseinKaviani-H/forge
that referenced
this pull request
Feb 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
tl;dr
Add tracer in v1 to log perf metrics to wandb
V0 vs V1 Metrics Parity Comparison
generator/generate/count_requests(SUM)generator/generate/count_requests(SUM)generator/generate/count_sequences_completed(SUM)generator/generate/count_sequences_completed(SUM)generator_perf/generate/*(Tracer, GPU)generator_perf/generate/*(Tracer, GPU)generator_perf/update_weights/sum_pending_gen_requests(SUM)generator_perf/update_weights/avg_waiting_for_generation_duration_s(MEAN)generator_perf/update_weights/pause_generation_duration_s(MEAN)generator_perf/update_weights/wait_fetch_weights(MEAN)generator_perf/update_weights/worker_load_weights_duration_s(MEAN)generator_perf/update_weights/generator_worker_update/*(trace, GPU)generator_perf/update_weights/generator_worker_update/*(trace, GPU)Test Plan
Main GRPO app:
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yamlMake sure integration tests that do not initialize the tracer still works
pytest tests/integration_tests/test_generator_lifecycle.py -v -sNext Steps
[ ] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0
Differential Revision: D91038187