Skip to content

add tracer in v1 to log generator perf metrics#720

Merged
JenniferWang merged 1 commit into
mainfrom
export-D91038187
Jan 26, 2026
Merged

add tracer in v1 to log generator perf metrics#720
JenniferWang merged 1 commit into
mainfrom
export-D91038187

Conversation

@JenniferWang
Copy link
Copy Markdown
Contributor

Summary:

tl;dr

Add tracer in v1 to log perf metrics to wandb

V0 vs V1 Metrics Parity Comparison

Category v0 Metric v1 Metric Parity
Generate - Request Count generator/generate/count_requests (SUM) generator/generate/count_requests (SUM) ✅ Same
Generate - Completion Count generator/generate/count_sequences_completed (SUM) generator/generate/count_sequences_completed (SUM) ✅ Same
Generate - E2E Timing generator_perf/generate/* (Tracer, GPU) generator_perf/generate/* (Tracer, GPU) ✅ Same
Update - Pending Requests generator_perf/update_weights/sum_pending_gen_requests (SUM) N/A - AsyncLLM handles internally ⚠️ Skip (by design)
Update - Wait for Generation generator_perf/update_weights/avg_waiting_for_generation_duration_s (MEAN) generator_perf/update_weights/pause_generation_duration_s (MEAN) ✅ Equivalent - renamed for clarity
Update - Fetch Weights generator_perf/update_weights/wait_fetch_weights (MEAN) generator_perf/update_weights/worker_load_weights_duration_s (MEAN) ✅ Equivalent - renamed for clarity
Worker - Update Timing generator_perf/update_weights/generator_worker_update/* (trace, GPU) generator_perf/update_weights/generator_worker_update/* (trace, GPU) ✅ Same

Test Plan

Main GRPO app: python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml

wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run drawn-waterfall-686
wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training
wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p
wandb: Detected [openai] in use.
....
rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 ===
  ...
  generator/generate/count_requests: 13.0
  generator/generate/count_sequences_completed: 96.0
  generator_perf/generate/total_duration_avg_s: 3.6518315022786463
  generator_perf/generate/total_duration_max_s: 9.2080615234375
  generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946
  generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05
  generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546
  ...

Make sure integration tests that do not initialize the tracer still works
pytest tests/integration_tests/test_generator_lifecycle.py -v -s

Next Steps

[ ] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91038187

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Jan 21, 2026

@JenniferWang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91038187.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 21, 2026
@JenniferWang JenniferWang linked an issue Jan 21, 2026 that may be closed by this pull request
2 tasks
facebook-github-bot pushed a commit that referenced this pull request Jan 21, 2026
Summary:

## tl;dr
Add tracer in v1 to log perf metrics to wandb 

## V0 vs V1 Metrics Parity Comparison

| Category | v0 Metric | v1 Metric | Parity |
|----------|-----------|-----------|--------|
| **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same |
| **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same |
| **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same |
| **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally | ⚠️ Skip (by design) |
| **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same |

## Test Plan

Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml`

```
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run drawn-waterfall-686
wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training
wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p
wandb: Detected [openai] in use.
....
rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 ===
  ...
  generator/generate/count_requests: 13.0
  generator/generate/count_sequences_completed: 96.0
  generator_perf/generate/total_duration_avg_s: 3.6518315022786463
  generator_perf/generate/total_duration_max_s: 9.2080615234375
  generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946
  generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05
  generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546
  ...
```


Make sure integration tests that do not initialize the tracer still works 
`pytest tests/integration_tests/test_generator_lifecycle.py -v -s`


## Next Steps
[ ] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91038187
Copy link
Copy Markdown
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

facebook-github-bot pushed a commit that referenced this pull request Jan 23, 2026
Summary:

## tl;dr
Add tracer in v1 to log perf metrics to wandb 

## V0 vs V1 Metrics Parity Comparison

| Category | v0 Metric | v1 Metric | Parity |
|----------|-----------|-----------|--------|
| **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same |
| **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same |
| **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same |
| **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally | ⚠️ Skip (by design) |
| **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same |

## Test Plan

Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml`

```
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run drawn-waterfall-686
wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training
wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p
wandb: Detected [openai] in use.
....
rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 ===
  ...
  generator/generate/count_requests: 13.0
  generator/generate/count_sequences_completed: 96.0
  generator_perf/generate/total_duration_avg_s: 3.6518315022786463
  generator_perf/generate/total_duration_max_s: 9.2080615234375
  generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946
  generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05
  generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546
  ...
```


Make sure integration tests that do not initialize the tracer still works 
`pytest tests/integration_tests/test_generator_lifecycle.py -v -s`


## Next Steps
[ ] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Reviewed By: allenwang28

Differential Revision: D91038187
facebook-github-bot pushed a commit that referenced this pull request Jan 23, 2026
Summary:

## tl;dr
Add tracer in v1 to log perf metrics to wandb 

## V0 vs V1 Metrics Parity Comparison

| Category | v0 Metric | v1 Metric | Parity |
|----------|-----------|-----------|--------|
| **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same |
| **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same |
| **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same |
| **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally | ⚠️ Skip (by design) |
| **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same |

## Test Plan

Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml`

```
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run drawn-waterfall-686
wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training
wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p
wandb: Detected [openai] in use.
....
rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 ===
  ...
  generator/generate/count_requests: 13.0
  generator/generate/count_sequences_completed: 96.0
  generator_perf/generate/total_duration_avg_s: 3.6518315022786463
  generator_perf/generate/total_duration_max_s: 9.2080615234375
  generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946
  generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05
  generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546
  ...
```


Make sure integration tests that do not initialize the tracer still works 
`pytest tests/integration_tests/test_generator_lifecycle.py -v -s`


## Next Steps
[ ] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Reviewed By: allenwang28

Differential Revision: D91038187
facebook-github-bot pushed a commit that referenced this pull request Jan 26, 2026
Summary:

## tl;dr
Add tracer in v1 to log perf metrics to wandb 

## V0 vs V1 Metrics Parity Comparison

| Category | v0 Metric | v1 Metric | Parity |
|----------|-----------|-----------|--------|
| **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same |
| **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same |
| **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same |
| **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally | ⚠️ Skip (by design) |
| **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same |

## Test Plan

Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml`

```
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run drawn-waterfall-686
wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training
wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p
wandb: Detected [openai] in use.
....
rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 ===
  ...
  generator/generate/count_requests: 13.0
  generator/generate/count_sequences_completed: 96.0
  generator_perf/generate/total_duration_avg_s: 3.6518315022786463
  generator_perf/generate/total_duration_max_s: 9.2080615234375
  generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946
  generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05
  generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546
  ...
```


Make sure integration tests that do not initialize the tracer still works 
`pytest tests/integration_tests/test_generator_lifecycle.py -v -s`


## Next Steps
[ ] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Reviewed By: allenwang28

Differential Revision: D91038187
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.40%. Comparing base (080770c) to head (dc35fed).
⚠️ Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #720      +/-   ##
==========================================
- Coverage   78.33%   71.40%   -6.93%     
==========================================
  Files          36       41       +5     
  Lines        4209     4288      +79     
==========================================
- Hits         3297     3062     -235     
- Misses        912     1226     +314     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Summary:

## tl;dr
Add tracer in v1 to log perf metrics to wandb 

## V0 vs V1 Metrics Parity Comparison

| Category | v0 Metric | v1 Metric | Parity |
|----------|-----------|-----------|--------|
| **Generate - Request Count** | `generator/generate/count_requests` (SUM) | `generator/generate/count_requests` (SUM) | ✅ Same |
| **Generate - Completion Count** | `generator/generate/count_sequences_completed` (SUM) | `generator/generate/count_sequences_completed` (SUM) | ✅ Same |
| **Generate - E2E Timing** | `generator_perf/generate/*` (Tracer, GPU) | `generator_perf/generate/*` (Tracer, GPU) | ✅ Same |
| **Update - Pending Requests** | `generator_perf/update_weights/sum_pending_gen_requests` (SUM) | N/A - AsyncLLM handles internally | ⚠️ Skip (by design) |
| **Update - Wait for Generation** | `generator_perf/update_weights/avg_waiting_for_generation_duration_s` (MEAN) | `generator_perf/update_weights/pause_generation_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Update - Fetch Weights** | `generator_perf/update_weights/wait_fetch_weights` (MEAN) | `generator_perf/update_weights/worker_load_weights_duration_s` (MEAN) | ✅ Equivalent - renamed for clarity |
| **Worker - Update Timing** | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | `generator_perf/update_weights/generator_worker_update/*` (trace, GPU) | ✅ Same |

## Test Plan

Main GRPO app: `python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml`

```
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run drawn-waterfall-686
wandb: ⭐️ View project at https://meta.wandb.io/jiyue/grpo-training
wandb: 🚀 View run at https://meta.wandb.io/jiyue/grpo-training/runs/6pltx38p
wandb: Detected [openai] in use.
....
rvability.metric_actors.GlobalLoggingActor global_logger>] === [global_reduce] - METRICS STEP 1 ===
  ...
  generator/generate/count_requests: 13.0
  generator/generate/count_sequences_completed: 96.0
  generator_perf/generate/total_duration_avg_s: 3.6518315022786463
  generator_perf/generate/total_duration_max_s: 9.2080615234375
  generator_perf/update_weights/pause_generation_duration_s: 2.8634108749683946
  generator_perf/update_weights/resume_generation_duration_s: 1.918897032737732e-05
  generator_perf/update_weights/worker_load_weights_duration_s: 3.506648204056546
  ...
```


Make sure integration tests that do not initialize the tracer still works 
`pytest tests/integration_tests/test_generator_lifecycle.py -v -s`


## Next Steps
[ ] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Reviewed By: allenwang28

Differential Revision: D91038187
@JenniferWang JenniferWang merged commit 58bf8e3 into main Jan 26, 2026
12 checks passed
HosseinKaviani-H pushed a commit to HosseinKaviani-H/forge that referenced this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[vLLM v0.13] Re-architect forge's integration with vLLM (generator.py)

4 participants