Skip to content

prefetch weights while waiting for pending requests to complete#728

Merged
JenniferWang merged 1 commit into
mainfrom
export-D91092833
Jan 27, 2026
Merged

prefetch weights while waiting for pending requests to complete#728
JenniferWang merged 1 commit into
mainfrom
export-D91092833

Conversation

@JenniferWang
Copy link
Copy Markdown
Contributor

Summary:
Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

Test Plan

Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

Test Group (V1)

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================

Next Steps

[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Jan 23, 2026

@JenniferWang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91092833.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 23, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 167 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.73%. Comparing base (080770c) to head (0c99d56).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
benchmarks/generator/weight_sync.py 0.00% 167 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #728      +/-   ##
==========================================
- Coverage   78.33%   68.73%   -9.61%     
==========================================
  Files          36       42       +6     
  Lines        4209     4455     +246     
==========================================
- Hits         3297     3062     -235     
- Misses        912     1393     +481     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

facebook-github-bot pushed a commit that referenced this pull request Jan 23, 2026
Summary:

Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

## Test Plan 
Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

```

Test Group (V1)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================
```


## Next Steps
[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833
Copy link
Copy Markdown
Contributor

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if this is supposed to be a stacked diff? Contains more than just prefetch information.

Comment thread benchmarks/generator/weight_sync.py Outdated
"""TitanTrainer with weight modification capabilities for benchmarking."""

@endpoint
async def modify_weights(self, scale: float = 1.1):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: do we need to parameterize this? Can't we just 1) assume it's a floating point and 2) arbitrarily add or scale by X ?

Copy link
Copy Markdown
Contributor Author

@JenniferWang JenniferWang Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll simplify this. The rest of the test is quite essential.

logger.info(
"[ForgeMonarchExecutor] Deserializing TorchStore Controller from environment..."
)
self.torchstore_controller = cloudpickle.loads(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅

model: str
iterations: int
prefetch_enabled: bool
n_fetcher_procs: int
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test how this parameter affects throughput?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_fetcher_procs -- too high will slightly degrade the overall time.
prefetch_enabled -- this is the major toggle.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but is there a graph of where "too high" is? I imagine that also too low will not be optimized.

JenniferWang added a commit that referenced this pull request Jan 26, 2026
Summary:
Pull Request resolved: #728

Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

## Test Plan
Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

```

Test Group (V1)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================
```

## Next Steps
[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833
facebook-github-bot pushed a commit that referenced this pull request Jan 26, 2026
Summary:

Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

## Test Plan 
Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

```

Test Group (V1)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================
```


## Next Steps
[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833
JenniferWang added a commit that referenced this pull request Jan 26, 2026
Summary:
Pull Request resolved: #728

Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

## Test Plan
Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

```

Test Group (V1)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================
```

## Next Steps
[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833
facebook-github-bot pushed a commit that referenced this pull request Jan 27, 2026
Summary:

Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

## Test Plan 
Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

```

Test Group (V1)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================
```


## Next Steps
[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833
Summary:

Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

## Test Plan 
Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

```

Test Group (V1)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================
```


## Next Steps
[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833
facebook-github-bot pushed a commit that referenced this pull request Jan 27, 2026
Summary:

Feature parity with v0: allow prefetching weights while waiting for the pending requests to finish.

## Test Plan 
Introduced a benchmark that simulates the on-going requests with actual weight sync logic.

Reference Group (V0)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.102 s         2.99 GB/s
Avg update_weights                   43.738 s         0.35 GB/s
Avg total (push + update)            48.840 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.208 s         2.93 GB/s
Avg update_weights                   29.602 s         0.52 GB/s
Avg total (push + update)            34.810 s
================================================================================

```

Test Group (V1)
```
================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: False
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.070 s         3.01 GB/s
Avg update_weights                   39.974 s         0.38 GB/s
Avg total (push + update)            45.044 s
================================================================================

================================================================================
WEIGHT SYNC BENCHMARK RESULTS
================================================================================
Model: Qwen/Qwen3-8B
Model size: 15.26 GB
Iterations: 3
Prefetch enabled: True
Fetcher procs: 8
--------------------------------------------------------------------------------
Metric                         Time (s)        Throughput (GB/s)   
--------------------------------------------------------------------------------
Avg push_weights                      5.055 s         3.02 GB/s
Avg update_weights                   28.730 s         0.53 GB/s
Avg total (push + update)            33.784 s
================================================================================
```


## Next Steps
[-] implement the prefetch logic & shared memory
[-] Add metric similar to generator v0
[ ] Perf/Throughput testing compared to generator v0

Differential Revision: D91092833
@facebook-github-bot facebook-github-bot force-pushed the export-D91092833 branch 2 times, most recently from 6f9290d to f5fe7d6 Compare January 27, 2026 19:31
Copy link
Copy Markdown
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@JenniferWang JenniferWang merged commit 2729bdc into main Jan 27, 2026
11 of 12 checks passed
@JenniferWang JenniferWang linked an issue Jan 28, 2026 that may be closed by this pull request
2 tasks
HosseinKaviani-H pushed a commit to HosseinKaviani-H/forge that referenced this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[vLLM v0.13] Re-architect forge's integration with vLLM (generator.py)

4 participants