[vLLM v0.13] Re-architect forge's integration with vLLM (`generator.py`)

## Context

forge's current integration with vLLM is essentially a bespoke fork of the vLLM `EngineCore` that integrates with Monarch's distributed process mesh architecture. It manually recreates the EngineCore execution loop while replacing vLLM's multi-process ZMQ communication with Monarch RPC.

**However, [the attempt to upgrade from v0.10.0 to v0.13.0](https://github.com/meta-pytorch/torchforge/pull/668/changes) has made it clear that this engine-level fork is not sustainable and a complete refactor is required before we add any more optimization logic into the `generator.py`.** 

What's Implemented (Aligned with vLLM v0.13):
    - Scheduler integration (request scheduling, KV cache management)
    - InputProcessor (tokenization, validation)
    - OutputProcessor (detokenization, output formatting)
    - StructuredOutputManager (grammar-based constrained decoding)
    - Two-phase execution (execute_model → sample_tokens for grammar constraints)
    - Request block hashing (prefix caching)
    - KV cache configuration and initialization
    - Tensor-parallel (TP) model execution

What's NOT Implemented (vs. vLLM v0.13 EngineCore):
    - Pipeline-parallel (PP) execution - Generator assumes single-rank workers
    - Multi-modal input processing (mm_receiver_cache integration incomplete)
    - Async scheduling (overlapping schedule/execute for throughput)
    - Batch queue for pipeline parallelism (batch_queue_size > 1)
    - Data-parallel coordination (DPCoordinator for multi-engine deployments)
    - Abort queue (mid-generation request cancellation)
    - EngineCoreClient abstraction (direct worker calls instead)

## Goal 

- [ ] Re-architect the integration point -- the minimum requirement is to integrate at least the EngineCore level so that we do not need to manage the scheduler, the kv cache etc manually.  
- [ ] Upgrade to v0.13.0 or the latest RC.

(Diagram from https://www.ubicloud.com/blog/life-of-an-inference-request-vllm-v1)
<img width="1200" height="631" alt="Image" src="https://github.com/user-attachments/assets/ab1890a4-5812-48b5-985f-4bce7facea0f" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vLLM v0.13] Re-architect forge's integration with vLLM (`generator.py`) #669

Context

Goal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[vLLM v0.13] Re-architect forge's integration with vLLM (generator.py) #669

Description

Context

Goal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[vLLM v0.13] Re-architect forge's integration with vLLM (`generator.py`) #669