Summary
Add a vLLM-backed target model for EAGLE-3 / P-EAGLE training, mirroring the vLLM-based target inference used by torchspec. The frozen target would be served through vLLM during training to produce draft supervision.
Background
Same as the SGLang backend: EAGLE-3 already abstracts target inference behind Eagle3TargetBackend (nemo_automodel/components/speculative/eagle/backend.py), with co-located (HFEagle3TargetModel) and remote (RemoteEagle3TargetModel) implementations. A vLLM backend is another implementation of the same contract.
Scope
- New
Eagle3TargetBackend implementation backed by vLLM, returning the supervision tensors defined by Eagle3TargetBatch.
- Wire into
recipes/llm/train_eagle3.py (e.g. target_model_backend: vllm) with endpoint/model/parallelism config.
- Unit tests for the client contract.
Environment constraint
Same isolation as the SGLang backend: vLLM will not be merged into the main training container. It will be pinned to a fixed version in a separate dedicated SD environment.
Open question
The backend must expose the intermediate auxiliary hidden states EAGLE-3 consumes, not only final logits. Validating that vLLM can surface those is the key feasibility item.
Summary
Add a vLLM-backed target model for EAGLE-3 / P-EAGLE training, mirroring the vLLM-based target inference used by torchspec. The frozen target would be served through vLLM during training to produce draft supervision.
Background
Same as the SGLang backend: EAGLE-3 already abstracts target inference behind
Eagle3TargetBackend(nemo_automodel/components/speculative/eagle/backend.py), with co-located (HFEagle3TargetModel) and remote (RemoteEagle3TargetModel) implementations. A vLLM backend is another implementation of the same contract.Scope
Eagle3TargetBackendimplementation backed by vLLM, returning the supervision tensors defined byEagle3TargetBatch.recipes/llm/train_eagle3.py(e.g.target_model_backend: vllm) with endpoint/model/parallelism config.Environment constraint
Same isolation as the SGLang backend: vLLM will not be merged into the main training container. It will be pinned to a fixed version in a separate dedicated SD environment.
Open question
The backend must expose the intermediate auxiliary hidden states EAGLE-3 consumes, not only final logits. Validating that vLLM can surface those is the key feasibility item.