Skip to content

feat(speculative): add vLLM target-model backend for EAGLE-3 training #2425

Description

@khazic

Summary

Add a vLLM-backed target model for EAGLE-3 / P-EAGLE training, mirroring the vLLM-based target inference used by torchspec. The frozen target would be served through vLLM during training to produce draft supervision.

Background

Same as the SGLang backend: EAGLE-3 already abstracts target inference behind Eagle3TargetBackend (nemo_automodel/components/speculative/eagle/backend.py), with co-located (HFEagle3TargetModel) and remote (RemoteEagle3TargetModel) implementations. A vLLM backend is another implementation of the same contract.

Scope

  • New Eagle3TargetBackend implementation backed by vLLM, returning the supervision tensors defined by Eagle3TargetBatch.
  • Wire into recipes/llm/train_eagle3.py (e.g. target_model_backend: vllm) with endpoint/model/parallelism config.
  • Unit tests for the client contract.

Environment constraint

Same isolation as the SGLang backend: vLLM will not be merged into the main training container. It will be pinned to a fixed version in a separate dedicated SD environment.

Open question

The backend must expose the intermediate auxiliary hidden states EAGLE-3 consumes, not only final logits. Validating that vLLM can surface those is the key feasibility item.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions