Skip to content

Commit 71d888b

Browse files
committed
add vllm_backend doc
1 parent bcea231 commit 71d888b

3 files changed

Lines changed: 25 additions & 14 deletions

File tree

docs/source/features/speculative_decoding/eagle/vlm_eagle.md

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -81,29 +81,39 @@ bash scripts/speculative/generate_data_for_target_model.sh
8181

8282
#### 2.2.2 为Eagle3模型生成hidden states
8383

84-
目前仅支持以HF为后端生成hidden states,调用脚本如下:
84+
目前支持两种后端生成hidden states:**HF后端(torchrun)****vLLM后端(Ray)**
85+
86+
> 注意:qwen3_vl系列模型生成hidden states需要更新transformers>=5.0.0,
87+
或者cherry-pick: https://github.com/huggingface/transformers/pull/42609,
88+
否则抓取的hidden states不可用!!!
89+
90+
##### 方式一:HF后端(torchrun)
91+
92+
使用HuggingFace Transformers作为推理后端,通过torchrun进行多卡分布式生成。适合对HF生态兼容性要求高的场景。
93+
94+
调用脚本如下:
8595
```shell
8696
# For HunyuanOCR
8797
bash scripts/speculative/hunyuan_ocr/generate_vlm_hidden_for_draft_model.sh
8898
# For Qwen3-VL series
8999
bash scripts/speculative/qwen3_vl/generate_vlm_hidden_for_draft_model.sh
90100
```
91101

92-
> 注意:qwen3_vl系列模型生成hidden states需要更新transformers>=5.0.0,
93-
或者cherry-pick: https://github.com/huggingface/transformers/pull/42609,
94-
否则抓取的hidden states不可用!!!
95102

96-
**脚本参数说明:**
103+
##### 方式二:vLLM后端(Ray)
97104

98-
在使用前,需要在脚本中配置以下参数:
105+
使用vLLM作为推理后端对采样过程进行加速,通过Ray进行分布式调度。**推荐在多节点、大规模生成场景下使用**
99106

100-
- `DATASET_PATH`: 输入数据集的HF名称或本地路径
101-
- `TARGET_MODEL_NAME_OR_PATH`: 目标模型的HF名称或本地路径
102-
- `DRAFT_MODEL_CONFIG_PATH`: 草稿模型的config路径
103-
- `TARGET_BACKEND`: 目标模型后端,目前仅支持HF
104-
- `MODEL_MAX_LENGTH`: 生成数据的上下文长度
105-
- `CHAT_TEMPLATE_TYPE`: 目标模型的目标类型,目前支持qwen3_vl/hunyuan_vl
106-
- `OUTPUT_DIR`: 生成的数据集输出路径
107+
**核心优势:**
108+
- 支持多节点Ray集群,自动管理节点间通信
109+
- 支持vLLM的tensor parallel,充分利用多卡资源
110+
- 自动处理Ray集群的启动、任务分发和资源回收
111+
112+
调用脚本如下:
113+
114+
```shell
115+
bash scripts/speculative/qwen3_vl/generate_vlm_hidden_for_draft_model_ray.sh
116+
```
107117

108118

109119
## 3. 训练Eagle3模型

tools/ray_generate_hidden_for_draft_model.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,7 @@ def main():
254254
# Read draft model config
255255
draft_vocab_size = None
256256
target_vocab_size = None
257+
logger.info(f"args.draft_model_config_path: {args.draft_model_config_path}")
257258
if args.draft_model_config_path is not None:
258259
draft_config = DraftModelConfig.from_file(args.draft_model_config_path)
259260
draft_vocab_size = getattr(draft_config, "draft_vocab_size", None)

tools/train_eagle3_offline.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ def train():
302302
)
303303

304304
# Create draft model
305-
rank0_print("Loading draft model...")
305+
rank0_print(f"Loading draft model: {args.draft_model_config_path}")
306306
draft_model_config = DraftModelConfig.from_file(args.draft_model_config_path)
307307
draft_model = create_draft_model(draft_model_config)
308308
draft_model.load_embed_weights(args.target_model_name_or_path, args.embed_weight_key)

0 commit comments

Comments
 (0)