Skip to content

Commit 5d76b11

Browse files
authored
update vlm & audio eagle3 doc & readme (Tencent#201)
1 parent 35b9525 commit 5d76b11

11 files changed

Lines changed: 700 additions & 14 deletions

File tree

README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,8 @@ bash scripts/speculative/train_eagle3_online.sh
233233

234234
For detailed training configurations and vLLM performance benchmarks of Eagle3, please refer to the [Quick Start Guide for Speculative Sampling](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html#id5).
235235

236+
Training and Deployment Guide for Multimodal Model Eagle3—Supporting LLM, VLM, and Audio (ASR & TTS) Models: [LLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/eagle.html) | [VLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/vlm_eagle.html) | [Audio(ASR)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_eagle.html) | [Audio(TTS)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_tts_eagle.html).
237+
236238
#### 2.2 LLM/VLM Model Quantization
237239

238240
After installing `AngelSlim`, you can launch static FP8 quantization for the Qwen3-1.7B model with the following one-command script:
@@ -507,6 +509,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
507509
<th colspan="2">MATH-500</th>
508510
<th colspan="2">MMMU</th>
509511
<th colspan="2">MMStar</th>
512+
<th>Mean</th>
513+
<th></th>
510514
</tr></thead>
511515
<tbody>
512516
<tr>
@@ -526,6 +530,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
526530
<td>accept length</td>
527531
<td>throughput (tokens/s)</td>
528532
<td>accept length</td>
533+
<td>throughput (tokens/s)</td>
534+
<td>accept length</td>
529535
</tr>
530536
<tr>
531537
<td rowspan="2">Qwen3-VL-2B-Instruct</td>
@@ -544,6 +550,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
544550
<td>1</td>
545551
<td>81.63</td>
546552
<td>1</td>
553+
<td>234.24</td>
554+
<td>1</td>
547555
</tr>
548556
<tr>
549557
<td>Eagle3</td>
@@ -561,6 +569,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
561569
<td>2.55</td>
562570
<td>139.73</td>
563571
<td>2.31</td>
572+
<td>415.76</td>
573+
<td>2.5</td>
564574
</tr>
565575
<tr>
566576
<td rowspan="2">Qwen3-VL-4B-Instruct</td>
@@ -579,6 +589,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
579589
<td>1</td>
580590
<td>67.75</td>
581591
<td>1</td>
592+
<td>150.21</td>
593+
<td>1</td>
582594
</tr>
583595
<tr>
584596
<td>Eagle3</td>
@@ -596,6 +608,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
596608
<td>2.05</td>
597609
<td>107.07</td>
598610
<td>2.1</td>
611+
<td>283.32</td>
612+
<td>2.41</td>
599613
</tr>
600614
<tr>
601615
<td rowspan="2">Qwen3-VL-30B-A3B-Instruct</td>
@@ -614,6 +628,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
614628
<td>1</td>
615629
<td>30.93</td>
616630
<td>1</td>
631+
<td>115.33</td>
632+
<td>1</td>
617633
</tr>
618634
<tr>
619635
<td>Eagle3</td>
@@ -631,6 +647,8 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
631647
<td>1.78</td>
632648
<td>52.57</td>
633649
<td>1.94</td>
650+
<td>166.17</td>
651+
<td>2.32</td>
634652
</tr>
635653
</tbody></table>
636654

@@ -684,7 +702,7 @@ Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.
684702
<td>accept length</td>
685703
</tr>
686704
<tr>
687-
<td rowspan="2">Qwen2_Audio</td>
705+
<td rowspan="2">Qwen2-Audio</td>
688706
<td>Vanilla</td>
689707
<td>78.76</td>
690708
<td>1</td>

README_cn.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,7 @@ bash scripts/speculative/train_eagle3_online.sh
234234

235235
详细训练配置,以及`Eagle3`的vLLM性能测试,详情请参考投机采样[快速开始文档](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html#id5)
236236

237+
多模态模型 Eagle3 训练与部署指南,支持LLM / VLM / Audio (ASR & TTS) 模型:[LLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/eagle.html) | [VLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/vlm_eagle.html) | [Audio(ASR)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_eagle.html) | [Audio(TTS)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_tts_eagle.html).
237238
#### 2.2 LLM/VLM模型量化
238239
完成安装`AngelSlim`后,您可以通过以下脚本快速开始,完成`Qwen3-1.7B`模型的静态`FP8`量化:
239240

@@ -511,6 +512,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
511512
<th colspan="2">MATH-500</th>
512513
<th colspan="2">MMMU</th>
513514
<th colspan="2">MMStar</th>
515+
<th>Mean</th>
516+
<th></th>
514517
</tr></thead>
515518
<tbody>
516519
<tr>
@@ -530,6 +533,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
530533
<td>accept length</td>
531534
<td>throughput (tokens/s)</td>
532535
<td>accept length</td>
536+
<td>throughput (tokens/s)</td>
537+
<td>accept length</td>
533538
</tr>
534539
<tr>
535540
<td rowspan="2">Qwen3-VL-2B-Instruct</td>
@@ -548,6 +553,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
548553
<td>1</td>
549554
<td>81.63</td>
550555
<td>1</td>
556+
<td>234.24</td>
557+
<td>1</td>
551558
</tr>
552559
<tr>
553560
<td>Eagle3</td>
@@ -565,6 +572,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
565572
<td>2.55</td>
566573
<td>139.73</td>
567574
<td>2.31</td>
575+
<td>415.76</td>
576+
<td>2.5</td>
568577
</tr>
569578
<tr>
570579
<td rowspan="2">Qwen3-VL-4B-Instruct</td>
@@ -583,6 +592,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
583592
<td>1</td>
584593
<td>67.75</td>
585594
<td>1</td>
595+
<td>150.21</td>
596+
<td>1</td>
586597
</tr>
587598
<tr>
588599
<td>Eagle3</td>
@@ -600,6 +611,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
600611
<td>2.05</td>
601612
<td>107.07</td>
602613
<td>2.1</td>
614+
<td>283.32</td>
615+
<td>2.41</td>
603616
</tr>
604617
<tr>
605618
<td rowspan="2">Qwen3-VL-30B-A3B-Instruct</td>
@@ -618,6 +631,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
618631
<td>1</td>
619632
<td>30.93</td>
620633
<td>1</td>
634+
<td>115.33</td>
635+
<td>1</td>
621636
</tr>
622637
<tr>
623638
<td>Eagle3</td>
@@ -635,6 +650,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
635650
<td>1.78</td>
636651
<td>52.57</td>
637652
<td>1.94</td>
653+
<td>166.17</td>
654+
<td>2.32</td>
638655
</tr>
639656
</tbody></table>
640657

@@ -689,7 +706,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
689706
<td>accept length</td>
690707
</tr>
691708
<tr>
692-
<td rowspan="2">Qwen2_Audio</td>
709+
<td rowspan="2">Qwen2-Audio</td>
693710
<td>Vanilla</td>
694711
<td>78.76</td>
695712
<td>1</td>
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# 语音理解模型EAGLE3
2+
3+
[Eagle3](https://arxiv.org/pdf/2503.01840)是目前最常用、加速效果最好的投机采样算法。
4+
本项目包括Eagle3的训练以及benchmark测试,并开源了Qwen2Audio的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)
5+
6+
我们训练的Qwen2Audio Eagle3模型的表现可以参见基准测试[benchmarks](../../performance/speculative_decoding/benchmarks.md)
7+
其中全部数据都是在单张H20上使用vLLM推理获得。
8+
9+
## 1. 支持模型列表
10+
- `Qwen2Audio`
11+
12+
## 2. 准备数据
13+
14+
### 2.1 数据组织形式
15+
16+
所有数据需保存在jsonl文件中,训练数据格式可参考:
17+
18+
- 数据示例: AngelSlim/dataset/librispeech_test/librispeech_eval_10_test.jsonl
19+
20+
```shell
21+
{"id": 5910, "conversations": [{"role": "user", "content": [{"type": "audio", "audio": "./audios/1580-141083-0008.flac"}, {"type": "text", "text": "Detect the language and recognize the speech: <|en|>"}]}, {"role": "assistant", "content": [{"type": "text", "text": "THE PROOF WAS IN THREE LONG SLIPS I HAD LEFT THEM ALL TOGETHER"}]}]}
22+
```
23+
24+
- 典型字段意义如下:
25+
- id: 对话唯一标识
26+
- conversations: OpenAI 对话格式
27+
- audio: 对应音频文件路径
28+
29+
### 2.2 重采样训练数据(推荐)
30+
31+
为得到高质量的目标模型SFT数据,建议使用目标模型重新采样训练数据,将LLM生成的结果保存在jsonl文件中,对应的Audio文件存储在同一目录下,组织形式同上。
32+
33+
可基于实际应用场景自行生成训练数据,下面提供vLLM生成数据流程参考:
34+
35+
**步骤1:启动vLLM server**
36+
37+
首先需要启动vLLM server来提供模型推理服务:
38+
39+
```shell
40+
bash scripts/speculative/run_vllm_server.sh
41+
```
42+
43+
**server配置说明:**
44+
- 该脚本会启动目标基础模型的vLLM推理服务
45+
- 确保服务器成功启动后再进行下一步数据生成
46+
- 可以通过修改脚本中的参数来调整vLLM server配置(如vLLM启动参数、GPU数量等),来适应不同的目标模型
47+
48+
**步骤2:生成采样数据**
49+
50+
vLLM server启动后,使用 `scripts/speculative/generate_data_for_target_model.sh` 脚本生成训练数据:
51+
52+
```shell
53+
bash scripts/speculative/generate_data_for_target_model.sh
54+
```
55+
56+
**脚本功能说明:**
57+
- 通过vLLM server调用目标基础模型对输入数据进行采样
58+
- 生成 `.jsonl` 格式的训练数据集
59+
- 数据将用于后续Eagle模型的在线训练
60+
61+
**脚本参数说明:**
62+
63+
在使用前,需要在脚本中配置以下参数:
64+
65+
- `DATA_NAME_OR_PATH`: 输入数据集的HF名称或本地路径
66+
- `OUTPUT_DIR`: 生成的数据集输出路径
67+
- `DATA_FORMAT`: 输入数据集的格式(sharegpt|ultrachat)
68+
- `DATA_SHARD_SIZE`: 生成数据集的切分子集大小
69+
- `BASE_PORT`: vLLM server的端口号
70+
71+
**注意事项:**
72+
- 确保vLLM服务器已成功启动并正常运行
73+
- 数据生成过程可能需要较长时间,取决于样本数量和模型规模
74+
75+
76+
## 3. 训练Eagle3模型
77+
78+
目前支持Qwen2Audio在线训练模式:在线训练适合显存足够、目标模型不大、训练上下文长度不要求极长的场景。
79+
80+
### 3.1 在线训练
81+
82+
使用下面的脚本进行Eagle3模型的在线训练:
83+
84+
```shell
85+
bash scripts/speculative/qwen2_audio/train_eagle3_audio_online.sh
86+
```
87+
88+
**脚本参数说明:**
89+
90+
在使用前,需要在脚本中配置以下参数:
91+
92+
- `TARGET_MODEL_NAME_OR_PATH`: 目标模型的HF名称或本地名称
93+
- `DRAFT_MODEL_CONFIG_PATH`: 草稿模型的config路径
94+
- `TRAIN_DATA_PATH`: 训练数据路径
95+
- `EVAL_DATA_PATH`: 验证数据路径
96+
- `OUTPUT_DIR`: Eagle3模型输出路径
97+
- `MODEL_MAX_LENGTH`: 训练数据的最大长度
98+
- `CHAT_TEMPLATE_TYPE`: 目标模型的数据模板类型
99+
100+
## 4. 基准测试
101+
102+
AngelSlim提供了Qwen2Audio模型vLLM backend的Eagle3基准测试脚本,用于评估投机采样的性能提升。
103+
104+
### 4.1 vLLM基准测试
105+
106+
> vLLM 适配参考: [Support Eagle3 for Qwen2Audio](https://github.com/vllm-project/vllm/pull/32230)
107+
108+
#### 4.1.1 基本用法
109+
110+
使用 `tools/vllm_offline_eagle3_qwen2_audio_bench.py` 脚本进行投机采样基准测试:
111+
112+
```shell
113+
python3 tools/vllm_offline_eagle3_qwen2_audio_bench.py \
114+
--target_model ${BASE_MODEL_PATH} \
115+
--draft_model ${EAGLE_MODEL_PATH} \
116+
--output_file ${OUTPUT_FILE} \
117+
--use_eagle \
118+
```
119+
120+
#### 4.1.2 参数说明
121+
122+
**模型配置参数:**
123+
- `--target_model`: 基础模型路径(必需)
124+
- `--draft_model`: Eagle辅助模型路径(必需)
125+
126+
**基准测试配置:**
127+
- `--test_data_path`: 测试jsonl文件路径,默认为: "dataset/librispeech_test/librispeech_eval_10_test.jsonl"
128+
- `--use_eagle`: 运行Eagle3推理,默认为False
129+
- `--output_file`: 输出结果文件路径
130+
- `--num_prompts`: 测试用例数量,默认为100
131+
132+
**生成参数:**
133+
- `--temp`: 采样温度,默认为 0
134+
- `--max_model_len`: 最大上下文长度,默认为 16384
135+
- `--output_len`: 最大生成token数,默认为 1024
136+
- `--max_num_seqs`: 每次迭代的最大序列数,默认为 1
137+
- `--num_spec_tokens`: draft model投机采样token数量,默认为2
138+
139+
**硬件配置:**
140+
- `--tp`: 张量并行大小,默认为1
141+
142+
**其他设置:**
143+
- `--seed`: 随机种子
144+
145+
#### 4.1.3 使用示例
146+
147+
测试数据组织形式:所有数据需保存在jsonl文件中,对应的Audio文件存储在同一目录下,目录结构可参考:
148+
```
149+
└── librispeech_test
150+
├── librispeech_eval_10_test.json
151+
├── audios
152+
│ ├── xxx.flac
153+
│ ├── xxx.flac
154+
```
155+
156+
**运行投机采样:**
157+
```shell
158+
python3 tools/vllm_offline_eagle3_qwen2_audio_bench.py \
159+
--target_model Qwen/Qwen2-Audio-7B-Instruct \
160+
--draft_model "$EAGLE_DIR" \
161+
--use_eagle \
162+
--num_spec_tokens 4 \
163+
--num_prompts 10 \
164+
--temp 0 \
165+
--max_num_seqs 1 \
166+
--output_len 1024 \
167+
--output_file "$OUTPUT_FILE"
168+
```
169+
170+
**Baseline基准测试:**
171+
```shell
172+
python3 tools/vllm_offline_eagle3_qwen2_audio_bench.py \
173+
--target_model Qwen/Qwen2-Audio-7B-Instruct \
174+
--num_prompts 10 \
175+
--temp 0 \
176+
--max_num_seqs 1 \
177+
--output_len 1024 \
178+
--output_file "$OUTPUT_FILE"
179+
```
180+
181+
#### 4.1.4 性能报告
182+
183+
运行完成后,工具会自动生成性能报告,包括:
184+
- 投机采样与基线模型的性能对比
185+
- 加速比统计
186+
- 生成质量指标(如果启用)
187+
188+
结果将保存在指定的输出目录中,便于后续分析和比较。
189+
190+
完整的vLLM benchmark结果可见[Benchmark](../../../performance/speculative_decoding/benchmarks.md)

0 commit comments

Comments
 (0)