Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 8 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
<ul style="padding-left: 0; list-style-position: inside;">
<li><a href="https://huggingface.co/collections/Qwen/qwen3-omni">Qwen3-Omni</a></li>
<li><a href="https://huggingface.co/collections/Qwen/qwen2-audio">Qwen2-Audio</a></li>
<li><a href="https://huggingface.co/FunAudioLLM/Fun-CosyVoice3-0.5B-2512">Fun-CosyVoice3</a></li>
</ul>
</td>
<td>
Expand Down Expand Up @@ -341,7 +342,7 @@ For more detaileds, please refer to the [Deployment Documentation](https://angel

### 1. Speculative Decoding

We evaluated the Eagle3 model trained by AngelSlim on tasks including code generation, mathematical reasoning, instruction following, text generation, and multimodal understanding using vLLM. The inference acceleration and context length performance of our trained model under the settings of num_speculative_tokens = 2 or 4 are presented as follows.
We evaluated the Eagle3 model trained by AngelSlim on tasks including code generation, mathematical reasoning, instruction following, text generation, and multimodal understanding using vLLM. The inference acceleration and context length performance of our trained model under the settings of num_speculative_tokens = 2 or 4 are presented as follows, with an accept length of 1.8–3.5 and a maximum speedup of 1.4–1.9×.

<p align="center">
<picture>
Expand Down Expand Up @@ -636,13 +637,11 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
##### 1.2.2 HunyuanOCR Model

Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.13.0) across OCR tasks, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).

<table><thead>
<tr>
<th>Model</th>
<th>Method</th>
<th>OCR-Bench-Internal</th>
<th></th>
<th colspan="2">OCR-Bench-Internal</th>
</tr></thead>
<tbody>
<tr>
Expand All @@ -652,13 +651,12 @@ Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.1
<td>accept length</td>
</tr>
<tr>
<td>Hunyuan-OCR</td>
<td rowspan="2">Hunyuan-OCR</td>
<td>Vanilla</td>
<td>71.21</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>Eagle3</td>
<td>120.75</td>
<td>2.2</td>
Expand Down Expand Up @@ -686,13 +684,12 @@ Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.
<td>accept length</td>
</tr>
<tr>
<td>Qwen2-Audio-7B-Instruct</td>
<td rowspan="2">Qwen2_Audio</td>
<td>Vanilla</td>
<td>78.76</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>Eagle3</td>
<td>146.66</td>
<td>3.51</td>
Expand All @@ -708,7 +705,7 @@ Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **
<tr>
<th>Model</th>
<th>Method</th>
<th colspan="2">LibriTTS</a></th>
<th colspan="2">LibriTTS</th>
</tr></thead>
<tbody>
<tr>
Expand All @@ -718,21 +715,20 @@ Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **
<td>accept length</td>
</tr>
<tr>
<td>Fun-CosyVoice3</td>
<td rowspan="2">Fun-CosyVoice3</td>
<td>Vanilla</td>
<td>-</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>Eagle3</td>
<td>-</td>
<td>1.96</td>
</tr>
</tbody>
</table>

> Adapted for Transformers backend inference, only displays accept length.
> Adapted for Transformers backend inference, only displays accept length. vLLM speedup ~1.6×, estimated from baseline LLM speedup.

### 2. Quantization

Expand Down
21 changes: 9 additions & 12 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@
<ul style="padding-left: 0; list-style-position: inside;">
<li><a href="https://huggingface.co/collections/Qwen/qwen3-omni">Qwen3-Omni</a></li>
<li><a href="https://huggingface.co/collections/Qwen/qwen2-audio">Qwen2-Audio</a></li>
<li><a href="https://huggingface.co/FunAudioLLM/Fun-CosyVoice3-0.5B-2512">Fun-CosyVoice3</a></li>
</ul>
</td>
<td>
Expand Down Expand Up @@ -345,7 +346,8 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta

### 1、投机采样

我们使用vLLM在代码、数学、指令跟随、文本生成、多模态理解等任务上评测了AngelSlim所训练的Eagle3模型,设置num_speculative_tokens=2 or 4 下我们所训的模型加速和接收长度表现如下所示。
我们使用vLLM在代码、数学、指令跟随、文本生成、多模态理解等任务上评测了AngelSlim所训练的Eagle3模型,设置num_speculative_tokens=2 or 4 下我们所训的模型加速和接收长度表现如下所示,接收长度在1.8-3.5,最高加速可达1.4-1.9倍。


<p align="center">
<picture>
Expand Down Expand Up @@ -640,13 +642,11 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta

我们使用(v0.13.0)评测了HunyuanOCR Eagle3模型在 **OCR-Bench** 上的接收长度和吞吐。结果是在单张H20上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。


<table><thead>
<tr>
<th>Model</th>
<th>Method</th>
<th>OCR-Bench-Internal</th>
<th></th>
<th colspan="2">OCR-Bench-Internal</th>
</tr></thead>
<tbody>
<tr>
Expand All @@ -656,13 +656,12 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
<td>accept length</td>
</tr>
<tr>
<td>Hunyuan-OCR</td>
<td rowspan="2">Hunyuan-OCR</td>
<td>Vanilla</td>
<td>71.21</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>Eagle3</td>
<td>120.75</td>
<td>2.2</td>
Expand Down Expand Up @@ -690,13 +689,12 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
<td>accept length</td>
</tr>
<tr>
<td>Qwen2-Audio-7B-Instruct</td>
<td rowspan="2">Qwen2_Audio</td>
<td>Vanilla</td>
<td>78.76</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>Eagle3</td>
<td>146.66</td>
<td>3.51</td>
Expand All @@ -711,7 +709,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
<tr>
<th>Model</th>
<th>Method</th>
<th colspan="2">LibriTTS</a></th>
<th colspan="2">LibriTTS</th>
</tr></thead>
<tbody>
<tr>
Expand All @@ -721,21 +719,20 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
<td>accept length</td>
</tr>
<tr>
<td>Fun-CosyVoice3</td>
<td rowspan="2">Fun-CosyVoice3</td>
<td>Vanilla</td>
<td>-</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>Eagle3</td>
<td>-</td>
<td>1.96</td>
</tr>
</tbody>
</table>

> Adapted for Transformers backend inference, only displays accept length.
> Adapted for Transformers backend inference, only displays accept length. vLLM speedup ~1.6×, estimated from baseline LLM speedup.

### 2、量化

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading