You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+42-21Lines changed: 42 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -353,11 +353,7 @@ We evaluated the Eagle3 model trained by AngelSlim on tasks including code gener
353
353
354
354
#### 1.1 Qwen3 Series Models
355
355
356
-
**vLLM v0.11.2 Benchmark Results**
357
-
358
-
We report benchmark results of the Qwen3 series models using the Eagle3 speculative decoding algorithm across multiple evaluation suites, including **MT-bench**, **HumanEval**, **GSM8K**, and **Alpaca**.
359
-
All experiments were conducted on a single NVIDIA H20 GPU with the configuration:
We report benchmark results for Qwen3 series models using Eagle3 speculative decoding on vLLM (v0.11.2) across **MT-bench**, **HumanEval**, **GSM8K** and **Alpaca**, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**).
361
357
362
358
<table>
363
359
<thead>
@@ -493,15 +489,11 @@ All experiments were conducted on a single NVIDIA H20 GPU with the configuration
493
489
</tbody>
494
490
</table>
495
491
496
-
#### 1.2 VLM & Audio Models
492
+
#### 1.2 VLM Models
497
493
498
494
##### 1.2.1 Qwen3-VL Series Models
499
495
500
-
vLLM v0.12.0 Benchmark Results
501
-
502
-
We report benchmark results of the Qwen3-VL series models using the Eagle3 speculative decoding algorithm across multiple evaluation suites, including **MT-bench**, **HumanEval**, **GSM8K**, **Alpaca**, **MATH-500**, and multimodal understanding tasks, including **MMMU**, **MMStar**. All experiments were conducted on a single NVIDIA H20 GPU with the configuration:
We report benchmark results for Qwen3-VL series models using Eagle3 speculative decoding on vLLM (v0.12.0) across language and multimodal tasks, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
505
497
506
498
<table><thead>
507
499
<tr>
@@ -643,11 +635,7 @@ We report benchmark results of the Qwen3-VL series models using the Eagle3 specu
643
635
644
636
##### 1.2.2 HunyuanOCR Model
645
637
646
-
vLLM v0.13.0 Benchmark Results
647
-
648
-
We report benchmark results of the HunyuanOCR using the Eagle3 speculative decoding algorithm across **OCR-Bench**. All experiments were conducted on a single NVIDIA H20 GPU with the configuration:
We report benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.13.0) across OCR tasks, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
651
639
652
640
<table><thead>
653
641
<tr>
@@ -678,18 +666,17 @@ We report benchmark results of the HunyuanOCR using the Eagle3 speculative decod
678
666
</tbody>
679
667
</table>
680
668
681
-
#####1.2.3 Qwen2-Audio Model
669
+
#### 1.3 Audio Models
682
670
683
-
vLLM v0.12.0 Benchmark Results
671
+
##### 1.3.1 Qwen2-Audio Model
684
672
685
-
We report benchmark results of the HunyuanOCR using the Eagle3 speculative decoding algorithm across **[librispeech_dev](https://www.openslr.org/12)** dataset. All experiments were conducted on a single NVIDIA H20 GPU with the configuration:
We report benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.12.0) across **[LibriSpeech](https://www.openslr.org/12)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
687
674
688
675
<table><thead>
689
676
<tr>
690
677
<th>Model</th>
691
678
<th>Method</th>
692
-
<th colspan="2">librispeech_dev</th>
679
+
<th colspan="2">LibriSpeech</th>
693
680
</tr></thead>
694
681
<tbody>
695
682
<tr>
@@ -713,6 +700,40 @@ We report benchmark results of the HunyuanOCR using the Eagle3 speculative decod
713
700
</tbody>
714
701
</table>
715
702
703
+
#### 1.3.2 Fun-CosyVoice3 Model
704
+
705
+
We report benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **[LibriTTS](https://www.openslr.org/60/)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
> Adapted for Transformers backend inference, only displays acceptance rate.
736
+
716
737
### 2. Quantization
717
738
718
739
The performance test results for selected models are shown below. For the complete benchmark, refer to the [Benchmark documentation](https://angelslim.readthedocs.io/zh-cn/latest/performance/quantization/benchmarks.html)
0 commit comments