@@ -594,65 +594,143 @@ Other models such as GLM-4.6, Qwen2.5, and Seed-OSS have been evaluated on bench
594594
595595#### 2.1 Qwen3 Series Models
596596
597- Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
597+ **vLLM v0.11.2 Benchmark Results**
598598
599- <table>
600- <thead>
601- <tr>
602- <th> </th><th> </th>
603- <th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
604- <th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
605- <th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
606- <th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
607- <th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
608- <tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
609- </thead>
610- <tbody>
611- <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
612- <tr><td rowspan="6"><strong>T=0</strong></td>
613- <td>Qwen3-1.7B</td><td>2.05x</td><td>2.81</td><td>2.07x</td><td>2.93</td><td>2.11x</td><td>2.98</td><td>1.93x</td><td>2.69</td><td>2.04x</td><td>2.85</td></tr>
614- <tr> <td>Qwen3-4B</td><td>2.21x</td><td>3.01</td><td>2.36x</td><td>3.24</td><td>2.42x</td><td>3.13</td><td>2.32x</td><td>2.75</td><td>2.33x</td><td>3.03</td></tr>
615- <tr><td>Qwen3-8B</td><td>2.63x</td><td>3.65</td><td>2.76x</td><td>3.85</td><td>2.82x</td><td>3.90</td><td>2.62x</td><td>3.48</td><td>2.70x</td><td>3.72</td></tr>
616- <tr><td>Qwen3-14B</td><td>2.23x</td><td>3.30</td><td>2.53x</td><td>3.74</td><td>2.56x</td><td>3.79</td><td>2.16x</td><td>3.13</td><td>2.37x</td><td>3.49</td></tr>
617- <tr><td>Qwen3-32B</td><td>2.39x</td><td>2.78</td><td>2.37x</td><td>2.81</td><td>2.47x</td><td>2.92</td><td>2.42x</td><td>2.53</td><td>2.41x</td><td>2.76</td></tr>
618- <tr><td>Qwen3-30B-A3B</td><td>2.84x</td><td>3.63</td><td>2.27x</td><td>3.09</td><td>2.64x</td><td>3.42</td><td>2.83x</td><td>3.56</td><td>2.64x</td><td>3.42</td></tr>
619- <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
620- <tr><td rowspan="6"><strong>T=1</strong></td>
621- <td>Qwen3-1.7B</td><td>1.74x</td><td>2.53</td><td>1.86x</td><td>2.70</td><td>1.82x</td><td>2.69</td><td>1.72x</td><td>2.46</td><td>1.93x</td><td>2.60</td></tr>
622- <tr><td>Qwen3-4B</td><td>1.93x</td><td>2.60</td><td>2.00x</td><td>2.84</td><td>2.11x</td><td>2.82</td><td>2.34x</td><td>2.50</td><td>1.75x</td><td>2.69</td></tr>
623- <tr><td>Qwen3-8B</td><td>1.98x</td><td>2.75</td><td>2.25x</td><td>3.11</td><td>2.31x</td><td>3.15</td><td>2.10x</td><td>2.76</td><td>2.90x</td><td>2.94</td></tr>
624- <tr><td>Qwen3-14B</td><td>1.71x</td><td>2.61</td><td>1.95x</td><td>2.87</td><td>2.04x</td><td>3.08</td><td>1.68x</td><td>2.55</td><td>2.90x</td><td>2.78</td></tr>
625- <tr><td>Qwen3-32B</td><td>1.62x</td><td>1.91</td><td>1.71x</td><td>2.05</td><td>1.78x</td><td>2.10</td><td>1.80x</td><td>1.95</td><td>1.62x</td><td>2.00</td></tr>
626- <tr><td>Qwen3-30B-A3B</td><td>1.91x</td><td>2.46</td><td>2.00x</td><td>2.64</td><td>1.90x</td><td>2.53</td><td>1.80x</td><td>2.32</td><td>1.90x</td><td>2.48</td></tr>
627- </tbody>
628- </table>
629-
630- #### 2.2 Hunyuan Series Models
631-
632- Benchmark results for Hunyuan series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
599+ We report benchmark results of the Qwen3 series models using the Eagle3 speculative decoding algorithm across multiple evaluation suites, including **MT-bench**, **HumanEval**, **GSM8K**, and **Alpaca**.
600+ All experiments were conducted on a single NVIDIA H20 GPU with the configuration:
601+ **tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**.
633602
634603<table>
635604 <thead>
636605 <tr>
637- <th> </th><th> </th>
638- <th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
639- <th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
640- <th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
641- <th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
642- <th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
643- <tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th><th>Speedup</th><th>τ</th></tr>
606+ <th>Model</th>
607+ <th>Method</th>
608+ <th colspan="2" style="text-align:center;">GSM8K</th>
609+ <th colspan="2" style="text-align:center;">Alpaca</th>
610+ <th colspan="2" style="text-align:center;">HumanEval</th>
611+ <th colspan="2" style="text-align:center;">MT-bench</th>
612+ <th colspan="2" style="text-align:center;">Mean</th>
613+ </tr>
614+ <tr>
615+ <th></th><th></th>
616+ <th>throughput (tokens/s)</th><th>accept length</th>
617+ <th>throughput (tokens/s)</th><th>accept length</th>
618+ <th>throughput (tokens/s)</th><th>accept length</th>
619+ <th>throughput (tokens/s)</th><th>accept length</th>
620+ <th>throughput (tokens/s)</th><th>accept length</th>
621+ </tr>
644622 </thead>
623+
645624 <tbody>
646- <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
647- <tr><td rowspan="3"><strong>T=0</strong></td>
648- <td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
649- <tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
650- <tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
651- <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
652- <tr><td rowspan="3"><strong>T=1</strong></td>
653- <td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
654- <tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
655- <tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
625+ <!-- Qwen3-1.7B -->
626+ <tr>
627+ <td rowspan="2">Qwen3-1.7B</td>
628+ <td>Vanilla</td>
629+ <td>376.42</td><td>1</td>
630+ <td>378.86</td><td>1</td>
631+ <td>378.38</td><td>1</td>
632+ <td>390.53</td><td>1</td>
633+ <td>318.05</td><td>1</td>
634+ </tr>
635+ <tr>
636+ <td>Eagle3</td>
637+ <td>616.9</td><td>2.13</td>
638+ <td>653.29</td><td>2.19</td>
639+ <td>680.1</td><td>2.2</td>
640+ <td>621.44</td><td>2.17</td>
641+ <td>642.93</td><td>2.18</td>
642+ </tr>
643+ <!-- Qwen3-4B -->
644+ <tr>
645+ <td rowspan="2">Qwen3-4B</td>
646+ <td>Vanilla</td>
647+ <td>229.05</td><td>1</td>
648+ <td>235.29</td><td>1</td>
649+ <td>234.66</td><td>1</td>
650+ <td>234.04</td><td>1</td>
651+ <td>233.26</td><td>1</td>
652+ </tr>
653+ <tr>
654+ <td>Eagle3</td>
655+ <td>389.35</td><td>2.07</td>
656+ <td>395.97</td><td>2.1</td>
657+ <td>377.84</td><td>2.08</td>
658+ <td>384.6</td><td>2.07</td>
659+ <td>386.94</td><td>2.08</td>
660+ </tr>
661+ <!-- Qwen3-8B -->
662+ <tr>
663+ <td rowspan="2">Qwen3-8B</td>
664+ <td>Vanilla</td>
665+ <td>149.63</td><td>1</td>
666+ <td>149.93</td><td>1</td>
667+ <td>153.85</td><td>1</td>
668+ <td>153.81</td><td>1</td>
669+ <td>151.81</td><td>1</td>
670+ </tr>
671+ <tr>
672+ <td>Eagle3</td>
673+ <td>257.32</td><td>2</td>
674+ <td>266.69</td><td>2.02</td>
675+ <td>244.89</td><td>1.97</td>
676+ <td>258.2</td><td>1.97</td>
677+ <td>257.52</td><td>1.99</td>
678+ </tr>
679+ <!-- Qwen3-14B -->
680+ <tr>
681+ <td rowspan="2">Qwen3-14B</td>
682+ <td>Vanilla</td>
683+ <td>92.97</td><td>1</td>
684+ <td>92.66</td><td>1</td>
685+ <td>92.94</td><td>1</td>
686+ <td>94.46</td><td>1</td>
687+ <td>93.26</td><td>1</td>
688+ </tr>
689+ <tr>
690+ <td>Eagle3</td>
691+ <td>153.72</td><td>1.87</td>
692+ <td>140.46</td><td>1.78</td>
693+ <td>144.68</td><td>1.76</td>
694+ <td>142.45</td><td>1.74</td>
695+ <td>145.33</td><td>1.79</td>
696+ </tr>
697+ <!-- Qwen3-32B -->
698+ <tr>
699+ <td rowspan="2">Qwen3-32B</td>
700+ <td>Vanilla</td>
701+ <td>43.49</td><td>1</td>
702+ <td>43.38</td><td>1</td>
703+ <td>43.19</td><td>1</td>
704+ <td>43.3</td><td>1</td>
705+ <td>43.32</td><td>1</td>
706+ </tr>
707+ <tr>
708+ <td>Eagle3</td>
709+ <td>80.43</td><td>2.01</td>
710+ <td>72.49</td><td>1.9</td>
711+ <td>71.57</td><td>1.86</td>
712+ <td>74.1</td><td>1.86</td>
713+ <td>74.1</td><td>1.91</td>
714+ </tr>
715+ <!-- Qwen3-30B-A3B -->
716+ <tr>
717+ <td rowspan="2">Qwen3-30B-A3B</td>
718+ <td>Vanilla</td>
719+ <td>311.84</td><td>1</td>
720+ <td>320.43</td><td>1</td>
721+ <td>325.77</td><td>1</td>
722+ <td>325.42</td><td>1</td>
723+ <td>320.87</td><td>1</td>
724+ </tr>
725+ <tr>
726+ <td>Eagle3</td>
727+ <td>453.97</td><td>2.1</td>
728+ <td>432.45</td><td>2.04</td>
729+ <td>428.81</td><td>2.02</td>
730+ <td>437.06</td><td>2.01</td>
731+ <td>438.07</td><td>2.04</td>
732+ </tr>
733+
656734 </tbody>
657735</table>
658736
0 commit comments