diff --git a/README.md b/README.md index 48f0d510..a7e466d1 100644 --- a/README.md +++ b/README.md @@ -170,6 +170,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
-> Adapted for Transformers backend inference, only displays accept length.
+> Adapted for Transformers backend inference, only displays accept length. vLLM speedup ~1.6×, estimated from baseline LLM speedup.
### 2. Quantization
diff --git a/README_cn.md b/README_cn.md
index e5ab1c07..a47839de 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -171,6 +171,7 @@
Model
Method
- OCR-Bench-Internal
-
+ OCR-Bench-Internal
@@ -652,13 +651,12 @@ Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.1
accept length
-
Hunyuan-OCR
+ Hunyuan-OCR
Vanilla
71.21
1
-
Eagle3
120.75
2.2
@@ -686,13 +684,12 @@ Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.
accept length
-
Qwen2-Audio-7B-Instruct
+ Qwen2_Audio
Vanilla
78.76
1
-
Eagle3
146.66
3.51
@@ -708,7 +705,7 @@ Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **
Model
Method
- LibriTTS
+ LibriTTS
@@ -718,13 +715,12 @@ Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **
accept length
-
Fun-CosyVoice3
+ Fun-CosyVoice3
Vanilla
-
1
-
Eagle3
-
1.96
@@ -732,7 +728,7 @@ Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **
-> Adapted for Transformers backend inference, only displays accept length.
+> Adapted for Transformers backend inference, only displays accept length. vLLM speedup ~1.6×, estimated from baseline LLM speedup.
### 2、量化
diff --git a/docs/source/assets/speculative_decoding/eagle3_speedup_and_accepted_length.png b/docs/source/assets/speculative_decoding/eagle3_speedup_and_accepted_length.png
index 3a9d3780..ccaf1bf0 100644
Binary files a/docs/source/assets/speculative_decoding/eagle3_speedup_and_accepted_length.png and b/docs/source/assets/speculative_decoding/eagle3_speedup_and_accepted_length.png differ
Model
Method
- OCR-Bench-Internal
-
+ OCR-Bench-Internal
@@ -656,13 +656,12 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
accept length
-
Hunyuan-OCR
+ Hunyuan-OCR
Vanilla
71.21
1
-
Eagle3
120.75
2.2
@@ -690,13 +689,12 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
accept length
-
Qwen2-Audio-7B-Instruct
+ Qwen2_Audio
Vanilla
78.76
1
-
Eagle3
146.66
3.51
@@ -711,7 +709,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
Model
Method
- LibriTTS
+ LibriTTS
@@ -721,13 +719,12 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
accept length
-
Fun-CosyVoice3
+ Fun-CosyVoice3
Vanilla
-
1
-
Eagle3
-
1.96
@@ -735,7 +732,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta