diff --git a/README.md b/README.md
index 5f583be2..faa6ea72 100644
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
## 📣Latest News
-- [26/01/13] We have released v0.3. We support the training and deployment of [Eagle3 for LLM/VLM/Audio](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html) Multimodal models. And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
+- [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
- [25/11/05] We have released v0.2. Quantization support for new models, such as `GLM-4.6`, `Qwen3-VL` and `Qwen3-Omni`, open-sources the Eagle3 speculative decoding training framework, and updates the Diffusion model quantization tools.
- [25/09/30] We have released **SpecExit**, the reasoning early-exit algorithm: [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM Code]](https://github.com/vllm-project/vllm/pull/27192)
- [25/09/26] We have released **TEQUILA**, the ternary quantization algorithm [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)
@@ -232,8 +232,6 @@ bash scripts/speculative/generate_data_for_target_model.sh
bash scripts/speculative/train_eagle3_online.sh
```
-For detailed training configurations and vLLM performance benchmarks of Eagle3, please refer to the [Quick Start Guide for Speculative Sampling](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html#id5).
-
Training and Deployment Guide for Multimodal Model Eagle3—Supporting LLM, VLM, and Audio (ASR & TTS) Models: [LLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/eagle.html) | [VLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/vlm_eagle.html) | [Audio(ASR)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_asr_eagle.html) | [Audio(TTS)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_tts_eagle.html).
#### 2.2 LLM/VLM Model Quantization
@@ -392,7 +390,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v
381.05 | 1 |
- | Eagle3 |
+ Eagle3 |
616.9 | 2.13 |
653.29 | 2.19 |
680.1 | 2.2 |
@@ -410,7 +408,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v
233.26 | 1 |
- | Eagle3 |
+ Eagle3 |
389.35 | 2.07 |
395.97 | 2.1 |
377.84 | 2.08 |
@@ -428,7 +426,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v
151.81 | 1 |
- | Eagle3 |
+ Eagle3 |
257.32 | 2 |
266.69 | 2.02 |
244.89 | 1.97 |
@@ -446,7 +444,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v
93.26 | 1 |
- | Eagle3 |
+ Eagle3 |
153.72 | 1.87 |
140.46 | 1.78 |
144.68 | 1.76 |
@@ -464,7 +462,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v
43.32 | 1 |
- | Eagle3 |
+ Eagle3 |
80.43 | 2.01 |
72.49 | 1.9 |
71.57 | 1.86 |
@@ -482,7 +480,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v
320.87 | 1 |
- | Eagle3 |
+ Eagle3 |
453.97 | 2.1 |
432.45 | 2.04 |
428.81 | 2.02 |
@@ -554,7 +552,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
1 |
- | Eagle3 |
+ Eagle3 |
511.52 |
2.11 |
560.55 |
@@ -593,7 +591,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
1 |
- | Eagle3 |
+ Eagle3 |
415.29 |
2.57 |
372.89 |
@@ -632,7 +630,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
1 |
- | Eagle3 |
+ Eagle3 |
281.93 |
2.82 |
241.42 |
@@ -676,7 +674,7 @@ Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.1
1 |
- | Eagle3 |
+ Eagle3 |
108.1 |
2.08 |
@@ -709,7 +707,7 @@ Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.
1 |
- | Eagle3 |
+ Eagle3 |
146.66 |
3.51 |
@@ -740,7 +738,7 @@ Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **
1 |
- | Eagle3 |
+ Eagle3 |
- |
1.96 |
diff --git a/README_cn.md b/README_cn.md
index cddc920e..e4febbfd 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -17,10 +17,10 @@
## 📣最新进展
-- [26/01/13]我们发布V0.2版本, 支持了全模态场景的投机采样训练及部署,文档:[Eagle3 for LLM/VLM/Audio](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html)。并且我们发布了 **Sherry** 新的硬件高效的1.25bit三值量化算法 [论文即将发布] | [[代码]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
+- [26/01/13]我们发布V0.3版本, 支持了全模态场景的投机采样训练及部署,文档:[Eagle3 for LLM/VLM/Audio](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html)。并且我们发布了 **Sherry** 新的硬件高效的1.25bit三值量化算法 [论文即将发布] | [[代码]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
- [25/11/05] 我们发布V0.2版本,支持了包括GLM-4.6/Qwen3-VL/Qwen3-Omni等更多模型的量化,开源投机采样Eagle3训练框架,更新Diffusion模型量化工具。
-- [25/09/30] 我们开源了思考早退新算法 **SpecExit** [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM代码]](https://github.com/vllm-project/vllm/pull/27192)🔥🔥🔥
-- [25/09/30] 我们发布了三值量化新算法 **Tequila** [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)。🔥🔥🔥
+- [25/09/30] 我们开源了思考早退新算法 **SpecExit** [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM代码]](https://github.com/vllm-project/vllm/pull/27192)
+- [25/09/30] 我们发布了三值量化新算法 **Tequila** [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)
- [25/09/24] 我们支持了Qwen3系列模型的NVFP4的PTQ量化,我们还开源了[Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4)、[Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4)权重。
@@ -233,8 +233,6 @@ bash scripts/speculative/generate_data_for_target_model.sh
bash scripts/speculative/train_eagle3_online.sh
```
-详细训练配置,以及`Eagle3`的vLLM性能测试,详情请参考投机采样[快速开始文档](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html#id5)。
-
多模态模型 Eagle3 训练与部署指南,支持LLM / VLM / Audio (ASR & TTS) 模型:[LLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/eagle.html) | [VLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/vlm_eagle.html) | [Audio(ASR)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_asr_eagle.html) | [Audio(TTS)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_tts_eagle.html).
#### 2.2 LLM/VLM模型量化
完成安装`AngelSlim`后,您可以通过以下脚本快速开始,完成`Qwen3-1.7B`模型的静态`FP8`量化:
@@ -395,7 +393,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
| 381.05 | 1 |
- | Eagle3 |
+ Eagle3 |
616.9 | 2.13 |
653.29 | 2.19 |
680.1 | 2.2 |
@@ -413,7 +411,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
233.26 | 1 |
- | Eagle3 |
+ Eagle3 |
389.35 | 2.07 |
395.97 | 2.1 |
377.84 | 2.08 |
@@ -431,7 +429,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
151.81 | 1 |
- | Eagle3 |
+ Eagle3 |
257.32 | 2 |
266.69 | 2.02 |
244.89 | 1.97 |
@@ -449,7 +447,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
93.26 | 1 |
- | Eagle3 |
+ Eagle3 |
153.72 | 1.87 |
140.46 | 1.78 |
144.68 | 1.76 |
@@ -467,7 +465,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
43.32 | 1 |
- | Eagle3 |
+ Eagle3 |
80.43 | 2.01 |
72.49 | 1.9 |
71.57 | 1.86 |
@@ -485,7 +483,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
320.87 | 1 |
- | Eagle3 |
+ Eagle3 |
453.97 | 2.1 |
432.45 | 2.04 |
428.81 | 2.02 |
@@ -557,7 +555,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
1 |
- | Eagle3 |
+ Eagle3 |
511.52 |
2.11 |
560.55 |
@@ -596,7 +594,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
1 |
- | Eagle3 |
+ Eagle3 |
415.29 |
2.57 |
372.89 |
@@ -635,7 +633,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
1 |
- | Eagle3 |
+ Eagle3 |
281.93 |
2.82 |
241.42 |
@@ -679,7 +677,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
1 |
- | Eagle3 |
+ Eagle3 |
108.1 |
2.08 |
@@ -712,7 +710,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
1 |
- | Eagle3 |
+ Eagle3 |
146.66 |
3.51 |
@@ -742,7 +740,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
1 |
- | Eagle3 |
+ Eagle3 |
- |
1.96 |