diff --git a/README.md b/README.md index 5f583be2..faa6ea72 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress

## 📣Latest News -- [26/01/13] We have released v0.3. We support the training and deployment of [Eagle3 for LLM/VLM/Audio](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html) Multimodal models. And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥 +- [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥 - [25/11/05] We have released v0.2. Quantization support for new models, such as `GLM-4.6`, `Qwen3-VL` and `Qwen3-Omni`, open-sources the Eagle3 speculative decoding training framework, and updates the Diffusion model quantization tools. - [25/09/30] We have released **SpecExit**, the reasoning early-exit algorithm: [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM Code]](https://github.com/vllm-project/vllm/pull/27192) - [25/09/26] We have released **TEQUILA**, the ternary quantization algorithm [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant) @@ -232,8 +232,6 @@ bash scripts/speculative/generate_data_for_target_model.sh bash scripts/speculative/train_eagle3_online.sh ``` -For detailed training configurations and vLLM performance benchmarks of Eagle3, please refer to the [Quick Start Guide for Speculative Sampling](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html#id5). - Training and Deployment Guide for Multimodal Model Eagle3—Supporting LLM, VLM, and Audio (ASR & TTS) Models: [LLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/eagle.html) | [VLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/vlm_eagle.html) | [Audio(ASR)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_asr_eagle.html) | [Audio(TTS)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_tts_eagle.html). #### 2.2 LLM/VLM Model Quantization @@ -392,7 +390,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v 381.051 - Eagle3 + Eagle3 616.92.13 653.292.19 680.12.2 @@ -410,7 +408,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v 233.261 - Eagle3 + Eagle3 389.352.07 395.972.1 377.842.08 @@ -428,7 +426,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v 151.811 - Eagle3 + Eagle3 257.322 266.692.02 244.891.97 @@ -446,7 +444,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v 93.261 - Eagle3 + Eagle3 153.721.87 140.461.78 144.681.76 @@ -464,7 +462,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v 43.321 - Eagle3 + Eagle3 80.432.01 72.491.9 71.571.86 @@ -482,7 +480,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v 320.871 - Eagle3 + Eagle3 453.972.1 432.452.04 428.812.02 @@ -554,7 +552,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o 1 - Eagle3 + Eagle3 511.52 2.11 560.55 @@ -593,7 +591,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o 1 - Eagle3 + Eagle3 415.29 2.57 372.89 @@ -632,7 +630,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o 1 - Eagle3 + Eagle3 281.93 2.82 241.42 @@ -676,7 +674,7 @@ Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.1 1 - Eagle3 + Eagle3 108.1 2.08 @@ -709,7 +707,7 @@ Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0. 1 - Eagle3 + Eagle3 146.66 3.51 @@ -740,7 +738,7 @@ Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across ** 1 - Eagle3 + Eagle3 - 1.96 diff --git a/README_cn.md b/README_cn.md index cddc920e..e4febbfd 100644 --- a/README_cn.md +++ b/README_cn.md @@ -17,10 +17,10 @@

## 📣最新进展 -- [26/01/13]我们发布V0.2版本, 支持了全模态场景的投机采样训练及部署,文档:[Eagle3 for LLM/VLM/Audio](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html)。并且我们发布了 **Sherry** 新的硬件高效的1.25bit三值量化算法 [论文即将发布] | [[代码]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥 +- [26/01/13]我们发布V0.3版本, 支持了全模态场景的投机采样训练及部署,文档:[Eagle3 for LLM/VLM/Audio](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html)。并且我们发布了 **Sherry** 新的硬件高效的1.25bit三值量化算法 [论文即将发布] | [[代码]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥 - [25/11/05] 我们发布V0.2版本,支持了包括GLM-4.6/Qwen3-VL/Qwen3-Omni等更多模型的量化,开源投机采样Eagle3训练框架,更新Diffusion模型量化工具。 -- [25/09/30] 我们开源了思考早退新算法 **SpecExit** [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM代码]](https://github.com/vllm-project/vllm/pull/27192)🔥🔥🔥 -- [25/09/30] 我们发布了三值量化新算法 **Tequila** [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)。🔥🔥🔥 +- [25/09/30] 我们开源了思考早退新算法 **SpecExit** [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM代码]](https://github.com/vllm-project/vllm/pull/27192) +- [25/09/30] 我们发布了三值量化新算法 **Tequila** [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant) - [25/09/24] 我们支持了Qwen3系列模型的NVFP4的PTQ量化,我们还开源了[Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4)、[Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4)权重。
@@ -233,8 +233,6 @@ bash scripts/speculative/generate_data_for_target_model.sh bash scripts/speculative/train_eagle3_online.sh ``` -详细训练配置,以及`Eagle3`的vLLM性能测试,详情请参考投机采样[快速开始文档](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html#id5)。 - 多模态模型 Eagle3 训练与部署指南,支持LLM / VLM / Audio (ASR & TTS) 模型:[LLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/eagle.html) | [VLM](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/vlm_eagle.html) | [Audio(ASR)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_asr_eagle.html) | [Audio(TTS)](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/audio_tts_eagle.html). #### 2.2 LLM/VLM模型量化 完成安装`AngelSlim`后,您可以通过以下脚本快速开始,完成`Qwen3-1.7B`模型的静态`FP8`量化: @@ -395,7 +393,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 381.051 - Eagle3 + Eagle3 616.92.13 653.292.19 680.12.2 @@ -413,7 +411,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 233.261 - Eagle3 + Eagle3 389.352.07 395.972.1 377.842.08 @@ -431,7 +429,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 151.811 - Eagle3 + Eagle3 257.322 266.692.02 244.891.97 @@ -449,7 +447,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 93.261 - Eagle3 + Eagle3 153.721.87 140.461.78 144.681.76 @@ -467,7 +465,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 43.321 - Eagle3 + Eagle3 80.432.01 72.491.9 71.571.86 @@ -485,7 +483,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 320.871 - Eagle3 + Eagle3 453.972.1 432.452.04 428.812.02 @@ -557,7 +555,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 1 - Eagle3 + Eagle3 511.52 2.11 560.55 @@ -596,7 +594,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 1 - Eagle3 + Eagle3 415.29 2.57 372.89 @@ -635,7 +633,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 1 - Eagle3 + Eagle3 281.93 2.82 241.42 @@ -679,7 +677,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 1 - Eagle3 + Eagle3 108.1 2.08 @@ -712,7 +710,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 1 - Eagle3 + Eagle3 146.66 3.51 @@ -742,7 +740,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta 1 - Eagle3 + Eagle3 - 1.96