Skip to content

Commit fc40208

Browse files
authored
add SpecExit vLLM PR Link (#91)
1 parent 1a5efa0 commit fc40208

3 files changed

Lines changed: 10 additions & 10 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
- [技术交流](#技术交流)
3232

3333
## 📣最新进展
34-
- 🌟[25/09/30] 我们开源了思考早退算法 SpecExit 的实现:*SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
34+
- 🌟[25/09/30] 我们开源了思考早退算法 SpecExit 的实现:*SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM代码]](https://github.com/vllm-project/vllm/pull/27192)
3535
- 🌟[25/09/30] 我们发布了三值量化 Tequila 的实现:*TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)
3636
- [25/09/24] 我们支持了Qwen3系列模型的NVFP4的PTQ量化,我们还开源了[Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4)[Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4)权重。
3737
- [25/09/01] 我们支持了[Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8)翻译开源模型的FP8量化;支持了Eagle3的Torch推理及Benchmark评测流程;支持了[FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux)的量化、Cache;支持了[Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss)模型量化压缩。

README_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
3131
- [Technical Discussion](#technical-discussion)
3232

3333
## 📣Latest Updates
34-
- [25/09/30] We now open-source the implementation of the reasoning early-exit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
34+
- [25/09/30] We now open-source the implementation of the reasoning early-exit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM Code]](https://github.com/vllm-project/vllm/pull/27192)
3535
- [25/09/26] We now release the the ternary quantization *TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)
3636
- [25/09/24] We now support the PTQ quantification of NVFP4 for the Qwen3 series models. We also opensource [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4) and [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
3737
- [25/09/01] We now support ​FP8 quantization​ of the [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) translation model. And enabled ​Torch inference and Benchmark evaluation​ for Eagle3. And implemented support for ​quantization and Cache​ for [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux). And support ​quantization​ for the [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss).

docs/source/features/speculative_decoding/spec_exit.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,14 @@ Despite their strong performance on reasoning tasks, large reasoning models (LRM
3030

3131
To run inference with the trained `SpecExit` model and evaluate its performance on benchmarks like GSM8K, GPQA, etc., use the `spec_benchmark.py` script.
3232

33-
```shell
34-
python3 tools/spec_benchmark.py \
35-
--base-model-path ${BASE_MODEL} \
36-
--eagle-model-path ${SpecExit_Model} \
37-
--mode eagle \
38-
--num-gpus-total 8 \
39-
--early-stop-method confidence_progress_remain
40-
```
33+
```shell
34+
python3 tools/spec_benchmark.py \
35+
--base-model-path ${BASE_MODEL} \
36+
--eagle-model-path ${SpecExit_Model} \
37+
--mode eagle \
38+
--num-gpus-total 8 \
39+
--early-stop-method confidence_progress_remain
40+
```
4141

4242
## 📈 Results
4343

0 commit comments

Comments
 (0)