add SpecExit vLLM PR Link (#91)

RuBing-Yang · web-flow · commit fc4020818d35 · 2025-10-20T19:44:59.000+08:00
diff --git a/README.md b/README.md
@@ -31,7 +31,7 @@
 - [技术交流](#技术交流)
 
 ## 📣最新进展
-- 🌟[25/09/30] 我们开源了思考早退算法 SpecExit 的实现：*SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
+- 🌟[25/09/30] 我们开源了思考早退算法 SpecExit 的实现：*SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM代码]](https://github.com/vllm-project/vllm/pull/27192)
 - 🌟[25/09/30] 我们发布了三值量化 Tequila 的实现：*TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)。
 - [25/09/24] 我们支持了Qwen3系列模型的NVFP4的PTQ量化，我们还开源了[Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4)、[Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4)权重。
 - [25/09/01] 我们支持了[Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8)翻译开源模型的FP8量化；支持了Eagle3的Torch推理及Benchmark评测流程；支持了[FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux)的量化、Cache；支持了[Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss)模型量化压缩。
diff --git a/README_en.md b/README_en.md
@@ -31,7 +31,7 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
 - [Technical Discussion](#technical-discussion)
 
 ## 📣Latest Updates
-- [25/09/30] We now open-source the implementation of the reasoning early-exit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
+- [25/09/30] We now open-source the implementation of the reasoning early-exit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM Code]](https://github.com/vllm-project/vllm/pull/27192)
 - [25/09/26] We now release the the ternary quantization *TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)
 - [25/09/24] We now support the PTQ quantification of NVFP4 for the Qwen3 series models. We also opensource [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4) and [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
 - [25/09/01] We now support ​FP8 quantization​ of the [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) translation model. And enabled ​Torch inference and Benchmark evaluation​ for Eagle3. And implemented support for ​quantization and Cache​ for [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux). And support ​quantization​ for the [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss).
diff --git a/docs/source/features/speculative_decoding/spec_exit.md b/docs/source/features/speculative_decoding/spec_exit.md
@@ -30,14 +30,14 @@ Despite their strong performance on reasoning tasks, large reasoning models (LRM
 
 To run inference with the trained `SpecExit` model and evaluate its performance on benchmarks like GSM8K, GPQA, etc., use the `spec_benchmark.py` script.
 
-    ```shell
-    python3 tools/spec_benchmark.py \
-        --base-model-path ${BASE_MODEL} \
-        --eagle-model-path ${SpecExit_Model} \
-        --mode eagle \
-        --num-gpus-total 8 \
-        --early-stop-method confidence_progress_remain
-    ```
+```shell
+python3 tools/spec_benchmark.py \
+    --base-model-path ${BASE_MODEL} \
+    --eagle-model-path ${SpecExit_Model} \
+    --mode eagle \
+    --num-gpus-total 8 \
+    --early-stop-method confidence_progress_remain
+```
 
 ## 📈 Results