Tencent
diff --git a/‎README.md‎
Lines changed: 2 additions & 0 deletions b/‎README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README_cn.md‎
Lines changed: 2 additions & 0 deletions b/‎README_cn.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎angelslim/compressor/speculative/train/data/data_utils.py‎
Lines changed: 0 additions & 1 deletion b/‎angelslim/compressor/speculative/train/data/data_utils.py‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎angelslim/compressor/speculative/train/models/draft/__init__.py‎
Lines changed: 2 additions & 0 deletions b/‎angelslim/compressor/speculative/train/models/draft/__init__.py‎
Lines changed: 2 additions & 0 deletions
@@ -22,6 +22,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
 </p>
 
 ## 📣Latest News
+- [26/06/01] We have released **DFlare**, a block-diffusion speculative decoding framework with layer-wise fusion that achieves up to **5.52× end-to-end speedup**. [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html)
 - [26/05/27] We have released **D-Cut**, an adaptive verification depth pruning technique for speculative decoding. [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/dcut.html)
 - [26/05/20] We support Distillation for full-precision HuggingFace models and **quantized QAT-style** models, as detailed in the [distillation documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/distill/index.html). 
 - [26/05/08] We have released STQ1_0 kernel for 1.25-bit model and given a PR to llama.cpp [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836) ! If you have any questions or suggestions for STQ_0, welcome to comment under the PR !🔥🔥🔥
@@ -92,6 +93,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
         <ul style="padding-left: 0; list-style-position: inside;">
           <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li>
           <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html">SpecExit</a></li>
+          <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html">DFlare</a></li>
         </ul>
       </td>
       <td>
 
@@ -22,6 +22,7 @@
 </p>
 
 ## 📣最新进展
+- [26/06/01] 我们发布了 **DFlare**，一种基于 layer-wise fusion 的块扩散投机解码框架，端到端加速比可达 **5.52×**。[[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html)
 - [26/05/27] 我们发布了 **D-Cut**，一种用于投机解码的自适应验证深度裁剪技术。[[文档]](https://angelslim.readthedocs.io/zh-cn/latest/dcut.html)
 - [26/05/20]  我们支持了模型蒸馏功能，适用于huggingface 全精度或者**QAT量化**模型，详细步骤可以参考[文档](https://angelslim.readthedocs.io/zh-cn/latest/features/distill/index.html).🔥🔥🔥
 - [26/05/08] 我们发布了用于 1.25-bit 模型的 STQ1_0 内核，并向 llama.cpp 提交了 [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836)！如果您对 STQ_0 有任何疑问或建议，欢迎在该 PR 下留言！🔥🔥🔥
@@ -93,6 +94,7 @@
         <ul style="padding-left: 0; list-style-position: inside;">
           <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li>
           <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html">SpecExit</a></li>
+          <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html">DFlare</a></li>
         </ul>
       </td>
       <td>
 
@@ -91,7 +91,6 @@ def convert_ultrachat_data(row, dataset_column="messages"):
     return {"conversations": converted_messages, "id": row["prompt_id"]}
 
 
-# Copied from https://github.com/sgl-project/SpecForge/blob/main/specforge/data/preprocessing.py # noqa: E501
 def process_token_dict_to_mappings(
     token_dict,
     draft_vocab_size: int,
 
@@ -14,6 +14,7 @@
 
 from .draft_model_factory import DraftModelConfig, create_draft_model
 from .llama_eagle3 import CosyVoice3Eagle3LlamaForCausalLM, Eagle3LlamaForCausalLM
+from .qwen_dflare import QwenDFlareDraftModel
 from .qwen_dflash import QwenDFlashDraftModel
 
 __all__ = [
@@ -22,4 +23,5 @@
     "Eagle3LlamaForCausalLM",
     "CosyVoice3Eagle3LlamaForCausalLM",
     "QwenDFlashDraftModel",
+    "QwenDFlareDraftModel",
 ]