Skip to content

Commit 0fbaed7

Browse files
authored
Dev dflash and dflare (#325)
1 parent 8e3a6b6 commit 0fbaed7

21 files changed

Lines changed: 2369 additions & 62 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
2222
</p>
2323

2424
## 📣Latest News
25+
- [26/06/01] We have released **DFlare**, a block-diffusion speculative decoding framework with layer-wise fusion that achieves up to **5.52× end-to-end speedup**. [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html)
2526
- [26/05/27] We have released **D-Cut**, an adaptive verification depth pruning technique for speculative decoding. [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/dcut.html)
2627
- [26/05/20] We support Distillation for full-precision HuggingFace models and **quantized QAT-style** models, as detailed in the [distillation documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/distill/index.html).
2728
- [26/05/08] We have released STQ1_0 kernel for 1.25-bit model and given a PR to llama.cpp [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836) ! If you have any questions or suggestions for STQ_0, welcome to comment under the PR !🔥🔥🔥
@@ -92,6 +93,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
9293
<ul style="padding-left: 0; list-style-position: inside;">
9394
<li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li>
9495
<li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html">SpecExit</a></li>
96+
<li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html">DFlare</a></li>
9597
</ul>
9698
</td>
9799
<td>

README_cn.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
</p>
2323

2424
## 📣最新进展
25+
- [26/06/01] 我们发布了 **DFlare**,一种基于 layer-wise fusion 的块扩散投机解码框架,端到端加速比可达 **5.52×**[[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html)
2526
- [26/05/27] 我们发布了 **D-Cut**,一种用于投机解码的自适应验证深度裁剪技术。[[文档]](https://angelslim.readthedocs.io/zh-cn/latest/dcut.html)
2627
- [26/05/20] 我们支持了模型蒸馏功能,适用于huggingface 全精度或者**QAT量化**模型,详细步骤可以参考[文档](https://angelslim.readthedocs.io/zh-cn/latest/features/distill/index.html).🔥🔥🔥
2728
- [26/05/08] 我们发布了用于 1.25-bit 模型的 STQ1_0 内核,并向 llama.cpp 提交了 [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836)!如果您对 STQ_0 有任何疑问或建议,欢迎在该 PR 下留言!🔥🔥🔥
@@ -93,6 +94,7 @@
9394
<ul style="padding-left: 0; list-style-position: inside;">
9495
<li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li>
9596
<li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html">SpecExit</a></li>
97+
<li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/dflare.html">DFlare</a></li>
9698
</ul>
9799
</td>
98100
<td>

angelslim/compressor/speculative/train/data/data_utils.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ def convert_ultrachat_data(row, dataset_column="messages"):
9191
return {"conversations": converted_messages, "id": row["prompt_id"]}
9292

9393

94-
# Copied from https://github.com/sgl-project/SpecForge/blob/main/specforge/data/preprocessing.py # noqa: E501
9594
def process_token_dict_to_mappings(
9695
token_dict,
9796
draft_vocab_size: int,

angelslim/compressor/speculative/train/models/draft/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
from .draft_model_factory import DraftModelConfig, create_draft_model
1616
from .llama_eagle3 import CosyVoice3Eagle3LlamaForCausalLM, Eagle3LlamaForCausalLM
17+
from .qwen_dflare import QwenDFlareDraftModel
1718
from .qwen_dflash import QwenDFlashDraftModel
1819

1920
__all__ = [
@@ -22,4 +23,5 @@
2223
"Eagle3LlamaForCausalLM",
2324
"CosyVoice3Eagle3LlamaForCausalLM",
2425
"QwenDFlashDraftModel",
26+
"QwenDFlareDraftModel",
2527
]

0 commit comments

Comments
 (0)