init commit to refactor dit quant by GGgary666 · Pull Request #90 · Tencent/AngelSlim

GGgary666 · 2025-10-14T12:20:24Z

This PR aims to refactor the compressor for DiT models.

The initial commit delivers a streamlined, plug-and-play implementation of w8a8 dynamic fp8 quantization along with a straightforward example.

Reference:
https://github.com/deepseek-ai/DeepSeek-V3
https://github.com/neuralmagic/AutoFP8
https://github.com/sgl-project/sglang

yghstill · 2025-10-14T14:29:10Z

+    print("scale1 shape: ", scale1.shape)
+
+    x = torch.randn(1024, 1024).cpu()
+    import pdb


调试信息清理一下

yghstill · 2025-10-14T14:30:09Z

+    """
+    if x.numel() == 0:
+        min_val, max_val = (
+            torch.tensor(-16.0, dtype=x.dtype),


max_val为什么设置成16

这里是针对元素数量为0的情况，以满足对empty MoE expert的支持，16只是一个常规的fp8易于表示的数字

yghstill · 2025-10-14T14:37:53Z

+__all__ = ["quantize_model_to_fp8"]
+
+
+def quantize_model_to_fp8(


是否能以PTQ类的形式组织，初始化超参放入类成员变量，self.quantize函数执行具体量化逻辑，可走进fp8分支

好的，将会在新的commit中体现

yghstill · 2025-10-14T14:39:10Z

+
+
+# modified from https://github.com/neuralmagic/AutoFP8/blob/main/auto_fp8/quantize.py
+class FP8DynamicLinear(torch.nn.Module):


FP8DynamicLinear以上的量化函数统一放入至quant_func.py，linear.py只存放量化linear类

…and wrap quantization function interfaces with classes.

yghstill · 2025-10-15T13:58:57Z

+
+    def _set_quantize_linear_module(self) -> torch.nn.Module:
+        """
+        返回用于替换nn.Linear的量化模块类型


改成英文注释

Co-authored-by: garygugong <garygugong@tencent.com>

garygugong added 3 commits October 14, 2025 20:13

init commit to refactor dit quant

53ab095

chore: run pre-commit

302d836

fix some linter errors

fb23730

yghstill reviewed Oct 14, 2025

View reviewed changes

More refactoring for DiT quant: migrate kernels to a separate folder …

0b9786f

…and wrap quantization function interfaces with classes.

yghstill reviewed Oct 15, 2025

View reviewed changes

Ensure English comments are used

66ebc47

yghstill approved these changes Oct 15, 2025

View reviewed changes

yghstill merged commit 1a5efa0 into Tencent:main Oct 15, 2025
5 checks passed

dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026

init commit to refactor dit quant (Tencent#90)

fe27f3b

Co-authored-by: garygugong <garygugong@tencent.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init commit to refactor dit quant#90

init commit to refactor dit quant#90
yghstill merged 5 commits into
Tencent:mainfrom
GGgary666:diffusion_refactor_gg_1013

GGgary666 commented Oct 14, 2025

Uh oh!

yghstill Oct 14, 2025

Uh oh!

GGgary666 Oct 15, 2025

Uh oh!

yghstill Oct 14, 2025

Uh oh!

GGgary666 Oct 15, 2025

Uh oh!

yghstill Oct 14, 2025

Uh oh!

GGgary666 Oct 15, 2025

Uh oh!

yghstill Oct 14, 2025

Uh oh!

yghstill Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		__all__ = ["quantize_model_to_fp8"]


		def quantize_model_to_fp8(



		# modified from https://github.com/neuralmagic/AutoFP8/blob/main/auto_fp8/quantize.py
		class FP8DynamicLinear(torch.nn.Module):

Conversation

GGgary666 commented Oct 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants