Skip to content

init commit to refactor dit quant#90

Merged
yghstill merged 5 commits into
Tencent:mainfrom
GGgary666:diffusion_refactor_gg_1013
Oct 15, 2025
Merged

init commit to refactor dit quant#90
yghstill merged 5 commits into
Tencent:mainfrom
GGgary666:diffusion_refactor_gg_1013

Conversation

@GGgary666
Copy link
Copy Markdown
Contributor

This PR aims to refactor the compressor for DiT models.

The initial commit delivers a streamlined, plug-and-play implementation of w8a8 dynamic fp8 quantization along with a straightforward example.

Reference:
https://github.com/deepseek-ai/DeepSeek-V3
https://github.com/neuralmagic/AutoFP8
https://github.com/sgl-project/sglang

print("scale1 shape: ", scale1.shape)

x = torch.randn(1024, 1024).cpu()
import pdb
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调试信息清理一下

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

"""
if x.numel() == 0:
min_val, max_val = (
torch.tensor(-16.0, dtype=x.dtype),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_val为什么设置成16

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是针对元素数量为0的情况,以满足对empty MoE expert的支持,16只是一个常规的fp8易于表示的数字

__all__ = ["quantize_model_to_fp8"]


def quantize_model_to_fp8(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否能以PTQ类的形式组织,初始化超参放入类成员变量,self.quantize函数执行具体量化逻辑,可走进fp8分支

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,将会在新的commit中体现



# modified from https://github.com/neuralmagic/AutoFP8/blob/main/auto_fp8/quantize.py
class FP8DynamicLinear(torch.nn.Module):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP8DynamicLinear以上的量化函数统一放入至quant_func.py,linear.py只存放量化linear类

…and wrap quantization function interfaces with classes.

def _set_quantize_linear_module(self) -> torch.nn.Module:
"""
返回用于替换nn.Linear的量化模块类型
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成英文注释

@yghstill yghstill merged commit 1a5efa0 into Tencent:main Oct 15, 2025
5 checks passed
dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026
Co-authored-by: garygugong <garygugong@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants