Skip to content

Commit d71ca77

Browse files
authored
update changelog and fix ptq typo (#60)
1 parent fd79f5b commit d71ca77

6 files changed

Lines changed: 25 additions & 5 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
- [技术交流](#技术交流)
3232

3333
## 📣最新进展
34+
- [25/09/01] 我们支持了[Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8)翻译开源模型的FP8量化;支持了Eagle3的Torch推理及Benchmark评测流程;支持了[FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux)的量化、Cache;支持了[Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss)模型量化压缩。
3435
- [25/08/06] 我们支持了`Hunyuan 0.5B/1.8B/4B/7B``Qwen2.5VL 3B/7B/32B/72B`的FP8、INT4量化,支持了`DeepSeek-R1/V3``Kimi-K2`模型的`FP8-Static``W4A8-FP8`量化。我们还开源了`Hunyuan 1.8B/4B/7B`系列模型的Eagle3权重。
3536
- [25/07/04] 我们支持了`Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen`等模型的量化,包含INT8、FP8、INT4等算法。
3637
我们还开源了`Qwen3`系列模型的Eagle3权重。

README_en.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
3131
- [Technical Discussion](#technical-discussion)
3232

3333
## 📣Latest Updates
34+
- [25/09/01] We now support ​FP8 quantization​ of the [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) translation model. And enabled ​Torch inference and Benchmark evaluation​ for Eagle3. And implemented support for ​quantization and Cache​ for [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux). And support ​quantization​ for the [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss).
3435
- [25/08/06] We now support quantization for `Hunyuan 0.5B/1.8B/4B/7B` and multimodal model `Qwen2.5VL 3B/7B/32B/72B`, including `FP8/INT4` algorithms, and quantization for `DeepSeek-R1/V3` and `Kimi-K2`, including `FP8-Static` and `W4A8-FP8` algorithms. We also opensource `Hunyuan 1.8B/4B/7B` series Eagle3 model weight.
3536
- [25/07/04] We now support quantization for `Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen` and other models, including `INT8/FP8/INT4` algorithms. We also opensource `Qwen3` series Eagle3 model weight.
3637

angelslim/compressor/quant/modules/fp8/lepto_fp8.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def __init__(
4848
self.ptq_hook = ptq_hook
4949
self.quant_model = model # self.quant_model
5050
self.modal_type = self.quant_model.modal_type
51-
self.layers = self.quant_model.model.model.layers
51+
self.layers = self.quant_model.get_quant_module()
5252
self.quant_bits = self.quant_model.quant_config.quant_bit
5353
self.seq_length = seq_length
5454
self.hidden_size = hidden_size
@@ -252,9 +252,11 @@ def convert(self):
252252
torch.cuda.empty_cache()
253253

254254
# 2. insert qdq module
255-
layers = self.quant_model.get_model()
255+
quant_convert_module = self.quant_model.get_quant_convert_module()
256256
for name, sub_layer in self.ptq_hook.quant_layers_dict.items():
257-
parent_layer, sub_name = find_parent_layer_and_sub_name(layers, name)
257+
parent_layer, sub_name = find_parent_layer_and_sub_name(
258+
quant_convert_module, name
259+
)
258260

259261
qdq_module = self.quant_model.get_qdq_module(sub_layer, name)
260262
setattr(parent_layer, sub_name, qdq_module)

angelslim/compressor/quant/ptq.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ def __init__(self, model, slim_config=None):
3636
self.quant_model = model
3737
# init ptq config of model
3838
self.quant_model.init_ptq(slim_config)
39-
self.layers = self.quant_model.get_quant_module()
4039
self.quant_algo = self.quant_model.quant_config.quant_algo
4140
self.quant_helpers = self.quant_model.quant_config.quant_helpers
4241
if "fp8" in self.quant_algo or "int8" in self.quant_algo:
@@ -210,9 +209,12 @@ def _convert(self):
210209

211210
self.ptq_hook.post_process()
212211

212+
quant_convert_module = self.quant_model.get_quant_convert_module()
213213
# 2. insert qdq module
214214
for name, sub_layer in self.ptq_hook.quant_layers_dict.items():
215-
parent_layer, sub_name = find_parent_layer_and_sub_name(self.layers, name)
215+
parent_layer, sub_name = find_parent_layer_and_sub_name(
216+
quant_convert_module, name
217+
)
216218

217219
qdq_module = self.quant_model.get_qdq_module(sub_layer, name)
218220
setattr(parent_layer, sub_name, qdq_module)

angelslim/models/base_model.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,13 @@ def get_quant_module(self):
116116
"""
117117
return self.model.model.layers
118118

119+
def get_quant_convert_module(self):
120+
"""
121+
Returns the module that will be converted to quantized.
122+
This is typically the main transformer module of the model.
123+
"""
124+
return self.model
125+
119126
def get_qdq_module(self, sub_layer, name):
120127
act_scale, weight_scale = None, None
121128
if name in self.act_scales_dict:

angelslim/models/diffusion/flux.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,13 @@ def get_quant_module(self):
159159
"""
160160
return self.model.transformer
161161

162+
def get_quant_convert_module(self):
163+
"""
164+
Returns the module that will be converted to quantized.
165+
This is typically the main transformer module of the model.
166+
"""
167+
return self.model.transformer
168+
162169
def get_save_func(self):
163170
if self.deploy_backend in ["huggingface"]:
164171
return PTQDiffusionSave

0 commit comments

Comments
 (0)