Skip to content

Commit d39146e

Browse files
meenchenkevalmorabia97
authored andcommitted
Fix Qwen3 recipe and update autoquant example cmd (#749)
## What does this PR do? **Type of change:** Bug fix <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** ? ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
1 parent 41aaec5 commit d39146e

2 files changed

Lines changed: 7 additions & 7 deletions

File tree

examples/llm_ptq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -226,7 +226,7 @@ export HF_PATH=<the downloaded LLaMA checkpoint from the Hugging Face hub, or si
226226
# --auto_quantize_bits specifies the constraint for `AutoQuantize`
227227
# --quant specifies the formats to be searched for `AutoQuantize`
228228
# NOTE: auto_quantize_bits cannot be lower than the number of bits for the smallest quantization format in --quant
229-
scripts/huggingface_example.sh --type llama --model $HF_PATH --quant w4a8_awq,fp8 --auto_quantize_bits 4.8 --tp [1|2|4|8] --calib_batch_size 4
229+
scripts/huggingface_example.sh --model $HF_PATH --quant w4a8_awq,fp8 --auto_quantize_bits 4.8 --calib_batch_size 4
230230
```
231231

232232
The above example perform `AutoQuantize` where the less quantization accuracy sensitive layers are quantized with `w4a8_awq` (specified by `--quant w4a8_awq`) and the more sensitive layers

examples/llm_ptq/example_utils.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -180,12 +180,12 @@ def build_quant_cfg(
180180
quant_cfg["quant_cfg"]["*image*"] = {"enable": False}
181181
quant_cfg["quant_cfg"]["*vision*"] = {"enable": False}
182182

183-
if model_type in ["qwen3moe", "qwen3next"] and qformat == "nvfp4":
184-
# Disable the attention projection layers to retain accuracy
185-
quant_cfg["quant_cfg"]["model*.*attn*in_proj*"] = {"enable": False}
186-
quant_cfg["quant_cfg"]["model*.*attn*q_proj*"] = {"enable": False}
187-
quant_cfg["quant_cfg"]["model*.*attn*k_proj*"] = {"enable": False}
188-
quant_cfg["quant_cfg"]["model*.*attn*v_proj*"] = {"enable": False}
183+
if model_type in ["qwen3moe", "qwen3next"] and qformat == "nvfp4":
184+
# Disable the attention projection layers to retain accuracy
185+
quant_cfg["quant_cfg"]["model*.*attn*in_proj*"] = {"enable": False}
186+
quant_cfg["quant_cfg"]["model*.*attn*q_proj*"] = {"enable": False}
187+
quant_cfg["quant_cfg"]["model*.*attn*k_proj*"] = {"enable": False}
188+
quant_cfg["quant_cfg"]["model*.*attn*v_proj*"] = {"enable": False}
189189

190190
return quant_cfg
191191

0 commit comments

Comments
 (0)