Commit 1f4a489
authored
Adds AutoQuant support for VLM / Qwen3.5-Qwen3.6 style models (#1381)
### What does this PR do?
Type of change: new feature, bug fix, new tests
### Details
- Enables AutoQuant search over fused MoE expert containers by
snapshotting/restoring their per-expert quantizers.
- Adds Qwen3.5/3.6 linear-attention grouping rules so fused deployment
layers keep compatible quant formats.
- Supports `w4a16_nvfp4` as an AutoQuant search format.
- Preserves disabled AutoQuant layer patterns in generated configs while
allowing selected modules like `lm_head` to override default disables.
- Keeps recipe-mode and AutoQuantize VLM paths on the outer CausalLM so
Qwen3.5/3.6-MoE `lm_head` remains visible.
- Skips `parent_class`-scoped quant config entries during AutoQuant bare
quantizer matching, preventing class-scoped global entries from
last-match overriding every selected module.
- Adds temporary hardcoded Qwen/VLM AutoQuant disabled-layer patterns in
`hf_ptq.py` with a TODO to refactor into the config system.
### Usage
```bash
python examples/llm_ptq/hf_ptq.py \
--pyt_ckpt_path <model_path> \
--qformat fp8,w4a16_nvfp4 \
--auto_quantize_bits 5.0 \
--auto_quantize_cost_model active_moe \
--auto_quantize_checkpoint <autoquant_state.pt> \
--export_path <output_dir>
```
### Testing
- `/Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest
tests/unit/torch/quantization/test_autoquant.py::test_get_auto_quantize_config_keeps_selected_lm_head_enabled
tests/unit/torch/quantization/test_config_validation.py::TestMatchQuantizerCfg::test_parent_class_scoped_entries_are_ignored_for_bare_autoquant_lookup`
- `/Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_config_validation.py -k "not
data_parallel"` (`120 passed, 1 deselected`)
- `/Users/weimingc/miniconda3/envs/modelopt/bin/python -m py_compile
examples/llm_ptq/hf_ptq.py modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/_auto_quantize_cost.py
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_config_validation.py`
- Full local affected-file pytest without `-k "not data_parallel"` only
failed `test_data_parallel_auto_quantize` because this local sandbox
cannot bind a free socket (`PermissionError: Operation not permitted`).
- Ran Qwen3.6 35B AutoQuant e2e with `fp8,w4a16_nvfp4` and exported a
checkpoint.
- Verified exported checkpoint loads in vLLM nightly without local
patches.
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A
### Additional Information
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added w4a16_nvfp4 quantization format and optional cost-exclusion
patterns for AutoQuantize.
* **Improvements**
* Safer multimodal/VLM handling and AutoQuantize now runs on the full
outer model when applicable.
* Better fused-MoE support, more accurate weight accounting, and refined
attention-grouping for improved quantization choices.
* Dynamic layer-disabling support for targeted disables.
* **Tests**
* New unit tests covering cost-model exclusions, fused-MoE accounting,
and config selection.
* **Documentation**
* Updated cost-constraint example to show exclusion-pattern usage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>1 parent 1555e6d commit 1f4a489
9 files changed
Lines changed: 556 additions & 50 deletions
File tree
- examples/llm_ptq
- modelopt/torch/quantization
- plugins
- tests
- examples/llm_ptq
- unit/torch/quantization
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
45 | 48 | | |
46 | 49 | | |
47 | 50 | | |
| |||
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
54 | 109 | | |
55 | 110 | | |
56 | 111 | | |
| |||
133 | 188 | | |
134 | 189 | | |
135 | 190 | | |
136 | | - | |
137 | 191 | | |
138 | 192 | | |
139 | 193 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
72 | 74 | | |
73 | 75 | | |
74 | 76 | | |
75 | | - | |
| 77 | + | |
| 78 | + | |
76 | 79 | | |
77 | 80 | | |
78 | 81 | | |
| |||
132 | 135 | | |
133 | 136 | | |
134 | 137 | | |
| 138 | + | |
135 | 139 | | |
136 | 140 | | |
137 | 141 | | |
| |||
387 | 391 | | |
388 | 392 | | |
389 | 393 | | |
| 394 | + | |
390 | 395 | | |
391 | | - | |
392 | | - | |
393 | | - | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
394 | 402 | | |
395 | 403 | | |
396 | 404 | | |
| |||
406 | 414 | | |
407 | 415 | | |
408 | 416 | | |
409 | | - | |
410 | | - | |
411 | | - | |
412 | | - | |
413 | | - | |
414 | | - | |
| 417 | + | |
415 | 418 | | |
416 | 419 | | |
417 | 420 | | |
| |||
487 | 490 | | |
488 | 491 | | |
489 | 492 | | |
490 | | - | |
| 493 | + | |
491 | 494 | | |
492 | 495 | | |
493 | 496 | | |
| |||
539 | 542 | | |
540 | 543 | | |
541 | 544 | | |
542 | | - | |
543 | | - | |
544 | | - | |
545 | | - | |
546 | | - | |
547 | | - | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
548 | 549 | | |
549 | 550 | | |
550 | 551 | | |
| |||
1070 | 1071 | | |
1071 | 1072 | | |
1072 | 1073 | | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
| 1078 | + | |
| 1079 | + | |
| 1080 | + | |
1073 | 1081 | | |
1074 | 1082 | | |
1075 | | - | |
| 1083 | + | |
1076 | 1084 | | |
1077 | 1085 | | |
1078 | 1086 | | |
| |||
1437 | 1445 | | |
1438 | 1446 | | |
1439 | 1447 | | |
| 1448 | + | |
| 1449 | + | |
1440 | 1450 | | |
1441 | 1451 | | |
1442 | 1452 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
90 | 92 | | |
91 | 93 | | |
92 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
93 | 115 | | |
94 | 116 | | |
95 | 117 | | |
96 | 118 | | |
97 | | - | |
| 119 | + | |
98 | 120 | | |
99 | 121 | | |
100 | 122 | | |
| |||
103 | 125 | | |
104 | 126 | | |
105 | 127 | | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
106 | 141 | | |
107 | 142 | | |
108 | 143 | | |
109 | 144 | | |
110 | 145 | | |
111 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
112 | 157 | | |
113 | 158 | | |
114 | 159 | | |
| |||
119 | 164 | | |
120 | 165 | | |
121 | 166 | | |
122 | | - | |
| 167 | + | |
123 | 168 | | |
124 | 169 | | |
125 | 170 | | |
| |||
135 | 180 | | |
136 | 181 | | |
137 | 182 | | |
138 | | - | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
139 | 186 | | |
140 | 187 | | |
141 | 188 | | |
| |||
164 | 211 | | |
165 | 212 | | |
166 | 213 | | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
167 | 217 | | |
168 | 218 | | |
169 | | - | |
| 219 | + | |
170 | 220 | | |
171 | 221 | | |
172 | 222 | | |
| |||
0 commit comments