Commit 077e29a
authored
[NVBug 6108145] Fix PTQ calibration and export for fused-experts MoE (Qwen3.5-MoE VLM) (#1340)
### What does this PR do?
Type of change: Bug fix
Fixes a 4-bug cascade that caused silent PTQ failure on Qwen3.5-MoE VLMs
(Qwen3.6-35B-A3B): calibration
appeared to succeed but produced token-salad at inference. Root cause:
HF's @use_experts_implementation
dispatches expert forward to torch._grouped_mm / torch.bmm, bypassing
the F.linear hook that captures
activations — so gate_up_proj_input_quantizer /
down_proj_input_quantizer never calibrated and no input_scale
tensors were emitted.
Changes:
- examples/llm_ptq/hf_ptq.py — force config._experts_implementation =
"eager" (recursing into text_config /
vision_config / …) so per-expert F.linear calls are visible to the
calibration hook.
- modelopt/torch/quantization/conversion.py — normalize plural
ModuleList quantizer names (weight_quantizers.N
→ weight_quantizer) before fnmatch, so wildcards like
*mlp.experts*weight_quantizer match fused-expert
quantizers.
- modelopt/torch/export/unified_export_hf.py — hoist the
_QuantFusedExperts export branch above the
get_quantization_format() gate so _export_fused_experts() runs even when
the top-level format query returns
QUANTIZATION_NONE (happens for experts-only recipes).
- modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml —
layerwise: false (VLM nested layer structure
breaks the layerwise walker).
<!-- Details about the change. -->
### Usage
```python
python examples/llm_ptq/hf_ptq.py \
--pyt_ckpt_path Qwen/Qwen3.6-35B-A3B \
--qformat nvfp4 \
--kv_cache_qformat fp8 \
--calib_size 512 \
--export_path Qwen3.6-35B-A3B-NVFP4
```
### Testing
<!-- Mention how have you tested your change if applicable. -->
Testing
End-to-end PTQ → vLLM deploy → NEL eval on Qwen3.6-35B-A3B (256 experts
× 40 layers, 35B params):
Hook-call diagnostic: 0 → 6720 per-expert F.linear calls during
calibration after the fix; 0 → 30720
input_scale tensors emitted in the exported checkpoint.
FP8 fused-MoE path still produces gibberish — separate follow-up (vLLM
per-expert weight_scale handling).
* vLLM full-FP8: the FlashInfer TRTLLM Fp8MoE loader doesn't stack the
256 per-expert scalar weight_scale tensors
into a [num_experts] per-expert vector — it ends up applying one
expert's scale across all 256, so every
routed expert dequants with the wrong amplitude → coherent token stream
collapses into multilingual gibberish.
* SGLang full-FP8: qwen3_5.py::_make_packed_weight_loader rejects with
AssertionError: Unexpected scalar for
tuple shard load: loaded_shard_id=(0,1,2), split_sizes=[1,1,1] — its
packed-loader has no path for "N
independent per-tensor source scalars combining into one fused-shard
parameter," so the fused QKV (or
in_proj_qkvz) load is structurally refused and the model never finishes
loading.
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain
why. -->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A
<!--- Mandatory -->
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory
for new features or examples. -->
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Better fused-expert export flow, a plugin to force eager expert
execution during calibration/export, and a representative quantizer
discovery utility.
* **Bug Fixes**
* Reliable matching/discovery of per-expert indexed quantizers enabling
correct calibration and mixed-precision export; fixes for calibration in
nested decoder layouts.
* **Documentation**
* Clarified PTQ config guidance on layerwise calibration.
* **Tests**
* Added fused-experts calibration, export, and name-normalization tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>1 parent e5ce0ae commit 077e29a
9 files changed
Lines changed: 533 additions & 30 deletions
File tree
- modelopt_recipes/general/ptq
- modelopt/torch
- export
- plugins
- quantization
- plugins
- utils
- tests/unit/torch/quantization/plugins
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
51 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
55 | | - | |
| 56 | + | |
56 | 57 | | |
57 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
58 | 62 | | |
59 | 63 | | |
60 | 64 | | |
| |||
142 | 146 | | |
143 | 147 | | |
144 | 148 | | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
145 | 199 | | |
146 | 200 | | |
147 | 201 | | |
| |||
159 | 213 | | |
160 | 214 | | |
161 | 215 | | |
| 216 | + | |
162 | 217 | | |
163 | 218 | | |
164 | 219 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
546 | 547 | | |
547 | 548 | | |
548 | 549 | | |
549 | | - | |
| 550 | + | |
550 | 551 | | |
551 | 552 | | |
552 | 553 | | |
| |||
572 | 573 | | |
573 | 574 | | |
574 | 575 | | |
575 | | - | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
576 | 581 | | |
577 | 582 | | |
578 | 583 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| 91 | + | |
91 | 92 | | |
92 | 93 | | |
93 | 94 | | |
| |||
642 | 643 | | |
643 | 644 | | |
644 | 645 | | |
| 646 | + | |
| 647 | + | |
645 | 648 | | |
646 | 649 | | |
647 | 650 | | |
648 | 651 | | |
649 | | - | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
650 | 660 | | |
651 | 661 | | |
652 | 662 | | |
| |||
677 | 687 | | |
678 | 688 | | |
679 | 689 | | |
680 | | - | |
681 | | - | |
682 | | - | |
683 | | - | |
684 | | - | |
685 | | - | |
686 | | - | |
687 | 690 | | |
688 | 691 | | |
689 | 692 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
286 | 287 | | |
287 | 288 | | |
288 | 289 | | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
289 | 317 | | |
290 | 318 | | |
291 | 319 | | |
| |||
296 | 324 | | |
297 | 325 | | |
298 | 326 | | |
299 | | - | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
300 | 332 | | |
301 | 333 | | |
302 | 334 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
900 | 900 | | |
901 | 901 | | |
902 | 902 | | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
903 | 930 | | |
904 | 931 | | |
905 | 932 | | |
| |||
1438 | 1465 | | |
1439 | 1466 | | |
1440 | 1467 | | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
| 1489 | + | |
| 1490 | + | |
| 1491 | + | |
| 1492 | + | |
| 1493 | + | |
| 1494 | + | |
| 1495 | + | |
| 1496 | + | |
| 1497 | + | |
| 1498 | + | |
| 1499 | + | |
1441 | 1500 | | |
1442 | 1501 | | |
1443 | 1502 | | |
| |||
1665 | 1724 | | |
1666 | 1725 | | |
1667 | 1726 | | |
| 1727 | + | |
1668 | 1728 | | |
1669 | 1729 | | |
1670 | 1730 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
205 | | - | |
206 | | - | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
207 | 214 | | |
208 | | - | |
209 | | - | |
210 | | - | |
| 215 | + | |
211 | 216 | | |
212 | 217 | | |
213 | 218 | | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
219 | 223 | | |
220 | | - | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
221 | 249 | | |
| 250 | + | |
| 251 | + | |
222 | 252 | | |
223 | | - | |
224 | | - | |
225 | | - | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
226 | 256 | | |
227 | 257 | | |
228 | 258 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
0 commit comments