Commit 1dc890d
authored
Remove _moe_count_expert_calib_tokens flag; tie token counting to moe_calib_experts_ratio (#1062)
Cherry-pick for 0.43.0
## Summary
- **Remove `moe_count_expert_calib_tokens`** config field and the
`_moe_count_expert_calib_tokens` internal flag. Token counting is now
implicitly enabled when `moe_calib_experts_ratio` is set, removing a
redundant knob.
- **Change `--moe_calib_experts_ratio` default to `None`** in
`hf_ptq.py` (was `1.0`). Previously all experts were force-calibrated by
default; now the feature is opt-in and non-MoE models are unaffected
without any flag.
- **Disable `layer_sync_moe_local_experts_amax`** when
`moe_calib_experts_ratio` is set, since each expert is calibrated
independently with sufficient token coverage in that mode.
- **Simplify `_QuantSparseMoe.forward`**: remove redundant truthy checks
on `_moe_calib_experts_ratio` inside the branch that already assumes it
is set.
## Changed files
| File | Change |
|------|--------|
| `modelopt/torch/quantization/config.py` | Remove
`moe_count_expert_calib_tokens` field; update `moe_calib_experts_ratio`
description to document amax sync behavior |
| `modelopt/torch/quantization/mode.py` | Remove
`moe_count_expert_calib_tokens` propagation in `wrapped_calib_func` |
| `modelopt/torch/quantization/plugins/huggingface.py` | Remove
`_moe_count_expert_calib_tokens` from `_QuantSparseMoe`; simplify
`forward`; skip `layer_sync_moe_local_experts_amax` when ratio is set |
| `examples/llm_ptq/hf_ptq.py` | Default `--moe_calib_experts_ratio` to
`None`; guard validation |
| `tests/unit/.../test_sparse_moe.py` | Update tests to use
`_moe_calib_experts_ratio` instead of removed flag |
## Test plan
- [x] Verify `hf_ptq.py` works without `--moe_calib_experts_ratio`
(non-MoE model, default `None`)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Configuration Changes**
* moe_calib_experts_ratio now defaults to None (disabled) instead of
1.0; validation only occurs when a value is provided.
* **Refactor**
* Simplified MoE calibration flow and token-counting behavior; removed a
deprecated expert-calibration configuration field.
* **Documentation**
* Changelog and docstrings updated to reflect the new default and
calibration behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>1 parent c76633a commit 1dc890d
File tree
6 files changed
+35
-46
lines changed- examples/llm_ptq
- modelopt/torch/quantization
- plugins
- tests/unit/torch/quantization/plugins
6 files changed
+35
-46
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1207 | 1207 | | |
1208 | 1208 | | |
1209 | 1209 | | |
1210 | | - | |
| 1210 | + | |
1211 | 1211 | | |
1212 | 1212 | | |
1213 | | - | |
| 1213 | + | |
1214 | 1214 | | |
1215 | 1215 | | |
1216 | 1216 | | |
1217 | 1217 | | |
1218 | 1218 | | |
1219 | | - | |
| 1219 | + | |
1220 | 1220 | | |
1221 | 1221 | | |
1222 | 1222 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1066 | 1066 | | |
1067 | 1067 | | |
1068 | 1068 | | |
| 1069 | + | |
| 1070 | + | |
1069 | 1071 | | |
1070 | 1072 | | |
1071 | 1073 | | |
1072 | 1074 | | |
1073 | | - | |
1074 | | - | |
1075 | | - | |
1076 | | - | |
1077 | | - | |
1078 | | - | |
1079 | | - | |
1080 | | - | |
1081 | | - | |
1082 | | - | |
1083 | | - | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
1084 | 1078 | | |
1085 | 1079 | | |
1086 | 1080 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
236 | 236 | | |
237 | 237 | | |
238 | 238 | | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | 239 | | |
246 | 240 | | |
247 | 241 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
446 | 446 | | |
447 | 447 | | |
448 | 448 | | |
449 | | - | |
| 449 | + | |
450 | 450 | | |
451 | | - | |
| 451 | + | |
452 | 452 | | |
453 | | - | |
| 453 | + | |
454 | 454 | | |
455 | 455 | | |
456 | 456 | | |
457 | 457 | | |
458 | | - | |
459 | 458 | | |
460 | 459 | | |
461 | 460 | | |
| |||
503 | 502 | | |
504 | 503 | | |
505 | 504 | | |
506 | | - | |
| 505 | + | |
507 | 506 | | |
508 | 507 | | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | 508 | | |
513 | | - | |
514 | | - | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | | - | |
519 | | - | |
520 | | - | |
521 | | - | |
522 | | - | |
523 | | - | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
524 | 517 | | |
525 | 518 | | |
526 | 519 | | |
| |||
561 | 554 | | |
562 | 555 | | |
563 | 556 | | |
564 | | - | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
565 | 563 | | |
566 | 564 | | |
567 | 565 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
205 | | - | |
206 | 205 | | |
207 | 206 | | |
208 | 207 | | |
| |||
259 | 258 | | |
260 | 259 | | |
261 | 260 | | |
262 | | - | |
| 261 | + | |
263 | 262 | | |
264 | 263 | | |
265 | 264 | | |
266 | 265 | | |
267 | 266 | | |
268 | 267 | | |
269 | | - | |
| 268 | + | |
270 | 269 | | |
271 | 270 | | |
272 | 271 | | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
273 | 277 | | |
274 | 278 | | |
275 | 279 | | |
| |||
305 | 309 | | |
306 | 310 | | |
307 | 311 | | |
308 | | - | |
309 | | - | |
| 312 | + | |
310 | 313 | | |
311 | 314 | | |
312 | 315 | | |
| |||
0 commit comments