Commit ac7c985
authored
[NVBUG: 5804406] Auto detect MOE layers (#900)
## What does this PR do?
**Type of change:** New feature, new tests
**Overview:** Replace hardcoded per-model MoE class registrations
(Mixtral, Qwen2Moe, Qwen3Moe, Qwen3Next, Llama4TextMoe, Qwen3VLMoe,
MiniMaxM2, etc.) with a single generic auto-detection mechanism
(`register_sparse_moe_on_the_fly`) that walks the model tree and
identifies MoE blocks by their structural attributes (`gate` + `experts`
with `top_k`/`num_experts`). This makes MoE quantization
forward-compatible with new HuggingFace MoE architectures without
requiring explicit registration for each model family.
Additionally, this PR:
- Tracks per-expert token routing counts during calibration via a gate
forward hook, enabling visibility into expert utilization.
- Saves an HTML report of expert token counts during export
(`save_expert_token_count_table`), highlighting under-utilized experts.
- Fixes the `topk` -> `top_k` attribute name for transformers >= 5.0
compatibility.
- Also move the ptq summary prints to a file in hf_ptq.py to reduce the
prints
## Usage
Auto-detection is transparent -- no user-facing API changes are needed.
Any HuggingFace MoE model with the standard `gate`/`experts` pattern is
automatically detected and quantized:
import modelopt.torch.quantization as mtq
# Any HuggingFace MoE model (Mixtral, Qwen3Moe, DeepSeek, etc.)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
mtq.quantize(model, mtq.INT8_DEFAULT_CFG, forward_loop)
# During export, an .moe.html report with per-expert token counts is
saved automatically
## Testing
unittest, also test exporting qwen MOE
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added expert token count visualization for Mixture of Experts models,
exported as HTML reports during model export.
* Enhanced sparse MoE quantization with improved calibration-aware
routing and automatic model block detection.
* **Tests**
* Added comprehensive test suite for sparse MoE quantization validation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>1 parent c4b662f commit ac7c985
File tree
7 files changed
+531
-77
lines changed- examples/llm_ptq
- modelopt/torch
- export
- quantization
- plugins
- tests/unit/torch/quantization/plugins
7 files changed
+531
-77
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| |||
726 | 727 | | |
727 | 728 | | |
728 | 729 | | |
729 | | - | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
730 | 736 | | |
731 | 737 | | |
732 | 738 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
508 | 508 | | |
509 | 509 | | |
510 | 510 | | |
511 | | - | |
| 511 | + | |
512 | 512 | | |
513 | | - | |
514 | | - | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | | - | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
519 | 531 | | |
520 | 532 | | |
521 | 533 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
450 | 450 | | |
451 | 451 | | |
452 | 452 | | |
453 | | - | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
454 | 490 | | |
455 | 491 | | |
456 | | - | |
| 492 | + | |
| 493 | + | |
457 | 494 | | |
458 | 495 | | |
459 | 496 | | |
460 | 497 | | |
461 | | - | |
462 | | - | |
463 | | - | |
464 | | - | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
465 | 501 | | |
466 | | - | |
| 502 | + | |
467 | 503 | | |
468 | 504 | | |
469 | 505 | | |
| |||
475 | 511 | | |
476 | 512 | | |
477 | 513 | | |
478 | | - | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
479 | 519 | | |
480 | 520 | | |
481 | 521 | | |
| |||
765 | 805 | | |
766 | 806 | | |
767 | 807 | | |
768 | | - | |
769 | | - | |
770 | | - | |
771 | | - | |
| 808 | + | |
772 | 809 | | |
773 | 810 | | |
774 | 811 | | |
| |||
791 | 828 | | |
792 | 829 | | |
793 | 830 | | |
794 | | - | |
795 | | - | |
796 | | - | |
797 | | - | |
798 | | - | |
799 | | - | |
800 | | - | |
801 | | - | |
802 | | - | |
803 | | - | |
804 | 831 | | |
805 | 832 | | |
806 | 833 | | |
| |||
809 | 836 | | |
810 | 837 | | |
811 | 838 | | |
812 | | - | |
813 | | - | |
814 | | - | |
815 | | - | |
816 | | - | |
817 | | - | |
818 | | - | |
819 | | - | |
820 | | - | |
821 | | - | |
822 | | - | |
823 | | - | |
824 | | - | |
825 | | - | |
826 | | - | |
827 | | - | |
828 | | - | |
829 | | - | |
830 | | - | |
831 | | - | |
832 | | - | |
833 | | - | |
834 | | - | |
835 | | - | |
836 | | - | |
837 | | - | |
838 | | - | |
839 | | - | |
840 | | - | |
841 | | - | |
842 | 839 | | |
843 | 840 | | |
844 | 841 | | |
| |||
850 | 847 | | |
851 | 848 | | |
852 | 849 | | |
853 | | - | |
854 | | - | |
855 | | - | |
856 | | - | |
857 | | - | |
858 | | - | |
859 | | - | |
860 | | - | |
861 | | - | |
| 850 | + | |
862 | 851 | | |
863 | 852 | | |
864 | 853 | | |
| |||
989 | 978 | | |
990 | 979 | | |
991 | 980 | | |
992 | | - | |
993 | | - | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
994 | 987 | | |
995 | | - | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
996 | 992 | | |
997 | | - | |
998 | | - | |
999 | | - | |
1000 | | - | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
1001 | 1031 | | |
1002 | 1032 | | |
1003 | 1033 | | |
| |||
1065 | 1095 | | |
1066 | 1096 | | |
1067 | 1097 | | |
1068 | | - | |
| 1098 | + | |
1069 | 1099 | | |
1070 | 1100 | | |
1071 | 1101 | | |
| |||
0 commit comments