Commit 33d4d27
Latent MOE & Repeated MTP support for NemotronH; fix KV cache quant export (#830)
## What does this PR do?
**Type of change:** New feature and bug fix
**Overview:**
Support Latent MOE and Repeated MTP for NemotronH models
- Enable latent MOE modules during megatron import/export
- Fix KV cache quantization export: remove old
`qkv_layer.output_quantizer` export & replace with proper
`k/v_bmm_quantizer` logic
- Improvements to EP amax sync
- Support repeated MTP import/export for NemotronH models (only BF16
export for MTP for now)
## Usage
<!-- You can potentially add a usage example below. -->
```python
# Add a code snippet demonstrating how to use this
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added support for grouped MLP and self-attention scaling operations in
model export workflows
* Enhanced model parallel training capabilities with improved component
mapping
* Expanded quantization configuration handling with dynamic module
exclusion across distributed ranks
* Improved support for additional transformer engine components
* **Refactor**
* Reorganized internal export and import logic for improved
maintainability and specialist model architecture support
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: jenchen13 <jennifchen@nvidia.com>
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
Signed-off-by: Jenny Chen <jennifchen@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>1 parent 9b84e13 commit 33d4d27
11 files changed
Lines changed: 849 additions & 521 deletions
File tree
- modelopt/torch
- export
- plugins
- quantization
- plugins
- tests
- _test_utils/torch/megatron
- gpu/torch
- export
- quantization/plugins
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
106 | 118 | | |
107 | 119 | | |
108 | 120 | | |
| |||
127 | 139 | | |
128 | 140 | | |
129 | 141 | | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
130 | 154 | | |
131 | 155 | | |
132 | 156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
| 43 | + | |
41 | 44 | | |
42 | 45 | | |
43 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| 30 | + | |
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
| |||
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| 40 | + | |
38 | 41 | | |
39 | 42 | | |
40 | 43 | | |
| |||
81 | 84 | | |
82 | 85 | | |
83 | 86 | | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
84 | 102 | | |
85 | 103 | | |
86 | | - | |
87 | 104 | | |
88 | 105 | | |
89 | 106 | | |
| |||
101 | 118 | | |
102 | 119 | | |
103 | 120 | | |
| 121 | + | |
104 | 122 | | |
105 | 123 | | |
106 | 124 | | |
| |||
115 | 133 | | |
116 | 134 | | |
117 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
118 | 144 | | |
0 commit comments