Commit 307fe71
authored
Fix QuantSequentialMLP sharded_state_dict (#742)
## What does this PR do?
**Type of change:** ? <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. --> Bug
**Overview:** ?
These fixes are needed for Megatron-LM `main` branch due to some changes
in `sharded_state_dict`. Qwen3-30B-A3B PTQ and resume fails while EP=4
cannot load a checkpoint generated with PP=4.
`singleton_local_shards` must be added to the metadata; otherwise, all
experts `amax` are packed to gather and currently the TP `replica_id`
for `linear_fc1` is incorrect.
**Other Finding:** This limits TP=ETP=1 when EP>1. Otherwise, there will
be `sharded_state_dict` access error. There is a potential blind spot of
using the default TP group in `ColumnParallelLinear` and
`RowParallelLinear` since it can be part of the MoE where the tensor
parallelism is controlled by ETP instead. Will need a different PR to
fix the parallel_state.
**Results:** If calibrate with EP=1, mmlu = 0.80. This can be resumed
with EP=4, TP=1, ETP=1 (TP>1 does not work as mentioned above). However
if calibrated with EP=4, then mmlu = 0.71 which shows there are some
issues with max sync in EP.
## Usage
<!-- You can potentially add a usage example below. -->
```python
# Add a code snippet demonstrating how to use this
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
---------
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>1 parent 6f18490 commit 307fe71
File tree
2 files changed
+37
-2
lines changed- modelopt/torch
- opt/plugins
- quantization/plugins
2 files changed
+37
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
158 | 167 | | |
159 | 168 | | |
160 | 169 | | |
| |||
163 | 172 | | |
164 | 173 | | |
165 | 174 | | |
166 | | - | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
167 | 178 | | |
168 | 179 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
551 | 552 | | |
552 | 553 | | |
553 | 554 | | |
554 | | - | |
| 555 | + | |
555 | 556 | | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
556 | 565 | | |
557 | 566 | | |
558 | 567 | | |
| |||
592 | 601 | | |
593 | 602 | | |
594 | 603 | | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
595 | 619 | | |
596 | 620 | | |
597 | 621 | | |
| |||
0 commit comments