Commit 35d0f52
Fix Sequential MLP amax sync deadlock (#862)
## What does this PR do?
**Type of change:** ? <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. -->
Bug fix
**Overview:** ?
After `QuantMoELayer`, we rely on `layer_sync_moe_local_experts_amax` to
first perform local sync. This is supposed to create
`input_quantizer.amax` for all experts but the current logic will only
update experts that already have `amax`. This results in some experts
are still missing `amax`.
With the fact above, `sync_quantizer_amax_across_dp_ep` will actually
deadlock seems the collective is called based on whether
`quantizer._amax is None`. Any expert with `None` amax will not call
collective hence will never arrive the collective and cause a deadlock.
We fix `layer_sync_moe_local_experts_amax` such that even if an expert
does not have `amax`, we will overwrite it with a clone of the global
amax. The post condition should be all experts have `amax` and the pre
condition of `sync_quantizer_amax_across_dp_ep` should be the same.
**Note:** we found that `_check_moe_calibration_complete` actually
didn't raise any error even some experts have no amax. Didn't look into
this problem.
## Usage
<!-- You can potentially add a usage example below. -->
```python
# Add a code snippet demonstrating how to use this
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Improved synchronization of quantization parameters for Mixture of
Experts (MoE) models with more flexible configuration support.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>1 parent b11d49b commit 35d0f52
File tree
2 files changed
+10
-4
lines changed- modelopt/torch/quantization
- plugins
2 files changed
+10
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
118 | | - | |
119 | | - | |
| 118 | + | |
| 119 | + | |
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
583 | 583 | | |
584 | 584 | | |
585 | 585 | | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
586 | 592 | | |
587 | 593 | | |
588 | 594 | | |
| |||
600 | 606 | | |
601 | 607 | | |
602 | 608 | | |
603 | | - | |
604 | | - | |
| 609 | + | |
| 610 | + | |
605 | 611 | | |
606 | 612 | | |
607 | 613 | | |
| |||
0 commit comments