Commit 7b34de6
authored
Unify weight_scale_2 between gate_proj/up_proj (and w1/w3) in the HF export path for MOE models (#1033)
### What does this PR do?
Unify `weight_scale_2` between `gate_proj/up_proj` (and `w1/w3`) in the
HF export path for MOE models. Serving engines fuse these projections
into a single `gate_up_proj` and require a shared scale; this takes the
element-wise max of the two independent scales as a conservative choice
that avoids overflow.
Type of change: ? Bug fix
### Usage
```python
# Add a code snippet demonstrating how to use this
```
### Testing
<!-- Mention how have you tested your change if applicable. -->
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅ / N/A <!--- If ❌, explain why.
-->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A
<!--- Mandatory -->
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory
for new features or examples. -->
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Automatic synchronization of quantization scaling between
Mixture-of-Experts gate and up projections during model export for
non‑fused MoE setups (e.g., Qwen MoE, DeepSeek).
* **Bug Fixes / Improvements**
* Export now emits a brief notification when gate/up scaling values are
adjusted to ensure consistent quantization.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>1 parent 1070d89 commit 7b34de6
2 files changed
+62
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1171 | 1171 | | |
1172 | 1172 | | |
1173 | 1173 | | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
1174 | 1223 | | |
1175 | 1224 | | |
1176 | 1225 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| |||
775 | 776 | | |
776 | 777 | | |
777 | 778 | | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
778 | 791 | | |
779 | 792 | | |
780 | 793 | | |
| |||
0 commit comments