Skip to content

Commit 6b0ea4d

Browse files
authored
Fix MOE layer sync test (#936)
## What does this PR do? **Type of change:** Bug Fix **Overview:** ? Fix MOE layer sync test by initializing weights in MOE layer differently [Link to bug](https://github.com/NVIDIA/Model-Optimizer/actions/runs/22288772586/job/64472124733#step:7:851) ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> https://github.com/NVIDIA/Model-Optimizer/actions/runs/22414958311 ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Tests** * Enhanced quantization testing with improved expert weight initialization patterns for expert-based models. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
1 parent 4eacb0d commit 6b0ea4d

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

tests/gpu_megatron/torch/quantization/plugins/test_megatron.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -765,6 +765,12 @@ def _test_layer_sync_moe_local_experts_amax(ep_size, moe_grouped_gemm, rank, siz
765765
num_moe_experts=8,
766766
transformer_impl="modelopt",
767767
)
768+
# Make weight initialization different across experts, otherwise experts will have similar amax values
769+
for layer in model.decoder.layers:
770+
for i, expert in enumerate(layer.mlp.experts.local_experts):
771+
expert.linear_fc1.weight.data.fill_(0.1 + i * 0.05)
772+
expert.linear_fc2.weight.data.fill_(0.2 + i * 0.05)
773+
768774
quant_cfg = mtq.FP8_DEFAULT_CFG
769775
model = mtq.quantize(model, quant_cfg, get_forward(model))
770776

0 commit comments

Comments
 (0)