Skip to content

Fix MOE layer sync test#936

Merged
kevalmorabia97 merged 1 commit intomainfrom
jennifchen/fix_moe_test
Feb 26, 2026
Merged

Fix MOE layer sync test#936
kevalmorabia97 merged 1 commit intomainfrom
jennifchen/fix_moe_test

Conversation

@jenchen13
Copy link
Copy Markdown
Contributor

@jenchen13 jenchen13 commented Feb 25, 2026

What does this PR do?

Type of change: Bug Fix

Overview: ?
Fix MOE layer sync test by initializing weights in MOE layer differently
Link to bug

Usage

# Add a code snippet demonstrating how to use this

Testing

https://github.com/NVIDIA/Model-Optimizer/actions/runs/22414958311

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

  • Tests
    • Enhanced quantization testing with improved expert weight initialization patterns for expert-based models.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
@jenchen13 jenchen13 requested a review from realAsma February 25, 2026 18:45
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

The change adds per-expert weight initialization in a test file to diversify amax values across local experts by assigning unique weight constants to each expert's linear layers prior to quantization.

Changes

Cohort / File(s) Summary
Per-Expert Weight Initialization
tests/gpu_megatron/torch/quantization/plugins/test_megatron.py
Added initialization logic that assigns unique constants to decoder layer MLP experts' linear layer weights (linear_fc1 and linear_fc2) for each local expert to diversify amax values before quantization and synchronization.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Fix MOE layer sync test' is vague and does not clearly convey the specific change made (per-expert weight initialization). It describes what was fixed (the test) but not the actual solution. Consider a more specific title such as 'Initialize per-expert weights to diversify amax in MOE layer sync test' to better communicate the implementation details of the fix.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jennifchen/fix_moe_test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/gpu_megatron/torch/quantization/plugins/test_megatron.py (1)

771-772: Prefer torch.no_grad() over .data for test initialization.

Lines 771–772 use .data.fill_() for weight initialization. The idiomatic PyTorch pattern is to wrap parameter mutations in torch.no_grad() context and call fill_() directly on the parameter.

♻️ Proposed fix
-    for layer in model.decoder.layers:
-        for i, expert in enumerate(layer.mlp.experts.local_experts):
-            expert.linear_fc1.weight.data.fill_(0.1 + i * 0.05)
-            expert.linear_fc2.weight.data.fill_(0.2 + i * 0.05)
+    with torch.no_grad():
+        for layer in model.decoder.layers:
+            for i, expert in enumerate(layer.mlp.experts.local_experts):
+                expert.linear_fc1.weight.fill_(0.1 + i * 0.05)
+                expert.linear_fc2.weight.fill_(0.2 + i * 0.05)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu_megatron/torch/quantization/plugins/test_megatron.py` around lines
771 - 772, The test initializes expert weights using the deprecated .data API;
replace those mutations with a torch.no_grad() context and call fill_() directly
on the parameter tensors (e.g., expert.linear_fc1.weight and
expert.linear_fc2.weight) so the assignments occur without tracking gradients
and avoid .data usage—wrap the two fill_() calls inside torch.no_grad(): with
torch.no_grad(): expert.linear_fc1.weight.fill_(...) and
expert.linear_fc2.weight.fill_(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/gpu_megatron/torch/quantization/plugins/test_megatron.py`:
- Around line 771-772: The test initializes expert weights using the deprecated
.data API; replace those mutations with a torch.no_grad() context and call
fill_() directly on the parameter tensors (e.g., expert.linear_fc1.weight and
expert.linear_fc2.weight) so the assignments occur without tracking gradients
and avoid .data usage—wrap the two fill_() calls inside torch.no_grad(): with
torch.no_grad(): expert.linear_fc1.weight.fill_(...) and
expert.linear_fc2.weight.fill_(...).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e589ac8 and e3e98fe.

📒 Files selected for processing (1)
  • tests/gpu_megatron/torch/quantization/plugins/test_megatron.py

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.17%. Comparing base (d78797b) to head (e3e98fe).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #936      +/-   ##
==========================================
- Coverage   73.09%   72.17%   -0.92%     
==========================================
  Files         205      207       +2     
  Lines       22301    22656     +355     
==========================================
+ Hits        16300    16352      +52     
- Misses       6001     6304     +303     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevalmorabia97 kevalmorabia97 merged commit 6b0ea4d into main Feb 26, 2026
50 of 52 checks passed
@kevalmorabia97 kevalmorabia97 deleted the jennifchen/fix_moe_test branch February 26, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants