Megatron-Bridge import example in launcher for Nemotron Super V3#1516
Megatron-Bridge import example in launcher for Nemotron Super V3#1516jenchen13 wants to merge 1 commit into
Conversation
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
📝 WalkthroughWalkthroughThis PR integrates Megatron-Bridge into the NVIDIA Model-Optimizer launcher infrastructure to enable checkpoint import from HuggingFace models. It adds Megatron-Bridge as a git submodule, provides a shell script to orchestrate the import process with environment setup, updates the launcher's packaging configuration to distribute Megatron-Bridge artifacts, and includes a YAML example for importing NVIDIA-Nemotron-3-Super-120B weights. ChangesMegatron-Bridge Integration for Launcher
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tools/launcher/launch.py (1)
74-74: ⚡ Quick winAvoid hardcoding
relative_pathlength.This is easy to desync from
include_patternand can silently break packager mapping when entries change.Proposed refactor
-packager = run.PatternPackager( - include_pattern=[ +_include_patterns = [ "modules/Megatron-LM/megatron/*", "modules/Megatron-LM/examples/*", "modules/Megatron-LM/*.py", "modules/Megatron-Bridge/src/*", "modules/Megatron-Bridge/examples/*", "modules/Megatron-Bridge/pyproject.toml", "modules/Megatron-Bridge/README.md", "modules/Model-Optimizer/modelopt/*", "modules/Model-Optimizer/modelopt_recipes/*", "modules/Model-Optimizer/examples/*", "examples/*", "common/*", - ], - relative_path=[LAUNCHER_DIR] * 12, +] + +packager = run.PatternPackager( + include_pattern=_include_patterns, + relative_path=[LAUNCHER_DIR] * len(_include_patterns), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/launch.py` at line 74, The hardcoded repeat count in the relative_path assignment (relative_path=[LAUNCHER_DIR] * 12) can get out of sync with include_pattern; change it to derive the repeat length from the include_pattern (e.g. relative_path=[LAUNCHER_DIR] * len(include_pattern) or equivalent) so that relative_path size always matches include_pattern length (reference variables: relative_path, LAUNCHER_DIR, include_pattern).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/launcher/common/megatron_bridge/import/import.sh`:
- Around line 45-49: The current import check only verifies importability and
can pick up an older installed megatron.bridge; change the logic so we force
local resolution by verifying the imported module's file path is inside
BRIDGE_DIR (or else install). Replace the simple import test with a Python check
that imports megatron.bridge, inspects megatron.bridge.__file__ (or uses
importlib to get the module path), resolves it to an absolute path and exits
nonzero if that path does not start with the absolute BRIDGE_DIR; keep the pip
install -e "${BRIDGE_DIR}" step to run when the path check fails. This ensures
the script uses the submodule under BRIDGE_DIR rather than an unrelated
site-packages version.
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml`:
- Line 26: Replace the hardcoded OUTPUT_DIR value with a launcher interpolation
variable: remove "/scratchspace/megatron-bridge" and set OUTPUT_DIR to use the
shared launcher global var (e.g., <<global_vars.OUTPUT_DIR>>); update any
accompanying docs/examples to reference the global var if needed and ensure the
global_vars entry is defined in the launcher YAML set so the
megatron_bridge_import.yaml's OUTPUT_DIR uses <<global_vars.OUTPUT_DIR>> instead
of the literal path.
---
Nitpick comments:
In `@tools/launcher/launch.py`:
- Line 74: The hardcoded repeat count in the relative_path assignment
(relative_path=[LAUNCHER_DIR] * 12) can get out of sync with include_pattern;
change it to derive the repeat length from the include_pattern (e.g.
relative_path=[LAUNCHER_DIR] * len(include_pattern) or equivalent) so that
relative_path size always matches include_pattern length (reference variables:
relative_path, LAUNCHER_DIR, include_pattern).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3c1a8d8d-8cbf-4959-8dc3-ee644e8af2e1
📒 Files selected for processing (6)
.gitmodulestools/launcher/common/megatron_bridge/import/import.shtools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yamltools/launcher/launch.pytools/launcher/modules/Megatron-Bridgetools/launcher/modules/Megatron-LM
| if ! python -c "import megatron.bridge" 2>/dev/null; then | ||
| echo "[INFO] Installing megatron-bridge from ${BRIDGE_DIR}" | ||
| unset PIP_CONSTRAINT | ||
| pip install -e "${BRIDGE_DIR}" | ||
| fi |
There was a problem hiding this comment.
Force local Megatron-Bridge resolution to avoid version drift.
Line 45 only checks importability, so an older site-packages megatron.bridge can be used instead of this submodule, which can break the local conversion script/API contract.
Proposed fix
-if ! python -c "import megatron.bridge" 2>/dev/null; then
- echo "[INFO] Installing megatron-bridge from ${BRIDGE_DIR}"
- unset PIP_CONSTRAINT
- pip install -e "${BRIDGE_DIR}"
-fi
+echo "[INFO] Installing megatron-bridge from ${BRIDGE_DIR}"
+unset PIP_CONSTRAINT
+pip install -e "${BRIDGE_DIR}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ! python -c "import megatron.bridge" 2>/dev/null; then | |
| echo "[INFO] Installing megatron-bridge from ${BRIDGE_DIR}" | |
| unset PIP_CONSTRAINT | |
| pip install -e "${BRIDGE_DIR}" | |
| fi | |
| echo "[INFO] Installing megatron-bridge from ${BRIDGE_DIR}" | |
| unset PIP_CONSTRAINT | |
| pip install -e "${BRIDGE_DIR}" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/common/megatron_bridge/import/import.sh` around lines 45 - 49,
The current import check only verifies importability and can pick up an older
installed megatron.bridge; change the logic so we force local resolution by
verifying the imported module's file path is inside BRIDGE_DIR (or else
install). Replace the simple import test with a Python check that imports
megatron.bridge, inspects megatron.bridge.__file__ (or uses importlib to get the
module path), resolves it to an absolute path and exits nonzero if that path
does not start with the absolute BRIDGE_DIR; keep the pip install -e
"${BRIDGE_DIR}" step to run when the path check fails. This ensures the script
uses the submodule under BRIDGE_DIR rather than an unrelated site-packages
version.
| script: common/megatron_bridge/import/import.sh | ||
| environment: | ||
| - HF_MODEL_ID: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | ||
| - OUTPUT_DIR: /scratchspace/megatron-bridge |
There was a problem hiding this comment.
Replace hardcoded OUTPUT_DIR with launcher interpolation.
Hardcoding /scratchspace/megatron-bridge makes the example non-portable across clusters/users; prefer a shared variable via launcher interpolation.
As per coding guidelines, tools/launcher/**/*.yaml should use <<global_vars.X>> interpolation for shared values.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml`
at line 26, Replace the hardcoded OUTPUT_DIR value with a launcher interpolation
variable: remove "/scratchspace/megatron-bridge" and set OUTPUT_DIR to use the
shared launcher global var (e.g., <<global_vars.OUTPUT_DIR>>); update any
accompanying docs/examples to reference the global var if needed and ensure the
global_vars entry is defined in the launcher YAML set so the
megatron_bridge_import.yaml's OUTPUT_DIR uses <<global_vars.OUTPUT_DIR>> instead
of the literal path.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1516 +/- ##
==========================================
+ Coverage 76.95% 76.98% +0.03%
==========================================
Files 474 474
Lines 51503 51503
==========================================
+ Hits 39632 39649 +17
+ Misses 11871 11854 -17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Why does this script need to be so complicated and why can we just run /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py directly? Can we assume this script is run in nemo:26.02 or later container hence all required dependencies already present?
| environment: | ||
| - HF_MODEL_ID: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | ||
| - OUTPUT_DIR: /scratchspace/megatron-bridge | ||
| - EXTRA_PIP_DEPS: "mamba-ssm causal-conv1d" |
There was a problem hiding this comment.
if we use nemo container, then we wont need these extra deps
What does this PR do?
Type of change: New feature
Add Megatron-Bridge import example in launcher for Nemotron Super V3
Usage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
Release Notes