|
| 1 | +--- |
| 2 | +package: megatron-bridge |
| 3 | +github: NVIDIA-NeMo/Megatron-Bridge |
| 4 | +branch_template: v${VERSION} |
| 5 | +upstream_paths: |
| 6 | + - megatron/bridge/__init__.py |
| 7 | + - megatron/bridge/auto_bridge.py |
| 8 | + - megatron/bridge/peft/lora.py |
| 9 | +--- |
| 10 | + |
| 11 | +## Affected Files |
| 12 | + |
| 13 | +### Primary (engine layer — most likely to break) |
| 14 | + |
| 15 | +| File | Imports / Usage | |
| 16 | +| ---------------------------------------------- | -------------------------------------------------------------- | |
| 17 | +| `areal/engine/megatron_engine.py` | `megatron.bridge.AutoBridge`, `megatron.bridge.peft.lora.LoRA` | |
| 18 | +| `areal/engine/megatron_utils/megatron_lora.py` | `megatron.bridge.AutoBridge` (inside function, monkey-patched) | |
| 19 | + |
| 20 | +### Secondary (model / infra layer) |
| 21 | + |
| 22 | +_None._ |
| 23 | + |
| 24 | +### Tertiary (tests, config) |
| 25 | + |
| 26 | +| File | Imports / Usage | |
| 27 | +| -------------------------------- | --------------------------------------------------------------------------------- | |
| 28 | +| `areal/tools/validation_base.py` | `"megatron-bridge"` → `"megatron.bridge"` in `PACKAGE_IMPORT_MAP` (metadata only) | |
| 29 | + |
| 30 | +______________________________________________________________________ |
| 31 | + |
| 32 | +## API Usage Catalog |
| 33 | + |
| 34 | +For each function/class below, verify the call signature against the upstream source at |
| 35 | +the target version. Focus on: **missing new required parameters**, **removed old |
| 36 | +parameters**, **renamed parameters**, **changed return types**, **changed method |
| 37 | +signatures on returned objects**, and **moved/renamed modules**. |
| 38 | + |
| 39 | +### 1. `megatron.bridge.AutoBridge.from_hf_pretrained` |
| 40 | + |
| 41 | +**Source:** `megatron/bridge/auto_bridge.py` |
| 42 | + |
| 43 | +Called in `areal/engine/megatron_engine.py` (line 430): |
| 44 | + |
| 45 | +```python |
| 46 | +self.bridge = MegatronBridgeAutoBridge.from_hf_pretrained( |
| 47 | + self.config.path, |
| 48 | + trust_remote_code=True, |
| 49 | + dtype=self.config.dtype, |
| 50 | +) |
| 51 | +``` |
| 52 | + |
| 53 | +**Check:** Confirm `trust_remote_code` and `dtype` are still accepted keyword arguments. |
| 54 | +Verify the first positional arg is still the model path. Verify the method still returns |
| 55 | +a bridge object that exposes `save_hf_pretrained`, `load_hf_weights`, and (depending on |
| 56 | +version) `save_hf_adapter`. Check for any new required parameters. |
| 57 | + |
| 58 | +______________________________________________________________________ |
| 59 | + |
| 60 | +### 2. `megatron.bridge.AutoBridge.save_hf_pretrained` |
| 61 | + |
| 62 | +**Source:** `megatron/bridge/auto_bridge.py` |
| 63 | + |
| 64 | +Called in `areal/engine/megatron_engine.py` (line 1561): |
| 65 | + |
| 66 | +```python |
| 67 | +bridge.save_hf_pretrained(model, path, source_path=base_model_path) |
| 68 | +``` |
| 69 | + |
| 70 | +**Check:** Confirm `source_path` is still a valid keyword argument. Verify positional |
| 71 | +order of `model` and `path` hasn't changed. Check return type (currently void/`None`). |
| 72 | + |
| 73 | +______________________________________________________________________ |
| 74 | + |
| 75 | +### 3. `megatron.bridge.AutoBridge.load_hf_weights` |
| 76 | + |
| 77 | +**Source:** `megatron/bridge/auto_bridge.py` |
| 78 | + |
| 79 | +Called in `areal/engine/megatron_engine.py` (line 1595): |
| 80 | + |
| 81 | +```python |
| 82 | +bridge.load_hf_weights(model, hf_path=path) |
| 83 | +``` |
| 84 | + |
| 85 | +**Check:** Confirm `hf_path` is still the correct keyword name. Verify `model` is still |
| 86 | +the first positional argument. Check for newly added required arguments. |
| 87 | + |
| 88 | +______________________________________________________________________ |
| 89 | + |
| 90 | +### 4. `megatron.bridge.AutoBridge.save_hf_adapter` |
| 91 | + |
| 92 | +**Source:** `megatron/bridge/auto_bridge.py` |
| 93 | + |
| 94 | +Called in `areal/engine/megatron_engine.py` (lines 1554-1559) via the monkey-patched |
| 95 | +method on the bridge instance: |
| 96 | + |
| 97 | +```python |
| 98 | +self.bridge.save_hf_adapter( |
| 99 | + self.model, |
| 100 | + path=path, |
| 101 | + peft_config=self.bridge_lora, |
| 102 | + base_model_name_or_path=base_model_path or self.config.path, |
| 103 | +) |
| 104 | +``` |
| 105 | + |
| 106 | +**Check:** If the new version adds this method natively, confirm its signature matches |
| 107 | +the monkey-patch in `areal/engine/megatron_utils/megatron_lora.py` (line 189). The |
| 108 | +monkey-patched signature is: |
| 109 | + |
| 110 | +```python |
| 111 | +def save_hf_adapter( |
| 112 | + self, model, path, peft_config, base_model_name_or_path=None, show_progress=True |
| 113 | +) |
| 114 | +``` |
| 115 | + |
| 116 | +Any mismatch in parameter names or order will silently break adapter saving. Note that |
| 117 | +`peft_config` receives a `megatron.bridge.peft.lora.LoRA` instance (not a dict). See |
| 118 | +also the [Version-Guarded Code](#version-guarded-code) section. |
| 119 | + |
| 120 | +______________________________________________________________________ |
| 121 | + |
| 122 | +### 5. `megatron.bridge.AutoBridge.export_adapter_weights` |
| 123 | + |
| 124 | +**Source:** `megatron/bridge/auto_bridge.py` |
| 125 | + |
| 126 | +Called inside the monkey-patched `save_hf_adapter` in |
| 127 | +`areal/engine/megatron_utils/megatron_lora.py` (lines 237-240): |
| 128 | + |
| 129 | +```python |
| 130 | +for name, tensor in self.export_adapter_weights( |
| 131 | + model, cpu=True, show_progress=show_progress |
| 132 | +): |
| 133 | + adapter_state[f"base_model.model.{name}"] = tensor.clone().float() |
| 134 | +``` |
| 135 | + |
| 136 | +**Check:** Confirm `cpu` and `show_progress` are still accepted. Verify the method |
| 137 | +yields `(name, tensor)` tuples (iterable of pairs). The names are module FQNs without |
| 138 | +the `base_model.model.` prefix — that prefix is added by the caller. This is a native |
| 139 | +bridge method used inside the monkey-patch — if its return type or signature changes, |
| 140 | +the patch breaks. |
| 141 | + |
| 142 | +______________________________________________________________________ |
| 143 | + |
| 144 | +### 6. `megatron.bridge.peft.lora.LoRA` |
| 145 | + |
| 146 | +**Source:** `megatron/bridge/peft/lora.py` |
| 147 | + |
| 148 | +Called in `areal/engine/megatron_engine.py` (line 233): |
| 149 | + |
| 150 | +```python |
| 151 | +bridge_lora = MegatronBridgeLoRA( |
| 152 | + target_modules=target_modules, |
| 153 | + dim=lora_rank, |
| 154 | + alpha=lora_alpha, |
| 155 | + dropout=0.0, |
| 156 | +) |
| 157 | +``` |
| 158 | + |
| 159 | +**Check:** Confirm `dim` is still the rank parameter (not renamed to `r` or `rank`). |
| 160 | +Verify `alpha` and `dropout` are still accepted. Check the `target_modules` accepted |
| 161 | +type (list of strings vs. regex). |
| 162 | + |
| 163 | +______________________________________________________________________ |
| 164 | + |
| 165 | +### 7. `LoRA.__call__` (apply to model) |
| 166 | + |
| 167 | +**Source:** `megatron/bridge/peft/lora.py` |
| 168 | + |
| 169 | +Called in `areal/engine/megatron_engine.py` (lines 239-240): |
| 170 | + |
| 171 | +```python |
| 172 | +self.model = _MegatronModelList(self.bridge_lora(self.model, training=True)) |
| 173 | +self.bridge_lora.set_params_to_save(self.model) |
| 174 | +``` |
| 175 | + |
| 176 | +**Check:** Confirm `LoRA` instances are still callable with `(model, training=...)`. |
| 177 | +Verify the return type — it returns a modified model (or list of model chunks), which is |
| 178 | +then wrapped in `_MegatronModelList`. Confirm `set_params_to_save(model)` still exists |
| 179 | +and marks LoRA parameters for checkpoint saving. Check if `training=True` is still the |
| 180 | +correct keyword to enable grad on LoRA parameters. |
| 181 | + |
| 182 | +______________________________________________________________________ |
| 183 | + |
| 184 | +## Version-Guarded Code |
| 185 | + |
| 186 | +- `areal/engine/megatron_utils/megatron_lora.py:185` — |
| 187 | + `hasattr(AutoBridge, "save_hf_adapter")` guard. The monkey-patch at line 291 is only |
| 188 | + applied when `save_hf_adapter` does not exist on `AutoBridge`. The module-level call |
| 189 | + at line 298 (`_monkey_patch_save_hf_adapter()`) runs at import time, so the guard is |
| 190 | + evaluated once on first import. If upgrading to a version that ships `save_hf_adapter` |
| 191 | + natively, the guard will skip patching, but the native signature must then match what |
| 192 | + AReaL's call site expects (see entry 4 above). Once confirmed compatible, the entire |
| 193 | + function `_monkey_patch_save_hf_adapter()` and its line-298 invocation can be removed. |
0 commit comments