Skip to content

Commit 9b09beb

Browse files
committed
chore(deps): upgrade megatron-core, megatron-bridge, sglang, vllm
Upgrade focused runtime dependencies: - megatron-core 0.16.0 → 0.17.0 - megatron-bridge 0.3.0 → 0.4.0 - sglang 0.5.9 → 0.5.10.post1 - vllm 0.17.0 → 0.19.1 - transformers 4.57.1 → 5.3.0 (sglang) / 5.5.4 (vllm) Key changes: - Add python_version >= 3.12 markers for megatron-core and megatron-bridge - Add flash-attn-4 prerelease override (required by sglang 0.5.10.post1) - Override transformers in vllm variant (megatron-bridge 5.0-5.3 vs vllm excludes 5.0-5.5) - Update Dockerfile base image to sglang v0.5.10.post1 - Add transformers to ESCAPABLE_PACKAGES for legitimate variant divergence - Add upgrade-deps skill with per-package API checklists
1 parent f3d7e50 commit 9b09beb

15 files changed

Lines changed: 4516 additions & 1402 deletions

.agents/skills/upgrade-deps/SKILL.md

Lines changed: 492 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
<!-- Template for per-package API checklists. Copy this file and fill in for each package. -->
2+
3+
______________________________________________________________________
4+
5+
package: <pip-package-name> github: \<org/repo> branch_template: "v${VERSION}"
6+
upstream_paths:
7+
8+
- path/to/relevant/file.py
9+
- path/to/relevant/module/
10+
11+
______________________________________________________________________
12+
13+
## Affected Files
14+
15+
### Primary (engine layer — most likely to break)
16+
17+
| File | Imports / Usage |
18+
| ----------------------- | --------------------------------- |
19+
| `areal/path/to/file.py` | `module.function`, `module.Class` |
20+
21+
### Secondary (model / infra layer)
22+
23+
| File | Imports / Usage |
24+
| ----------------------- | ----------------- |
25+
| `areal/path/to/file.py` | `module.function` |
26+
27+
### Tertiary (tests, config)
28+
29+
| File | Imports / Usage |
30+
| -------------------- | ----------------- |
31+
| `tests/test_file.py` | `module.function` |
32+
33+
______________________________________________________________________
34+
35+
## API Usage Catalog
36+
37+
For each function/class below, verify the call signature against the upstream source at
38+
the target version. Focus on: **missing new required parameters**, **removed old
39+
parameters**, **renamed parameters**, **changed return types**, **changed method
40+
signatures on returned objects**, and **moved/renamed modules**.
41+
42+
### 1. `module.submodule.FunctionOrClass`
43+
44+
**Source:** `upstream-repo/path/to/file.py`
45+
46+
Called in `areal/path/to/file.py`:
47+
48+
```python
49+
# Paste the actual call site code here
50+
FunctionOrClass(param1=..., param2=...)
51+
```
52+
53+
**Check:** \[Describe what to verify — new params? renamed? return type change?\]
54+
55+
### 2. ...
56+
57+
______________________________________________________________________
58+
59+
## Version-Guarded Code
60+
61+
<!-- List any AReaL code that has version-specific behavior for this package -->
62+
63+
- `areal/path/to/file.py:LINE` — description of version guard and threshold
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
package: megatron-bridge
3+
github: NVIDIA-NeMo/Megatron-Bridge
4+
branch_template: v${VERSION}
5+
upstream_paths:
6+
- megatron/bridge/__init__.py
7+
- megatron/bridge/auto_bridge.py
8+
- megatron/bridge/peft/lora.py
9+
---
10+
11+
## Affected Files
12+
13+
### Primary (engine layer — most likely to break)
14+
15+
| File | Imports / Usage |
16+
| ---------------------------------------------- | -------------------------------------------------------------- |
17+
| `areal/engine/megatron_engine.py` | `megatron.bridge.AutoBridge`, `megatron.bridge.peft.lora.LoRA` |
18+
| `areal/engine/megatron_utils/megatron_lora.py` | `megatron.bridge.AutoBridge` (inside function, monkey-patched) |
19+
20+
### Secondary (model / infra layer)
21+
22+
_None._
23+
24+
### Tertiary (tests, config)
25+
26+
| File | Imports / Usage |
27+
| -------------------------------- | --------------------------------------------------------------------------------- |
28+
| `areal/tools/validation_base.py` | `"megatron-bridge"``"megatron.bridge"` in `PACKAGE_IMPORT_MAP` (metadata only) |
29+
30+
______________________________________________________________________
31+
32+
## API Usage Catalog
33+
34+
For each function/class below, verify the call signature against the upstream source at
35+
the target version. Focus on: **missing new required parameters**, **removed old
36+
parameters**, **renamed parameters**, **changed return types**, **changed method
37+
signatures on returned objects**, and **moved/renamed modules**.
38+
39+
### 1. `megatron.bridge.AutoBridge.from_hf_pretrained`
40+
41+
**Source:** `megatron/bridge/auto_bridge.py`
42+
43+
Called in `areal/engine/megatron_engine.py` (line 430):
44+
45+
```python
46+
self.bridge = MegatronBridgeAutoBridge.from_hf_pretrained(
47+
self.config.path,
48+
trust_remote_code=True,
49+
dtype=self.config.dtype,
50+
)
51+
```
52+
53+
**Check:** Confirm `trust_remote_code` and `dtype` are still accepted keyword arguments.
54+
Verify the first positional arg is still the model path. Verify the method still returns
55+
a bridge object that exposes `save_hf_pretrained`, `load_hf_weights`, and (depending on
56+
version) `save_hf_adapter`. Check for any new required parameters.
57+
58+
______________________________________________________________________
59+
60+
### 2. `megatron.bridge.AutoBridge.save_hf_pretrained`
61+
62+
**Source:** `megatron/bridge/auto_bridge.py`
63+
64+
Called in `areal/engine/megatron_engine.py` (line 1561):
65+
66+
```python
67+
bridge.save_hf_pretrained(model, path, source_path=base_model_path)
68+
```
69+
70+
**Check:** Confirm `source_path` is still a valid keyword argument. Verify positional
71+
order of `model` and `path` hasn't changed. Check return type (currently void/`None`).
72+
73+
______________________________________________________________________
74+
75+
### 3. `megatron.bridge.AutoBridge.load_hf_weights`
76+
77+
**Source:** `megatron/bridge/auto_bridge.py`
78+
79+
Called in `areal/engine/megatron_engine.py` (line 1595):
80+
81+
```python
82+
bridge.load_hf_weights(model, hf_path=path)
83+
```
84+
85+
**Check:** Confirm `hf_path` is still the correct keyword name. Verify `model` is still
86+
the first positional argument. Check for newly added required arguments.
87+
88+
______________________________________________________________________
89+
90+
### 4. `megatron.bridge.AutoBridge.save_hf_adapter`
91+
92+
**Source:** `megatron/bridge/auto_bridge.py`
93+
94+
Called in `areal/engine/megatron_engine.py` (lines 1554-1559) via the monkey-patched
95+
method on the bridge instance:
96+
97+
```python
98+
self.bridge.save_hf_adapter(
99+
self.model,
100+
path=path,
101+
peft_config=self.bridge_lora,
102+
base_model_name_or_path=base_model_path or self.config.path,
103+
)
104+
```
105+
106+
**Check:** If the new version adds this method natively, confirm its signature matches
107+
the monkey-patch in `areal/engine/megatron_utils/megatron_lora.py` (line 189). The
108+
monkey-patched signature is:
109+
110+
```python
111+
def save_hf_adapter(
112+
self, model, path, peft_config, base_model_name_or_path=None, show_progress=True
113+
)
114+
```
115+
116+
Any mismatch in parameter names or order will silently break adapter saving. Note that
117+
`peft_config` receives a `megatron.bridge.peft.lora.LoRA` instance (not a dict). See
118+
also the [Version-Guarded Code](#version-guarded-code) section.
119+
120+
______________________________________________________________________
121+
122+
### 5. `megatron.bridge.AutoBridge.export_adapter_weights`
123+
124+
**Source:** `megatron/bridge/auto_bridge.py`
125+
126+
Called inside the monkey-patched `save_hf_adapter` in
127+
`areal/engine/megatron_utils/megatron_lora.py` (lines 237-240):
128+
129+
```python
130+
for name, tensor in self.export_adapter_weights(
131+
model, cpu=True, show_progress=show_progress
132+
):
133+
adapter_state[f"base_model.model.{name}"] = tensor.clone().float()
134+
```
135+
136+
**Check:** Confirm `cpu` and `show_progress` are still accepted. Verify the method
137+
yields `(name, tensor)` tuples (iterable of pairs). The names are module FQNs without
138+
the `base_model.model.` prefix — that prefix is added by the caller. This is a native
139+
bridge method used inside the monkey-patch — if its return type or signature changes,
140+
the patch breaks.
141+
142+
______________________________________________________________________
143+
144+
### 6. `megatron.bridge.peft.lora.LoRA`
145+
146+
**Source:** `megatron/bridge/peft/lora.py`
147+
148+
Called in `areal/engine/megatron_engine.py` (line 233):
149+
150+
```python
151+
bridge_lora = MegatronBridgeLoRA(
152+
target_modules=target_modules,
153+
dim=lora_rank,
154+
alpha=lora_alpha,
155+
dropout=0.0,
156+
)
157+
```
158+
159+
**Check:** Confirm `dim` is still the rank parameter (not renamed to `r` or `rank`).
160+
Verify `alpha` and `dropout` are still accepted. Check the `target_modules` accepted
161+
type (list of strings vs. regex).
162+
163+
______________________________________________________________________
164+
165+
### 7. `LoRA.__call__` (apply to model)
166+
167+
**Source:** `megatron/bridge/peft/lora.py`
168+
169+
Called in `areal/engine/megatron_engine.py` (lines 239-240):
170+
171+
```python
172+
self.model = _MegatronModelList(self.bridge_lora(self.model, training=True))
173+
self.bridge_lora.set_params_to_save(self.model)
174+
```
175+
176+
**Check:** Confirm `LoRA` instances are still callable with `(model, training=...)`.
177+
Verify the return type — it returns a modified model (or list of model chunks), which is
178+
then wrapped in `_MegatronModelList`. Confirm `set_params_to_save(model)` still exists
179+
and marks LoRA parameters for checkpoint saving. Check if `training=True` is still the
180+
correct keyword to enable grad on LoRA parameters.
181+
182+
______________________________________________________________________
183+
184+
## Version-Guarded Code
185+
186+
- `areal/engine/megatron_utils/megatron_lora.py:185`
187+
`hasattr(AutoBridge, "save_hf_adapter")` guard. The monkey-patch at line 291 is only
188+
applied when `save_hf_adapter` does not exist on `AutoBridge`. The module-level call
189+
at line 298 (`_monkey_patch_save_hf_adapter()`) runs at import time, so the guard is
190+
evaluated once on first import. If upgrading to a version that ships `save_hf_adapter`
191+
natively, the guard will skip patching, but the native signature must then match what
192+
AReaL's call site expects (see entry 4 above). Once confirmed compatible, the entire
193+
function `_monkey_patch_save_hf_adapter()` and its line-298 invocation can be removed.

0 commit comments

Comments
 (0)