Skip to content

Commit 234a770

Browse files
authored
[docs] feat: update model support skill with encapsulation guidance (#3313)
Signed-off-by: Chen Cui <chcui@nvidia.com>
1 parent a59fb46 commit 234a770

1 file changed

Lines changed: 134 additions & 15 deletions

File tree

skills/adding-model-support/SKILL.md

Lines changed: 134 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,9 @@ quantized models typically have `std ≈ 13` before dequantization vs `std ≈ 0
8686
```
8787
src/megatron/bridge/models/<model>/
8888
├── __init__.py
89-
├── <model>_bridge.py # Config + weight mappings
90-
└── <model>_provider.py # (optional) Only if custom provide() or recipe presets needed
89+
├── <model>_bridge.py # Config + weight mappings (no provider file needed)
90+
└── modeling_<model>/ # (optional) Custom nn.Module implementations if needed
91+
└── ...
9192
```
9293

9394
**VLM** — Reference: Qwen3.5-VL (`src/megatron/bridge/models/qwen_vl/`)
@@ -96,8 +97,8 @@ src/megatron/bridge/models/<model>/
9697
src/megatron/bridge/models/<model>/
9798
├── __init__.py
9899
├── <model>_bridge.py # Config + weight mappings
99-
├── <model>_provider.py # Megatron config + model construction
100-
└── modelling_<model>/ # If using Megatron vision encoder
100+
├── <model>_provider.py # Only for VLMs that need custom provide()
101+
└── modeling_<model>/ # If using Megatron vision encoder
101102
├── __init__.py
102103
└── model.py # Combines vision + language
103104
```
@@ -108,23 +109,31 @@ OR with HF vision encoder (Reference: Gemma3-VL):
108109
src/megatron/bridge/models/<model>/
109110
├── __init__.py
110111
├── <model>_bridge.py
111-
├── <model>_provider.py
112+
├── <model>_provider.py # Only for VLMs that need custom provide()
112113
└── modeling_<model>.py # HF vision + Megatron language wrapper
113114
```
114115

116+
**Model-specific modeling code:** If the model requires custom `nn.Module` implementations
117+
(e.g. a custom RoPE variant, non-standard transformer config, custom thinker/talker
118+
architecture), place them in a `modeling_<model>/` directory or a single `modeling_<model>.py`
119+
file inside the model family folder. Use a directory when there are multiple files (model,
120+
transformer config, custom ops); use a single file when one module suffices. Never put
121+
model-specific modeling code in shared directories or as loose files in the bridge family
122+
directory — keep them namespaced under the `modeling_<model>` prefix.
123+
115124
### Implementation order
116125

117126
**LLM:**
118-
1. **Bridge** — Register bridge, implement `provider_bridge()` and `mapping_registry()`.
127+
1. **Bridge only** — Register bridge, implement `provider_bridge()` and `mapping_registry()`.
119128
The bridge calls `super().provider_bridge()` to get a `GPTModelProvider` from `CONFIG_MAPPING`,
120-
then sets model-specific attributes on it. No separate provider file needed for most models.
121-
2. **Provider** (optional) — Only if the model needs extra dataclass fields for serialization,
122-
custom `provide()` logic, or predefined size variants for recipes.
129+
then sets model-specific attributes on it. **Do not create a provider file** — the stock
130+
provider returned by `super().provider_bridge()` is usually sufficient for LLMs
131+
(e.g., `GPTModelProvider`, or another base provider selected via `PROVIDER_CLASS`).
123132

124133
**VLM:**
125-
1. **Provider**VLMs always need a custom provider subclass with a custom `provide()` that
126-
instantiates the combined vision+language model.
127-
2. **Bridge** — Register bridge with `provider=MyVLModelProvider`. The bridge manually calls
134+
1. **Bridge**Register bridge, implement config and weight mappings.
135+
2. **Provider** (when needed) — Only VLMs that require a custom `provide()` to instantiate a
136+
combined vision+language model need a provider subclass. The bridge manually calls
128137
`hf_config_to_provider_kwargs(text_config)` and instantiates the custom provider.
129138
3. **Model class** — Combine vision encoder + language decoder.
130139

@@ -147,6 +156,114 @@ When reading HF config for VLMs, check whether each field is in:
147156
- `hf_config.text_config` — e.g. `num_hidden_layers`, `hidden_size`, etc.
148157
- `hf_config.vision_config` — e.g. vision encoder dimensions
149158

159+
### Encapsulating model-specific layers
160+
161+
When a new model introduces custom or non-standard layers (novel attention variants, custom
162+
normalization, fused expert layouts, MTP heads, etc.), **keep all model-specific logic inside
163+
the model family directory**. Do not modify shared files in `src/megatron/bridge/models/conversion/`
164+
(e.g. `param_mapping.py`, `model_bridge.py`, `quant_mapping.py`) unless the change is genuinely
165+
reusable across multiple model families.
166+
167+
**Principle:** The bridge and provider files for a model family are your primary extension surface.
168+
Shared conversion infrastructure provides hooks and base classes — subclass them locally rather
169+
than adding conditionals to shared code.
170+
171+
#### Strategy 1: Create a local mapping subclass
172+
173+
If the model has a layer whose weight layout doesn't match any existing mapping class, create a
174+
private mapping class in the bridge file or a `<model>_mappings.py` file in the family directory.
175+
176+
Example — GLM's fused expert down-projection disables grouped-export transpose:
177+
178+
```python
179+
# src/megatron/bridge/models/glm/glm_moe_mappings.py
180+
class GLMExpertDownProjMapping(FusedExpertMapping):
181+
def __init__(self, megatron_param, hf_param, permute_dims=None):
182+
super().__init__(megatron_param, hf_param, permute_dims, transpose_on_export=False)
183+
```
184+
185+
Example — Nemotron-H's MTP layers flatten indices during resolve:
186+
187+
```python
188+
# Inside nemotron_h_bridge.py (private to the module)
189+
class _MTPFlatteningMapping(MegatronParamMapping):
190+
def resolve(self, captures):
191+
return AutoMapping(self._flatten(captures), ...)
192+
```
193+
194+
Example — MiniMax-M2's non-standard QK norm layout:
195+
196+
```python
197+
# Inside minimax_m2_bridge.py (private to the module)
198+
class _FullDimQKNormMapping(MegatronParamMapping):
199+
def hf_to_megatron(self, hf_weights):
200+
# Custom scatter logic for full-dim QK norm
201+
...
202+
def megatron_to_hf(self, megatron_weights):
203+
# Custom gather logic
204+
...
205+
```
206+
207+
#### Strategy 2: Override bridge hooks
208+
209+
`MegatronModelBridge` provides several override hooks — use them instead of modifying the base class:
210+
211+
| Hook | When to use |
212+
|------|-------------|
213+
| `mapping_registry()` | Define all weight name mappings (abstract, always overridden) |
214+
| `provider_bridge()` | Configure the provider with model-specific flags (call `super()` then setattr) |
215+
| `maybe_modify_loaded_hf_weight()` | Dequantize, rename, or reshape HF weights before conversion |
216+
| `maybe_modify_converted_hf_weight()` | Synthesize extra HF keys on export (e.g. `inv_freq`) |
217+
| `megatron_to_hf_config()` | Build HF `config.json` for export |
218+
| `hf_config_to_provider_kwargs()` | Override CONFIG_MAPPING behavior for specific fields |
219+
220+
**Accessing HF config in `mapping_registry()`:** The bridge instance has `self.hf_config`
221+
available during conversion — it is set automatically by the dispatch system before
222+
`mapping_registry()` is called. Use it when your mapping registry needs config-dependent
223+
logic (e.g. dynamic MTP layer count, number of experts):
224+
225+
```python
226+
def mapping_registry(self) -> MegatronMappingRegistry:
227+
hf_config = getattr(self, "hf_config", None)
228+
num_mtp_layers = getattr(hf_config, "num_nextn_predict_layers", 0) if hf_config else 0
229+
...
230+
```
231+
232+
Do **not** override `build_conversion_tasks()` to stash `self._hf_config` — that pattern is
233+
deprecated.
234+
235+
#### Strategy 3: Custom provider subclass (VLMs only)
236+
237+
Most models do **not** need a provider file — the stock provider (e.g., `GPTModelProvider`, or
238+
another base selected via `PROVIDER_CLASS`) is usually sufficient for LLMs. Only create a provider subclass when a VLM needs custom `provide()` logic to instantiate
239+
a combined vision+language model:
240+
241+
```python
242+
# src/megatron/bridge/models/<model>/<model>_provider.py
243+
class MyVLModelProvider(GPTModelProvider):
244+
image_token_id: int = 0
245+
246+
def provide(self, ...):
247+
# Custom model construction combining vision encoder + language decoder
248+
...
249+
```
250+
251+
The bridge then references it via `PROVIDER_CLASS = MyVLModelProvider` or instantiates it directly
252+
in `provider_bridge()`.
253+
254+
#### When shared file changes ARE justified
255+
256+
Modify `param_mapping.py` or `model_bridge.py` only when the pattern is **reusable by 2+ model
257+
families**. Examples of justified shared changes:
258+
259+
- `FusedExpertMapping` / `FusedGatedExpertMapping` — used by GLM, DeepSeek, OLMoE, etc.
260+
- `RMSNorm2ZeroCenteredRMSNormMapping` — used by Gemma, Nemotron, etc.
261+
- New `CONFIG_MAPPING` entries — when a standard HF config key maps to a standard provider attribute
262+
263+
If you're tempted to add a model-specific `if model_type == "..."` branch in shared code, or
264+
pattern-matching on specific weight names in shared conversion logic, that's a signal to use a
265+
local subclass or hook override instead.
266+
150267
### Update FLOPs calculator for new architectural blocks
151268

152269
If the model introduces a new computational block that differs from standard attention or MLP
@@ -318,7 +435,9 @@ User wants to add a model
318435
│ ├─ Has Megatron vision encoder? ──→ Megatron encoder (Qwen3.5 pattern)
319436
│ └─ No Megatron encoder ──→ HF encoder (Gemma3 pattern)
320437
321-
└─ No vision config ──→ LLM path (Qwen2 / GPT-OSS pattern)
322-
├─ Standard GPT-style? ──→ Bridge only (no provider subclass needed)
323-
└─ Custom components? ──→ Bridge + custom provider or modeling module
438+
└─ No vision config ──→ LLM path (bridge only, no provider file)
439+
├─ Standard GPT-style? ──→ Bridge with stock mappings
440+
└─ Custom layers? ──→ Bridge + local mapping subclasses / hook overrides
441+
├─ Custom weight layout? ──→ Local mapping subclass in family dir
442+
└─ Custom import/export? ──→ Override bridge hooks (maybe_modify_*)
324443
```

0 commit comments

Comments
 (0)