@@ -86,8 +86,9 @@ quantized models typically have `std ≈ 13` before dequantization vs `std ≈ 0
8686```
8787src/megatron/bridge/models/<model>/
8888├── __init__.py
89- ├── <model>_bridge.py # Config + weight mappings
90- └── <model>_provider.py # (optional) Only if custom provide() or recipe presets needed
89+ ├── <model>_bridge.py # Config + weight mappings (no provider file needed)
90+ └── modeling_<model>/ # (optional) Custom nn.Module implementations if needed
91+ └── ...
9192```
9293
9394** VLM** — Reference: Qwen3.5-VL (` src/megatron/bridge/models/qwen_vl/ ` )
@@ -96,8 +97,8 @@ src/megatron/bridge/models/<model>/
9697src/megatron/bridge/models/<model>/
9798├── __init__.py
9899├── <model>_bridge.py # Config + weight mappings
99- ├── <model>_provider.py # Megatron config + model construction
100- └── modelling_ <model>/ # If using Megatron vision encoder
100+ ├── <model>_provider.py # Only for VLMs that need custom provide()
101+ └── modeling_ <model>/ # If using Megatron vision encoder
101102 ├── __init__.py
102103 └── model.py # Combines vision + language
103104```
@@ -108,23 +109,31 @@ OR with HF vision encoder (Reference: Gemma3-VL):
108109src/megatron/bridge/models/<model>/
109110├── __init__.py
110111├── <model>_bridge.py
111- ├── <model>_provider.py
112+ ├── <model>_provider.py # Only for VLMs that need custom provide()
112113└── modeling_<model>.py # HF vision + Megatron language wrapper
113114```
114115
116+ ** Model-specific modeling code:** If the model requires custom ` nn.Module ` implementations
117+ (e.g. a custom RoPE variant, non-standard transformer config, custom thinker/talker
118+ architecture), place them in a ` modeling_<model>/ ` directory or a single ` modeling_<model>.py `
119+ file inside the model family folder. Use a directory when there are multiple files (model,
120+ transformer config, custom ops); use a single file when one module suffices. Never put
121+ model-specific modeling code in shared directories or as loose files in the bridge family
122+ directory — keep them namespaced under the ` modeling_<model> ` prefix.
123+
115124### Implementation order
116125
117126** LLM:**
118- 1 . ** Bridge** — Register bridge, implement ` provider_bridge() ` and ` mapping_registry() ` .
127+ 1 . ** Bridge only ** — Register bridge, implement ` provider_bridge() ` and ` mapping_registry() ` .
119128 The bridge calls ` super().provider_bridge() ` to get a ` GPTModelProvider ` from ` CONFIG_MAPPING ` ,
120- then sets model-specific attributes on it. No separate provider file needed for most models.
121- 2 . ** Provider ** (optional) — Only if the model needs extra dataclass fields for serialization,
122- custom ` provide() ` logic , or predefined size variants for recipes .
129+ then sets model-specific attributes on it. ** Do not create a provider file** — the stock
130+ provider returned by ` super().provider_bridge() ` is usually sufficient for LLMs
131+ (e.g., ` GPTModelProvider ` , or another base provider selected via ` PROVIDER_CLASS ` ) .
123132
124133** VLM:**
125- 1 . ** Provider ** — VLMs always need a custom provider subclass with a custom ` provide() ` that
126- instantiates the combined vision+language model.
127- 2 . ** Bridge ** — Register bridge with ` provider=MyVLModelProvider ` . The bridge manually calls
134+ 1 . ** Bridge ** — Register bridge, implement config and weight mappings.
135+ 2 . ** Provider ** (when needed) — Only VLMs that require a custom ` provide() ` to instantiate a
136+ combined vision+language model need a provider subclass . The bridge manually calls
128137 ` hf_config_to_provider_kwargs(text_config) ` and instantiates the custom provider.
1291383 . ** Model class** — Combine vision encoder + language decoder.
130139
@@ -147,6 +156,114 @@ When reading HF config for VLMs, check whether each field is in:
147156- ` hf_config.text_config ` — e.g. ` num_hidden_layers ` , ` hidden_size ` , etc.
148157- ` hf_config.vision_config ` — e.g. vision encoder dimensions
149158
159+ ### Encapsulating model-specific layers
160+
161+ When a new model introduces custom or non-standard layers (novel attention variants, custom
162+ normalization, fused expert layouts, MTP heads, etc.), ** keep all model-specific logic inside
163+ the model family directory** . Do not modify shared files in ` src/megatron/bridge/models/conversion/ `
164+ (e.g. ` param_mapping.py ` , ` model_bridge.py ` , ` quant_mapping.py ` ) unless the change is genuinely
165+ reusable across multiple model families.
166+
167+ ** Principle:** The bridge and provider files for a model family are your primary extension surface.
168+ Shared conversion infrastructure provides hooks and base classes — subclass them locally rather
169+ than adding conditionals to shared code.
170+
171+ #### Strategy 1: Create a local mapping subclass
172+
173+ If the model has a layer whose weight layout doesn't match any existing mapping class, create a
174+ private mapping class in the bridge file or a ` <model>_mappings.py ` file in the family directory.
175+
176+ Example — GLM's fused expert down-projection disables grouped-export transpose:
177+
178+ ``` python
179+ # src/megatron/bridge/models/glm/glm_moe_mappings.py
180+ class GLMExpertDownProjMapping (FusedExpertMapping ):
181+ def __init__ (self , megatron_param , hf_param , permute_dims = None ):
182+ super ().__init__ (megatron_param, hf_param, permute_dims, transpose_on_export = False )
183+ ```
184+
185+ Example — Nemotron-H's MTP layers flatten indices during resolve:
186+
187+ ``` python
188+ # Inside nemotron_h_bridge.py (private to the module)
189+ class _MTPFlatteningMapping (MegatronParamMapping ):
190+ def resolve (self , captures ):
191+ return AutoMapping(self ._flatten(captures), ... )
192+ ```
193+
194+ Example — MiniMax-M2's non-standard QK norm layout:
195+
196+ ``` python
197+ # Inside minimax_m2_bridge.py (private to the module)
198+ class _FullDimQKNormMapping (MegatronParamMapping ):
199+ def hf_to_megatron (self , hf_weights ):
200+ # Custom scatter logic for full-dim QK norm
201+ ...
202+ def megatron_to_hf (self , megatron_weights ):
203+ # Custom gather logic
204+ ...
205+ ```
206+
207+ #### Strategy 2: Override bridge hooks
208+
209+ ` MegatronModelBridge ` provides several override hooks — use them instead of modifying the base class:
210+
211+ | Hook | When to use |
212+ | ------| -------------|
213+ | ` mapping_registry() ` | Define all weight name mappings (abstract, always overridden) |
214+ | ` provider_bridge() ` | Configure the provider with model-specific flags (call ` super() ` then setattr) |
215+ | ` maybe_modify_loaded_hf_weight() ` | Dequantize, rename, or reshape HF weights before conversion |
216+ | ` maybe_modify_converted_hf_weight() ` | Synthesize extra HF keys on export (e.g. ` inv_freq ` ) |
217+ | ` megatron_to_hf_config() ` | Build HF ` config.json ` for export |
218+ | ` hf_config_to_provider_kwargs() ` | Override CONFIG_MAPPING behavior for specific fields |
219+
220+ ** Accessing HF config in ` mapping_registry() ` :** The bridge instance has ` self.hf_config `
221+ available during conversion — it is set automatically by the dispatch system before
222+ ` mapping_registry() ` is called. Use it when your mapping registry needs config-dependent
223+ logic (e.g. dynamic MTP layer count, number of experts):
224+
225+ ``` python
226+ def mapping_registry (self ) -> MegatronMappingRegistry:
227+ hf_config = getattr (self , " hf_config" , None )
228+ num_mtp_layers = getattr (hf_config, " num_nextn_predict_layers" , 0 ) if hf_config else 0
229+ ...
230+ ```
231+
232+ Do ** not** override ` build_conversion_tasks() ` to stash ` self._hf_config ` — that pattern is
233+ deprecated.
234+
235+ #### Strategy 3: Custom provider subclass (VLMs only)
236+
237+ Most models do ** not** need a provider file — the stock provider (e.g., ` GPTModelProvider ` , or
238+ another base selected via ` PROVIDER_CLASS ` ) is usually sufficient for LLMs. Only create a provider subclass when a VLM needs custom ` provide() ` logic to instantiate
239+ a combined vision+language model:
240+
241+ ``` python
242+ # src/megatron/bridge/models/<model>/<model>_provider.py
243+ class MyVLModelProvider (GPTModelProvider ):
244+ image_token_id: int = 0
245+
246+ def provide (self , ...):
247+ # Custom model construction combining vision encoder + language decoder
248+ ...
249+ ```
250+
251+ The bridge then references it via ` PROVIDER_CLASS = MyVLModelProvider ` or instantiates it directly
252+ in ` provider_bridge() ` .
253+
254+ #### When shared file changes ARE justified
255+
256+ Modify ` param_mapping.py ` or ` model_bridge.py ` only when the pattern is ** reusable by 2+ model
257+ families** . Examples of justified shared changes:
258+
259+ - ` FusedExpertMapping ` / ` FusedGatedExpertMapping ` — used by GLM, DeepSeek, OLMoE, etc.
260+ - ` RMSNorm2ZeroCenteredRMSNormMapping ` — used by Gemma, Nemotron, etc.
261+ - New ` CONFIG_MAPPING ` entries — when a standard HF config key maps to a standard provider attribute
262+
263+ If you're tempted to add a model-specific ` if model_type == "..." ` branch in shared code, or
264+ pattern-matching on specific weight names in shared conversion logic, that's a signal to use a
265+ local subclass or hook override instead.
266+
150267### Update FLOPs calculator for new architectural blocks
151268
152269If the model introduces a new computational block that differs from standard attention or MLP
@@ -318,7 +435,9 @@ User wants to add a model
318435│ ├─ Has Megatron vision encoder? ──→ Megatron encoder (Qwen3.5 pattern)
319436│ └─ No Megatron encoder ──→ HF encoder (Gemma3 pattern)
320437│
321- └─ No vision config ──→ LLM path (Qwen2 / GPT-OSS pattern)
322- ├─ Standard GPT-style? ──→ Bridge only (no provider subclass needed)
323- └─ Custom components? ──→ Bridge + custom provider or modeling module
438+ └─ No vision config ──→ LLM path (bridge only, no provider file)
439+ ├─ Standard GPT-style? ──→ Bridge with stock mappings
440+ └─ Custom layers? ──→ Bridge + local mapping subclasses / hook overrides
441+ ├─ Custom weight layout? ──→ Local mapping subclass in family dir
442+ └─ Custom import/export? ──→ Override bridge hooks (maybe_modify_*)
324443```
0 commit comments