Skip to content

Commit 82df595

Browse files
Add anymodel-core to feature/puzzletron (#974)
### What does this PR do? - Add converter, model_descriptor, puzzformer, and llama model support <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * CLI to convert HF models to AnyModel format; unified package exports; Llama support; pruning toolkit (FFN, KV-heads, MoE expert removal) with multiple init strategies and runtime hooks; per-layer patching and no-op primitives; improved checkpoint export flow. * **Documentation** * Comprehensive AnyModel README and SPDX/license headers added. * **Tests** * Expanded parameterized end-to-end tests and new tokenizer/model config resources. * **Chores** * Package initializers consolidated public API; lightweight dummy modules for testing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
1 parent 5f77c81 commit 82df595

46 files changed

Lines changed: 4504 additions & 914 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

modelopt/torch/nas/plugins/megatron_hooks/base_hooks.py

Lines changed: 379 additions & 1 deletion
Large diffs are not rendered by default.
Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# AnyModel Guide
2+
3+
This guide explains how to add support for new models in the Puzzletron pipeline.
4+
5+
## Convert model
6+
7+
Convert a HuggingFace model to Puzzletron format.
8+
9+
Step 1: Create Model Descriptor
10+
11+
Extend `ModelDescriptor` and implement `layer_name_predicates()` to define regex patterns for grouping weights into subblocks (embeddings, lm_head, block_N_ffn, block_N_attention).
12+
13+
Key points:
14+
15+
- Find weight names on the model's HuggingFace page → click "Files info" to see the safetensors structure with all tensor names (example: [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct?show_file_info=model.safetensors.index.json))
16+
17+
See example: [llama_model_descriptor.py](models/llama/llama_model_descriptor.py)
18+
19+
Step 2: Create Converter
20+
21+
Extend `Converter` and implement `create_block_configs_from_main_config()` to create per-layer BlockConfigs from the HuggingFace config.
22+
23+
Key points:
24+
25+
- Import correct HuggingFace config class (e.g., `MistralConfig`, `LlamaConfig`, `Qwen2Config`). Find it in the transformers source: `github.com/huggingface/transformers/tree/main/src/transformers/models/<model_type>/configuration_<model_type>.py`
26+
27+
See example: [llama_converter.py](models/llama/llama_converter.py)
28+
29+
Step 3: Create `models/<model_name>/__init__.py`
30+
31+
Export descriptor and converter classes:
32+
33+
```python
34+
from models.<model_name>.<model_name>_model_descriptor import MyModelDescriptor
35+
from models.<model_name>.<model_name>_converter import MyConverter
36+
```
37+
38+
Step 4: Register in `models/__init__.py`
39+
40+
Add import to trigger factory registration:
41+
42+
```python
43+
from models.<model_name> import *
44+
```
45+
46+
## Usage
47+
48+
```python
49+
from modelopt.torch.puzzletron.anymodel import convert_model
50+
51+
convert_model(
52+
input_dir="path/to/hf_checkpoint",
53+
output_dir="path/to/puzzletron_checkpoint",
54+
converter="model_name",
55+
)
56+
```
57+
58+
## Compress model
59+
60+
Run pruning and compression on a Puzzletron model.
61+
62+
Step 1: Implement ModelDescriptor methods for compression
63+
64+
Add to your `ModelDescriptor`:
65+
66+
- `decoder_layer_cls()` - return the decoder layer class(es) to patch for heterogeneous config support
67+
- `block_config_to_layer_overrides()` - map BlockConfig to layer override dict (see [details](#implementing-block_config_to_layer_overrides))
68+
- `init_rotary_embedding()` - reinitialize rotary embeddings after model loading (see [details](#implementing-init_rotary_embedding))
69+
- `input_embedding_name()` - return the name of the input embedding layer (see [details](#implementing-path-based-methods))
70+
- `output_embedding_name()` - return the name of the output embedding layer (see [details](#implementing-path-based-methods))
71+
- `layer_block_name()` - return the name pattern for decoder layers (see [details](#implementing-path-based-methods))
72+
- `final_norm_name()` - return the name of the final normalization layer (see [details](#implementing-path-based-methods))
73+
- `attn_no_op_post_init()` - replace attention sublayers with no-op modules
74+
- `mlp_no_op_post_init()` - replace MLP sublayers with no-op modules
75+
76+
Step 2: Create FFN Layer Descriptor
77+
78+
Extend `FFNIntermediateLayerDescriptor` to define model-specific paths for FFN pruning hooks (`down_proj_name`, `ffn_prefix_name`, `linear_weight_names`). Derive values from your model's weight names in `layer_name_predicates()`.
79+
80+
See example: [llama_model_descriptor.py](models/llama/llama_model_descriptor.py)`LlamaFFNIntermediateLayerDescriptor`
81+
82+
Step 3: Configure YAML files
83+
84+
Update the main model config YAML:
85+
86+
- Set `descriptor` to match the name used in `@ModelDescriptorFactory.register_decorator("your_model_name")`
87+
- See example: [llama_3_1_8b_instruct.yaml](../../../../tests/gpu/torch/puzzletron/resources/configs/llama_3_1_8b_instruct/llama_3_1_8b_instruct.yaml)
88+
89+
Update pruning YAML files (`ffn_pruning.yaml`, `expert_pruning.yaml`, etc.):
90+
91+
- Set `pruning_mixin._target_` to the appropriate mixin class
92+
- Set `layer_descriptor._target_` to your layer descriptor class
93+
- Set `hook_class` to the activation hook for scoring
94+
- Set `target_layer` in `activation_hooks_kwargs` to the layer name for hook attachment
95+
- See examples in [configs/llama_3_1_8b_instruct/pruning/](../../../../tests/gpu/torch/puzzletron/resources/configs/llama_3_1_8b_instruct/pruning/)
96+
97+
## End-to-end example
98+
99+
See [test_puzzletron.py](../../../../tests/gpu/torch/puzzletron/test_puzzletron.py) for a complete example that runs both convert and compression steps.
100+
101+
---
102+
103+
## Advanced Topics
104+
105+
## Pruning Configuration
106+
107+
### Pruning YAML Structure
108+
109+
Each pruning type has a YAML config with these key fields:
110+
111+
```yaml
112+
pruning_mixin:
113+
_target_: pruning.<type>_pruning_mixin.<MixinClass>
114+
layer_descriptor:
115+
_target_: models.<model>.<descriptor_class>
116+
117+
hook_class: ${get_object:utils.activation_hooks.hooks.<HookClass>}
118+
activation_hooks_kwargs:
119+
method: <method_name>
120+
target_layer: "<layer.name>" # e.g., "mlp.down_proj", "self_attn.o_proj"
121+
```
122+
123+
| Field | Description |
124+
|-------|-------------|
125+
| `pruning_mixin._target_` | Mixin class that orchestrates this pruning type |
126+
| `layer_descriptor._target_` | Model-specific class defining layer paths for hooks |
127+
| `hook_class` | Activation hook class for importance scoring |
128+
| `target_layer` | Layer name (relative to decoder block) where hooks attach |
129+
130+
### Adding a New Hook Class
131+
132+
1. **Implement the hook** in `modelopt/torch/nas/plugins/megatron_hooks/base_hooks.py`:
133+
- Extend an existing hook base class (e.g., `RemoveExpertsIndependentHook`)
134+
- Implement required methods (e.g., `get_router_logits_and_routed_experts`)
135+
136+
2. **Register the hook** in the appropriate pruning mixin's `supported_hooks()`:
137+
138+
For FFN pruning (`pruning/ffn_intermediate_pruning_mixin.py`):
139+
140+
```python
141+
def supported_hooks(self) -> List[Type[ActivationsHook]]:
142+
return [IndependentChannelContributionHook, IterativeChannelContributionHook, YourNewHook]
143+
```
144+
145+
For expert removal (`pruning/expert_removal_pruning_mixin.py`):
146+
147+
```python
148+
def supported_hooks(self) -> List[Type[ActivationsHook]]:
149+
return [RankedChoiceVotingHook, ..., YourNewHook]
150+
```
151+
152+
3. **Reference in YAML**:
153+
154+
```yaml
155+
hook_class: ${get_object:utils.activation_hooks.hooks.YourNewHook}
156+
```
157+
158+
### Pruning Types Reference
159+
160+
| Type | Mixin | Example Hooks |
161+
|------|-------|---------------|
162+
| FFN intermediate | [`FFNIntermediatePruningMixIn`](../pruning/ffn_intermediate_pruning_mixin.py) | [`IterativeChannelContributionHook`](../../../nas/plugins/megatron_hooks/base_hooks.py), [`IndependentChannelContributionHook`](../../../nas/plugins/megatron_hooks/base_hooks.py) |
163+
| Expert removal | [`ExpertRemovalPruningMixIn`](../pruning/expert_removal_pruning_mixin.py) | [`NemotronHRemoveExpertsIndependentHook`](../../../nas/plugins/megatron_hooks/base_hooks.py), [`Qwen3VLRemoveExpertsIndependentHook`](../../../nas/plugins/megatron_hooks/base_hooks.py) |
164+
| KV heads | [`KVHeadsPruningMixIn`](../pruning/kv_heads_pruning_mixin.py) | [`IndependentKvHeadContributionHook`](../../../nas/plugins/megatron_hooks/base_hooks.py) |
165+
166+
## Implementing `block_config_to_layer_overrides`
167+
168+
Maps Puzzletron's [`BlockConfig`](../decilm/deci_lm_hf_code/block_config.py) fields to HuggingFace config attribute names. Only override attributes that change during pruning:
169+
170+
| BlockConfig Field | HuggingFace Attribute (check `config.json`) |
171+
|-------------------|---------------------------------------------|
172+
| `attention.num_key_value_heads` | `num_key_value_heads` |
173+
| `ffn.intermediate_size` | `intermediate_size` |
174+
| `ffn.moe.num_local_experts` | `num_experts` or `n_routed_experts` (model-specific) |
175+
| `ffn.moe.expert_intermediate_dim` | `moe_intermediate_size` |
176+
177+
**Tip**: Check the model's `config.json` for exact attribute names - they vary between models.
178+
179+
See examples: [qwen3_vl](models/qwen3_vl/qwen3_vl_model_descriptor.py), [nemotron_h](models/nemotron_h/nemotron_h_model_descriptor.py)
180+
181+
---
182+
183+
## Implementing path-based methods
184+
185+
These methods return paths derived from the model's weight names:
186+
187+
- `input_embedding_name()`, `output_embedding_name()`, `layer_block_name()`, `final_norm_name()`
188+
189+
Find them on the model's HuggingFace page → "Files info" → safetensors structure (example: [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct?show_file_info=model.safetensors.index.json)).
190+
191+
See example: [llama_model_descriptor.py](models/llama/llama_model_descriptor.py)
192+
193+
---
194+
195+
## Implementing `init_rotary_embedding`
196+
197+
Rotary embeddings are computed modules (not saved weights). After model sharding, they need re-initialization on the correct device/dtype.
198+
199+
Look in `github.com/huggingface/transformers/tree/main/src/transformers/models/<model_type>/modeling_<model_type>.py` for:
200+
201+
- `class.*Rotary` — the rotary embedding class name and constructor arguments
202+
- `self.rotary_emb` — the attribute path
203+
204+
See example: [llama_model_descriptor.py](models/llama/llama_model_descriptor.py)
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# mypy: ignore-errors
16+
17+
"""AnyModel: Architecture-agnostic model compression for HuggingFace models.
18+
19+
This module provides a declarative approach to model compression that works with
20+
any HuggingFace model without requiring custom modeling code. Instead of duplicating
21+
HuggingFace modeling classes, AnyModel uses ModelDescriptors that define:
22+
23+
1. Which decoder layer class(es) to patch for heterogeneous configs
24+
2. How to map BlockConfig to layer-specific overrides
25+
3. Weight name patterns for subblock checkpointing
26+
27+
Example usage:
28+
>>> from modelopt.torch.puzzletron.anymodel import convert_model
29+
>>> convert_model(
30+
... input_dir="path/to/hf_checkpoint",
31+
... output_dir="path/to/anymodel_checkpoint",
32+
... converter="llama",
33+
... )
34+
35+
Supported models:
36+
- llama: Llama 2, Llama 3, Llama 3.1, Llama 3.2
37+
- (more to come: qwen2, mistral_small, etc.)
38+
"""
39+
40+
# Import models to trigger factory registration
41+
from modelopt.torch.puzzletron.anymodel import models # noqa: F401
42+
from modelopt.torch.puzzletron.anymodel.converter import Converter, ConverterFactory, convert_model
43+
from modelopt.torch.puzzletron.anymodel.model_descriptor import (
44+
ModelDescriptor,
45+
ModelDescriptorFactory,
46+
)
47+
from modelopt.torch.puzzletron.anymodel.puzzformer import (
48+
MatchingZeros,
49+
Same,
50+
deci_x_patcher,
51+
return_tuple_of_size,
52+
)
53+
54+
__all__ = [
55+
"Converter",
56+
"ConverterFactory",
57+
"ModelDescriptor",
58+
"ModelDescriptorFactory",
59+
"deci_x_patcher",
60+
"MatchingZeros",
61+
"Same",
62+
"return_tuple_of_size",
63+
"convert_model",
64+
]
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
"""Converters for transforming HuggingFace models to AnyModel format."""
16+
17+
from .convert_any_model import *
18+
from .converter import *
19+
from .converter_factory import *
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# mypy: ignore-errors
16+
17+
"""Convert a HuggingFace model to AnyModel format."""
18+
19+
from pathlib import Path
20+
21+
from modelopt.torch.puzzletron.anymodel.converter.converter import Converter
22+
from modelopt.torch.puzzletron.anymodel.converter.converter_factory import ConverterFactory
23+
from modelopt.torch.puzzletron.anymodel.model_descriptor import ModelDescriptorFactory
24+
25+
__all__ = ["convert_model"]
26+
27+
28+
def convert_model(
29+
input_dir: str,
30+
output_dir: str,
31+
converter: Converter | str,
32+
):
33+
"""Convert a HuggingFace model to AnyModel format.
34+
35+
This function converts a HuggingFace checkpoint to the AnyModel format used
36+
for compression. The conversion process:
37+
38+
1. Copies non-weight files (config, tokenizer, etc.)
39+
2. Creates block_configs for each layer
40+
3. Reorganizes weights into subblock checkpoints
41+
42+
Args:
43+
input_dir: Path to the input HuggingFace checkpoint directory.
44+
output_dir: Path to the output AnyModel checkpoint directory.
45+
converter: Either a converter name (e.g., "llama") or a Converter class.
46+
47+
Example:
48+
>>> convert_model(
49+
... input_dir="/path/to/Llama-3.1-8B-Instruct",
50+
... output_dir="/path/to/output/ckpts/teacher",
51+
... converter="llama",
52+
... )
53+
"""
54+
input_dir = Path(input_dir)
55+
output_dir = Path(output_dir)
56+
output_dir.mkdir(parents=True, exist_ok=True)
57+
58+
# Get descriptor and converter from factories (they use the same name)
59+
descriptor = ModelDescriptorFactory.get(converter)
60+
converter = ConverterFactory.get(converter)
61+
62+
converter.convert(descriptor=descriptor, input_dir=input_dir, output_dir=output_dir)
63+
64+
65+
if __name__ == "__main__":
66+
from fire import Fire
67+
68+
Fire(convert_model)

0 commit comments

Comments
 (0)