AnyModel Guide

This guide explains how to add support for new models in the Puzzletron pipeline.

Convert model

Convert a HuggingFace model to Puzzletron format.

Step 1: Create Model Descriptor

Extend ModelDescriptor and implement layer_name_predicates() to define regex patterns for grouping weights into subblocks (embeddings, lm_head, block_N_ffn, block_N_attention).

Key points:

Find weight names on the model's HuggingFace page → click "Files info" to see the safetensors structure with all tensor names (example: Llama-3.1-8B-Instruct)

See example: llama_model_descriptor.py

Step 2: Create Converter

Extend Converter and implement create_block_configs_from_main_config() to create per-layer BlockConfigs from the HuggingFace config.

Key points:

Import correct HuggingFace config class (e.g., MistralConfig, LlamaConfig, Qwen2Config). Find it in the transformers source: github.com/huggingface/transformers/tree/main/src/transformers/models/<model_type>/configuration_<model_type>.py

See example: llama_converter.py

Step 3: Create models/<model_name>/__init__.py

Export descriptor and converter classes:

from models.<model_name>.<model_name>_model_descriptor import MyModelDescriptor
from models.<model_name>.<model_name>_converter import MyConverter

Step 4: Register in models/__init__.py

Add import to trigger factory registration:

from models.<model_name> import *

Usage

from modelopt.torch.puzzletron.anymodel import convert_model

convert_model(
    input_dir="path/to/hf_checkpoint",
    output_dir="path/to/puzzletron_checkpoint",
    converter="model_name",
)

Compress model

Run pruning and compression on a Puzzletron model.

Step 1: Implement ModelDescriptor methods for compression

Add to your ModelDescriptor:

decoder_layer_cls() - return the decoder layer class(es) to patch for heterogeneous config support
block_config_to_layer_overrides() - map BlockConfig to layer override dict (see details)
init_rotary_embedding() - reinitialize rotary embeddings after model loading (see details)
input_embedding_name() - return the name of the input embedding layer (see details)
output_embedding_name() - return the name of the output embedding layer (see details)
layer_block_name() - return the name pattern for decoder layers (see details)
final_norm_name() - return the name of the final normalization layer (see details)
attn_no_op_post_init() - replace attention sublayers with no-op modules
mlp_no_op_post_init() - replace MLP sublayers with no-op modules

Step 2: Create FFN Layer Descriptor

Extend FFNIntermediateLayerDescriptor to define model-specific paths for FFN pruning hooks (down_proj_name, ffn_prefix_name, linear_weight_names). Derive values from your model's weight names in layer_name_predicates().

See example: llama_model_descriptor.py → LlamaFFNIntermediateLayerDescriptor

Step 3: Configure YAML files

Update the main model config YAML:

Set descriptor to match the name used in @ModelDescriptorFactory.register_decorator("your_model_name")
See example: llama_3_1_8b_instruct.yaml

Update pruning YAML files (ffn_pruning.yaml, expert_pruning.yaml, etc.):

Set pruning_mixin._target_ to the appropriate mixin class
Set layer_descriptor._target_ to your layer descriptor class
Set hook_class to the activation hook for scoring
Set target_layer in activation_hooks_kwargs to the layer name for hook attachment
See examples in configs/llama_3_1_8b_instruct/pruning/

End-to-end example

See test_puzzletron.py for a complete example that runs both convert and compression steps. For container setup and dependencies needed to run this test, see the Puzzletron README environment section.

Advanced Topics

Pruning Configuration

Pruning YAML Structure

Each pruning type has a YAML config with these key fields:

pruning_mixin:
  _target_: pruning.<type>_pruning_mixin.<MixinClass>
  layer_descriptor:
    _target_: models.<model>.<descriptor_class>

hook_class: ${get_object:utils.activation_hooks.hooks.<HookClass>}
activation_hooks_kwargs:
  method: <method_name>
  target_layer: "<layer.name>"  # e.g., "mlp.down_proj", "self_attn.o_proj"

Field	Description
`pruning_mixin._target_`	Mixin class that orchestrates this pruning type
`layer_descriptor._target_`	Model-specific class defining layer paths for hooks
`hook_class`	Activation hook class for importance scoring
`target_layer`	Layer name (relative to decoder block) where hooks attach

Adding a New Hook Class

Implement the hook under modelopt/torch/prune/importance_hooks/ (e.g. base_hooks.py for generic hooks, expert_removal_hooks.py for MoE expert removal):
- Extend an existing hook base class (e.g., RemoveExpertsIndependentHook in expert_removal_hooks.py)
- Implement required methods (e.g., get_router_logits_and_routed_experts)

Register the hook in the appropriate pruning mixin's supported_hooks():

For FFN pruning (pruning/ffn_intermediate_pruning_mixin.py):

def supported_hooks(self) -> List[Type[ActivationsHook]]:
    return [IndependentChannelContributionHook, IterativeChannelContributionHook, YourNewHook]

For expert removal (pruning/expert_removal_pruning_mixin.py):

def supported_hooks(self) -> List[Type[ActivationsHook]]:
    return [RankedChoiceVotingHook, ..., YourNewHook]

Reference in YAML:

hook_class: ${get_object:utils.activation_hooks.hooks.YourNewHook}

Pruning Types Reference

Type	Mixin	Example Hooks
FFN intermediate	`FFNIntermediatePruningMixIn`	`IterativeChannelContributionHook`, `IndependentChannelContributionHook`
Expert removal	`ExpertRemovalPruningMixIn`	`NemotronHRemoveExpertsIndependentHook`, `Qwen3VLRemoveExpertsIndependentHook`
KV heads	`KVHeadsPruningMixIn`	`IndependentKvHeadContributionHook`

Implementing `block_config_to_layer_overrides`

Maps Puzzletron's BlockConfig fields to HuggingFace config attribute names. Only override attributes that change during pruning:

BlockConfig Field	HuggingFace Attribute (check `config.json`)
`attention.num_key_value_heads`	`num_key_value_heads`
`ffn.intermediate_size`	`intermediate_size`
`ffn.moe.num_local_experts`	`num_experts` or `n_routed_experts` (model-specific)
`ffn.moe.expert_intermediate_dim`	`moe_intermediate_size`

Tip: Check the model's config.json for exact attribute names - they vary between models.

See examples: qwen3_vl, nemotron_h

Implementing path-based methods

These methods return paths derived from the model's weight names:

input_embedding_name(), output_embedding_name(), layer_block_name(), final_norm_name()

Find them on the model's HuggingFace page → "Files info" → safetensors structure (example: Llama-3.1-8B-Instruct).

See example: llama_model_descriptor.py

Implementing `init_rotary_embedding`

Rotary embeddings are computed modules (not saved weights). After model sharding, they need re-initialization on the correct device/dtype.

Look in github.com/huggingface/transformers/tree/main/src/transformers/models/<model_type>/modeling_<model_type>.py for:

class.*Rotary — the rotary embedding class name and constructor arguments
self.rotary_emb — the attribute path

See example: llama_model_descriptor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnyModel Guide

Convert model

Usage

Compress model

End-to-end example

Advanced Topics

Pruning Configuration

Pruning YAML Structure

Adding a New Hook Class

Pruning Types Reference

Implementing `block_config_to_layer_overrides`

Implementing path-based methods

Implementing `init_rotary_embedding`

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AnyModel Guide

Convert model

Usage

Compress model

End-to-end example

Advanced Topics

Pruning Configuration

Pruning YAML Structure

Adding a New Hook Class

Pruning Types Reference

Implementing block_config_to_layer_overrides

Implementing path-based methods

Implementing init_rotary_embedding

Implementing `block_config_to_layer_overrides`

Implementing `init_rotary_embedding`