Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
1d8a30b
Add import system for composable YAML configs
shengliangxu Apr 13, 2026
99120f8
reimplement using $import
shengliangxu Apr 13, 2026
f3caa85
remove enable: true
shengliangxu Apr 13, 2026
f29aed8
remove incorrect indent
shengliangxu Apr 13, 2026
eb0842b
remove filter
shengliangxu Apr 14, 2026
d692606
simplify list import
shengliangxu Apr 14, 2026
e267edc
update docs
shengliangxu Apr 14, 2026
bc47154
add import override semantic
shengliangxu Apr 14, 2026
dbb524d
more clear docs
shengliangxu Apr 14, 2026
9941490
changelog
shengliangxu Apr 14, 2026
74235a9
new conflict semantic
shengliangxu Apr 14, 2026
8182b74
support import for recipe snippets
shengliangxu Apr 14, 2026
fd13e6e
license headers + more doc
shengliangxu Apr 14, 2026
dcf10a6
more snippets
shengliangxu Apr 14, 2026
dc67001
nvfp4_dynamic is default
shengliangxu Apr 15, 2026
5baba0b
quant config
shengliangxu Apr 15, 2026
82d5a12
presets
shengliangxu Apr 15, 2026
cbf3f29
yml -> yaml
shengliangxu Apr 15, 2026
ae9e245
remove circular dependency
shengliangxu Apr 15, 2026
65b291d
make config_root so it is logcially independent of recipe
shengliangxu Apr 15, 2026
9f69cd0
README
shengliangxu Apr 15, 2026
0b79b9f
Change Log
shengliangxu Apr 15, 2026
e3c9e50
use full name, do not short
shengliangxu Apr 15, 2026
070f215
cleaner code
shengliangxu Apr 15, 2026
1127f32
A new test
shengliangxu Apr 15, 2026
185ee3b
more loads
shengliangxu Apr 15, 2026
5e0cc8a
fix the doc
shengliangxu Apr 16, 2026
c7ce455
fix failed tests and more tests
shengliangxu Apr 16, 2026
844c088
Merge branch 'main' into shengliangx/composable-recipes
shengliangxu Apr 16, 2026
33af932
better wording
shengliangxu Apr 16, 2026
2c5beee
Merge branch 'main' into shengliangx/composable-recipes
shengliangxu Apr 16, 2026
569c424
Merge branch 'main' into shengliangx/composable-recipes
shengliangxu Apr 16, 2026
27656eb
Merge branch 'main' into shengliangx/composable-recipes
shengliangxu Apr 16, 2026
a8f5c0f
more tests for better coverage
shengliangxu Apr 17, 2026
fb99caa
Merge branch 'main' into shengliangx/composable-recipes
shengliangxu Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ repos:
entry: python tools/precommit/check_modelopt_recipes.py
language: system
files: ^modelopt_recipes/
# configs/ contains reusable snippets (not full recipes) — skip recipe validation
exclude: ^modelopt_recipes/configs/

# Instructions to change license file if ever needed:
# https://github.com/Lucas-C/pre-commit-hooks#removing-old-license-and-replacing-it-with-a-new-one
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Changelog
- Enable PTQ workflow for the Step3.5-Flash MoE model with NVFP4 W4A4 + FP8 KV cache quantization. See `modelopt_recipes/models/Step3.5-Flash/nvfp4-mlp-only.yaml <https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt_recipes/models/Step3.5-Flash/nvfp4-mlp-only.yaml>`_ for more details.
- Add support for vLLM fakequant reload using ModelOpt state for HF models. See `examples/vllm_serve/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/vllm_serve#load-qatptq-model-and-serve-in-vllm-wip>`_ for more details.
- [Early Testing] Add Claude Code PTQ skill (``.claude/skills/ptq/``) for agent-assisted post-training quantization. The skill guides the agent through environment detection, model support checking, format selection, and execution via the launcher or manual SLURM/Docker/bare GPU paths. Includes handling for unlisted models with custom module patching. This feature is in early testing — use with caution.
- Add composable ``$import`` system for recipe YAML configs, enabling reusable config snippets referenced via ``{$import: name}`` markers. All built-in PTQ recipes converted to use imports with shared snippets under ``modelopt_recipes/configs/`` (numeric formats, quant_cfg building blocks, presets). See :ref:`composable-imports`.

**Backward Breaking Changes**

Expand Down
257 changes: 233 additions & 24 deletions docs/source/guides/10_recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,18 @@ A recipe contains two top-level sections: ``metadata`` and a type-specific
configuration section (for example, ``quantize`` for PTQ recipes). These can live
in a single YAML file or be split across files in a directory.

Recipes support two authoring styles: **inline** (all values written directly)
and **import-based** (reusable snippets referenced via ``$import``). Both
styles can be used in a single-file or directory layout.

Single-file format
------------------

The simplest form is a single ``.yml`` or ``.yaml`` file. Here is a PTQ example:
The simplest form is a single ``.yaml`` file.

.. code-block:: yaml
**Inline style** — all config values are written directly:

# modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
.. code-block:: yaml

metadata:
recipe_type: ptq
Expand All @@ -81,11 +85,42 @@ The simplest form is a single ``.yml`` or ``.yaml`` file. Here is a PTQ example
num_bits: e4m3
axis:
- quantizer_name: '*[kv]_bmm_quantizer'
enable: true
cfg:
num_bits: e4m3
# ... standard exclusions omitted for brevity

**Import style** — the same recipe using reusable config snippets:

.. code-block:: yaml

imports:
base_disable_all: configs/ptq/units/base_disable_all
default_disabled: configs/ptq/units/default_disabled_quantizers
fp8: configs/numerics/fp8

metadata:
recipe_type: ptq
description: FP8 per-tensor weight and activation (W8A8), FP8 KV cache, max calibration.

quantize:
algorithm: max
quant_cfg:
- $import: base_disable_all
- quantizer_name: '*input_quantizer'
cfg:
$import: fp8
- quantizer_name: '*weight_quantizer'
cfg:
$import: fp8
- quantizer_name: '*[kv]_bmm_quantizer'
cfg:
$import: fp8
- $import: default_disabled

Both styles produce identical results at load time. The import style reduces
duplication when multiple recipes share the same numeric formats or exclusion
lists. See :ref:`composable-imports` below for the full ``$import`` specification.

Directory format
----------------

Expand All @@ -96,18 +131,18 @@ example:
.. code-block:: text

my_recipe/
recipe.yml # metadata section
quantize.yml # quantize section (quant_cfg + algorithm)
recipe.yaml # metadata section (+ optional imports)
quantize.yaml # quantize section (+ optional imports)

``recipe.yml``:
``recipe.yaml``:

.. code-block:: yaml

metadata:
recipe_type: ptq
description: My custom NVFP4 recipe.

``quantize.yml``:
``quantize.yaml``:

.. code-block:: yaml

Expand All @@ -124,6 +159,160 @@ example:
num_bits: e4m3
axis:

Both inline and import styles work with the directory format. Any YAML file
in the directory can have its own ``imports`` section — ``recipe.yaml``,
``quantize.yaml``, or any other config file.

.. _composable-imports:

Composable imports
------------------

Recipes can import **reusable config snippets** via the ``imports`` section.
This eliminates duplication — numeric format definitions and standard exclusion
lists are authored once and referenced by name across recipes.

The ``imports`` section is a dict mapping short names to config file paths.
References use the explicit ``{$import: name}`` marker so they are never
confused with literal values.

.. note::

``imports`` (no ``$``) is a **top-level structural section** — like
``metadata`` or ``quantize``, it declares the recipe's dependencies.
``$import`` (with ``$``) is an **inline directive** that appears inside
data values and gets resolved at load time.

The ``$import`` marker can appear anywhere in the recipe:

- As a **dict value** — the marker is replaced with the snippet content.
- As a **list element** — the snippet (which must itself be a list) is spliced
into the surrounding list.

As a **dict value**, ``$import`` supports composition with clear override
precedence (lowest to highest):

1. **Imports in list order** — ``$import: [base, override]``: later snippets
override earlier ones on key conflicts.
2. **Inline keys** — extra keys alongside ``$import`` override all imported
values.

This is equivalent to calling ``dict.update()`` in order: imports first (in
list order), then inline keys last.

.. code-block:: yaml

# Single import
cfg:
$import: nvfp4

# Import + override — import nvfp4, then override type inline
cfg:
$import: nvfp4 # imports {num_bits: e2m1, block_sizes: {-1: 16, type: dynamic, ...}}
block_sizes:
-1: 16
type: static # overrides type: dynamic → static calibration

# Multiple imports — later snippet overrides earlier on conflict
cfg:
$import: [base_format, kv_tweaks] # kv_tweaks wins on shared keys

# All three: multi-import + inline override
cfg:
$import: [bits, scale]
axis: 0 # highest precedence

As a **list element**, ``$import`` must be the only key — extra keys alongside
a list splice are not supported.

.. code-block:: yaml

imports:
base_disable_all: configs/ptq/units/base_disable_all
default_disabled: configs/ptq/units/default_disabled_quantizers
fp8: configs/numerics/fp8

metadata:
recipe_type: ptq
description: FP8 W8A8, FP8 KV cache.

quantize:
algorithm: max
quant_cfg:
- $import: base_disable_all # spliced from a single-element list snippet
- quantizer_name: '*weight_quantizer'
cfg:
$import: fp8 # cfg value replaced with imported dict
- $import: default_disabled # spliced from a multi-element list snippet

In this example:

- ``$import: base_disable_all`` and ``$import: default_disabled`` are **list elements**
— their snippets (YAML lists) are spliced into ``quant_cfg``.
- ``$import: fp8`` under ``cfg`` is a **dict value** — the snippet (a YAML dict of
quantizer attributes) replaces the ``cfg`` field.

Import paths are resolved via :func:`~modelopt.recipe.load_config` — the
built-in ``modelopt_recipes/`` library is checked first, then the filesystem.

**Recursive imports:** An imported snippet may itself contain an ``imports``
section. Each file's imports are scoped to that file — the same name can be
used in different files without conflict. Circular imports are detected and
raise ``ValueError``.

Multi-document snippets
^^^^^^^^^^^^^^^^^^^^^^^

Dict-valued snippets (e.g., numeric format definitions) can use ``imports``
directly because the ``imports`` key and the snippet content are both part of
the same YAML mapping. List-valued snippets have a problem: YAML only allows
one root node per document, so a file cannot be both a mapping (for
``imports``) and a list (for entries) at the same time.

The solution is **multi-document YAML**: the first document holds the
``imports``, and the second document (after ``---``) holds the list content.
The loader parses both documents, resolves ``$import`` markers in the content,
and returns the resolved list:

.. code-block:: yaml

# configs/ptq/units/fp8_kv.yaml — list snippet that imports a dict snippet
imports:
fp8: configs/numerics/fp8
---
- quantizer_name: '*[kv]_bmm_quantizer'
cfg:
$import: fp8

This enables full composability — list snippets can reference dict snippets,
dict snippets can reference other dict snippets, and recipes can reference
any of them. All import resolution happens at load time with the same
precedence rules.

Built-in config snippets
^^^^^^^^^^^^^^^^^^^^^^^^

Reusable snippets are stored under ``modelopt_recipes/configs/``:

.. list-table::
:header-rows: 1
:widths: 45 55

* - Snippet path
- Description
* - ``configs/numerics/fp8``
- FP8 E4M3 quantizer attributes
* - ``configs/numerics/nvfp4``
- NVFP4 E2M1 blockwise, dynamic calibration, FP8 scales (default)
* - ``configs/numerics/nvfp4_static``
- NVFP4 E2M1 blockwise, static calibration, FP8 scales
* - ``configs/ptq/units/base_disable_all``
- Disable all quantizers (deny-all-then-configure pattern)
* - ``configs/ptq/units/default_disabled_quantizers``
- Standard exclusions (LM head, routers, BatchNorm, etc.)
* - ``configs/ptq/units/fp8_kv``
- FP8 E4M3 KV cache quantization (multi-document, imports ``fp8``)


Metadata section
================
Expand Down Expand Up @@ -287,7 +476,7 @@ type depends on the ``recipe_type`` in the metadata:
.. code-block:: python

# Load a custom recipe from the filesystem (file or directory)
recipe = load_recipe("/path/to/my_custom_recipe.yml")
recipe = load_recipe("/path/to/my_custom_recipe.yaml")
# or: recipe = load_recipe("/path/to/my_recipe_dir/")

Command-line usage
Expand Down Expand Up @@ -341,7 +530,7 @@ This means built-in recipes can be referenced without any prefix:

# These are all equivalent:
load_recipe("general/ptq/fp8_default-fp8_kv")
load_recipe("general/ptq/fp8_default-fp8_kv.yml")
load_recipe("general/ptq/fp8_default-fp8_kv.yaml")


Writing a custom recipe
Expand All @@ -355,20 +544,23 @@ To create a custom recipe:
3. Update the ``metadata.description`` to describe your changes.
4. Save the file (or directory) and pass its path to ``load_recipe()`` or ``--recipe``.

Example -- creating a custom PTQ recipe (INT8 per-channel):
Example -- creating a custom PTQ recipe using imports:

.. code-block:: yaml

# my_int8_recipe.yml
# my_int8_recipe.yaml
imports:
base_disable_all: configs/ptq/units/base_disable_all
default_disabled: configs/ptq/units/default_disabled_quantizers

metadata:
recipe_type: ptq
description: INT8 per-channel weight, per-tensor activation.

quantize:
algorithm: max
quant_cfg:
- quantizer_name: '*'
enable: false
- $import: base_disable_all
- quantizer_name: '*weight_quantizer'
cfg:
num_bits: 8
Expand All @@ -377,10 +569,11 @@ Example -- creating a custom PTQ recipe (INT8 per-channel):
cfg:
num_bits: 8
axis:
- quantizer_name: '*lm_head*'
enable: false
- quantizer_name: '*output_layer*'
enable: false
- $import: default_disabled

The built-in snippets (``base_disable_all``, ``default_disabled``) handle the
deny-all prefix and standard exclusions. Only the format-specific entries need
to be written inline.


Recipe repository layout
Expand All @@ -394,15 +587,31 @@ The ``modelopt_recipes/`` package is organized as follows:
+-- __init__.py
+-- general/ # Model-agnostic recipes
| +-- ptq/
| +-- fp8_default-fp8_kv.yml
| +-- nvfp4_default-fp8_kv.yml
| +-- nvfp4_mlp_only-fp8_kv.yml
| +-- nvfp4_experts_only-fp8_kv.yml
| +-- nvfp4_omlp_only-fp8_kv.yml
| +-- fp8_default-fp8_kv.yaml
| +-- nvfp4_default-fp8_kv.yaml
| +-- nvfp4_mlp_only-fp8_kv.yaml
| +-- nvfp4_experts_only-fp8_kv.yaml
| +-- nvfp4_omlp_only-fp8_kv.yaml
+-- models/ # Model-specific recipes
| +-- Step3.5-Flash/
| +-- nvfp4-mlp-only.yaml
+-- configs/ # Shared configuration fragments
+-- configs/ # Reusable config snippets (imported via $import)
+-- numerics/ # Numeric format definitions
| +-- fp8.yaml
| +-- nvfp4_static.yaml
| +-- nvfp4.yaml
+-- ptq/
+-- units/ # Reusable quant_cfg building blocks
| +-- base_disable_all.yaml
| +-- default_disabled_quantizers.yaml
| +-- fp8_kv.yaml
| +-- w8a8_fp8_fp8.yaml
| +-- w4a4_nvfp4_nvfp4.yaml
+-- presets/ # Complete configs (backward compat with *_CFG dicts)
+-- model/
| +-- fp8.yaml
+-- kv/
+-- fp8.yaml

Comment thread
coderabbitai[bot] marked this conversation as resolved.

Recipe data model
Expand Down
Loading
Loading