NVIDIA · shengliangxu · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026
@@ -68,6 +68,8 @@ repos:
         entry: python tools/precommit/check_modelopt_recipes.py
         language: system
         files: ^modelopt_recipes/
+        # configs/ contains reusable snippets (not full recipes) — skip recipe validation
+        exclude: ^modelopt_recipes/configs/
 
   # Instructions to change license file if ever needed:
   # https://github.com/Lucas-C/pre-commit-hooks#removing-old-license-and-replacing-it-with-a-new-one

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -15,6 +15,7 @@ Changelog
 - Enable PTQ workflow for the Step3.5-Flash MoE model with NVFP4 W4A4 + FP8 KV cache quantization. See `modelopt_recipes/models/Step3.5-Flash/nvfp4-mlp-only.yaml <https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt_recipes/models/Step3.5-Flash/nvfp4-mlp-only.yaml>`_ for more details.
 - Add support for vLLM fakequant reload using ModelOpt state for HF models. See `examples/vllm_serve/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/vllm_serve#load-qatptq-model-and-serve-in-vllm-wip>`_ for more details.
 - [Early Testing] Add Claude Code PTQ skill (``.claude/skills/ptq/``) for agent-assisted post-training quantization. The skill guides the agent through environment detection, model support checking, format selection, and execution via the launcher or manual SLURM/Docker/bare GPU paths. Includes handling for unlisted models with custom module patching. This feature is in early testing — use with caution.
+- Add composable ``$import`` system for recipe YAML configs, enabling reusable config snippets referenced via ``{$import: name}`` markers. All built-in PTQ recipes converted to use imports with shared snippets under ``modelopt_recipes/configs/`` (numeric formats, quant_cfg building blocks, presets). See :ref:`composable-imports`.
 
 **Backward Breaking Changes**
 

diff --git a/docs/source/guides/10_recipes.rst b/docs/source/guides/10_recipes.rst
@@ -54,14 +54,18 @@ A recipe contains two top-level sections: ``metadata`` and a type-specific
 configuration section (for example, ``quantize`` for PTQ recipes).  These can live
 in a single YAML file or be split across files in a directory.
 
+Recipes support two authoring styles: **inline** (all values written directly)
+and **import-based** (reusable snippets referenced via ``$import``).  Both
+styles can be used in a single-file or directory layout.
+
 Single-file format
 ------------------
 
-The simplest form is a single ``.yml`` or ``.yaml`` file.  Here is a PTQ example:
+The simplest form is a single ``.yaml`` file.
 
-.. code-block:: yaml
+**Inline style** — all config values are written directly:
 
-   # modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
+.. code-block:: yaml
 
    metadata:
      recipe_type: ptq
@@ -81,11 +85,42 @@ The simplest form is a single ``.yml`` or ``.yaml`` file.  Here is a PTQ example
            num_bits: e4m3
            axis:
        - quantizer_name: '*[kv]_bmm_quantizer'
-         enable: true
          cfg:
            num_bits: e4m3
        # ... standard exclusions omitted for brevity
 
+**Import style** — the same recipe using reusable config snippets:
+
+.. code-block:: yaml
+
+   imports:
+     base_disable_all: configs/ptq/units/base_disable_all
+     default_disabled: configs/ptq/units/default_disabled_quantizers
+     fp8: configs/numerics/fp8
+
+   metadata:
+     recipe_type: ptq
+     description: FP8 per-tensor weight and activation (W8A8), FP8 KV cache, max calibration.
+
+   quantize:
+     algorithm: max
+     quant_cfg:
+       - $import: base_disable_all
+       - quantizer_name: '*input_quantizer'
+         cfg:
+           $import: fp8
+       - quantizer_name: '*weight_quantizer'
+         cfg:
+           $import: fp8
+       - quantizer_name: '*[kv]_bmm_quantizer'
+         cfg:
+           $import: fp8
+       - $import: default_disabled
+
+Both styles produce identical results at load time.  The import style reduces
+duplication when multiple recipes share the same numeric formats or exclusion
+lists.  See :ref:`composable-imports` below for the full ``$import`` specification.
+
 Directory format
 ----------------
 
@@ -96,18 +131,18 @@ example:
 .. code-block:: text
 
    my_recipe/
-     recipe.yml      # metadata section
-     quantize.yml    # quantize section (quant_cfg + algorithm)
+     recipe.yaml      # metadata section (+ optional imports)
+     quantize.yaml    # quantize section (+ optional imports)
 
-``recipe.yml``:
+``recipe.yaml``:
 
 .. code-block:: yaml
 
    metadata:
      recipe_type: ptq
      description: My custom NVFP4 recipe.
 
-``quantize.yml``:
+``quantize.yaml``:
 
 .. code-block:: yaml
 
@@ -124,6 +159,160 @@ example:
          num_bits: e4m3
          axis:
 
+Both inline and import styles work with the directory format.  Any YAML file
+in the directory can have its own ``imports`` section — ``recipe.yaml``,
+``quantize.yaml``, or any other config file.
+
+.. _composable-imports:
+
+Composable imports
+------------------
+
+Recipes can import **reusable config snippets** via the ``imports`` section.
+This eliminates duplication — numeric format definitions and standard exclusion
+lists are authored once and referenced by name across recipes.
+
+The ``imports`` section is a dict mapping short names to config file paths.
+References use the explicit ``{$import: name}`` marker so they are never
+confused with literal values.
+
+.. note::
+
+   ``imports`` (no ``$``) is a **top-level structural section** — like
+   ``metadata`` or ``quantize``, it declares the recipe's dependencies.
+   ``$import`` (with ``$``) is an **inline directive** that appears inside
+   data values and gets resolved at load time.
+
+The ``$import`` marker can appear anywhere in the recipe:
+
+- As a **dict value** — the marker is replaced with the snippet content.
+- As a **list element** — the snippet (which must itself be a list) is spliced
+  into the surrounding list.
+
+As a **dict value**, ``$import`` supports composition with clear override
+precedence (lowest to highest):
+
+1. **Imports in list order** — ``$import: [base, override]``: later snippets
+   override earlier ones on key conflicts.
+2. **Inline keys** — extra keys alongside ``$import`` override all imported
+   values.
+
+This is equivalent to calling ``dict.update()`` in order: imports first (in
+list order), then inline keys last.
+
+.. code-block:: yaml
+
+   # Single import
+   cfg:
+     $import: nvfp4
+
+   # Import + override — import nvfp4, then override type inline
+   cfg:
+     $import: nvfp4    # imports {num_bits: e2m1, block_sizes: {-1: 16, type: dynamic, ...}}
+     block_sizes:
+       -1: 16
+       type: static    # overrides type: dynamic → static calibration
+
+   # Multiple imports — later snippet overrides earlier on conflict
+   cfg:
+     $import: [base_format, kv_tweaks]   # kv_tweaks wins on shared keys
+
+   # All three: multi-import + inline override
+   cfg:
+     $import: [bits, scale]
+     axis: 0            # highest precedence
+
+As a **list element**, ``$import`` must be the only key — extra keys alongside
+a list splice are not supported.
+
+.. code-block:: yaml
+
+   imports:
+     base_disable_all: configs/ptq/units/base_disable_all
+     default_disabled: configs/ptq/units/default_disabled_quantizers
+     fp8: configs/numerics/fp8
+
+   metadata:
+     recipe_type: ptq
+     description: FP8 W8A8, FP8 KV cache.
+
+   quantize:
+     algorithm: max
+     quant_cfg:
+       - $import: base_disable_all          # spliced from a single-element list snippet
+       - quantizer_name: '*weight_quantizer'
+         cfg:
+           $import: fp8                     # cfg value replaced with imported dict
+       - $import: default_disabled          # spliced from a multi-element list snippet
+
+In this example:
+
+- ``$import: base_disable_all`` and ``$import: default_disabled`` are **list elements**
+  — their snippets (YAML lists) are spliced into ``quant_cfg``.
+- ``$import: fp8`` under ``cfg`` is a **dict value** — the snippet (a YAML dict of
+  quantizer attributes) replaces the ``cfg`` field.
+
+Import paths are resolved via :func:`~modelopt.recipe.load_config` — the
+built-in ``modelopt_recipes/`` library is checked first, then the filesystem.
+
+**Recursive imports:** An imported snippet may itself contain an ``imports``
+section.  Each file's imports are scoped to that file — the same name can be
+used in different files without conflict.  Circular imports are detected and
+raise ``ValueError``.
+
+Multi-document snippets
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Dict-valued snippets (e.g., numeric format definitions) can use ``imports``
+directly because the ``imports`` key and the snippet content are both part of
+the same YAML mapping.  List-valued snippets have a problem: YAML only allows
+one root node per document, so a file cannot be both a mapping (for
+``imports``) and a list (for entries) at the same time.
+
+The solution is **multi-document YAML**: the first document holds the
+``imports``, and the second document (after ``---``) holds the list content.
+The loader parses both documents, resolves ``$import`` markers in the content,
+and returns the resolved list:
+
+.. code-block:: yaml
+
+   # configs/ptq/units/fp8_kv.yaml — list snippet that imports a dict snippet
+   imports:
+     fp8: configs/numerics/fp8
+   ---
+   - quantizer_name: '*[kv]_bmm_quantizer'
+     cfg:
+       $import: fp8
+
+This enables full composability — list snippets can reference dict snippets,
+dict snippets can reference other dict snippets, and recipes can reference
+any of them.  All import resolution happens at load time with the same
+precedence rules.
+
+Built-in config snippets
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Reusable snippets are stored under ``modelopt_recipes/configs/``:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 45 55
+
+   * - Snippet path
+     - Description
+   * - ``configs/numerics/fp8``
+     - FP8 E4M3 quantizer attributes
+   * - ``configs/numerics/nvfp4``
+     - NVFP4 E2M1 blockwise, dynamic calibration, FP8 scales (default)
+   * - ``configs/numerics/nvfp4_static``
+     - NVFP4 E2M1 blockwise, static calibration, FP8 scales
+   * - ``configs/ptq/units/base_disable_all``
+     - Disable all quantizers (deny-all-then-configure pattern)
+   * - ``configs/ptq/units/default_disabled_quantizers``
+     - Standard exclusions (LM head, routers, BatchNorm, etc.)
+   * - ``configs/ptq/units/fp8_kv``
+     - FP8 E4M3 KV cache quantization (multi-document, imports ``fp8``)
+
 
 Metadata section
 ================
@@ -287,7 +476,7 @@ type depends on the ``recipe_type`` in the metadata:
 .. code-block:: python
 
    # Load a custom recipe from the filesystem (file or directory)
-   recipe = load_recipe("/path/to/my_custom_recipe.yml")
+   recipe = load_recipe("/path/to/my_custom_recipe.yaml")
    # or: recipe = load_recipe("/path/to/my_recipe_dir/")
 
 Command-line usage
@@ -341,7 +530,7 @@ This means built-in recipes can be referenced without any prefix:
 
    # These are all equivalent:
    load_recipe("general/ptq/fp8_default-fp8_kv")
-   load_recipe("general/ptq/fp8_default-fp8_kv.yml")
+   load_recipe("general/ptq/fp8_default-fp8_kv.yaml")
 
 
 Writing a custom recipe
@@ -355,20 +544,23 @@ To create a custom recipe:
 3. Update the ``metadata.description`` to describe your changes.
 4. Save the file (or directory) and pass its path to ``load_recipe()`` or ``--recipe``.
 
-Example -- creating a custom PTQ recipe (INT8 per-channel):
+Example -- creating a custom PTQ recipe using imports:
 
 .. code-block:: yaml
 
-   # my_int8_recipe.yml
+   # my_int8_recipe.yaml
+   imports:
+     base_disable_all: configs/ptq/units/base_disable_all
+     default_disabled: configs/ptq/units/default_disabled_quantizers
+
    metadata:
      recipe_type: ptq
      description: INT8 per-channel weight, per-tensor activation.
 
    quantize:
      algorithm: max
      quant_cfg:
-       - quantizer_name: '*'
-         enable: false
+       - $import: base_disable_all
        - quantizer_name: '*weight_quantizer'
          cfg:
            num_bits: 8
@@ -377,10 +569,11 @@ Example -- creating a custom PTQ recipe (INT8 per-channel):
          cfg:
            num_bits: 8
            axis:
-       - quantizer_name: '*lm_head*'
-         enable: false
-       - quantizer_name: '*output_layer*'
-         enable: false
+       - $import: default_disabled
+
+The built-in snippets (``base_disable_all``, ``default_disabled``) handle the
+deny-all prefix and standard exclusions.  Only the format-specific entries need
+to be written inline.
 
 
 Recipe repository layout
@@ -394,15 +587,31 @@ The ``modelopt_recipes/`` package is organized as follows:
    +-- __init__.py
    +-- general/                    # Model-agnostic recipes
    |   +-- ptq/
-   |       +-- fp8_default-fp8_kv.yml
-   |       +-- nvfp4_default-fp8_kv.yml
-   |       +-- nvfp4_mlp_only-fp8_kv.yml
-   |       +-- nvfp4_experts_only-fp8_kv.yml
-   |       +-- nvfp4_omlp_only-fp8_kv.yml
+   |       +-- fp8_default-fp8_kv.yaml
+   |       +-- nvfp4_default-fp8_kv.yaml
+   |       +-- nvfp4_mlp_only-fp8_kv.yaml
+   |       +-- nvfp4_experts_only-fp8_kv.yaml
+   |       +-- nvfp4_omlp_only-fp8_kv.yaml
    +-- models/                     # Model-specific recipes
    |   +-- Step3.5-Flash/
    |       +-- nvfp4-mlp-only.yaml
-   +-- configs/                    # Shared configuration fragments
+   +-- configs/                    # Reusable config snippets (imported via $import)
+       +-- numerics/               # Numeric format definitions
+       |   +-- fp8.yaml
+       |   +-- nvfp4_static.yaml
+       |   +-- nvfp4.yaml
+       +-- ptq/
+           +-- units/                # Reusable quant_cfg building blocks
+           |   +-- base_disable_all.yaml
+           |   +-- default_disabled_quantizers.yaml
+           |   +-- fp8_kv.yaml
+           |   +-- w8a8_fp8_fp8.yaml
+           |   +-- w4a4_nvfp4_nvfp4.yaml
+           +-- presets/              # Complete configs (backward compat with *_CFG dicts)
+               +-- model/
+               |   +-- fp8.yaml
+               +-- kv/
+                   +-- fp8.yaml
 
 
 Recipe data model