NVIDIA
diff --git a/‎CHANGELOG.rst‎
Lines changed: 4 additions & 0 deletions b/‎CHANGELOG.rst‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/guides/1_quantization.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/guides/1_quantization.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/guides/_pytorch_quantization.rst‎
Lines changed: 21 additions & 12 deletions b/‎docs/source/guides/_pytorch_quantization.rst‎
Lines changed: 21 additions & 12 deletions
@@ -14,6 +14,10 @@ NVIDIA Model Optimizer Changelog
 - Add support for vLLM fakequant reload using ModelOpt state for HF models. See `examples/vllm_serve/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/vllm_serve#load-qatptq-model-and-serve-in-vllm-wip>`_ for more details.
 - [Early Testing] Add Claude Code PTQ skill (``.claude/skills/ptq/``) for agent-assisted post-training quantization. The skill guides the agent through environment detection, model support checking, format selection, and execution via the launcher or manual SLURM/Docker/bare GPU paths. Includes handling for unlisted models with custom module patching. This feature is in early testing — use with caution.
 
+**Backward Breaking Changes**
+
+- The ``quant_cfg`` field in quantization configs is now an **ordered list** of ``QuantizerCfgEntry`` dicts instead of a flat dictionary. Each entry specifies a ``quantizer_name`` wildcard, an optional ``parent_class`` filter, a ``cfg`` dict of quantizer attributes, and/or an ``enable`` flag. Entries are applied in list order with later entries overriding earlier ones. The old dict-based format is still accepted and automatically converted via ``normalize_quant_cfg_list()``, but now emits a ``DeprecationWarning``; new code should use the list format. All built-in configs (e.g. ``FP8_DEFAULT_CFG``, ``INT4_AWQ_CFG``, ``NVFP4_DEFAULT_CFG``), examples, and YAML recipes have been updated. See the :ref:`quant-cfg` documentation for the new format reference and migration guide.
+
 **Bug Fixes**
 
 - Fix Minitron pruning (``mcore_minitron``) for MoE models. Importance estimation hooks were incorrectly registered for MoE modules and NAS step was hanging before this.
 
@@ -19,6 +19,7 @@ Below, you can find the documentation for the quantization toolkit in ModelOpt:
     ./_basic_quantization.rst
     ./_choosing_quant_methods.rst
     ./_pytorch_quantization.rst
+    ./_quant_cfg.rst
     ./_customized_model_quantization.rst
     ./_compress_quantized_models.rst
     ./_onnx_quantization.rst
 
@@ -237,14 +237,16 @@ For debugging purposes or simple customizations, you can modify an existing conf
 
 .. code-block:: python
 
-    # Create a copy of the default INT8 configuration
-    config = mtq.INT8_DEFAULT_CFG.copy()
+    import copy
 
-    # Disable input quantizers for all layers
-    config["quant_cfg"]["*input_quantizer"]["enable"] = False
+    # Create a deep copy of the default INT8 configuration
+    config = copy.deepcopy(mtq.INT8_DEFAULT_CFG)
+
+    # Disable input quantizers for all layers (appended last, so it takes precedence)
+    config["quant_cfg"].append({"quantizer_name": "*input_quantizer", "enable": False})
 
     # Disable all quantizers for layers matching the pattern "layer1.*"
-    config["quant_cfg"]["*layer1.*"] = {"enable": False}
+    config["quant_cfg"].append({"quantizer_name": "*layer1.*", "enable": False})
 
 Advanced Configuration Creation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -253,18 +255,23 @@ For exploring new quantization recipes, you can compose a completely new configu
 
 .. code-block:: python
 
+    from modelopt.torch.quantization.config import _default_disabled_quantizer_cfg
+
     # Custom configuration for INT4 block-wise weights and INT8 dynamic activations
     MY_CUSTOM_CONFIG = {
-        "quant_cfg": {
+        "quant_cfg": [
+            # Disable all quantizers by default, then enable selectively
+            {"quantizer_name": "*", "enable": False},
+
             # Configure weight quantizers with 4-bit precision and 128-element blocks
-            "*weight_quantizer": {"num_bits": 4, "block_sizes": {-1: 128}, "enable": True},
+            {"quantizer_name": "*weight_quantizer", "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, "enable": True},
 
             # Configure input quantizers with 8-bit dynamic quantization
-            "*input_quantizer": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}},
+            {"quantizer_name": "*input_quantizer", "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}},
 
             # Include default disabled quantizer configurations
-            **_default_disabled_quantizer_cfg,
-        },
+            *_default_disabled_quantizer_cfg,
+        ],
         "algorithm": "max",
     }
 
@@ -394,8 +401,10 @@ You can specify ``custom_calib`` as ``algorithm`` in ``quant_cfg`` to use it. He
 
     # create quantization configuration with "custom_calib" method
     quant_cfg = {
-        'quant_cfg': {'*weight_quantizer': ..},
-        'algorithm':  {"method": 'custom_calib'},
+        'quant_cfg': [
+            {"quantizer_name": "*weight_quantizer", "cfg": {...}},
+        ],
+        'algorithm': {"method": 'custom_calib'},
     }