Skip to content

Commit 82d5a12

Browse files
committed
presets
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
1 parent 5baba0b commit 82d5a12

File tree

16 files changed

+70
-47
lines changed

16 files changed

+70
-47
lines changed

docs/source/guides/10_recipes.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@ The simplest form is a single ``.yml`` or ``.yaml`` file.
9494
.. code-block:: yaml
9595
9696
imports:
97-
base_disable_all: configs/ptq/base_disable_all
98-
default_disabled: configs/ptq/default_disabled_quantizers
97+
base_disable_all: configs/ptq/units/base_disable_all
98+
default_disabled: configs/ptq/units/default_disabled_quantizers
9999
fp8: configs/numerics/fp8
100100
101101
metadata:
@@ -227,8 +227,8 @@ a list splice are not supported.
227227
.. code-block:: yaml
228228
229229
imports:
230-
base_disable_all: configs/ptq/base_disable_all
231-
default_disabled: configs/ptq/default_disabled_quantizers
230+
base_disable_all: configs/ptq/units/base_disable_all
231+
default_disabled: configs/ptq/units/default_disabled_quantizers
232232
fp8: configs/numerics/fp8
233233
234234
metadata:
@@ -275,7 +275,7 @@ and returns the resolved list:
275275

276276
.. code-block:: yaml
277277
278-
# configs/ptq/fp8_kv.yaml — list snippet that imports a dict snippet
278+
# configs/ptq/units/fp8_kv.yaml — list snippet that imports a dict snippet
279279
imports:
280280
fp8: configs/numerics/fp8
281281
---
@@ -305,11 +305,11 @@ Reusable snippets are stored under ``modelopt_recipes/configs/``:
305305
- NVFP4 E2M1 blockwise, dynamic calibration, FP8 scales (default)
306306
* - ``configs/numerics/nvfp4_static``
307307
- NVFP4 E2M1 blockwise, static calibration, FP8 scales
308-
* - ``configs/ptq/base_disable_all``
308+
* - ``configs/ptq/units/base_disable_all``
309309
- Disable all quantizers (deny-all-then-configure pattern)
310-
* - ``configs/ptq/default_disabled_quantizers``
310+
* - ``configs/ptq/units/default_disabled_quantizers``
311311
- Standard exclusions (LM head, routers, BatchNorm, etc.)
312-
* - ``configs/ptq/fp8_kv``
312+
* - ``configs/ptq/units/fp8_kv``
313313
- FP8 E4M3 KV cache quantization (multi-document, imports ``fp8``)
314314

315315

@@ -549,8 +549,8 @@ Example -- creating a custom PTQ recipe using imports:
549549
550550
# my_int8_recipe.yml
551551
imports:
552-
base_disable_all: configs/ptq/base_disable_all
553-
default_disabled: configs/ptq/default_disabled_quantizers
552+
base_disable_all: configs/ptq/units/base_disable_all
553+
default_disabled: configs/ptq/units/default_disabled_quantizers
554554
555555
metadata:
556556
recipe_type: ptq

modelopt/torch/quantization/config.py

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -273,7 +273,7 @@ def find_quant_cfg_entry_by_path(
273273
"algorithm": "max",
274274
}
275275

276-
FP8_DEFAULT_CFG: dict[str, Any] = load_config("configs/ptq/presets/fp8_default")
276+
FP8_DEFAULT_CFG: dict[str, Any] = load_config("configs/ptq/presets/model/fp8")
277277

278278
MAMBA_MOE_FP8_AGGRESSIVE_CFG = {
279279
"quant_cfg": [
@@ -518,14 +518,7 @@ def find_quant_cfg_entry_by_path(
518518
# KV-cache configs are designed to be merged with a primary quantization config (e.g.
519519
# FP8_DEFAULT_CFG) that already contains _base_disable_all. They intentionally omit both
520520
# _base_disable_all and "algorithm" because these are provided by the primary config.
521-
FP8_KV_CFG = {
522-
"quant_cfg": [
523-
{
524-
"quantizer_name": "*[kv]_bmm_quantizer",
525-
"cfg": {"num_bits": (4, 3)},
526-
},
527-
]
528-
}
521+
FP8_KV_CFG: dict[str, Any] = load_config("configs/ptq/presets/kv/fp8")
529522

530523
FP8_AFFINE_KV_CFG = {
531524
"quant_cfg": [
Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,20 @@
11
# PTQ Preset Configs
22

33
This directory holds preset quantization configurations that serve as the
4-
single source of truth for the hardcoded `*_CFG` dicts in
4+
YAML source of truth for the hardcoded `*_CFG` dicts in
55
`modelopt.torch.quantization.config` (e.g., `FP8_DEFAULT_CFG`).
66

77
Each preset is a complete, self-contained config with `algorithm` and
88
`quant_cfg` — ready to pass directly to `mtq.quantize()`. Presets compose
9-
from the reusable snippets in `configs/numerics/` and `configs/ptq/` via
10-
the `$import` system.
9+
from the reusable snippets in `configs/numerics/` and `configs/ptq/units/`
10+
via the `$import` system.
1111

12-
When adding a new preset, use existing snippets where possible and keep
13-
the YAML as the authoritative definition — the Python config should load
14-
from here rather than hardcoding the dict.
12+
**Note:** The main purpose of these presets is to support the existing
13+
`hf_ptq.py` script's `--qformat` / `--kv_cache_qformat` flags and other
14+
code paths that reference
15+
the hardcoded `*_CFG` dicts, maintaining backward compatibility during
16+
the transition to recipe-based workflows. Users are encouraged to use
17+
`load_recipe` with full recipe files under `general/` or `models/`
18+
instead. Some or all of these presets may be deprecated or removed in
19+
future releases as the recipe-based workflow becomes the standard entry
20+
point.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# FP8 E4M3 KV cache quantization preset.
17+
# Equivalent to the hardcoded FP8_KV_CFG in config.py.
18+
# This is a partial config (no algorithm, no base_disable_all) — designed
19+
# to be merged with a primary model quantization config.
20+
imports:
21+
fp8_kv: configs/ptq/units/fp8_kv
22+
23+
quant_cfg:
24+
- $import: fp8_kv

modelopt_recipes/configs/ptq/presets/fp8_default.yaml renamed to modelopt_recipes/configs/ptq/presets/model/fp8.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@
1616
# FP8 per-tensor weight and activation (W8A8), max calibration.
1717
# Equivalent to the hardcoded FP8_DEFAULT_CFG in config.py.
1818
imports:
19-
base_disable_all: configs/ptq/base_disable_all
20-
w8a8: configs/ptq/w8a8_fp8_fp8
21-
default_disabled: configs/ptq/default_disabled_quantizers
19+
base_disable_all: configs/ptq/units/base_disable_all
20+
w8a8: configs/ptq/units/w8a8_fp8_fp8
21+
default_disabled: configs/ptq/units/default_disabled_quantizers
2222

2323
algorithm: max
2424
quant_cfg:
File renamed without changes.

modelopt_recipes/configs/ptq/default_disabled_quantizers.yaml renamed to modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)