Commit e4dc020
authored
[OMNIML-4775] Move built-in PTQ quantization configs to YAML (#1423)
### What does this PR do?
Type of change: refactor
This PR moves the built-in PTQ quantization config definitions out of
hard-coded Python dictionaries and into schema-backed YAML config files,
and factors shared blocks into reusable composable snippets.
- Adds reusable numeric config snippets under
`modelopt_recipes/configs/numerics/`.
- Adds YAML presets for the built-in model PTQ configs under
`modelopt_recipes/configs/ptq/presets/model/`.
- Adds YAML presets for KV-cache quantization configs under
`modelopt_recipes/configs/ptq/presets/kv/`.
- Adds YAML presets for the Diffusers-specific PTQ configs under
`modelopt_recipes/configs/ptq/presets/diffusers/` and re-points
`examples/diffusers/quantization/config.py` constants at them via
`load_config`.
- Adds reusable KV quantization units (`kv_fp8_affine`, `kv_nvfp4`,
`kv_nvfp4_affine`, `kv_nvfp4_rotate`, `kv_*_cast` variants) under
`modelopt_recipes/configs/ptq/units/`.
- Adds reusable model-side units following the
`component_numerics[_type]` convention:
- `attention_qkv_fp8` — FP8 E4M3 on attention q/k/v bmm and softmax
quantizers; shared by `model/` and `diffusers/` `nvfp4_fp8_mha` presets.
- `block_sparse_moe_nvfp4` — NVFP4 W4A4 on `*block_sparse_moe*`
weight/input quantizers; shared by `nvfp4_mlp_only`,
`nvfp4_experts_only`, `nvfp4_omlp_only`.
- `experts_nvfp4` — NVFP4 W4A4 on `*.experts.*` weight/input quantizers;
shared by `nvfp4_mlp_only` and `nvfp4_experts_only`.
- Switches the existing 5 NVFP4 presets (default + awq lite/clip/full +
svdquant) and 4 mamba_moe presets to `$import` the existing
`w4a4_nvfp4_nvfp4` / `w8a8_fp8_fp8` units instead of re-inlining the
same weight+input quantizer pairs.
- Moves the recently-added `W4A16_NVFP4_CFG` to YAML
(`presets/model/w4a16_nvfp4.yaml`) composed from the existing
`units/w4_nvfp4` snippet.
- Updates `modelopt.torch.quantization.config` built-in config constants
to load `QuantizeConfig` objects from YAML with `load_config(...,
schema_type=QuantizeConfig).model_dump(exclude_unset=True)` via a new
`_load_quantize_config_dict` helper; the constants remain plain
`dict[str, Any]` for backwards compatibility with consumers that do
mapping-style mutation (e.g. `entry["cfg"]` assignment).
- Simplifies the cfg-list loader (`_load_quantizer_cfg_dict_list`) down
to a 4-line list/single normalization now that the three call sites all
load schema-typed YAMLs.
- Adds/updates recipe loader coverage for built-in schema-backed config
snippets.
### Latent-bug fixes surfaced by the refactor
Two small correctness fixes are included alongside the mechanical
refactor; flagging them explicitly:
- **`examples/diffusers/quantization/quantize.py`** — adds an explicit
`base_cfg = copy.deepcopy(base_cfg)` before applying runtime overrides.
The existing `# Build a fresh config dict so we never mutate the global
constants` comment had been aspirational only; in practice
`reset_set_int8_config` accumulated `PercentileCalibrator` entries into
`mtq.INT8_SMOOTHQUANT_CFG`/`INT8_DEFAULT_CONFIG` across repeated calls,
and `set_quant_config_attr` added `trt_high_precision_dtype` keys into
globally-shared cfg dicts. The deepcopy makes the code match the
comment.
- **`choices` set in `modelopt/torch/quantization/config.py`** — adds
`MXFP6_DEFAULT_CFG` and `NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG` to the
documented public set of valid `mtq.*_CFG` names. Both constants exist
on main but were missing from `choices`, so CLIs that gate on
`mtq.config.choices` (e.g., `hf_ptq.py --qformat`) couldn't reach them
even though the configs themselves were fully supported.
### Usage
Existing Python imports continue to work:
```python
import modelopt.torch.quantization as mtq
cfg = mtq.FP8_DEFAULT_CFG
model = mtq.quantize(model, cfg, forward_loop)
```
The built-in constants are plain `dict[str, Any]` (sparse — only
explicitly-set fields are present), but their definitions now come from
YAML snippets and presets composed through the existing `$import`
system.
Reusable YAML snippets can be composed through `$import`, for example:
```yaml
# modelopt-schema: modelopt.torch.quantization.config.QuantizeConfig
imports:
base_disable_all: configs/ptq/units/base_disable_all
w4a4_nvfp4_nvfp4: configs/ptq/units/w4a4_nvfp4_nvfp4
default_disabled_quantizers: configs/ptq/units/default_disabled_quantizers
algorithm: max
quant_cfg:
- $import: base_disable_all
- $import: w4a4_nvfp4_nvfp4
- $import: default_disabled_quantizers
```
### Testing
Local checks run:
- `nox -s "unit-3.10(torch_211, tf_latest)"` — 2329 passed, 12 skipped.
- `nox -s pre_commit_all` — all hooks pass (ruff check / ruff format /
mypy / YAML format / license / bandit / markdownlint).
- YAML parse + `$import` resolution sanity check across all changed
config files.
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅ Existing built-in Python config
constants keep the same public names and dict semantics.
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: ✅ Adds/updates recipe loader
coverage for schema-backed built-in snippets.
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A
- Did you get Claude approval on this PR?: ❌
### Additional Information
This PR was previously stacked on #1405, which has since merged to
`main`. The branch has been rebased onto `main` and no longer depends on
any other open PR.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Many new quantization numeric configs and PTQ presets added
(INT4/INT8/MXFP4/MXFP6/MXFP8/MXINT8/NVFP4), plus Diffusers, KV-cache
(affine/cast/rotate) and MLP/MoE-targeted presets.
* **Refactor**
* Presets and shared snippets migrated to schema-backed YAML sources and
centralized loading; INT8 percentile calibration avoids mutating shared
base configs.
* **Tests**
* Tests now discover packaged config snippets at runtime and validate
import/append behaviors.
* **Documentation**
* Presets README and numerous header descriptions updated.
* **Chores**
* Minor typing and script improvements.
<!-- review_stack_entry_start -->
[](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1423?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)
<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>1 parent a5bc6f8 commit e4dc020
90 files changed
Lines changed: 2050 additions & 721 deletions
File tree
- examples
- diffusers/quantization
- llm_autodeploy
- modelopt_recipes
- configs
- numerics
- ptq
- presets
- diffusers
- kv
- model
- units
- general
- ptq
- speculative_decoding
- models/Step3.5-Flash
- modelopt/torch
- opt
- quantization
- tests/unit/recipe
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 19 | + | |
| 20 | + | |
29 | 21 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
95 | 34 | | |
96 | 35 | | |
97 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
| 118 | + | |
117 | 119 | | |
118 | 120 | | |
119 | 121 | | |
120 | 122 | | |
121 | 123 | | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
| 124 | + | |
130 | 125 | | |
131 | 126 | | |
132 | 127 | | |
| |||
137 | 132 | | |
138 | 133 | | |
139 | 134 | | |
140 | | - | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
141 | 147 | | |
142 | 148 | | |
143 | 149 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | | - | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
87 | 88 | | |
88 | 89 | | |
89 | 90 | | |
90 | | - | |
| 91 | + | |
91 | 92 | | |
92 | 93 | | |
93 | 94 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
336 | 336 | | |
337 | 337 | | |
338 | 338 | | |
339 | | - | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
340 | 352 | | |
341 | 353 | | |
342 | 354 | | |
| |||
510 | 522 | | |
511 | 523 | | |
512 | 524 | | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
513 | 531 | | |
514 | 532 | | |
515 | 533 | | |
| |||
0 commit comments