Commit 1cceb95
authored
[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst (#1094)
### What does this PR do?
#### Summary
Redesigns the `quant_cfg` configuration format in ModelOpt's PyTorch
quantization stack, replacing the previous dict-based format with an
**ordered list of typed `QuantizerCfgEntry` dicts**.
##### Motivation
The old `quant_cfg` dict had several pain points:
- **Ambiguous precedence**: no explicit way to reason about which entry
wins when multiple keys match a quantizer
- **Mixed key namespaces**: wildcard paths and PyTorch class names lived
in the same dict level, requiring ad-hoc dispatch
- **Magic `"default"` key**: an implicit, undocumented catch-all that
was easy to misuse
- **Poor composability**: merging two configs required dict updates that
silently discarded keys
- **No YAML round-trip fidelity**: the nested structure couldn't be
expressed cleanly in YAML
##### New format
`quant_cfg` is now an ordered list of `QuantizerCfgEntry` TypedDicts.
Each entry has:
- `quantizer_name` *(required)*: `fnmatch` wildcard matched against
quantizer module names
- `cfg` *(optional)*: dict (or list of dicts) of
`QuantizerAttributeConfig` fields
- `enable` *(optional)*: toggles quantizer on/off independently of `cfg`
- `parent_class` *(optional)*: restricts match to quantizers whose
parent module is of the given PyTorch class (e.g. `"nn.BatchNorm2d"`)
Entries are applied in list order; later entries override earlier ones.
The canonical pattern is deny-all first (`_base_disable_all`), then
selectively re-enable and configure, then apply standard exclusions
(`_default_disabled_quantizer_cfg`).
##### Changes
**Core library (`modelopt/torch/quantization/`)**
- **`config.py`**:
- Added `QuantizerCfgEntry` TypedDict (line 163) and
`find_quant_cfg_entry_by_path()` helper for exact-match lookup of
entries by path.
- Added `normalize_quant_cfg_list()` (line 1539) that converts legacy
formats (flat dict, single-key dicts, `nn.*`-scoped dicts, `"default"`
key) to canonical `QuantizerCfgEntry` lists. After normalization every
entry is guaranteed to have explicit `quantizer_name`, `enable`, and
`cfg` keys.
- Converted `_default_disabled_quantizer_cfg` and
`_mamba_moe_disabled_quantizer_cfg` from dicts to lists of
`QuantizerCfgEntry`.
- Added `_base_disable_all` (line 205): canonical deny-all entry
(`[{"quantizer_name": "*", "enable": False}]`).
- Converted all ~30 built-in config constants (`INT8_DEFAULT_CFG`,
`FP8_DEFAULT_CFG`, `NVFP4_DEFAULT_CFG`, etc.) to list format using
`*_base_disable_all` and `*_default_disabled_quantizer_cfg` unpacking.
- KV-cache configs (`FP8_KV_CFG`, `NVFP4_KV_CFG`, etc.) are now minimal
lists designed to be concatenated with a primary config — they
intentionally omit `_base_disable_all` and `"algorithm"`.
- Added two `QuantizeConfig` Pydantic field validators: a
`mode="before"` validator that calls `normalize_quant_cfg_list()`, and a
`mode="after"` validator that validates `cfg` dicts against
`QuantizerAttributeConfig`.
- Updated `need_calibration()` to iterate the normalized list instead of
the old dict.
- Changed `QuantizeQuantCfgType` alias from `dict[str | Callable, ...]`
to `list[QuantizerCfgEntry]`.
- **`conversion.py`**:
- Rewrote `set_quantizer_by_cfg()` (line 217) to iterate the list
directly. Each entry's `parent_class` is resolved via
`QuantModuleRegistry[parent_class_name]` (the existing `_DMRegistryCls`
registry).
- Added `set_quantizer_attributes_full()` (line 314): full replacement
of quantizer attributes from a `QuantizerAttributeConfig`. Unspecified
fields revert to defaults, enforcing entry atomicity. Can also upgrade
`TensorQuantizer` → `SequentialQuantizer` or downgrade the reverse.
- Added `set_quantizer_attributes_partial()` (line 384): merges a
partial `dict` of attributes into existing quantizer state. Does NOT
change quantizer structure. Used for enable-only entries.
- Added `set_quantizer_by_cfg_context()` context manager (line 447) that
temporarily applies a `quant_cfg` list and restores original quantizer
state on exit.
- Deprecated `set_quantizer_attribute()` (line 525) with a
`DeprecationWarning` pointing to the new functions.
- **`tensor_quantizer.py`**:
- `TensorQuantizer.set_from_attribute_config()`: narrowed type hint from
`dict` to `dict[str, Any]`.
- Added `_axis_setter` and `_block_sizes_setter` custom setters so that
`axis` and `block_sizes` changes properly propagate to the calibrator
and maintain mutual exclusivity.
- `SequentialQuantizer.set_from_attribute_config()`: narrowed signature
to `list[QuantizerAttributeConfig] | list[dict[str, Any]]` (removed the
old union with single values).
- **`algorithms.py`**:
- Updated `_match_quantizer_cfg()` to iterate the list and return
`(matched_cfg, matched_enable)` tuple with last-match-wins.
- Updated `_cfg_to_dict()`, `estimate_quant_compression()`, and
`QuantRecipe` to work with the list-based format.
- Updated `get_auto_quantize_config()` to emit list-format `quant_cfg`.
- **`model_quant.py`**: `disable_quantizer()` / `enable_quantizer()` now
call `set_quantizer_attributes_partial()` directly instead of the
deprecated `set_quantizer_attribute()`. Updated docstrings and code
examples to show the list format.
- **`utils/core_utils.py`**: `disable_lora_quantizers_in_config()` and
`update_quant_cfg_with_kv_cache_quant()` updated to append
`QuantizerCfgEntry` dicts to the list.
- **Other**: minor updates to `backends/fp8_per_tensor_gemm.py`,
`backends/nvfp4_gemm.py`, `compress.py`, `model_calib.py`,
`export/unified_export_hf.py`, and
`sparsity/attention_sparsity/conversion.py` to use the list format.
- **`onnx/llm_export_utils/quantization_utils.py`**: Updated
quantization config construction to use list format.
**YAML recipes (`modelopt_recipes/`)**
- Converted all 5 general PTQ recipes to the new list format:
- `general/ptq/fp8_default-fp8_kv.yml`
- `general/ptq/nvfp4_default-fp8_kv.yml`
- `general/ptq/nvfp4_experts_only-fp8_kv.yml`
- `general/ptq/nvfp4_mlp_only-fp8_kv.yml`
- `general/ptq/nvfp4_omlp_only-fp8_kv.yml`
- Converted model-specific recipe:
`models/Step3.5-Flash/nvfp4-mlp-only.yaml`
**Documentation (`docs/`)**
- New guide: `docs/source/guides/_quant_cfg.rst` — comprehensive
reference covering entry format, ordering semantics, entry atomicity,
`enable` vs `cfg` independence, `parent_class` filtering, and common
patterns (deny-all-then-enable, customizing a built-in config, building
from scratch).
- Updated `_pytorch_quantization.rst` code examples to show the list
format with `copy.deepcopy` and `.append()`.
- Added `_quant_cfg.rst` to the quantization guide table of contents.
**Examples**
- Updated all quantization examples to use the list format:
`deepseek/ptq.py`, `diffusers/quantization/config.py`,
`llm_ptq/hf_ptq.py`, `llm_qat/main.py`, `vllm_serve/vllm_ptq_utils.py`,
`llm_autodeploy/run_auto_quantize.py`, `llm_eval/quantization_utils.py`,
`llm_ptq/example_utils.py`,
`windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`,
and 2 notebooks.
**Tests**
- New test file:
`tests/unit/torch/quantization/test_config_validation.py` — unit tests
for `need_calibration()`, `normalize_quant_cfg_list()` (new format,
legacy format conversions, error cases),
`find_quant_cfg_entry_by_path()`, `_match_quantizer_cfg()`, and
`QuantizeConfig` Pydantic validators.
- Extended `tests/unit/torch/quantization/test_quantize_cpu.py` with
tests for `set_quantizer_attributes_full()` (atomicity, parent_class
filtering, SequentialQuantizer creation), list ordering, enable-only
entry behavior, and end-to-end legacy dict format.
- Updated 20+ existing test files across `tests/unit/`, `tests/gpu/`,
`tests/gpu_megatron/`, and `tests/_test_utils/` to use the list format.
##### Backward compatibility
`normalize_quant_cfg_list()` is called automatically by the
`QuantizeConfig` Pydantic `mode="before"` validator, so existing code
passing the old dict-based format (flat dict like `{"*weight_quantizer":
{"num_bits": 8}}`, single-key dict lists, or `nn.*`-scoped dicts with
`parent_class` semantics) continues to work without modification. The
legacy `"default"` key is converted to `quantizer_name: "*"`.
`set_quantizer_attribute()` is preserved as a deprecated wrapper around
`set_quantizer_attributes_partial()`.
#### Test coverage
- **Unit tests**: new `test_config_validation.py` with tests for
normalization, validation, path lookup, and cfg matching. Extended
`test_quantize_cpu.py` with tests for full/partial attribute setting,
ordering, atomicity, and legacy backward compatibility.
- **System testing**:
```
python examples/llm_ptq/hf_ptq.py \
--model Qwen/Qwen3-8B \
--recipe general/ptq/fp8_default-fp8_kv \
--export_path=build/fp8_default-fp8_kv42 \
--calib_size=16 \
--batch_size=0 \
--trust_remote_code \
--export_fmt=hf
```
### Additional Information
<!-- E.g. related issue. -->
---------
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>1 parent c542c09 commit 1cceb95
62 files changed
Lines changed: 3361 additions & 1320 deletions
File tree
- docs/source/guides
- examples
- deepseek
- diffusers/quantization
- llm_autodeploy
- llm_eval
- llm_ptq
- notebooks
- llm_qat
- vllm_serve
- windows/torch_onnx/diffusers/qad_example
- modelopt_recipes
- general/ptq
- models/Step3.5-Flash
- modelopt
- onnx/llm_export_utils
- recipe
- torch
- export
- quantization
- backends
- nn/modules
- utils
- sparsity/attention_sparsity
- tests
- _test_utils/torch
- export
- quantization
- gpu_megatron/torch
- peft/plugins
- quantization/plugins
- gpu/torch/quantization
- unit
- recipe
- torch/quantization
- plugins
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
17 | 21 | | |
18 | 22 | | |
19 | 23 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
240 | | - | |
241 | | - | |
| 240 | + | |
242 | 241 | | |
243 | | - | |
244 | | - | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
245 | 247 | | |
246 | 248 | | |
247 | | - | |
| 249 | + | |
248 | 250 | | |
249 | 251 | | |
250 | 252 | | |
| |||
253 | 255 | | |
254 | 256 | | |
255 | 257 | | |
| 258 | + | |
| 259 | + | |
256 | 260 | | |
257 | 261 | | |
258 | | - | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
259 | 266 | | |
260 | | - | |
| 267 | + | |
261 | 268 | | |
262 | 269 | | |
263 | | - | |
| 270 | + | |
264 | 271 | | |
265 | 272 | | |
266 | | - | |
267 | | - | |
| 273 | + | |
| 274 | + | |
268 | 275 | | |
269 | 276 | | |
270 | 277 | | |
| |||
394 | 401 | | |
395 | 402 | | |
396 | 403 | | |
397 | | - | |
398 | | - | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
399 | 408 | | |
400 | 409 | | |
401 | 410 | | |
| |||
0 commit comments