Add AutoQuantize recipe support to mtq.auto_quantize by juhi10071998 · Pull Request #1523 · NVIDIA/Model-Optimizer

juhi10071998 · 2026-05-21T00:38:11Z

Add AutoQuantize YAML based recipe support to `mtq.auto_quantize`

What does this PR do?

Type of change: New feature.

Extends the recipe system (PR #1423) to support mtq.auto_quantize. Users
can now run autoquant via a single --recipe <name> flag instead of
combining --auto_quantize_bits, --qformat, --auto_quantize_method,
etc. The recipe carries the full search spec — candidate formats, budget,
scoring method, KV cache scheme — as a typed YAML.

Mirrors the existing PTQ recipe pattern (PR #1423): recipe is authoritative
for the search; CLI flags supply runtime concerns (dataset, calib size,
batch size).

Usage

# Before:
python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B \
    --qformat nvfp4,fp8 --auto_quantize_bits 4.8 \
    --auto_quantize_method gradient --kv_cache_qformat fp8_cast \
    --calib_size 512 --export_path ./out

# After:
python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B \
    --recipe general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast \
    --calib_size 512 --export_path ./out

Example recipe (modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml):

imports:
  nvfp4: configs/ptq/presets/model/nvfp4   # canonical preset (no duplication)
  fp8: configs/ptq/presets/model/fp8

metadata:
  recipe_type: auto_quantize

auto_quantize:
  constraints:
    effective_bits: 4.8
  candidate_formats:
    - $import: nvfp4
    - $import: fp8
  kv_cache:
    qformat: fp8_cast
  method: gradient
  num_score_steps: 128
  disabled_layers:
    - "*lm_head*"

Key design points

#	Decision	Choice
1	`constraints` shape	Mirror upstream `mtq.auto_quantize` nested dict exactly — zero-transformation dispatch via `.model_dump(exclude_none=True)`. Future-compat with PR #1497 (cost models).
2	KV cache placement	Top-level optional `kv_cache.qformat` field, not per-candidate `$import` (avoids duplication when KV is shared across candidates).
3	CLI override policy	Recipe is strict-authoritative for LP search fields (`effective_bits`, candidates, etc.). CLI may fall back only for orthogonal post-step fields — today only `kv_cache.qformat`. `--auto_quantize_bits + --recipe` errors out explicitly.
4	`auto_quantize()` helper layout	Helper is a leaf orchestrator — does not know whether inputs came from CLI or recipe. All resolution happens at the dispatch site in `quantize_main`.

Testing

Unit tests (tests/unit/recipe/test_loader.py, 7 tests):

Built-in recipe loads, type dispatch is correct
Pydantic defaults applied (method=gradient, num_score_steps=128, score_checkpoint=None)
$imported candidates byte-identical to mtq.NVFP4_DEFAULT_CFG / FP8_DEFAULT_CFG (single source of truth)
Schema validation rejects: missing auto_quantize section, <2 candidates, effective_bits outside (0, 16]
kv_cache field is optional

Equivalence smoke on Qwen/Qwen3-8B at --calib_size 512:

                            CLI (--auto_quantize_bits 6.0)   Recipe (effective_bits: 6.0)
quant_algo                  MIXED_PRECISION                  MIXED_PRECISION
kv_cache_quant_algo         FP8                              FP8
Total quantized layers      252                              252
NVFP4 layers                157                              157
FP8 layers                  95                               95
hf_quant_config.json hash   4b0564bf1f613132                 4b0564bf1f613132

hf_quant_config.json is byte-identical between the two paths.

Backward compatibility

✅ Yes. All four existing flows preserved:

Flow	Path	Status
CLI PTQ (`--qformat nvfp4`)	unchanged	✓
CLI autoquant (`--auto_quantize_bits 4.8`)	dispatch site resolves args inline; helper is pure orchestrator now (no behavior change)	✓
PTQ recipe (`--recipe general/ptq/...`)	recipe-load gate widened to accept PTQ + AutoQuantize	✓
AutoQuantize recipe (NEW)	new dispatch branch	✓

One new explicit error: --auto_quantize_bits + --recipe (previously would silently honor recipe). Fails fast with a clear message.

Files changed

modelopt/recipe/config.py — Pydantic schema (AutoQuantizeConfig, etc.) + RecipeType.AUTO_QUANTIZE enum + dispatch entry
examples/llm_ptq/hf_ptq.py — dispatch site resolves recipe/CLI knobs and passes them to auto_quantize() as kw-only kwargs; helper signature is pure value-driven
modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml — example recipe
tests/unit/recipe/test_loader.py — 7 unit tests

Checklist

Backward compatible
Signed commits (git commit -s -S)
Pre-commit clean
New unit tests added
CHANGELOG (n/a — new feature, but maintainer to decide)
Claude review (/claude review)

Summary by CodeRabbit

New Features
- Auto-quantization now supports recipe-based configuration, allowing users to define quantization strategies declaratively.
- New example recipe for mixed-precision quantization targeting 4.8 effective bits.
Tests
- Added comprehensive tests for recipe loading and validation.

Signed-off-by: Juhi Mittal <juhim@nvidia.com>

coderabbitai · 2026-05-21T00:38:25Z

📝 Walkthrough

Walkthrough

This PR introduces AutoQuantize recipe support, enabling users to define auto-quantization configurations declaratively in recipe files. It adds new Pydantic schema models for recipe validation, refactors the auto_quantize function to accept explicit parameters, integrates recipe loading with fail-fast validation, and provides test coverage and example recipes.

Changes

AutoQuantize Recipe Feature

Layer / File(s)	Summary
AutoQuantize Recipe Schema and Types `modelopt/recipe/config.py`	Introduces `RecipeType.AUTO_QUANTIZE` enum value and defines `AutoQuantizeKVCache`, `AutoQuantizeConstraints`, `AutoQuantizeConfig`, and `ModelOptAutoQuantizeRecipe` Pydantic models with validators enforcing `effective_bits` range (between 1 and 8) and requiring at least two `candidate_formats`. Updates `RECIPE_TYPE_TO_CLASS` mapping.
AutoQuantize Example Recipe `modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml`	Adds a concrete NVFP4+FP8 mixed-precision recipe configured for 4.8 effective bits, gradient-based scoring with 128 steps, FP8 cast KV cache mode, and disabled `lm_head` layer pattern.
Auto-Quantize Function Refactoring `examples/llm_ptq/hf_ptq.py`	Refactors `auto_quantize()` signature to use keyword-only parameters (`method`, `score_size`, `checkpoint`, `constraints`, `quantization_formats`, `disabled_layers`, `kv_cache_qformat`). Updates internal `mtq.auto_quantize()` call to pass resolved parameters and conditionally enable KV cache quantization based on resolved `qformat`.
Recipe-Aware Calibration Setup `examples/llm_ptq/hf_ptq.py`	Expands recipe imports, extends `make_calib_dataloader()` with optional `recipe` parameter, and adjusts `include_labels` logic to detect gradient-based auto-quantize via recipe type or CLI configuration.
Recipe-Driven Auto-Quantize Orchestration `examples/llm_ptq/hf_ptq.py`	Adds recipe loading in `quantize_main()` with validation forbidding `--auto_quantize_bits` + `--recipe` combination. Treats recipe-driven auto-quantize equivalent to CLI `--auto_quantize_bits` for batch-size probing. Implements unified parameter-resolution block that maps recipe candidate-format names back to canonical presets and calls refactored `auto_quantize()` with either recipe-resolved or CLI-derived values. Updates `--recipe` help text clarifying KV cache configuration ownership.
AutoQuantize Recipe Tests `tests/unit/recipe/test_loader.py`	Adds `ModelOptAutoQuantizeRecipe` import and comprehensive test coverage for loading built-in and custom recipes, validating parsed fields (effective_bits, candidate formats, kv_cache), asserting defaults, matching against `modelopt.torch.quantization` presets, and testing error cases (missing section, insufficient candidates, out-of-range bits, optional kv_cache).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

sychen52
cjluo-nv
shengliangxu
yeyu-nvidia

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add AutoQuantize recipe support to mtq.auto_quantize' accurately and clearly summarizes the main change: adding recipe support for auto-quantization, allowing users to configure autoquant via YAML recipes instead of only CLI flags.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns found: no unsafe torch.load/numpy.load/yaml.load, no hardcoded trust_remote_code=True, no eval/exec, no # nosec comments, no new non-permissive dependencies.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch juhim/autoquant-recipe

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

copy-pr-bot · 2026-05-21T00:38:49Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-05-21T00:41:57Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1523/
Built to branch `gh-pages` at 2026-05-21 00:41 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Line 1087: The conditional that chooses the auto-quantize branch incorrectly
treats falsy numeric values as "unset" — change the check so it explicitly tests
presence of the CLI value: in the expression that currently reads "if
isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits",
replace the truthy check with an explicit presence check for
args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not None") so
that values like 0 or 0.0 are honored; keep the ModelOptAutoQuantizeRecipe
isinstance check as-is.

In `@modelopt/recipe/config.py`:
- Around line 112-117: The qformat field currently accepts any string but should
be validated against the allowed keys; update the ModeloptField declaration for
qformat (and/or add a pydantic validator on the recipe class handling kv_cache)
to reject values not in KV_QUANT_CFG_CHOICES or the literal 'none' (allowing
None), raising a clear schema/validation error at recipe-load time instead of
allowing a later KeyError; ensure you reference the qformat field,
ModeloptField, and KV_QUANT_CFG_CHOICES when implementing the check so invalid
inputs are caught early.

In `@tests/unit/recipe/test_loader.py`:
- Around line 286-293: The test contains a function-local import "import
modelopt.torch.quantization as mtq" inside
test_load_recipe_autoquantize_candidates_match_presets; move that import to
module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during
collection rather than inside the test, then remove the local import from the
function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and
mtq.FP8_DEFAULT_CFG unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48396567-3825-425a-b877-f63b60bb6545

📥 Commits

Reviewing files that changed from the base of the PR and between c9098b6 and 75e16d2.

📒 Files selected for processing (4)

examples/llm_ptq/hf_ptq.py
modelopt/recipe/config.py
modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml
tests/unit/recipe/test_loader.py

coderabbitai · 2026-05-21T00:43:08Z

+    # All auto_quantize() knobs are resolved here before calling the helper.
+    # Helper is a leaf orchestrator — it does not know whether inputs came from
+    # CLI args or a recipe.
+    if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits:


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use explicit is not None for --auto_quantize_bits gating.

Line 1087 uses a truthy check, so --auto_quantize_bits 0.0 skips auto-quantize and silently takes the mono-quantization path.

Proposed fix

- if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits: + if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits is not None:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/hf_ptq.py` at line 1087, The conditional that chooses the auto-quantize branch incorrectly treats falsy numeric values as "unset" — change the check so it explicitly tests presence of the CLI value: in the expression that currently reads "if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits", replace the truthy check with an explicit presence check for args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not None") so that values like 0 or 0.0 are honored; keep the ModelOptAutoQuantizeRecipe isinstance check as-is.

coderabbitai · 2026-05-21T00:43:08Z

+    qformat: str | None = ModeloptField(
+        default=None,
+        title="KV cache quantization format",
+        description="One of the entries in KV_QUANT_CFG_CHOICES, or 'none' to disable. "
+        "If omitted, the runtime --kv_cache_qformat CLI flag is used.",
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate kv_cache.qformat at recipe-load time.

Line 112 accepts any string, but downstream lookup expects a fixed set of keys; invalid values will fail later as a KeyError instead of a schema error.

Proposed fix

class AutoQuantizeKVCache(ModeloptBaseConfig): """KV-cache configuration for an AutoQuantize recipe (optional).""" + _SUPPORTED_QFORMATS = { + "none", + "fp8_cast", + "fp8", + "fp8_affine", + "nvfp4_cast", + "nvfp4", + "nvfp4_affine", + "nvfp4_rotate", + } + qformat: str | None = ModeloptField( default=None, title="KV cache quantization format", description="One of the entries in KV_QUANT_CFG_CHOICES, or 'none' to disable. " "If omitted, the runtime --kv_cache_qformat CLI flag is used.", ) + + `@field_validator`("qformat") + `@classmethod` + def _validate_qformat(cls, v: str | None) -> str | None: + if v is not None and v not in cls._SUPPORTED_QFORMATS: + raise ValueError(f"Unsupported kv_cache.qformat: {v}") + return v

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/recipe/config.py` around lines 112 - 117, The qformat field currently accepts any string but should be validated against the allowed keys; update the ModeloptField declaration for qformat (and/or add a pydantic validator on the recipe class handling kv_cache) to reject values not in KV_QUANT_CFG_CHOICES or the literal 'none' (allowing None), raising a clear schema/validation error at recipe-load time instead of allowing a later KeyError; ensure you reference the qformat field, ModeloptField, and KV_QUANT_CFG_CHOICES when implementing the check so invalid inputs are caught early.

coderabbitai · 2026-05-21T00:43:08Z

+def test_load_recipe_autoquantize_candidates_match_presets():
+    """Built-in AutoQuantize recipe's $imported candidates equal mtq.X_DEFAULT_CFG dicts."""
+    import modelopt.torch.quantization as mtq
+
+    recipe = load_recipe("general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast")
+    candidates = recipe.auto_quantize.candidate_formats
+    assert candidates[0].model_dump(exclude_unset=True) == mtq.NVFP4_DEFAULT_CFG
+    assert candidates[1].model_dump(exclude_unset=True) == mtq.FP8_DEFAULT_CFG


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move the new in-test import to module scope.

Line 288 introduces a function-local import without a justification comment. In this test suite, imports should be at file top so failures surface during collection, not mid-test.

Proposed fix

import pytest +import modelopt.torch.quantization as mtq from modelopt.recipe.config import ( ModelOptAutoQuantizeRecipe, @@ def test_load_recipe_autoquantize_candidates_match_presets(): """Built-in AutoQuantize recipe's $imported candidates equal mtq.X_DEFAULT_CFG dicts.""" - import modelopt.torch.quantization as mtq - recipe = load_recipe("general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast")

As per coding guidelines: “Imports inside functions or test methods without explicit justification… Imports belong at the top of the file…”.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/recipe/test_loader.py` around lines 286 - 293, The test contains a function-local import "import modelopt.torch.quantization as mtq" inside test_load_recipe_autoquantize_candidates_match_presets; move that import to module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during collection rather than inside the test, then remove the local import from the function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and mtq.FP8_DEFAULT_CFG unchanged.

codecov · 2026-05-21T00:51:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.69%. Comparing base (c9098b6) to head (75e16d2).

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1523       +/-   ##
===========================================
+ Coverage   66.36%   76.69%   +10.33%     
===========================================
  Files         476      476               
  Lines       51811    51838       +27     
===========================================
+ Hits        34384    39759     +5375     
+ Misses      17427    12079     -5348

Flag	Coverage Δ
examples	`41.65% <85.71%> (+0.92%)`	⬆️
gpu	`59.52% <85.71%> (+32.36%)`	⬆️
unit	`52.65% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wip: autoquant recipe schema + hf_ptq dispatch

75e16d2

Signed-off-by: Juhi Mittal <juhim@nvidia.com>

juhi10071998 requested review from a team as code owners May 21, 2026 00:38

juhi10071998 requested a review from realAsma May 21, 2026 00:38

juhi10071998 marked this pull request as draft May 21, 2026 00:38

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AutoQuantize recipe support to mtq.auto_quantize#1523

Add AutoQuantize recipe support to mtq.auto_quantize#1523
juhi10071998 wants to merge 1 commit into
mainfrom
juhim/autoquant-recipe

juhi10071998 commented May 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Built to branch `gh-pages` at 2026-05-21 00:41 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

juhi10071998 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add AutoQuantize YAML based recipe support to mtq.auto_quantize

What does this PR do?

Usage

Key design points

Testing

Backward compatibility

Files changed

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Built to branch gh-pages at 2026-05-21 00:41 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

juhi10071998 commented May 21, 2026 •

edited

Loading

Add AutoQuantize YAML based recipe support to `mtq.auto_quantize`

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-21 00:41 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 21, 2026 •

edited

Loading