Skip to content

Add AutoQuantize recipe support to mtq.auto_quantize#1523

Draft
juhi10071998 wants to merge 1 commit into
mainfrom
juhim/autoquant-recipe
Draft

Add AutoQuantize recipe support to mtq.auto_quantize#1523
juhi10071998 wants to merge 1 commit into
mainfrom
juhim/autoquant-recipe

Conversation

@juhi10071998
Copy link
Copy Markdown
Contributor

@juhi10071998 juhi10071998 commented May 21, 2026

Add AutoQuantize YAML based recipe support to mtq.auto_quantize

What does this PR do?

Type of change: New feature.

Extends the recipe system (PR #1423) to support mtq.auto_quantize. Users
can now run autoquant via a single --recipe <name> flag instead of
combining --auto_quantize_bits, --qformat, --auto_quantize_method,
etc. The recipe carries the full search spec — candidate formats, budget,
scoring method, KV cache scheme — as a typed YAML.

Mirrors the existing PTQ recipe pattern (PR #1423): recipe is authoritative
for the search; CLI flags supply runtime concerns (dataset, calib size,
batch size).

Usage

# Before:
python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B \
    --qformat nvfp4,fp8 --auto_quantize_bits 4.8 \
    --auto_quantize_method gradient --kv_cache_qformat fp8_cast \
    --calib_size 512 --export_path ./out

# After:
python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B \
    --recipe general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast \
    --calib_size 512 --export_path ./out

Example recipe (modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml):

imports:
  nvfp4: configs/ptq/presets/model/nvfp4   # canonical preset (no duplication)
  fp8: configs/ptq/presets/model/fp8

metadata:
  recipe_type: auto_quantize

auto_quantize:
  constraints:
    effective_bits: 4.8
  candidate_formats:
    - $import: nvfp4
    - $import: fp8
  kv_cache:
    qformat: fp8_cast
  method: gradient
  num_score_steps: 128
  disabled_layers:
    - "*lm_head*"

Key design points

# Decision Choice
1 constraints shape Mirror upstream mtq.auto_quantize nested dict exactly — zero-transformation dispatch via .model_dump(exclude_none=True). Future-compat with PR #1497 (cost models).
2 KV cache placement Top-level optional kv_cache.qformat field, not per-candidate $import (avoids duplication when KV is shared across candidates).
3 CLI override policy Recipe is strict-authoritative for LP search fields (effective_bits, candidates, etc.). CLI may fall back only for orthogonal post-step fields — today only kv_cache.qformat. --auto_quantize_bits + --recipe errors out explicitly.
4 auto_quantize() helper layout Helper is a leaf orchestrator — does not know whether inputs came from CLI or recipe. All resolution happens at the dispatch site in quantize_main.

Testing

Unit tests (tests/unit/recipe/test_loader.py, 7 tests):

  • Built-in recipe loads, type dispatch is correct
  • Pydantic defaults applied (method=gradient, num_score_steps=128, score_checkpoint=None)
  • $imported candidates byte-identical to mtq.NVFP4_DEFAULT_CFG / FP8_DEFAULT_CFG (single source of truth)
  • Schema validation rejects: missing auto_quantize section, <2 candidates, effective_bits outside (0, 16]
  • kv_cache field is optional

Equivalence smoke on Qwen/Qwen3-8B at --calib_size 512:

                            CLI (--auto_quantize_bits 6.0)   Recipe (effective_bits: 6.0)
quant_algo                  MIXED_PRECISION                  MIXED_PRECISION
kv_cache_quant_algo         FP8                              FP8
Total quantized layers      252                              252
NVFP4 layers                157                              157
FP8 layers                  95                               95
hf_quant_config.json hash   4b0564bf1f613132                 4b0564bf1f613132

hf_quant_config.json is byte-identical between the two paths.

Backward compatibility

✅ Yes. All four existing flows preserved:

Flow Path Status
CLI PTQ (--qformat nvfp4) unchanged
CLI autoquant (--auto_quantize_bits 4.8) dispatch site resolves args inline; helper is pure orchestrator now (no behavior change)
PTQ recipe (--recipe general/ptq/...) recipe-load gate widened to accept PTQ + AutoQuantize
AutoQuantize recipe (NEW) new dispatch branch

One new explicit error: --auto_quantize_bits + --recipe (previously would silently honor recipe). Fails fast with a clear message.

Files changed

  • modelopt/recipe/config.py — Pydantic schema (AutoQuantizeConfig, etc.) + RecipeType.AUTO_QUANTIZE enum + dispatch entry
  • examples/llm_ptq/hf_ptq.py — dispatch site resolves recipe/CLI knobs and passes them to auto_quantize() as kw-only kwargs; helper signature is pure value-driven
  • modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml — example recipe
  • tests/unit/recipe/test_loader.py — 7 unit tests

Checklist

  • Backward compatible
  • Signed commits (git commit -s -S)
  • Pre-commit clean
  • New unit tests added
  • CHANGELOG (n/a — new feature, but maintainer to decide)
  • Claude review (/claude review)

Summary by CodeRabbit

  • New Features

    • Auto-quantization now supports recipe-based configuration, allowing users to define quantization strategies declaratively.
    • New example recipe for mixed-precision quantization targeting 4.8 effective bits.
  • Tests

    • Added comprehensive tests for recipe loading and validation.

Review Change Stack

Signed-off-by: Juhi Mittal <juhim@nvidia.com>
@juhi10071998 juhi10071998 requested review from a team as code owners May 21, 2026 00:38
@juhi10071998 juhi10071998 requested a review from realAsma May 21, 2026 00:38
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

This PR introduces AutoQuantize recipe support, enabling users to define auto-quantization configurations declaratively in recipe files. It adds new Pydantic schema models for recipe validation, refactors the auto_quantize function to accept explicit parameters, integrates recipe loading with fail-fast validation, and provides test coverage and example recipes.

Changes

AutoQuantize Recipe Feature

Layer / File(s) Summary
AutoQuantize Recipe Schema and Types
modelopt/recipe/config.py
Introduces RecipeType.AUTO_QUANTIZE enum value and defines AutoQuantizeKVCache, AutoQuantizeConstraints, AutoQuantizeConfig, and ModelOptAutoQuantizeRecipe Pydantic models with validators enforcing effective_bits range (between 1 and 8) and requiring at least two candidate_formats. Updates RECIPE_TYPE_TO_CLASS mapping.
AutoQuantize Example Recipe
modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml
Adds a concrete NVFP4+FP8 mixed-precision recipe configured for 4.8 effective bits, gradient-based scoring with 128 steps, FP8 cast KV cache mode, and disabled *lm_head* layer pattern.
Auto-Quantize Function Refactoring
examples/llm_ptq/hf_ptq.py
Refactors auto_quantize() signature to use keyword-only parameters (method, score_size, checkpoint, constraints, quantization_formats, disabled_layers, kv_cache_qformat). Updates internal mtq.auto_quantize() call to pass resolved parameters and conditionally enable KV cache quantization based on resolved qformat.
Recipe-Aware Calibration Setup
examples/llm_ptq/hf_ptq.py
Expands recipe imports, extends make_calib_dataloader() with optional recipe parameter, and adjusts include_labels logic to detect gradient-based auto-quantize via recipe type or CLI configuration.
Recipe-Driven Auto-Quantize Orchestration
examples/llm_ptq/hf_ptq.py
Adds recipe loading in quantize_main() with validation forbidding --auto_quantize_bits + --recipe combination. Treats recipe-driven auto-quantize equivalent to CLI --auto_quantize_bits for batch-size probing. Implements unified parameter-resolution block that maps recipe candidate-format names back to canonical presets and calls refactored auto_quantize() with either recipe-resolved or CLI-derived values. Updates --recipe help text clarifying KV cache configuration ownership.
AutoQuantize Recipe Tests
tests/unit/recipe/test_loader.py
Adds ModelOptAutoQuantizeRecipe import and comprehensive test coverage for loading built-in and custom recipes, validating parsed fields (effective_bits, candidate formats, kv_cache), asserting defaults, matching against modelopt.torch.quantization presets, and testing error cases (missing section, insufficient candidates, out-of-range bits, optional kv_cache).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

  • sychen52
  • cjluo-nv
  • shengliangxu
  • yeyu-nvidia
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add AutoQuantize recipe support to mtq.auto_quantize' accurately and clearly summarizes the main change: adding recipe support for auto-quantization, allowing users to configure autoquant via YAML recipes instead of only CLI flags.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns found: no unsafe torch.load/numpy.load/yaml.load, no hardcoded trust_remote_code=True, no eval/exec, no # nosec comments, no new non-permissive dependencies.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch juhim/autoquant-recipe

Comment @coderabbitai help to get the list of available commands and usage tips.

@juhi10071998 juhi10071998 marked this pull request as draft May 21, 2026 00:38
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1523/

Built to branch gh-pages at 2026-05-21 00:41 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Line 1087: The conditional that chooses the auto-quantize branch incorrectly
treats falsy numeric values as "unset" — change the check so it explicitly tests
presence of the CLI value: in the expression that currently reads "if
isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits",
replace the truthy check with an explicit presence check for
args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not None") so
that values like 0 or 0.0 are honored; keep the ModelOptAutoQuantizeRecipe
isinstance check as-is.

In `@modelopt/recipe/config.py`:
- Around line 112-117: The qformat field currently accepts any string but should
be validated against the allowed keys; update the ModeloptField declaration for
qformat (and/or add a pydantic validator on the recipe class handling kv_cache)
to reject values not in KV_QUANT_CFG_CHOICES or the literal 'none' (allowing
None), raising a clear schema/validation error at recipe-load time instead of
allowing a later KeyError; ensure you reference the qformat field,
ModeloptField, and KV_QUANT_CFG_CHOICES when implementing the check so invalid
inputs are caught early.

In `@tests/unit/recipe/test_loader.py`:
- Around line 286-293: The test contains a function-local import "import
modelopt.torch.quantization as mtq" inside
test_load_recipe_autoquantize_candidates_match_presets; move that import to
module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during
collection rather than inside the test, then remove the local import from the
function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and
mtq.FP8_DEFAULT_CFG unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48396567-3825-425a-b877-f63b60bb6545

📥 Commits

Reviewing files that changed from the base of the PR and between c9098b6 and 75e16d2.

📒 Files selected for processing (4)
  • examples/llm_ptq/hf_ptq.py
  • modelopt/recipe/config.py
  • modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml
  • tests/unit/recipe/test_loader.py

# All auto_quantize() knobs are resolved here before calling the helper.
# Helper is a leaf orchestrator — it does not know whether inputs came from
# CLI args or a recipe.
if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use explicit is not None for --auto_quantize_bits gating.

Line 1087 uses a truthy check, so --auto_quantize_bits 0.0 skips auto-quantize and silently takes the mono-quantization path.

Proposed fix
-    if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits:
+    if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits is not None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` at line 1087, The conditional that chooses the
auto-quantize branch incorrectly treats falsy numeric values as "unset" — change
the check so it explicitly tests presence of the CLI value: in the expression
that currently reads "if isinstance(recipe, ModelOptAutoQuantizeRecipe) or
args.auto_quantize_bits", replace the truthy check with an explicit presence
check for args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not
None") so that values like 0 or 0.0 are honored; keep the
ModelOptAutoQuantizeRecipe isinstance check as-is.

Comment thread modelopt/recipe/config.py
Comment on lines +112 to +117
qformat: str | None = ModeloptField(
default=None,
title="KV cache quantization format",
description="One of the entries in KV_QUANT_CFG_CHOICES, or 'none' to disable. "
"If omitted, the runtime --kv_cache_qformat CLI flag is used.",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate kv_cache.qformat at recipe-load time.

Line 112 accepts any string, but downstream lookup expects a fixed set of keys; invalid values will fail later as a KeyError instead of a schema error.

Proposed fix
 class AutoQuantizeKVCache(ModeloptBaseConfig):
     """KV-cache configuration for an AutoQuantize recipe (optional)."""
 
+    _SUPPORTED_QFORMATS = {
+        "none",
+        "fp8_cast",
+        "fp8",
+        "fp8_affine",
+        "nvfp4_cast",
+        "nvfp4",
+        "nvfp4_affine",
+        "nvfp4_rotate",
+    }
+
     qformat: str | None = ModeloptField(
         default=None,
         title="KV cache quantization format",
         description="One of the entries in KV_QUANT_CFG_CHOICES, or 'none' to disable. "
         "If omitted, the runtime --kv_cache_qformat CLI flag is used.",
     )
+
+    `@field_validator`("qformat")
+    `@classmethod`
+    def _validate_qformat(cls, v: str | None) -> str | None:
+        if v is not None and v not in cls._SUPPORTED_QFORMATS:
+            raise ValueError(f"Unsupported kv_cache.qformat: {v}")
+        return v
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/recipe/config.py` around lines 112 - 117, The qformat field
currently accepts any string but should be validated against the allowed keys;
update the ModeloptField declaration for qformat (and/or add a pydantic
validator on the recipe class handling kv_cache) to reject values not in
KV_QUANT_CFG_CHOICES or the literal 'none' (allowing None), raising a clear
schema/validation error at recipe-load time instead of allowing a later
KeyError; ensure you reference the qformat field, ModeloptField, and
KV_QUANT_CFG_CHOICES when implementing the check so invalid inputs are caught
early.

Comment on lines +286 to +293
def test_load_recipe_autoquantize_candidates_match_presets():
"""Built-in AutoQuantize recipe's $imported candidates equal mtq.X_DEFAULT_CFG dicts."""
import modelopt.torch.quantization as mtq

recipe = load_recipe("general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast")
candidates = recipe.auto_quantize.candidate_formats
assert candidates[0].model_dump(exclude_unset=True) == mtq.NVFP4_DEFAULT_CFG
assert candidates[1].model_dump(exclude_unset=True) == mtq.FP8_DEFAULT_CFG
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move the new in-test import to module scope.

Line 288 introduces a function-local import without a justification comment. In this test suite, imports should be at file top so failures surface during collection, not mid-test.

Proposed fix
 import pytest
+import modelopt.torch.quantization as mtq
 
 from modelopt.recipe.config import (
     ModelOptAutoQuantizeRecipe,
@@
 def test_load_recipe_autoquantize_candidates_match_presets():
     """Built-in AutoQuantize recipe's $imported candidates equal mtq.X_DEFAULT_CFG dicts."""
-    import modelopt.torch.quantization as mtq
-
     recipe = load_recipe("general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast")

As per coding guidelines: “Imports inside functions or test methods without explicit justification… Imports belong at the top of the file…”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/recipe/test_loader.py` around lines 286 - 293, The test contains a
function-local import "import modelopt.torch.quantization as mtq" inside
test_load_recipe_autoquantize_candidates_match_presets; move that import to
module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during
collection rather than inside the test, then remove the local import from the
function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and
mtq.FP8_DEFAULT_CFG unchanged.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.69%. Comparing base (c9098b6) to head (75e16d2).

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1523       +/-   ##
===========================================
+ Coverage   66.36%   76.69%   +10.33%     
===========================================
  Files         476      476               
  Lines       51811    51838       +27     
===========================================
+ Hits        34384    39759     +5375     
+ Misses      17427    12079     -5348     
Flag Coverage Δ
examples 41.65% <85.71%> (+0.92%) ⬆️
gpu 59.52% <85.71%> (+32.36%) ⬆️
unit 52.65% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant