Add ResNet50 support for torch_onnx quantization workflow#1263
Add ResNet50 support for torch_onnx quantization workflow#1263
Conversation
Add end-to-end support for ResNet50 (Conv2d-heavy model) in the torch_onnx quantization → ONNX export → TRT engine pipeline. Key fixes for Conv2d-heavy models: - Disable FP8 Conv2d weight quantizers during ONNX export to avoid TorchScript exporter's "kernel of unknown shape" error (FP8 DequantizeLinear produces dynamic-shape outputs incompatible with Conv2d's static kernel requirement) - Disable autocast for FP8/INT8 quantized models during export (prevents dynamic-shape kernels from autocast-induced FP16 casting) - Fix configure_linear_module_onnx_quantizers to handle all modules with block quantization (not just nn.Linear), fixing NVFP4/MXFP8 export for models with quantized non-Linear modules like MaxPool2d - Add calibration step for FP8 override quantizers that aren't calibrated by mtq.quantize() in MXFP8/NVFP4 modes - Override Conv2d block quantizers to FP8 in auto mode for TRT compat - Add maxpool and global_pool to filter_func (TRT DynamicQuantize requires 2D/3D input, but pooling layers operate on 4D tensors) - Always load calibration data (MXFP8 Conv2d FP8 overrides need it) Signed-off-by: ajrasane <arasane@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
📝 WalkthroughWalkthroughAdds resnet50 to tests/docs; enhances the quantization/export pipeline to calibrate previously uncalibrated quantizers, override Conv2d block quantizers to FP8-like per-tensor settings for calibration, and temporarily disable FP8 Conv weight quantizers during ONNX export; adds a TensorRT build flag/step. Changes
Sequence Diagram(s)sequenceDiagram
participant Script as Quantization Script
participant Model as Model
participant DL as DataLoader
participant Quant as Quantizers
participant Calib as Calibrator
participant Export as ONNX Export
Script->>Model: load model
Script->>Model: quantize_model()/auto_quantize_model(data_loader)
Model->>Quant: enumerate enabled quantizers
Quant-->>Model: identify uncalibrated & Conv2d block quantizers
Model->>Calib: _calibrate_uncalibrated_quantizers(data_loader)
activate Calib
Calib->>DL: iterate samples for calibration
Calib->>Calib: enable/disable calib, load calib_amax
deactivate Calib
Model->>Model: _override_conv2d_to_fp8(data_loader) if Conv2d block quantizers found
Model->>Calib: calibrate overridden Conv2d quantizers (forward loop)
Script->>Export: get_onnx_bytes_and_metadata(model)
Export->>Export: enter inference/autocast/quantizer contexts
Export->>Export: enter _disable_fp8_conv_weight_quantizers() if FP8 present
Export->>Model: export to ONNX
Export-->>Script: return ONNX bytes + metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
examples/torch_onnx/torch_quant_to_onnx.py (2)
164-164: Consider using the publicamaxproperty instead of checking internal attribute.The code uses
hasattr(quantizer, "_amax")to detect uncalibrated quantizers. Based on theTensorQuantizerimplementation, the publicamaxproperty returnsNonewhen_amaxis not set. Usingquantizer.amax is Nonewould be more aligned with the public API.♻️ Suggested fix
- if quantizer.is_enabled and not quantizer.block_sizes and not hasattr(quantizer, "_amax"): + if quantizer.is_enabled and not quantizer.block_sizes and quantizer.amax is None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/torch_onnx/torch_quant_to_onnx.py` at line 164, Replace the internal attribute check on quantizer with the public property: instead of checking hasattr(quantizer, "_amax") in the conditional that currently reads if quantizer.is_enabled and not quantizer.block_sizes and not hasattr(quantizer, "_amax"): use quantizer.amax is None to detect uncalibrated TensorQuantizer instances; update the condition to if quantizer.is_enabled and not quantizer.block_sizes and quantizer.amax is None so it uses the public TensorQuantizer API.
239-241: Prefer public property setters over internal attribute assignment.The code directly assigns to
_num_bitsand_axis, which are internal attributes. Based on the relevant code snippet fromtensor_quantizer.py, these have public property setters (num_bitsandaxis) that also update the calibrator state. Using the internal attributes may leave the calibrator out of sync.♻️ Suggested fix
# Override to FP8 per-tensor quantizer.block_sizes = None - quantizer._num_bits = (4, 3) - quantizer._axis = None + quantizer.num_bits = (4, 3) + quantizer.axis = None quantizer.enable_calib()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 239 - 241, The code assigns internal attributes quantizer._num_bits and quantizer._axis directly which can leave the calibrator out of sync; replace those direct assignments with the public property setters quantizer.num_bits = (4, 3) and quantizer.axis = None (leave quantizer.block_sizes = None as-is) so the setter logic in tensor_quantizer.py runs and updates the calibrator state accordingly.modelopt/torch/quantization/export_onnx.py (1)
657-674: Consider restoring quantizer state after export (optional).The context manager sets
_onnx_quantizer_typebut doesn't restore the original state in afinallyblock. While this is likely fine since models aren't typically reused after ONNX export, adding state restoration would make the context manager more robust for edge cases.♻️ Optional: Add state restoration
`@contextlib.contextmanager` def configure_linear_module_onnx_quantizers(model): """Sets the onnx export attributes for the given model. ... """ + original_states = [] for _, module in model.named_modules(): if hasattr(module, "input_quantizer") and module.input_quantizer.block_sizes: + original_states.append((module.input_quantizer, "_onnx_quantizer_type", getattr(module.input_quantizer, "_onnx_quantizer_type", None))) module.input_quantizer._onnx_quantizer_type = "dynamic" if hasattr(module, "weight_quantizer") and module.weight_quantizer.block_sizes: + original_states.append((module.weight_quantizer, "_onnx_quantizer_type", getattr(module.weight_quantizer, "_onnx_quantizer_type", None))) module.weight_quantizer._onnx_quantizer_type = "static" - yield + try: + yield + finally: + for obj, attr, orig_val in original_states: + if orig_val is None: + if hasattr(obj, attr): + delattr(obj, attr) + else: + setattr(obj, attr, orig_val)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/export_onnx.py` around lines 657 - 674, The context manager configure_linear_module_onnx_quantizers mutates module.input_quantizer._onnx_quantizer_type and module.weight_quantizer._onnx_quantizer_type but never restores original values; update configure_linear_module_onnx_quantizers to record the original _onnx_quantizer_type for each module found via model.named_modules() (checking hasattr(module, "input_quantizer") / "weight_quantizer" and block_sizes) and then yield, ensuring a finally block iterates the saved entries to reset each input_quantizer._onnx_quantizer_type and weight_quantizer._onnx_quantizer_type back to their original values (including handling missing/None originals).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 161-165: Reformat the if-block to satisfy ruff: keep the same
logic but use consistent spacing and line breaks (wrap the long conditional in
parentheses across lines) and standardize quotes; specifically, keep the initial
guard "if not hasattr(module, attr_name): continue", assign quantizer =
getattr(module, attr_name), then write the conditional as e.g. "if
(quantizer.is_enabled and not quantizer.block_sizes and not hasattr(quantizer,
'_amax')): quantizer.enable_calib()" (or break into multiple indented lines) so
spacing, parentheses, and quotation are ruff-compliant while preserving behavior
of module, attr_name, quantizer, .is_enabled, .block_sizes, ._amax and
enable_calib().
In `@modelopt/torch/_deploy/utils/torch_onnx.py`:
- Around line 587-593: The conditional assignment to conv_wq_context is
misformatted for the project's linter; reformat the expression using a single
parenthesized conditional expression: set conv_wq_context =
(_disable_fp8_conv_weight_quantizers(model) if is_fp8_quantized(model) else
nullcontext()), referencing the existing symbols
_disable_fp8_conv_weight_quantizers, is_fp8_quantized, nullcontext, and model so
the ternary is on one line inside parentheses and passes ruff formatting.
---
Nitpick comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Line 164: Replace the internal attribute check on quantizer with the public
property: instead of checking hasattr(quantizer, "_amax") in the conditional
that currently reads if quantizer.is_enabled and not quantizer.block_sizes and
not hasattr(quantizer, "_amax"): use quantizer.amax is None to detect
uncalibrated TensorQuantizer instances; update the condition to if
quantizer.is_enabled and not quantizer.block_sizes and quantizer.amax is None so
it uses the public TensorQuantizer API.
- Around line 239-241: The code assigns internal attributes quantizer._num_bits
and quantizer._axis directly which can leave the calibrator out of sync; replace
those direct assignments with the public property setters quantizer.num_bits =
(4, 3) and quantizer.axis = None (leave quantizer.block_sizes = None as-is) so
the setter logic in tensor_quantizer.py runs and updates the calibrator state
accordingly.
In `@modelopt/torch/quantization/export_onnx.py`:
- Around line 657-674: The context manager
configure_linear_module_onnx_quantizers mutates
module.input_quantizer._onnx_quantizer_type and
module.weight_quantizer._onnx_quantizer_type but never restores original values;
update configure_linear_module_onnx_quantizers to record the original
_onnx_quantizer_type for each module found via model.named_modules() (checking
hasattr(module, "input_quantizer") / "weight_quantizer" and block_sizes) and
then yield, ensuring a finally block iterates the saved entries to reset each
input_quantizer._onnx_quantizer_type and weight_quantizer._onnx_quantizer_type
back to their original values (including handling missing/None originals).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: cb1433e9-ae94-4122-80a6-bc7f9a108588
📒 Files selected for processing (6)
examples/torch_onnx/README.mdexamples/torch_onnx/torch_quant_to_onnx.pymodelopt/torch/_deploy/utils/torch_onnx.pymodelopt/torch/quantization/export_onnx.pytests/_test_utils/torch/vision_models.pytests/examples/torch_onnx/test_torch_quant_to_onnx.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1263 +/- ##
==========================================
- Coverage 76.90% 75.40% -1.50%
==========================================
Files 350 350
Lines 40524 40539 +15
==========================================
- Hits 31166 30570 -596
- Misses 9358 9969 +611
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Reflect all supported quantization modes and Conv2d override behavior. Signed-off-by: ajrasane <arasane@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 38-46: The docstring in torch_quant_to_onnx.py incorrectly groups
INT4_AWQ with fully supported quantization modes; update the top-level
description to either remove INT4_AWQ from the supported list or add a clear
caveat that INT4_AWQ is a known limitation and may not work end-to-end (e.g.,
"INT4_AWQ is experimental/limited — see PR objectives for current limitations"),
ensuring references to the script name and the quantization modes (FP8, INT8,
MXFP8, NVFP4, INT4_AWQ, AUTO) are adjusted so users won't assume INT4_AWQ is
fully supported.
- Around line 227-256: The override of Conv2d block quantizers to FP8 in
_override_conv2d_to_fp8 must not be applied after mtq.auto_quantize() because
mtq.auto_quantize() returns a search_state based on the candidate configs;
either incorporate the Conv2d->FP8/disable-block-quantization rule into the
candidate generation passed into mtq.auto_quantize() (so those Conv2d layers are
treated as FP8 during the search) or, if you must keep the override path,
recompute and validate the effective_bits/budget and update/refresh the returned
search_state after running _override_conv2d_to_fp8 so the final model’s budget
reflects FP8 costs for Conv2d (reference symbols: _override_conv2d_to_fp8,
mtq.auto_quantize, search_state).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 34fcb2fe-f485-45bd-96e4-df7cebf815fa
📒 Files selected for processing (1)
examples/torch_onnx/torch_quant_to_onnx.py
| Quantize a timm vision model and export to ONNX for TensorRT deployment. | ||
|
|
||
| Supports FP8, INT8, MXFP8, NVFP4, INT4_AWQ, and AUTO (mixed-precision) quantization modes. | ||
|
|
||
| The script will: | ||
| 1. Given the model name, create a timm torch model. | ||
| 2. Quantize the torch model in MXFP8, NVFP4, INT4_AWQ, or AUTO mode. | ||
| 3. Export the quantized torch model to ONNX format. | ||
| 1. Load a pretrained timm model (e.g., ViT, Swin, ResNet). | ||
| 2. Quantize the model using the specified mode. For models with Conv2d layers, | ||
| Conv2d quantization is automatically overridden for TensorRT compatibility | ||
| (FP8 for MXFP8/NVFP4, INT8 for INT4_AWQ). |
There was a problem hiding this comment.
Don’t advertise INT4_AWQ as supported end-to-end here.
The PR objectives still call out INT4_AWQ as a known limitation, but this docstring now groups it with the working modes. Please caveat or remove it here so users do not assume this example is expected to succeed in that mode.
✏️ Suggested wording
-Supports FP8, INT8, MXFP8, NVFP4, INT4_AWQ, and AUTO (mixed-precision) quantization modes.
+Supports FP8, INT8, MXFP8, NVFP4, and AUTO (mixed-precision) quantization modes.
+`INT4_AWQ` remains a known limitation for this example.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 38 - 46, The
docstring in torch_quant_to_onnx.py incorrectly groups INT4_AWQ with fully
supported quantization modes; update the top-level description to either remove
INT4_AWQ from the supported list or add a clear caveat that INT4_AWQ is a known
limitation and may not work end-to-end (e.g., "INT4_AWQ is experimental/limited
— see PR objectives for current limitations"), ensuring references to the script
name and the quantization modes (FP8, INT8, MXFP8, NVFP4, INT4_AWQ, AUTO) are
adjusted so users won't assume INT4_AWQ is fully supported.
| def _override_conv2d_to_fp8(model, data_loader): | ||
| """Override Conv2d layers with NVFP4/MXFP8 block quantization to FP8. | ||
|
|
||
| TRT DynamicQuantize requires 2D/3D input, but Conv2d operates on 4D tensors. | ||
| This overrides Conv2d block quantizers to FP8 per-tensor and calibrates them. | ||
| """ | ||
| overridden = [] | ||
| for _, module in model.named_modules(): | ||
| if not isinstance(module, torch.nn.Conv2d): | ||
| continue | ||
| for attr_name in ("input_quantizer", "weight_quantizer"): | ||
| if not hasattr(module, attr_name): | ||
| continue | ||
| quantizer = getattr(module, attr_name) | ||
| if quantizer.is_enabled and quantizer.block_sizes: | ||
| # Override to FP8 per-tensor | ||
| quantizer.block_sizes = None | ||
| quantizer._num_bits = (4, 3) | ||
| quantizer._axis = None | ||
| quantizer.enable_calib() | ||
| overridden.append(quantizer) | ||
|
|
||
| if overridden: | ||
| model.eval() | ||
| with torch.no_grad(): | ||
| for batch in data_loader: | ||
| model(batch["image"]) | ||
| for quantizer in overridden: | ||
| quantizer.disable_calib() | ||
| quantizer.load_calib_amax() |
There was a problem hiding this comment.
Apply the Conv2d TRT restriction before the auto-quant search.
mtq.auto_quantize() optimizes under the requested effective_bits budget and returns a search_state for the formats it actually evaluated. Rewriting any block-quantized Conv2d to FP8 afterwards means the exported model no longer matches that search space: Conv layers selected as NVFP4/MXFP8/INT4 now pay FP8 cost instead, so the final model can overshoot the target budget and the returned search_state becomes stale. Please push the Conv2d compatibility rule into the candidate configs before calling mtq.auto_quantize(), or recompute/validate the post-override budget before returning.
Also applies to: 307-309
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 227 - 256, The
override of Conv2d block quantizers to FP8 in _override_conv2d_to_fp8 must not
be applied after mtq.auto_quantize() because mtq.auto_quantize() returns a
search_state based on the candidate configs; either incorporate the
Conv2d->FP8/disable-block-quantization rule into the candidate generation passed
into mtq.auto_quantize() (so those Conv2d layers are treated as FP8 during the
search) or, if you must keep the override path, recompute and validate the
effective_bits/budget and update/refresh the returned search_state after running
_override_conv2d_to_fp8 so the final model’s budget reflects FP8 costs for
Conv2d (reference symbols: _override_conv2d_to_fp8, mtq.auto_quantize,
search_state).
Move TRT engine build logic into the script as a --trt_build flag, removing the duplicate trtexec invocation from the test file. Signed-off-by: ajrasane <arasane@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
examples/torch_onnx/torch_quant_to_onnx.py (2)
39-47:⚠️ Potential issue | 🟡 MinorRemove
INT4_AWQfrom the supported-modes docstring.This still reads like end-to-end support, but the PR summary calls out
INT4_AWQas a known limitation for this example.✏️ Suggested wording
-Supports FP8, INT8, MXFP8, NVFP4, INT4_AWQ, and AUTO (mixed-precision) quantization modes. +Supports FP8, INT8, MXFP8, NVFP4, and AUTO (mixed-precision) quantization modes. +`INT4_AWQ` remains a known limitation for this example.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 39 - 47, Update the module-level docstring that lists supported quantization modes by removing "INT4_AWQ" from the supported-modes sentence (the top-level docstring/description in torch_quant_to_onnx.py); ensure any other occurrences in the same docstring (e.g., the second paragraph that enumerates FP8, INT8, MXFP8, NVFP4, INT4_AWQ, and AUTO) are updated so INT4_AWQ is no longer mentioned, leaving the remaining modes and explanatory text intact.
228-257:⚠️ Potential issue | 🟠 MajorApply the Conv2d TRT restriction inside the auto-quant search.
mtq.auto_quantize()returns asearch_statefor the formats it actually evaluated. Rewriting selected Conv2d layers to FP8 afterwards means the exported model no longer matches that search space or itseffective_bitsbudget, so the returnedsearch_stateis stale. Push this compatibility rule into the candidate configs beforemtq.auto_quantize(), or recompute/validate the budget after the override.Also applies to: 308-310
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 228 - 257, The current post-hoc override in _override_conv2d_to_fp8 mutates Conv2d quantizers after mtq.auto_quantize has already produced a search_state and effective_bits budget, making search_state stale; move the Conv2d -> FP8 compatibility rule into the candidate configuration generation that mtq.auto_quantize consumes (or run a budget/validation pass to recompute search_state/effective_bits after performing the override). Specifically, incorporate the TRT Conv2d restriction into the candidate configs or pre-filter candidates before calling mtq.auto_quantize, or if you keep _override_conv2d_to_fp8, call a function to recompute/validate search_state and effective_bits (the returned object from mtq.auto_quantize) immediately after performing the override so the exported model and reported budget remain consistent.
🧹 Nitpick comments (1)
examples/torch_onnx/torch_quant_to_onnx.py (1)
198-202: Avoid forcing calibration data for quantizers that are disabled later.This now downloads calibration data for every non-
autorun, then does the extra calibration pass beforemtq.disable_quantizer(). For models whose Conv2d quantizers are later filtered out, that makesmxfp8/nvfp4pay the Tiny-ImageNet + calibration cost even though no active override survives to export. Consider applying the filter before_calibrate_uncalibrated_quantizers(), or loading calibration data lazily only when an enabled Conv2d override remains.Also applies to: 462-473
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 198 - 202, The current flow calls _calibrate_uncalibrated_quantizers(quantized_model, data_loader) before mtq.disable_quantizer(...), causing calibration data to be downloaded even for quantizers that will be disabled; change the order or gate calibration so only enabled overrides are calibrated: either call mtq.disable_quantizer(quantized_model, filter_func) before invoking _calibrate_uncalibrated_quantizers, or add a pre-check that filters quantized_model (using filter_func) to determine if any Conv2d override remains enabled and only then load/process data_loader and call _calibrate_uncalibrated_quantizers; reference functions: _calibrate_uncalibrated_quantizers, mtq.disable_quantizer, filter_func, quantized_model, and data_loader.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 39-47: Update the module-level docstring that lists supported
quantization modes by removing "INT4_AWQ" from the supported-modes sentence (the
top-level docstring/description in torch_quant_to_onnx.py); ensure any other
occurrences in the same docstring (e.g., the second paragraph that enumerates
FP8, INT8, MXFP8, NVFP4, INT4_AWQ, and AUTO) are updated so INT4_AWQ is no
longer mentioned, leaving the remaining modes and explanatory text intact.
- Around line 228-257: The current post-hoc override in _override_conv2d_to_fp8
mutates Conv2d quantizers after mtq.auto_quantize has already produced a
search_state and effective_bits budget, making search_state stale; move the
Conv2d -> FP8 compatibility rule into the candidate configuration generation
that mtq.auto_quantize consumes (or run a budget/validation pass to recompute
search_state/effective_bits after performing the override). Specifically,
incorporate the TRT Conv2d restriction into the candidate configs or pre-filter
candidates before calling mtq.auto_quantize, or if you keep
_override_conv2d_to_fp8, call a function to recompute/validate search_state and
effective_bits (the returned object from mtq.auto_quantize) immediately after
performing the override so the exported model and reported budget remain
consistent.
---
Nitpick comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 198-202: The current flow calls
_calibrate_uncalibrated_quantizers(quantized_model, data_loader) before
mtq.disable_quantizer(...), causing calibration data to be downloaded even for
quantizers that will be disabled; change the order or gate calibration so only
enabled overrides are calibrated: either call
mtq.disable_quantizer(quantized_model, filter_func) before invoking
_calibrate_uncalibrated_quantizers, or add a pre-check that filters
quantized_model (using filter_func) to determine if any Conv2d override remains
enabled and only then load/process data_loader and call
_calibrate_uncalibrated_quantizers; reference functions:
_calibrate_uncalibrated_quantizers, mtq.disable_quantizer, filter_func,
quantized_model, and data_loader.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 54e5c119-bb09-448b-ba29-0ad682692eb3
📒 Files selected for processing (2)
examples/torch_onnx/torch_quant_to_onnx.pytests/examples/torch_onnx/test_torch_quant_to_onnx.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/examples/torch_onnx/test_torch_quant_to_onnx.py
Signed-off-by: ajrasane <arasane@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 248-250: Replace the direct internal assignments to
quantizer._num_bits and quantizer._axis with the public property setters so the
calibrator stays synchronized: use quantizer.num_bits = (4, 3) and
quantizer.axis = None instead of writing to _num_bits and _axis (keeping
quantizer.block_sizes assignment as-is); update references in the same snippet
to call the num_bits and axis properties on the quantizer object to ensure
proper calibrator/internal state updates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 703357c3-97d1-492b-a949-8b09ff915c3c
📒 Files selected for processing (2)
examples/torch_onnx/torch_quant_to_onnx.pymodelopt/torch/_deploy/utils/torch_onnx.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/torch/_deploy/utils/torch_onnx.py
| quantizer.block_sizes = None | ||
| quantizer._num_bits = (4, 3) | ||
| quantizer._axis = None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's a public API for setting quantizer num_bits/axis
rg -n "def.*num_bits|def.*axis" --type=py modelopt/torch/quantization/Repository: NVIDIA/Model-Optimizer
Length of output: 3259
🏁 Script executed:
#!/bin/bash
# Get the file and examine the problematic lines with context
cat -n examples/torch_onnx/torch_quant_to_onnx.py | sed -n '235,260p'Repository: NVIDIA/Model-Optimizer
Length of output: 1289
🏁 Script executed:
#!/bin/bash
# Examine the public num_bits and axis property implementations
cat -n modelopt/torch/quantization/nn/modules/tensor_quantizer.py | sed -n '272,285p'
cat -n modelopt/torch/quantization/nn/modules/tensor_quantizer.py | sed -n '372,385p'Repository: NVIDIA/Model-Optimizer
Length of output: 967
Use public property setters for num_bits and axis to maintain calibrator synchronization.
At lines 249-250, directly assigning to _num_bits and _axis bypasses the property setters, which synchronize these values with the calibrator's internal state. This creates an inconsistency where the quantizer and calibrator have mismatched configuration.
Change:
quantizer._num_bits = (4, 3)
quantizer._axis = NoneTo:
quantizer.num_bits = (4, 3)
quantizer.axis = NoneLine 248 correctly uses the public block_sizes property; apply the same pattern here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 248 - 250, Replace
the direct internal assignments to quantizer._num_bits and quantizer._axis with
the public property setters so the calibrator stays synchronized: use
quantizer.num_bits = (4, 3) and quantizer.axis = None instead of writing to
_num_bits and _axis (keeping quantizer.block_sizes assignment as-is); update
references in the same snippet to call the num_bits and axis properties on the
quantizer object to ensure proper calibrator/internal state updates.
Summary
configure_linear_module_onnx_quantizersto handle all modules with block quantization (not justnn.Linear), fixing NVFP4/MXFP8 export for models with quantized non-Linear modules--trt_buildflag totorch_quant_to_onnx.pyand simplify test infrastructureFiles Changed
modelopt/torch/_deploy/utils/torch_onnx.py— Disable FP8 Conv2d weight quantizers and autocast during ONNX exportmodelopt/torch/quantization/export_onnx.py— Fixconfigure_linear_module_onnx_quantizersfor all module types with block quantizationexamples/torch_onnx/torch_quant_to_onnx.py— Add--trt_buildflag, calibration for FP8 override quantizers, Conv2d→FP8 override for auto mode, filter_func updatesexamples/torch_onnx/README.md— Add ResNet50 to supported models tabletests/examples/torch_onnx/test_torch_quant_to_onnx.py— Add ResNet50 test entry, simplify using--trt_buildtests/_test_utils/torch/vision_models.py— Add ResNet50 to timm model registryQuantization modes passing
Test plan
pytest tests/examples/torch_onnx/test_torch_quant_to_onnx.py -k resnet50(5/5 passed)🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
--trt_buildcommand-line flag for model optimization.