You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto Quantize improvements and bug fixes for large sparse MoEs
- Add get_auto_quantize_config API to extract quant config from search results
- Save/restore calibration state in auto_quantize checkpoints
- Add NemotronH MoE expert support in auto_quantize grouping/scoring
- Fix SequentialQuantizer scope, use F.kl_div for numerical stability
- Fix mypy errors and clean up tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,9 @@ NVIDIA Model Optimizer Changelog
21
21
- ``pass_through_bwd`` in the quantization config is now default to True. Please set it to False if you want to use STE with zeroed outlier gradients for potentially better QAT accuracy.
22
22
- Add :meth:`compute_quantization_mse <modelopt.torch.quantization.model_quant.compute_quantization_mse>` API to measure per-quantizer mean-squared quantization error, with flexible wildcard and callable filtering.
23
23
- **AutoQDQ**: New tool for automated Q/DQ (Quantize/Dequantize) placement optimization for ONNX models. Uses TensorRT latency measurements to choose insertion schemes that minimize inference time. Discovers regions automatically, groups them by structural pattern, and tests multiple Q/DQ schemes per pattern. Supports INT8 and FP8 quantization, pattern cache for warm-start on similar models, checkpoint/resume, and importing patterns from an existing QDQ baseline. CLI: ``python -m modelopt.onnx.quantization.autotune``. See the AutoQDQ guide in the documentation.
24
+
- Add ``get_auto_quantize_config`` API to extract a flat quantization config from ``auto_quantize`` search results, enabling re-quantization at different effective bit targets without re-running calibration.
25
+
- Improve ``auto_quantize`` checkpoint/resume: calibration state is now saved and restored across runs, avoiding redundant calibration when resuming a search.
26
+
- Add NemotronH MoE expert support in ``auto_quantize`` grouping and scoring rules.
0 commit comments