|
1 | | -NVIDIA Model Optimizer Changelog (Linux) |
2 | | -======================================== |
| 1 | +NVIDIA Model Optimizer Changelog |
| 2 | +================================ |
3 | 3 |
|
4 | 4 | 0.43 (2026-03-xx) |
5 | 5 | ^^^^^^^^^^^^^^^^^ |
@@ -72,6 +72,13 @@ NVIDIA Model Optimizer Changelog (Linux) |
72 | 72 |
|
73 | 73 | - Remove ``torchprofile`` as a default dependency from ModelOpt as its used only for flops-based FastNAS pruning (computer vision models). It can be installed separately if needed. |
74 | 74 |
|
| 75 | +**Windows Support** |
| 76 | + |
| 77 | +- Fix ONNX 1.19 compatibility issues with CuPy during ONNX INT4 AWQ quantization. ONNX 1.19 uses ml_dtypes.int4 instead of numpy.int8 which caused CuPy failures. |
| 78 | +- Add support for ONNX Mixed Precision Weight-only quantization using INT4 and INT8 precisions. Refer quantization `example for GenAI LLMs <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/genai_llm>`_. |
| 79 | +- Add support for some diffusion models' quantization on Windows. Refer `example script <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/torch_onnx/diffusers>`_ for details. |
| 80 | +- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks. |
| 81 | + |
75 | 82 | 0.40 (2025-12-12) |
76 | 83 | ^^^^^^^^^^^^^^^^^ |
77 | 84 |
|
@@ -199,6 +206,10 @@ NVIDIA Model Optimizer Changelog (Linux) |
199 | 206 | - Add NeMo 2 Simplified Flow examples for quantization aware training/distillation (QAT/QAD), speculative decoding, pruning & distillation. |
200 | 207 | - Fix a Qwen3 MOE model export issue. |
201 | 208 |
|
| 209 | +**Windows Support** |
| 210 | + |
| 211 | +- Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider. |
| 212 | + |
202 | 213 | 0.31 (2025-06-04) |
203 | 214 | ^^^^^^^^^^^^^^^^^ |
204 | 215 |
|
@@ -281,6 +292,11 @@ NVIDIA Model Optimizer Changelog (Linux) |
281 | 292 | - Add support for ``--calibration_shapes`` flag. |
282 | 293 | - Add automatic type and shape tensor propagation for full ORT support with TensorRT EP. |
283 | 294 |
|
| 295 | +**Windows Support** |
| 296 | + |
| 297 | +- New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer `Windows Support Matrix <https://nvidia.github.io/Model-Optimizer/guides/0_support_matrix.html>`_ for details about supported features and models. |
| 298 | +- Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check `example scripts <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq>`_ for getting started with quantizing these models. |
| 299 | + |
284 | 300 | **Known Issues** |
285 | 301 |
|
286 | 302 | - Quantization of T5 models is broken. Please use ``nvidia-modelopt==0.25.0`` with ``transformers<4.50`` meanwhile. |
@@ -384,6 +400,18 @@ NVIDIA Model Optimizer Changelog (Linux) |
384 | 400 | quantization for NeMo & MCore models (in addition to HuggingFace models). |
385 | 401 | - Add ``num_layers`` and ``hidden_size`` pruning support for NeMo / Megatron-core models. |
386 | 402 |
|
| 403 | +**Windows Support** |
| 404 | + |
| 405 | +- This is the first official release of Model Optimizer for Windows |
| 406 | +- **ONNX INT4 Quantization:** :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models. |
| 407 | +- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `Olive example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_. |
| 408 | +- **DirectML Deployment Guide:** Added DML deployment guide. Refer :ref:`Onnxruntime_Deployment` deployment guide for details. |
| 409 | +- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML). |
| 410 | +- **Published quantized ONNX models collection:** Published quantized ONNX models at HuggingFace `NVIDIA collections <https://huggingface.co/collections/nvidia/optimized-onnx-models-for-nvidia-rtx-gpus>`_. |
| 411 | + |
| 412 | + |
| 413 | +\* *This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.* |
| 414 | + |
387 | 415 |
|
388 | 416 | 0.17 (2024-09-11) |
389 | 417 | ^^^^^^^^^^^^^^^^^ |
|
0 commit comments