Skip to content

Commit 860d0b4

Browse files
Merge Linux and Windows Changelog (#954)
Since we no longer have any compiled packages, all releases are for all platforms so we dont have separate windows releases hence merging changelog as well. Going forward, windows can test on latest linux version and if fixes needed, they can go in next usual monthly linux release <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Reorganized changelog structure to consolidate Windows and Linux release information in a unified view. * Expanded Windows Support documentation across recent release versions. * **Chores** * Updated Windows example release badge to dynamically reflect the latest PyPI release version. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
1 parent edde087 commit 860d0b4

6 files changed

Lines changed: 32 additions & 63 deletions

File tree

CHANGELOG-Windows.rst

Lines changed: 0 additions & 47 deletions
This file was deleted.

CHANGELOG.rst

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
NVIDIA Model Optimizer Changelog (Linux)
2-
========================================
1+
NVIDIA Model Optimizer Changelog
2+
================================
33

44
0.43 (2026-03-xx)
55
^^^^^^^^^^^^^^^^^
@@ -72,6 +72,13 @@ NVIDIA Model Optimizer Changelog (Linux)
7272

7373
- Remove ``torchprofile`` as a default dependency from ModelOpt as its used only for flops-based FastNAS pruning (computer vision models). It can be installed separately if needed.
7474

75+
**Windows Support**
76+
77+
- Fix ONNX 1.19 compatibility issues with CuPy during ONNX INT4 AWQ quantization. ONNX 1.19 uses ml_dtypes.int4 instead of numpy.int8 which caused CuPy failures.
78+
- Add support for ONNX Mixed Precision Weight-only quantization using INT4 and INT8 precisions. Refer quantization `example for GenAI LLMs <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/genai_llm>`_.
79+
- Add support for some diffusion models' quantization on Windows. Refer `example script <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/torch_onnx/diffusers>`_ for details.
80+
- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
81+
7582
0.40 (2025-12-12)
7683
^^^^^^^^^^^^^^^^^
7784

@@ -199,6 +206,10 @@ NVIDIA Model Optimizer Changelog (Linux)
199206
- Add NeMo 2 Simplified Flow examples for quantization aware training/distillation (QAT/QAD), speculative decoding, pruning & distillation.
200207
- Fix a Qwen3 MOE model export issue.
201208

209+
**Windows Support**
210+
211+
- Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
212+
202213
0.31 (2025-06-04)
203214
^^^^^^^^^^^^^^^^^
204215

@@ -281,6 +292,11 @@ NVIDIA Model Optimizer Changelog (Linux)
281292
- Add support for ``--calibration_shapes`` flag.
282293
- Add automatic type and shape tensor propagation for full ORT support with TensorRT EP.
283294

295+
**Windows Support**
296+
297+
- New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer `Windows Support Matrix <https://nvidia.github.io/Model-Optimizer/guides/0_support_matrix.html>`_ for details about supported features and models.
298+
- Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check `example scripts <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq>`_ for getting started with quantizing these models.
299+
284300
**Known Issues**
285301

286302
- Quantization of T5 models is broken. Please use ``nvidia-modelopt==0.25.0`` with ``transformers<4.50`` meanwhile.
@@ -384,6 +400,18 @@ NVIDIA Model Optimizer Changelog (Linux)
384400
quantization for NeMo & MCore models (in addition to HuggingFace models).
385401
- Add ``num_layers`` and ``hidden_size`` pruning support for NeMo / Megatron-core models.
386402

403+
**Windows Support**
404+
405+
- This is the first official release of Model Optimizer for Windows
406+
- **ONNX INT4 Quantization:** :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models.
407+
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `Olive example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_.
408+
- **DirectML Deployment Guide:** Added DML deployment guide. Refer :ref:`Onnxruntime_Deployment` deployment guide for details.
409+
- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML).
410+
- **Published quantized ONNX models collection:** Published quantized ONNX models at HuggingFace `NVIDIA collections <https://huggingface.co/collections/nvidia/optimized-onnx-models-for-nvidia-rtx-gpus>`_.
411+
412+
413+
\* *This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.*
414+
387415

388416
0.17 (2024-09-11)
389417
^^^^^^^^^^^^^^^^^
Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1 @@
1-
=========
2-
Changelog
3-
=========
4-
5-
6-
.. toctree::
7-
:glob:
8-
:maxdepth: 1
9-
10-
_changelog_for_Linux.rst
11-
_changelog_for_Windows.rst
1+
.. include:: ../../../CHANGELOG.rst

docs/source/reference/_changelog_for_Linux.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/source/reference/_changelog_for_Windows.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

examples/windows/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
A Library to Quantize and Compress Deep Learning Models for Optimized Inference on Native Windows RTX GPUs
66

77
[![Documentation](https://img.shields.io/badge/Documentation-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/Model-Optimizer/)
8-
[![version](https://img.shields.io/badge/v0.33.0-orange?label=Release)](https://pypi.org/project/nvidia-modelopt/)
8+
[![version](https://img.shields.io/pypi/v/nvidia-modelopt?label=Release)](https://pypi.org/project/nvidia-modelopt/)
99
[![license](https://img.shields.io/badge/License-Apache%202.0-blue)](../../LICENSE)
1010

1111
[Examples](#examples) |

0 commit comments

Comments
 (0)