You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## What does this PR do?
**Type of change:** ? new feature
**Overview:** ?
Supports export `NVFP4StaticQantizer` in unified huggingface checkpoint,
as a deployment path for PTQ algorithms such as MSE
## Usage
<!-- You can potentially add a usage example below. -->
```python
# checkpoint generation
python examples/llm_ptq/hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B --qformat nvfp4_mse --export_path test-Qwen3-8B-Instruct-MSE-FP8-sweep-FP4 --kv_cache_qformat none --trust_remote_code
```
## Testing
Tested generated Qwen3 8B checkpoint with trtllm serve and nv_eval
example in `Model-Optimizer-Internal/examples/nv_eval`.
NV eval results:
```
| Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
|--------|-------|------|------|-----------|---|-----:|---|-----:|
|mmlu_str| |none | |exact_match|↑ |0.7186|± |0.0036|
```
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added support for static NVFP4 quantizers that utilize pre-computed
calibration scales.
* Introduced new NVFP4 W4A4 quantization configuration with optional FP8
scale sweep.
* **Performance Improvements**
* Static quantizers now skip unnecessary dynamic scaling factor
recalculation.
* Unified quantization handling for improved consistency and efficiency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,7 @@ NVIDIA Model Optimizer Changelog (Linux)
15
15
- Add ``--moe_calib_experts_ratio`` flag in ``hf_ptq.py`` to specify the ratio of experts to calibrate during forward pass to improve expert coverage during calibration. Default to all the experts.
16
16
- Add sparse attention optimization for transformer models (``modelopt.torch.sparsity.attention_sparsity``). This reduces computational cost by skipping attention computation. Supports calibration for threshold selection on HuggingFace models. See `examples/llm_sparsity/attention_sparsity/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_sparsity/attention_sparsity>`_ for usage.
17
17
- Add support for rotating the input before quantization for RHT.
18
+
- Add support for advanced weight scale search for NVFP4 quantization and its export path.
18
19
19
20
0.42 (2026-02-xx)
20
21
^^^^^^^^^^^^^^^^^
@@ -36,6 +37,7 @@ NVIDIA Model Optimizer Changelog (Linux)
36
37
- Add LTX-2 and Wan2.2 (T2V) support in the diffusers quantization workflow.
37
38
- Add PTQ support for GLM-4.7, including loading MTP layer weights from a separate ``mtp.safetensors`` file and export as-is.
38
39
- Add support for image-text data calibration in PTQ for Nemotron VL models.
40
+
- Add support for advanced weight scale search for NVFP4 quantization and its export path.
39
41
- Add PTQ support for Nemotron Parse.
40
42
- Add distillation support for LTX-2. See `examples/diffusers/distillation/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/diffusers/distillation>`_ for more details.
0 commit comments