You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## What does this PR do?
**Type of change:** New model support
**Overview:** Add ModelOpt PTQ support for
https://huggingface.co/Qwen/Qwen3.5-397B-A17B
## Usage
<!-- You can potentially add a usage example below. -->
```python
python3 hf_ptq.py --pyt_ckpt_path /home/omniml_data_3/models/Qwen3.5-397B-A17B --qformat nvfp4_mlp_only --export_path /home/omniml_data_3/zhiyuc/checkpoints/Qwen3.5-397B-A17B-NVFP4 --trust_remote_code
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not yet <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added Qwen3.5 Mixture-of-Experts model support in quantization
workflows.
* **Bug Fixes**
* Enhanced error diagnostics during model export with detailed module
information.
* Improved dataset tokenizer processing with proper truncation and
length handling.
* Fixed model export stability issue related to framework integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,7 @@ NVIDIA Model Optimizer Changelog (Linux)
16
16
- Add sparse attention optimization for transformer models (``modelopt.torch.sparsity.attention_sparsity``). This reduces computational cost by skipping attention computation. Supports calibration for threshold selection on HuggingFace models. See `examples/llm_sparsity/attention_sparsity/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_sparsity/attention_sparsity>`_ for usage.
17
17
- Add support for rotating the input before quantization for RHT.
18
18
- Add support for advanced weight scale search for NVFP4 quantization and its export path.
0 commit comments