Skip to content

Commit 6af1cd4

Browse files
committed
update doc
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
1 parent aa60527 commit 6af1cd4

File tree

2 files changed

+3
-1
lines changed

2 files changed

+3
-1
lines changed

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ NVIDIA Model Optimizer Changelog (Linux)
1616
- Add sparse attention optimization for transformer models (``modelopt.torch.sparsity.attention_sparsity``). This reduces computational cost by skipping attention computation. Supports calibration for threshold selection on HuggingFace models. See `examples/llm_sparsity/attention_sparsity/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_sparsity/attention_sparsity>`_ for usage.
1717
- Add support for rotating the input before quantization for RHT.
1818
- Add support for advanced weight scale search for NVFP4 quantization and its export path.
19+
- Enable PTQ workflow for Qwen3.5 MoE models.
1920

2021
0.42 (2026-02-xx)
2122
^^^^^^^^^^^^^^^^^

examples/llm_ptq/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Please reference our [framework scripts](#framework-scripts) and our [docs](http
106106
| Llama-Nemotron Ultra ||||||
107107
| Gemma 3 | ✅<sup>2</sup> | - || - | - |
108108
| QWen 2, 2.5 <sup>4</sup> ||||||
109-
| QWen3 MOE, Next <sup>6</sup> || - | - | - ||
109+
| QWen3, 3.5 MOE, Next <sup>6</sup> || - | - | - ||
110110
| QwQ || - | - | - ||
111111
| DeepSeek V3, R1, V3.1, V3.2<sup>7</sup> | - | - | - | - ||
112112
| GLM-4.7<sup>8</sup> || - | - | - ||
@@ -402,6 +402,7 @@ print(llm_fp8.generate(["What's the age of the earth? "]))
402402
| QWen3 | FP4 ||| - |
403403
| QWen3 MoE | FP8 ||||
404404
| QWen3 MoE | FP4 || - | - |
405+
| QWen3.5 MoE | FP4 | - | - ||
405406
| QWen2.5 | FP8 ||||
406407
| QWen2.5 | FP4 ||| - |
407408
| QwQ-32B | FP8 ||||

0 commit comments

Comments
 (0)