[3.1/4] Diffusion Quantized ckpt export - WAN 2.2 14B by jingyu-ml · Pull Request #855 · NVIDIA/Model-Optimizer

jingyu-ml · 2026-02-05T00:50:15Z

What does this PR do?

Type of change: documentation

Overview:

Added multi‑backbone support for quantization: --backbone now accepts space- or comma-separated lists and resolves to a list of backbone modules.
Introduced PipelineManager.iter_backbones() to iterate named backbone modules and updated get_backbone() to return a single module or a ModuleList for multi‑backbone.
Updated ExportManager to save/restore per‑backbone checkpoints when a directory is provided, with {backbone_name}.pt files, and to create target directories when missing.
Simplified save_checkpoint() calls to rely on the registered pipeline_manager by default.

**Usage: **

python quantize.py --model wan2.2-t2v-14b --format fp4 --batch-size 1 --calib-size 32 \
    --n-steps 30 --backbone transformer transformer_2 --model-dtype BFloat16 \
    --quantized-torch-ckpt-save-path ./wan22_mo_ckpts \
    --hf-ckpt-dir ./wan2.2-t2v-14b

Plans

[1/4] Add the basic functionalities to support limited image models with NVFP4 + FP8, with some refactoring on the previous LLM code and the diffusers example. PIC: @jingyu-ml
[2/4] Add support to more video gen models. PIC: @jingyu-ml
[3/4] Add test cases, refactor on the doc, and all related README. PIC: @jingyu-ml
[4/4] Add the final support to ComfyUI. PIC @jingyu-ml

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: No
Did you write any new necessary tests?:No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Yes

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Unified Hugging Face export support for diffusers pipelines and components
- LTX-2 and Wan2.2 (T2V) support in diffusers quantization workflow
- Comprehensive ONNX export and TensorRT engine build documentation for diffusion models
Documentation
- Updated to clarify support for both transformers and diffusers models in unified export API
- Expanded diffusers examples with LoRA fusion guidance and additional model options (Flux, SD3, SDXL variants)

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

…lopt Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai · 2026-02-05T00:50:32Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request extends Hugging Face export support to diffusers models, introduces comprehensive ONNX export and TensorRT engine build documentation, and refactors the quantization pipeline to support per-backbone operations instead of single-backbone workflows.

Changes

Cohort / File(s)	Summary
Version & Documentation Updates `CHANGELOG.rst`, `README.md`, `docs/source/deployment/3_unified_hf.rst`, `docs/source/getting_started/1_overview.rst`	Added changelog entries and updated existing documentation to reflect unified Hugging Face export API now supporting both transformers and diffusers models/pipelines.
Diffusers Examples Documentation `examples/diffusers/README.md`	Restructured and reorganized documentation by moving ONNX export and TensorRT engine build instructions to a dedicated quantization/ONNX.md file. Updated quantization script examples with new model variants (LTX-2, WAN2.2, flux-schnell, sd3-medium, sd3.5-medium) and added new LoRA fusion guidance section.
ONNX Workflow Documentation `examples/diffusers/quantization/ONNX.md`	New comprehensive documentation page detailing end-to-end ONNX export and TensorRT engine build workflows, including quantization steps (8-bit, FP8), memory requirements, trtexec commands, and inference execution examples for multiple diffusion models.
Pipeline Backbone Support `examples/diffusers/quantization/pipeline_manager.py`	Added `iter_backbones()` method to iterate over pipeline backbone modules with special LTX2 handling, and `print_quant_summary()` method to log per-backbone quantization information. Refactored `get_backbone()` to leverage new iterator.
Per-Backbone Quantization Export `examples/diffusers/quantization/quantize.py`	Modified `ExportManager` to accept optional `pipeline_manager` and refactored `save_checkpoint()` and `restore_checkpoint()` to perform per-backbone operations. Changed CLI `--backbone` argument to accept multiple values (`nargs="+"`). Updated orchestration in main workflow to iterate over backbones via pipeline manager.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title mentions 'WAN 2.2 14B' but is vague about the main change; it doesn't clearly convey that this is about multi-backbone checkpoint export support, which is the core technical change.	Clarify the title to reflect the primary change: multi-backbone per-checkpoint export support for diffusion quantization, not just WAN 2.2.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jingyux/3.1-4-diffusion

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@examples/diffusers/quantization/ONNX.md`:
- Line 26: Standardize the checkpoint placeholder used in the docs: replace the
inconsistent {MODEL_NAME} and {MODEL} occurrences with a single chosen
placeholder (e.g., {MODEL_NAME}) for the --quantized-torch-ckpt-save-path and
all related examples; update every instance including the command example shown
and the other occurrence around line 118 so that
--quantized-torch-ckpt-save-path ./{MODEL_NAME}.pt and any references
(README/usage examples) consistently use the same placeholder name.
- Around line 45-48: Update the TensorRT version guidance and SVDQuant claim and
standardize placeholders: clarify that the "INT8 requires >= 9.2.0" statement is
specific to LLM inference on select GPUs (A100, A10G, L4, L40, L40S, H100,
GH200) and note NVIDIA's general production recommendation of TensorRT 8.6.1;
keep the FP8 guidance (TensorRT >= 10.2.0) but scope it similarly; replace the
incorrect blanket "SVDQuant deployment is currently not supported" with a
corrected note that SVDQuant is supported via NVIDIA ModelOpt and can be
integrated with TensorRT (with additional complexity and runtime
considerations); and standardize all placeholders to a single token (choose
{MODEL_NAME} and replace all occurrences of {MODEL} accordingly).

In `@examples/diffusers/quantization/pipeline_manager.py`:
- Around line 184-199: The code currently does list(self.config.backbone) which
splits strings into characters; instead normalize self.config.backbone into a
list of backbone names by checking if it's a str and splitting on commas
(str.split(",") with strip on each token) or otherwise converting the iterable
to a list, then assign to names; keep the existing LTX2 branch using
ModelType.LTX2 and _ensure_ltx2_transformer_cached (yielding name and
self._transformer), preserve the RuntimeError when names is empty, and continue
using getattr(self.pipe, name, None) for each normalized name to raise the same
missing-backbone error if a module is absent.

codecov · 2026-02-05T01:06:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.44%. Comparing base (5e43b2a) to head (383bb96).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #855   +/-   ##
=======================================
  Coverage   73.44%   73.44%           
=======================================
  Files         197      197           
  Lines       20657    20657           
=======================================
  Hits        15172    15172           
  Misses       5485     5485

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1 · 2026-02-06T06:45:43Z

+    --extra-param checkpoint_path=./ltx-2-19b-dev-fp8.safetensors \
+    --extra-param distilled_lora_path=./ltx-2-19b-distilled-lora-384.safetensors \
+    --extra-param spatial_upsampler_path=./ltx-2-spatial-upscaler-x2-1.0.safetensors \


Are these ckpts pre-generated?

No, user has to download some from elsewhere and added the LTX2 link so they can follow up for LTX2.

If users want to use LTX2, they can figure out how to use it themselves, we can’t put everything into MO example. let me know if you have a different thought

I see, let's add a note about it, about where these ckpts can be found.

Sure, added it:

LTX-2 FP4 (torch checkpoint export)

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1

LGTM

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

@jingyu-ml

## What does this PR do? **Type of change:** documentation  **Overview:** 1. Added multi‑backbone support for quantization: --backbone now accepts space- or comma-separated lists and resolves to a list of backbone modules. 2. Introduced PipelineManager.iter_backbones() to iterate named backbone modules and updated get_backbone() to return a single module or a ModuleList for multi‑backbone. 3. Updated ExportManager to save/restore per‑backbone checkpoints when a directory is provided, with {backbone_name}.pt files, and to create target directories when missing. 4. Simplified save_checkpoint() calls to rely on the registered pipeline_manager by default. **Usage: ** ```bash python quantize.py --model wan2.2-t2v-14b --format fp4 --batch-size 1 --calib-size 32 \ --n-steps 30 --backbone transformer transformer_2 --model-dtype BFloat16 \ --quantized-torch-ckpt-save-path ./wan22_mo_ckpts \ --hf-ckpt-dir ./wan2.2-t2v-14b ``` Plans - [x] [1/4] Add the basic functionalities to support limited image models with NVFP4 + FP8, with some refactoring on the previous LLM code and the diffusers example. PIC: @jingyu-ml - [x] [2/4] Add support to more video gen models. PIC: @jingyu-ml - [x] [3/4] Add test cases, refactor on the doc, and all related README. PIC: @jingyu-ml - [ ] [4/4] Add the final support to ComfyUI. PIC @jingyu-ml ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: No  - **Did you write any new necessary tests?**:No - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes  ## Additional Information   ## Summary by CodeRabbit ## Release Notes * **New Features** * Unified Hugging Face export support for diffusers pipelines and components * LTX-2 and Wan2.2 (T2V) support in diffusers quantization workflow * Comprehensive ONNX export and TensorRT engine build documentation for diffusion models * **Documentation** * Updated to clarify support for both transformers and diffusers models in unified export API * Expanded diffusers examples with LoRA fusion guidance and additional model options (Flux, SD3, SDXL variants)  --------- Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

@jingyu-ml

## What does this PR do? **Type of change:** documentation  **Overview:** 1. Added multi‑backbone support for quantization: --backbone now accepts space- or comma-separated lists and resolves to a list of backbone modules. 2. Introduced PipelineManager.iter_backbones() to iterate named backbone modules and updated get_backbone() to return a single module or a ModuleList for multi‑backbone. 3. Updated ExportManager to save/restore per‑backbone checkpoints when a directory is provided, with {backbone_name}.pt files, and to create target directories when missing. 4. Simplified save_checkpoint() calls to rely on the registered pipeline_manager by default. **Usage: ** ```bash python quantize.py --model wan2.2-t2v-14b --format fp4 --batch-size 1 --calib-size 32 \ --n-steps 30 --backbone transformer transformer_2 --model-dtype BFloat16 \ --quantized-torch-ckpt-save-path ./wan22_mo_ckpts \ --hf-ckpt-dir ./wan2.2-t2v-14b ``` Plans - [x] [1/4] Add the basic functionalities to support limited image models with NVFP4 + FP8, with some refactoring on the previous LLM code and the diffusers example. PIC: @jingyu-ml - [x] [2/4] Add support to more video gen models. PIC: @jingyu-ml - [x] [3/4] Add test cases, refactor on the doc, and all related README. PIC: @jingyu-ml - [ ] [4/4] Add the final support to ComfyUI. PIC @jingyu-ml ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: No  - **Did you write any new necessary tests?**:No - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes  ## Additional Information   ## Summary by CodeRabbit ## Release Notes * **New Features** * Unified Hugging Face export support for diffusers pipelines and components * LTX-2 and Wan2.2 (T2V) support in diffusers quantization workflow * Comprehensive ONNX export and TensorRT engine build documentation for diffusion models * **Documentation** * Updated to clarify support for both transformers and diffusers models in unified export API * Expanded diffusers examples with LoRA fusion guidance and additional model options (Flux, SD3, SDXL variants)  --------- Signed-off-by: Jingyu Xin <jingyux@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

jingyu-ml added 30 commits January 14, 2026 03:55

Your commit message describing all changes

a33cf13

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge the diffusion and llms layer fusion code

dff152b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Create a diffusers utils function, moved some functions to it

9e94843

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

db61c20

Fixed some bugs in the CI/CD

8a81723

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

16a2bbf

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Move one function to diffusers utils

68d5665

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

ace5773

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

removed the DiffusionPipeline import

95dfb52

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the example

302e2f4

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Fixed the CI/CD

8eed21b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the CI/CD

01d31d7

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the Flux example & address Chenjie's comments

ca3fdaa

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

use single line of code

44345f8

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the test case

78f12cc

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Add the support for the WAN video

3911a3d

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Moved the has_quantized_modules to quant utils

4cf9e76

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

moving model specific configs to separate files

1da2b46

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

eafedde

Fixed the CI/CD

3fb8320

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Fixed the cicd

372c6f7

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

reducee the repeated code

e67bf85

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

9b5cf13

Update the lint

e931fbc

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

8b29228

Merge branch 'main' into jingyux/2-3-diffusion-export

b8b5eaf

Add the LTX2 FP8/BF16 support + Some core code changes

b717bae

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/2-3-diffusion-export

0d93e1a

Update

c2aadca

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/2-3-diffusion-export

109c010

jingyu-ml added 5 commits January 31, 2026 00:51

update the readme & doc

7717d5a

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the changelogsource /home/jingyux/miniconda3/bin/activate mode…

8bb048e

…lopt Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge main

6c0fa6b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the default model type

7bb5ca6

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Add the WAN 2.2 support

048296b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested review from a team as code owners February 5, 2026 00:50

jingyu-ml requested a review from kevalmorabia97 February 5, 2026 00:50

jingyu-ml added 2 commits February 5, 2026 00:52

Merge main

6685adf

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

merge main

90b43b7

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

Comment thread examples/diffusers/quantization/ONNX-TRT-Deployment.md

Comment thread examples/diffusers/quantization/ONNX-TRT-Deployment.md

Comment thread examples/diffusers/quantization/pipeline_manager.py

Comment thread examples/diffusers/quantization/quantize.py Outdated

jingyu-ml added 2 commits February 5, 2026 05:37

Fixed the CI/CD

920fd4f

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

merge

04abab5

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1 reviewed Feb 6, 2026

View reviewed changes

jingyu-ml added 3 commits February 6, 2026 22:13

Update the readme name

5d9a212

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update ltx

9df02b0

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/3.1-4-diffusion

9303c36

jingyu-ml requested a review from Edwardf0t1 February 6, 2026 22:17

jingyu-ml added 3 commits February 9, 2026 18:27

Merge branch 'main' into jingyux/3.1-4-diffusion

c6ddc85

Merge main

a7dc3cf

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Fixed the restore bug

f30dd20

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1 approved these changes Feb 10, 2026

View reviewed changes

jingyu-ml added 2 commits February 10, 2026 21:29

Update

fa0681f

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the recipe

383bb96

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml merged commit 10efcb6 into main Feb 11, 2026
37 checks passed

jingyu-ml deleted the jingyux/3.1-4-diffusion branch February 11, 2026 06:34

Conversation

jingyu-ml commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

jingyu-ml Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

jingyu-ml Feb 10, 2026

Choose a reason for hiding this comment

LTX-2 FP4 (torch checkpoint export)

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jingyu-ml commented Feb 5, 2026 •

edited

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading

codecov bot commented Feb 5, 2026 •

edited

Loading

jingyu-ml Feb 6, 2026 •

edited

Loading