[3.1/4] Diffusion Quantized ckpt export - WAN 2.2 14B#855
Conversation
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…lopt Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request extends Hugging Face export support to diffusers models, introduces comprehensive ONNX export and TensorRT engine build documentation, and refactors the quantization pipeline to support per-backbone operations instead of single-backbone workflows. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@examples/diffusers/quantization/ONNX.md`:
- Line 26: Standardize the checkpoint placeholder used in the docs: replace the
inconsistent {MODEL_NAME} and {MODEL} occurrences with a single chosen
placeholder (e.g., {MODEL_NAME}) for the --quantized-torch-ckpt-save-path and
all related examples; update every instance including the command example shown
and the other occurrence around line 118 so that
--quantized-torch-ckpt-save-path ./{MODEL_NAME}.pt and any references
(README/usage examples) consistently use the same placeholder name.
- Around line 45-48: Update the TensorRT version guidance and SVDQuant claim and
standardize placeholders: clarify that the "INT8 requires >= 9.2.0" statement is
specific to LLM inference on select GPUs (A100, A10G, L4, L40, L40S, H100,
GH200) and note NVIDIA's general production recommendation of TensorRT 8.6.1;
keep the FP8 guidance (TensorRT >= 10.2.0) but scope it similarly; replace the
incorrect blanket "SVDQuant deployment is currently not supported" with a
corrected note that SVDQuant is supported via NVIDIA ModelOpt and can be
integrated with TensorRT (with additional complexity and runtime
considerations); and standardize all placeholders to a single token (choose
{MODEL_NAME} and replace all occurrences of {MODEL} accordingly).
In `@examples/diffusers/quantization/pipeline_manager.py`:
- Around line 184-199: The code currently does list(self.config.backbone) which
splits strings into characters; instead normalize self.config.backbone into a
list of backbone names by checking if it's a str and splitting on commas
(str.split(",") with strip on each token) or otherwise converting the iterable
to a list, then assign to names; keep the existing LTX2 branch using
ModelType.LTX2 and _ensure_ltx2_transformer_cached (yielding name and
self._transformer), preserve the RuntimeError when names is empty, and continue
using getattr(self.pipe, name, None) for each normalized name to raise the same
missing-backbone error if a module is absent.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #855 +/- ##
=======================================
Coverage 73.44% 73.44%
=======================================
Files 197 197
Lines 20657 20657
=======================================
Hits 15172 15172
Misses 5485 5485 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
| --extra-param checkpoint_path=./ltx-2-19b-dev-fp8.safetensors \ | ||
| --extra-param distilled_lora_path=./ltx-2-19b-distilled-lora-384.safetensors \ | ||
| --extra-param spatial_upsampler_path=./ltx-2-spatial-upscaler-x2-1.0.safetensors \ |
There was a problem hiding this comment.
Are these ckpts pre-generated?
There was a problem hiding this comment.
No, user has to download some from elsewhere and added the LTX2 link so they can follow up for LTX2.
If users want to use LTX2, they can figure out how to use it themselves, we can’t put everything into MO example. let me know if you have a different thought
There was a problem hiding this comment.
I see, let's add a note about it, about where these ckpts can be found.
There was a problem hiding this comment.
Sure, added it:
LTX-2 FP4 (torch checkpoint export)
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
## What does this PR do?
**Type of change:** documentation <!-- Use one of the following: Bug
fix, new feature, new example, new tests, documentation. -->
**Overview:**
1. Added multi‑backbone support for quantization: --backbone now accepts
space- or comma-separated lists and resolves to a list of backbone
modules.
2. Introduced PipelineManager.iter_backbones() to iterate named backbone
modules and updated get_backbone() to return a single module or a
ModuleList for multi‑backbone.
3. Updated ExportManager to save/restore per‑backbone checkpoints when a
directory is provided, with {backbone_name}.pt files, and to create
target directories when missing.
4. Simplified save_checkpoint() calls to rely on the registered
pipeline_manager by default.
**Usage: **
```bash
python quantize.py --model wan2.2-t2v-14b --format fp4 --batch-size 1 --calib-size 32 \
--n-steps 30 --backbone transformer transformer_2 --model-dtype BFloat16 \
--quantized-torch-ckpt-save-path ./wan22_mo_ckpts \
--hf-ckpt-dir ./wan2.2-t2v-14b
```
Plans
- [x] [1/4] Add the basic functionalities to support limited image
models with NVFP4 + FP8, with some refactoring on the previous LLM code
and the diffusers example. PIC: @jingyu-ml
- [x] [2/4] Add support to more video gen models. PIC: @jingyu-ml
- [x] [3/4] Add test cases, refactor on the doc, and all related README.
PIC: @jingyu-ml
- [ ] [4/4] Add the final support to ComfyUI. PIC @jingyu-ml
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: No <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**:No
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Unified Hugging Face export support for diffusers pipelines and
components
* LTX-2 and Wan2.2 (T2V) support in diffusers quantization workflow
* Comprehensive ONNX export and TensorRT engine build documentation for
diffusion models
* **Documentation**
* Updated to clarify support for both transformers and diffusers models
in unified export API
* Expanded diffusers examples with LoRA fusion guidance and additional
model options (Flux, SD3, SDXL variants)
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
## What does this PR do?
**Type of change:** documentation <!-- Use one of the following: Bug
fix, new feature, new example, new tests, documentation. -->
**Overview:**
1. Added multi‑backbone support for quantization: --backbone now accepts
space- or comma-separated lists and resolves to a list of backbone
modules.
2. Introduced PipelineManager.iter_backbones() to iterate named backbone
modules and updated get_backbone() to return a single module or a
ModuleList for multi‑backbone.
3. Updated ExportManager to save/restore per‑backbone checkpoints when a
directory is provided, with {backbone_name}.pt files, and to create
target directories when missing.
4. Simplified save_checkpoint() calls to rely on the registered
pipeline_manager by default.
**Usage: **
```bash
python quantize.py --model wan2.2-t2v-14b --format fp4 --batch-size 1 --calib-size 32 \
--n-steps 30 --backbone transformer transformer_2 --model-dtype BFloat16 \
--quantized-torch-ckpt-save-path ./wan22_mo_ckpts \
--hf-ckpt-dir ./wan2.2-t2v-14b
```
Plans
- [x] [1/4] Add the basic functionalities to support limited image
models with NVFP4 + FP8, with some refactoring on the previous LLM code
and the diffusers example. PIC: @jingyu-ml
- [x] [2/4] Add support to more video gen models. PIC: @jingyu-ml
- [x] [3/4] Add test cases, refactor on the doc, and all related README.
PIC: @jingyu-ml
- [ ] [4/4] Add the final support to ComfyUI. PIC @jingyu-ml
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: No <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**:No
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Unified Hugging Face export support for diffusers pipelines and
components
* LTX-2 and Wan2.2 (T2V) support in diffusers quantization workflow
* Comprehensive ONNX export and TensorRT engine build documentation for
diffusion models
* **Documentation**
* Updated to clarify support for both transformers and diffusers models
in unified export API
* Expanded diffusers examples with LoRA fusion guidance and additional
model options (Flux, SD3, SDXL variants)
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?
Type of change: documentation
Overview:
**Usage: **
python quantize.py --model wan2.2-t2v-14b --format fp4 --batch-size 1 --calib-size 32 \ --n-steps 30 --backbone transformer transformer_2 --model-dtype BFloat16 \ --quantized-torch-ckpt-save-path ./wan22_mo_ckpts \ --hf-ckpt-dir ./wan2.2-t2v-14bPlans
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Release Notes
New Features
Documentation