Skip to content

sample QAD example script#933

Merged
ynankani merged 6 commits intomainfrom
ynankani/qad_example
Mar 6, 2026
Merged

sample QAD example script#933
ynankani merged 6 commits intomainfrom
ynankani/qad_example

Conversation

@ynankani
Copy link
Copy Markdown
Contributor

@ynankani ynankani commented Feb 25, 2026

What does this PR do?

sample QAD example script

Type of change: ? new example
Example script for QAD on diffusion model like ltx-2

Overview: ?

  1. Model loading
  2. NvFP4 fake quant PTQ using mtq.quantize
  3. Distillation class wrapping using mtd.convert
  4. Using ltx-2 trainer code for training
  5. Checkpoint save in bf16 . post process bf16 model using Comfy-kitchen to produce real quantized model for ComyUI inference

Usage

accelerate launch --config_file fsdp_custom.yaml sample_example_qad_diffusers.py train  --config ltx2_qad.yaml 

Testing

  1. Tested improvement in Vbench score for PTQ and QAD checkpoint.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: NA
  • Did you write any new necessary tests?: NA
  • Did you add or update any necessary documentation?: NA
  • Did you update Changelog?: NA

Additional Information

Summary by CodeRabbit

  • Documentation

    • Added a comprehensive README for the Windows/Linux QAD example covering prerequisites, installation, project layout, usage, training workflow, and inference checkpoint creation.
  • New Features

    • Added a complete Quantization-Aware Distillation example with a ready training configuration, single-node distributed setup, PTQ calibration, teacher–student distillation, CLI for train/create-inference workflows, and an inference-ready checkpoint generator.
    • Added a requirements file listing Python dependencies and tooling.

@ynankani ynankani requested a review from a team as a code owner February 25, 2026 13:29
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Adds a new LTX-2 Quantization-Aware Distillation (QAD) example for Windows/Linux diffusers: README, dependencies, FSDP and LTX config YAMLs, and a large sample script implementing training, ModelOpt PTQ calibration, distillation, checkpointing, and inference-checkpoint creation.

Changes

Cohort / File(s) Summary
Documentation & deps
examples/windows/torch_onnx/diffusers/qad_example/README.md, examples/windows/torch_onnx/diffusers/qad_example/requirements.txt
New README describing the QAD pipeline, usage, project layout, and checkpointing; updated requirements including accelerate, nvidia-modelopt, safetensors, pyyaml, torch>=2.0, and LTX-2 packages from Git.
Training configuration
examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml, examples/windows/torch_onnx/diffusers/qad_example/fsdp_custom.yaml
New LTX-2 YAML configuration defining model, optimization, validation, checkpointing, QAD-specific params and an FSDP local single-node multi-process config with sharded state and transformer wrapping settings.
Implementation / example
examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py
Large new example script adding LtxvQADTrainer, DiffusionMSELoss, ModelOpt PTQ calibration integration, teacher-student distillation logic, robust state-dict loading (torch/safetensors with fallbacks), amax extraction, key filtering, device helpers, quant-config builder, and create_inference_checkpoint utility.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Trainer as LtxvQADTrainer
    participant Quantizer as ModelOpt<br/>Quantizer
    participant Student as Student<br/>Model
    participant Teacher as Teacher<br/>Model
    participant Checkpoint as Checkpoint<br/>Manager

    User->>Trainer: start training (config, data)
    Trainer->>Student: load student model
    Trainer->>Quantizer: run PTQ calibration (calib dataset)
    Quantizer->>Student: compute/calibrate amaxs
    Quantizer-->>Trainer: return amax metadata

    Trainer->>Teacher: load teacher model (if configured)
    Trainer->>Student: wrap for distillation

    loop per training step
      Trainer->>Student: forward(batch)
      Student->>Teacher: request teacher outputs
      Teacher-->>Student: teacher targets
      Student->>Trainer: compute base + KD loss
      Trainer->>Trainer: backward & optimizer step
    end

    Trainer->>Checkpoint: save checkpoint (state, amaxs, filtered keys)
    Checkpoint-->>Trainer: checkpoint saved (safetensors)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Security Anti-Patterns ❓ Inconclusive Cannot access Python source files from PR to verify security anti-patterns per SECURITY.md guidelines; torch.load() and YAML loading patterns cannot be validated. Ensure sample_example_qad_diffusers.py is accessible and verify: torch.load() uses weights_only=True, YAML uses safe_load(), and no eval/exec/# nosec patterns exist.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'sample QAD example script' accurately describes the primary change—adding a new QAD example script for quantization-aware distillation with supporting configuration files and documentation.
✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ynankani/qad_example

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread examples/windows/torch_onnx/diffusers/qad_example/README.md
@vishalpandya1990
Copy link
Copy Markdown
Contributor

To be clear, will this example work for single RTX GPU use-cases (e.g. RTX 5090)? Or, are there any changes / limitations on this part?

@ynankani
Copy link
Copy Markdown
Contributor Author

ynankani commented Mar 3, 2026

To be clear, will this example work for single RTX GPU use-cases (e.g. RTX 5090)? Or, are there any changes / limitations on this part?
Tried to run on 5090 single GPU it fails to load text encoder due to some issue with ltx model loader on windows. and it runs OOM for single 5090 linux

@ynankani ynankani force-pushed the ynankani/qad_example branch from b39e963 to 28d6dc1 Compare March 3, 2026 10:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py (2)

383-392: Consider making calibration seed configurable.

The calibration seed is hardcoded to 42 (line 385). While this ensures reproducibility, consider making it configurable through the trainer config for users who want different calibration orderings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`
around lines 383 - 392, The hardcoded calibration seed (torch.manual_seed(42))
should be made configurable: add a calibration seed option in the trainer/config
(e.g., self._config.optimization.calibration_seed), read that value (fall back
to 42 if not set), and call torch.manual_seed(calibration_seed) before creating
the calib_loader DataLoader (which uses self._config.optimization.batch_size);
this preserves reproducibility while allowing users to change calibration
ordering.

216-228: Nested dict handling is limited to one level.

move_batch_to_device only handles one level of nested dictionaries. If batch data has deeper nesting, inner tensors won't be moved to device. This may be intentional given the expected data structure, but worth noting.

♻️ Optional: recursive implementation
 def move_batch_to_device(batch: dict, device: torch.device) -> dict:
     """Recursively move batch tensors to device."""
+    def _move(obj):
+        if isinstance(obj, dict):
+            return {k: _move(v) for k, v in obj.items()}
+        elif isinstance(obj, torch.Tensor):
+            return obj.to(device)
+        return obj
+    return _move(batch)
-    result = {}
-    for k, v in batch.items():
-        if isinstance(v, dict):
-            result[k] = {
-                ik: iv.to(device) if isinstance(iv, torch.Tensor) else iv for ik, iv in v.items()
-            }
-        elif isinstance(v, torch.Tensor):
-            result[k] = v.to(device)
-        else:
-            result[k] = v
-    return result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`
around lines 216 - 228, The function move_batch_to_device only handles one-level
nested dicts so tensors in deeper nested structures won't be moved; update
move_batch_to_device to recursively traverse arbitrary nested containers (dicts,
lists, tuples) and call .to(device) on any torch.Tensor encountered, preserving
container types and non-tensor values, e.g., by replacing the current one-level
dict branch (isinstance(v, dict)) with a recursive helper or making
move_batch_to_device itself recurse for dict/list/tuple cases and return the
same structure with tensors moved to the provided device.
examples/windows/torch_onnx/diffusers/qad_example/README.md (1)

99-111: Minor style: consider simplifying phrasing.

The static analysis tool suggests "All of the following" could be simplified to "All the following" for conciseness. This is optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/windows/torch_onnx/diffusers/qad_example/README.md` around lines 99
- 111, Replace the verbose phrasing "All of the following" in the README
surrounding the QAD/optimization table with the shorter "All the following";
search the README.md near the QAD section (keys like `qad`, `calib_size`,
`kd_loss_weight`, `exclude_blocks`, `skip_inference_ckpt`) and update the
sentence so it reads "All the following" while preserving surrounding
punctuation and formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml`:
- Around line 70-74: The YAML default settings in ltx2_qad.yaml (calib_size: 10,
kd_loss_weight: 1.0) conflict with the README documented defaults (calib_size:
512, kd_loss_weight: 0.5); pick one source of truth and make them consistent:
either update the YAML parameters calib_size to 512 and kd_loss_weight to 0.5 to
match the README, or update the README entries for calib_size and kd_loss_weight
to reflect the YAML values (10 and 1.0), then run a quick grep through
docs/usage to ensure no other references remain inconsistent.
- Around line 2-6: The YAML contains hardcoded internal Lustre paths for
model.model_path and model.text_encoder_path; replace those absolute internal
filesystem values with clear placeholder paths (e.g.,
"<PATH_TO_MODEL_CHECKPOINT>.safetensors" and "<PATH_TO_TEXT_ENCODER_DIR>") so
external users can update them; update the values for the keys model.model_path
and model.text_encoder_path in ltx2_qad.yaml and keep training_mode and
load_checkpoint unchanged.
- Around line 28-30: Replace the hard-coded internal Lustre path assigned to the
YAML key preprocessed_data_root with a neutral placeholder or environment
variable reference (for example "/path/to/preprocessed_data" or
"${PREPROCESSED_DATA_ROOT}") so external users can supply their own dataset
location; update the value of preprocessed_data_root in ltx2_qad.yaml
accordingly.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 942-961: The CLI-vs-YAML override wrongly treats explicit CLI
values equal to hardcoded defaults as "not provided"; change the argparse
defaults to sentinel (e.g., None) and update the resolution code to check for
None instead of comparing to hardcoded defaults: replace the current assignments
for calib_size, kd_loss_weight, and exclude_blocks to pick args.<name> if
args.<name> is not None else qad_config.get("<name>", <hardcoded_default>) so
explicit CLI values (including previous defaults) correctly override YAML;
reference the symbols args, calib_size, kd_loss_weight, and exclude_blocks when
making the edits.
- Around line 134-154: The call to torch.load in load_state_dict_any_format uses
weights_only=False without justification; update this by either (A) adding an
inline comment immediately next to the torch.load call explaining why
deserializing pickles here is safe (e.g., sources are internal/trusted and not
user-supplied), or (B) change the call to torch.load(..., weights_only=True) and
adjust subsequent logic to handle the restricted load, or (C) prefer/require
safetensors (detect_format fallback) for untrusted inputs and log/warn or raise
if a torch pickle is provided from an untrusted path; reference the torch.load
invocation and the weights_only parameter in your change.

---

Nitpick comments:
In `@examples/windows/torch_onnx/diffusers/qad_example/README.md`:
- Around line 99-111: Replace the verbose phrasing "All of the following" in the
README surrounding the QAD/optimization table with the shorter "All the
following"; search the README.md near the QAD section (keys like `qad`,
`calib_size`, `kd_loss_weight`, `exclude_blocks`, `skip_inference_ckpt`) and
update the sentence so it reads "All the following" while preserving surrounding
punctuation and formatting.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 383-392: The hardcoded calibration seed (torch.manual_seed(42))
should be made configurable: add a calibration seed option in the trainer/config
(e.g., self._config.optimization.calibration_seed), read that value (fall back
to 42 if not set), and call torch.manual_seed(calibration_seed) before creating
the calib_loader DataLoader (which uses self._config.optimization.batch_size);
this preserves reproducibility while allowing users to change calibration
ordering.
- Around line 216-228: The function move_batch_to_device only handles one-level
nested dicts so tensors in deeper nested structures won't be moved; update
move_batch_to_device to recursively traverse arbitrary nested containers (dicts,
lists, tuples) and call .to(device) on any torch.Tensor encountered, preserving
container types and non-tensor values, e.g., by replacing the current one-level
dict branch (isinstance(v, dict)) with a recursive helper or making
move_batch_to_device itself recurse for dict/list/tuple cases and return the
same structure with tensors moved to the provided device.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b39e963d65dfed2a22a963db7d1e789abf1b167d and 28d6dc1.

📒 Files selected for processing (5)
  • examples/windows/torch_onnx/diffusers/qad_example/README.md
  • examples/windows/torch_onnx/diffusers/qad_example/fsdp_custom.yaml
  • examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml
  • examples/windows/torch_onnx/diffusers/qad_example/requirements.txt
  • examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/windows/torch_onnx/diffusers/qad_example/requirements.txt
  • examples/windows/torch_onnx/diffusers/qad_example/fsdp_custom.yaml

Comment on lines +2 to +6
model:
model_path: "/lustre/fsw/portfolios/adlr/projects/adlr_psx_numerics/users/ynankani/ComfyUI/models/checkpoints/ltx-av-step-1933500-split-new-vae.safetensors"
training_mode: "full"
load_checkpoint:
text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace internal filesystem paths with placeholders.

The config contains hardcoded paths to an internal Lustre filesystem that won't work for external users:

  • model_path: /lustre/fsw/.../ltx-av-step-1933500-split-new-vae.safetensors
  • text_encoder_path: /lustre/fsw/.../gemma

Replace with placeholder paths that users can easily identify and update.

📝 Suggested placeholder paths
 model:
-  model_path: "/lustre/fsw/portfolios/adlr/projects/adlr_psx_numerics/users/ynankani/ComfyUI/models/checkpoints/ltx-av-step-1933500-split-new-vae.safetensors"
+  model_path: "/path/to/ltx2/checkpoint.safetensors"  # TODO: Set your LTX-2 checkpoint path
   training_mode: "full"
   load_checkpoint:
-  text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma"
+  text_encoder_path: "/path/to/gemma"  # TODO: Set your Gemma text encoder path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml` around lines
2 - 6, The YAML contains hardcoded internal Lustre paths for model.model_path
and model.text_encoder_path; replace those absolute internal filesystem values
with clear placeholder paths (e.g., "<PATH_TO_MODEL_CHECKPOINT>.safetensors" and
"<PATH_TO_TEXT_ENCODER_DIR>") so external users can update them; update the
values for the keys model.model_path and model.text_encoder_path in
ltx2_qad.yaml and keep training_mode and load_checkpoint unchanged.

Comment on lines +28 to +30
data:
preprocessed_data_root: "/lustre/fsw/portfolios/adlr/users/scavallari/ltx-qad/qad-dataset"
num_dataloader_workers: 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace internal data path with placeholder.

The preprocessed_data_root path is also an internal Lustre path that external users cannot access.

📝 Suggested fix
 data:
-  preprocessed_data_root: "/lustre/fsw/portfolios/adlr/users/scavallari/ltx-qad/qad-dataset"
+  preprocessed_data_root: "/path/to/preprocessed"  # TODO: Set your preprocessed dataset path
   num_dataloader_workers: 2
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data:
preprocessed_data_root: "/lustre/fsw/portfolios/adlr/users/scavallari/ltx-qad/qad-dataset"
num_dataloader_workers: 2
data:
preprocessed_data_root: "/path/to/preprocessed" # TODO: Set your preprocessed dataset path
num_dataloader_workers: 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml` around lines
28 - 30, Replace the hard-coded internal Lustre path assigned to the YAML key
preprocessed_data_root with a neutral placeholder or environment variable
reference (for example "/path/to/preprocessed_data" or
"${PREPROCESSED_DATA_ROOT}") so external users can supply their own dataset
location; update the value of preprocessed_data_root in ltx2_qad.yaml
accordingly.

Comment on lines +70 to +74
qad:
calib_size: 10
kd_loss_weight: 1.0
exclude_blocks: [0, 1, 46, 47]
skip_inference_ckpt: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation mismatch: YAML defaults differ from README.

The YAML default values don't match the README documentation (lines 101-102):

Parameter YAML default README states
calib_size 10 512
kd_loss_weight 1.0 0.5

Either update the YAML to match the README or update the README to reflect the actual defaults.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml` around lines
70 - 74, The YAML default settings in ltx2_qad.yaml (calib_size: 10,
kd_loss_weight: 1.0) conflict with the README documented defaults (calib_size:
512, kd_loss_weight: 0.5); pick one source of truth and make them consistent:
either update the YAML parameters calib_size to 512 and kd_loss_weight to 0.5 to
match the README, or update the README entries for calib_size and kd_loss_weight
to reflect the YAML values (10 and 1.0), then run a quick grep through
docs/usage to ensure no other references remain inconsistent.

Comment on lines +134 to +154
def load_state_dict_any_format(path: str, label: str = "") -> tuple[dict, dict | None]:
"""Load state dict from either torch pickle or safetensors."""
fmt = detect_format(path)
logger.info(f"[{label}] Detected format: {fmt} for {path}")

if fmt == "torch":
raw = torch.load(path, map_location="cpu", weights_only=False)
if isinstance(raw, dict) and "state_dict" in raw:
return raw["state_dict"], None
return raw, None
else:
try:
from safetensors.torch import load_file, safe_open

with safe_open(path, framework="pt", device="cpu") as f:
metadata = f.metadata() or {}
return load_file(path, device="cpu"), metadata
except Exception as e:
logger.warning(f"safe_open failed ({e}), trying manual parse...")
return _load_safetensors_manual(path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: torch.load with weights_only=False requires justification comment.

Per the security coding guidelines in SECURITY.md, using torch.load(..., weights_only=False) without an inline comment justifying why it is safe is a critical security issue. Pickle deserialization can execute arbitrary code, and this function loads user-provided checkpoint paths.

Either:

  1. Add an inline comment explaining why this is safe (e.g., files are internally-generated and not user-supplied), OR
  2. Use weights_only=True and handle the restricted loading, OR
  3. Prefer safetensors format which doesn't have this vulnerability
🔒 Option 1: Add justification comment
     if fmt == "torch":
-        raw = torch.load(path, map_location="cpu", weights_only=False)
+        # Safe: Checkpoint paths are provided by the user who controls their own training pipeline.
+        # Users are responsible for ensuring checkpoint provenance.
+        raw = torch.load(path, map_location="cpu", weights_only=False)
         if isinstance(raw, dict) and "state_dict" in raw:
             return raw["state_dict"], None
         return raw, None
🔒 Option 2: Add warning about untrusted files
 def load_state_dict_any_format(path: str, label: str = "") -> tuple[dict, dict | None]:
-    """Load state dict from either torch pickle or safetensors."""
+    """Load state dict from either torch pickle or safetensors.
+    
+    WARNING: Torch pickle format uses unsafe deserialization. Only load
+    checkpoints from trusted sources. Prefer safetensors format when possible.
+    """
     fmt = detect_format(path)
     logger.info(f"[{label}] Detected format: {fmt} for {path}")
 
     if fmt == "torch":
+        logger.warning(
+            f"Loading torch pickle checkpoint from {path}. "
+            "Only load checkpoints from trusted sources."
+        )
         raw = torch.load(path, map_location="cpu", weights_only=False)

As per coding guidelines: "torch.load(..., weights_only=False) with no inline comment justifying why it is safe" must be flagged as a CRITICAL security issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`
around lines 134 - 154, The call to torch.load in load_state_dict_any_format
uses weights_only=False without justification; update this by either (A) adding
an inline comment immediately next to the torch.load call explaining why
deserializing pickles here is safe (e.g., sources are internal/trusted and not
user-supplied), or (B) change the call to torch.load(..., weights_only=True) and
adjust subsequent logic to handle the restricted load, or (C) prefer/require
safetensors (detect_format fallback) for untrusted inputs and log/warn or raise
if a torch pickle is provided from an untrusted path; reference the torch.load
invocation and the weights_only parameter in your change.

Comment on lines +942 to +961
import yaml

with open(args.config) as f:
config_dict = yaml.safe_load(f)

# Extract QAD-specific config (not part of LtxTrainerConfig)
qad_config = config_dict.pop("qad", {})

config = LtxTrainerConfig(**config_dict)

# Resolve QAD params: CLI args override YAML values, YAML overrides defaults
calib_size = args.calib_size if args.calib_size != 512 else qad_config.get("calib_size", 512)
kd_loss_weight = (
args.kd_loss_weight if args.kd_loss_weight != 0.5 else qad_config.get("kd_loss_weight", 0.5)
)
exclude_blocks = (
args.exclude_blocks
if args.exclude_blocks != [0, 1, 46, 47]
else qad_config.get("exclude_blocks", [0, 1, 46, 47])
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

CLI override logic has edge case: user explicitly passing default value.

The override logic checks if CLI args differ from hardcoded defaults to determine whether to use CLI or YAML values:

calib_size = args.calib_size if args.calib_size != 512 else qad_config.get("calib_size", 512)

If a user explicitly passes --calib-size 512 to override a YAML value of 256, the YAML value (256) would be used instead. Consider using argparse's default tracking or explicit "was this arg provided" checking.

♻️ Suggested fix using sentinel values
     train_parser.add_argument(
         "--calib-size",
         type=int,
-        default=512,
+        default=None,
         help="Number of calibration batches for PTQ",
     )
     train_parser.add_argument(
         "--kd-loss-weight",
         type=float,
-        default=0.5,
+        default=None,
         help="KD loss weight (0=pure hard loss, 1=pure KD loss)",
     )
     train_parser.add_argument(
         "--exclude-blocks",
         type=int,
         nargs="*",
-        default=[0, 1, 46, 47],
+        default=None,
         help="Transformer block indices to exclude from quantization",
     )

Then in main:

-    calib_size = args.calib_size if args.calib_size != 512 else qad_config.get("calib_size", 512)
+    calib_size = args.calib_size if args.calib_size is not None else qad_config.get("calib_size", 512)
-    kd_loss_weight = (
-        args.kd_loss_weight if args.kd_loss_weight != 0.5 else qad_config.get("kd_loss_weight", 0.5)
-    )
+    kd_loss_weight = (
+        args.kd_loss_weight if args.kd_loss_weight is not None else qad_config.get("kd_loss_weight", 0.5)
+    )
-    exclude_blocks = (
-        args.exclude_blocks
-        if args.exclude_blocks != [0, 1, 46, 47]
-        else qad_config.get("exclude_blocks", [0, 1, 46, 47])
-    )
+    exclude_blocks = (
+        args.exclude_blocks
+        if args.exclude_blocks is not None
+        else qad_config.get("exclude_blocks", [0, 1, 46, 47])
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`
around lines 942 - 961, The CLI-vs-YAML override wrongly treats explicit CLI
values equal to hardcoded defaults as "not provided"; change the argparse
defaults to sentinel (e.g., None) and update the resolution code to check for
None instead of comparing to hardcoded defaults: replace the current assignments
for calib_size, kd_loss_weight, and exclude_blocks to pick args.<name> if
args.<name> is not None else qad_config.get("<name>", <hardcoded_default>) so
explicit CLI values (including previous defaults) correctly override YAML;
reference the symbols args, calib_size, kd_loss_weight, and exclude_blocks when
making the edits.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.13%. Comparing base (42482b1) to head (bdf2f3e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #933   +/-   ##
=======================================
  Coverage   72.13%   72.13%           
=======================================
  Files         209      209           
  Lines       23631    23631           
=======================================
  Hits        17046    17046           
  Misses       6585     6585           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@vishalpandya1990 vishalpandya1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note about sample-script for illustrating pipeline, and Linux RTX verification of it.

ynankani added 6 commits March 5, 2026 19:25
Signed-off-by: ynankani <ynankani@nvidia.com>
Signed-off-by: ynankani <ynankani@nvidia.com>
Signed-off-by: ynankani <ynankani@nvidia.com>
Signed-off-by: ynankani <ynankani@nvidia.com>
Signed-off-by: ynankani <ynankani@nvidia.com>
Signed-off-by: ynankani <ynankani@nvidia.com>
@ynankani ynankani force-pushed the ynankani/qad_example branch from 28d6dc1 to bdf2f3e Compare March 6, 2026 03:26
@ynankani ynankani enabled auto-merge (squash) March 6, 2026 03:26
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py (2)

139-143: ⚠️ Potential issue | 🔴 Critical

Critical: unsafe pickle deserialization in checkpoint loading path.

torch.load(..., weights_only=False) is used on file paths that come from CLI inputs, which keeps pickle object deserialization enabled.

🔒 Suggested fix
-        raw = torch.load(path, map_location="cpu", weights_only=False)
+        raw = torch.load(path, map_location="cpu", weights_only=True)

As per coding guidelines: "Do not use torch.load(..., weights_only=False) unless a documented exception is provided."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`
around lines 139 - 143, The code currently calls torch.load(path,
map_location="cpu", weights_only=False) which allows unsafe pickle
deserialization; change the call to torch.load(path, map_location="cpu",
weights_only=True) to avoid untrusted pickle deserialization when loading
CLI-provided checkpoints (keep the existing handling of the returned object and
"state_dict" key), and add a short fallback/error path if weights_only=True
fails for an older checkpoint format so we fail safe rather than re-enabling
pickle.

845-861: ⚠️ Potential issue | 🟡 Minor

CLI override precedence still drops explicit default-valued args.

If a user explicitly passes --calib-size 512 (or equivalent default), YAML still wins due to value-comparison logic.

♻️ Suggested fix
     train_parser.add_argument(
         "--calib-size",
         type=int,
-        default=512,
+        default=None,
         help="Number of calibration batches for PTQ",
     )
     train_parser.add_argument(
         "--kd-loss-weight",
         type=float,
-        default=0.5,
+        default=None,
         help="KD loss weight (0=pure hard loss, 1=pure KD loss)",
     )
     train_parser.add_argument(
         "--exclude-blocks",
         type=int,
         nargs="*",
-        default=[0, 1, 46, 47],
+        default=None,
         help="Transformer block indices to exclude from quantization",
     )
@@
-    calib_size = args.calib_size if args.calib_size != 512 else qad_config.get("calib_size", 512)
+    calib_size = args.calib_size if args.calib_size is not None else qad_config.get("calib_size", 512)
     kd_loss_weight = (
-        args.kd_loss_weight if args.kd_loss_weight != 0.5 else qad_config.get("kd_loss_weight", 0.5)
+        args.kd_loss_weight if args.kd_loss_weight is not None else qad_config.get("kd_loss_weight", 0.5)
     )
     exclude_blocks = (
-        args.exclude_blocks
-        if args.exclude_blocks != [0, 1, 46, 47]
-        else qad_config.get("exclude_blocks", [0, 1, 46, 47])
+        args.exclude_blocks if args.exclude_blocks is not None else qad_config.get("exclude_blocks", [0, 1, 46, 47])
     )

Also applies to: 953-961

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`
around lines 845 - 861, The CLI currently treats arguments equal to their
default as "not provided" so YAML values win; change the parser so
explicitly-passed defaults are preserved by constructing the ArgumentParser with
argument_default=argparse.SUPPRESS (or otherwise suppressing defaults) and then
checking for presence in vars(args) when merging CLI→YAML; update the parser
that defines train_parser (and any other parser instances that add --calib-size,
--kd-loss-weight, --exclude-blocks and the similar args around lines 953-961) so
add_argument calls remain the same but the parser is created with
argument_default=argparse.SUPPRESS and the merge logic uses "if 'calib_size' in
vars(args)" (and analogous keys for kd_loss_weight and exclude_blocks) to
respect explicit CLI values even when they equal the default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/windows/torch_onnx/diffusers/qad_example/README.md`:
- Around line 66-70: The README has a filename mismatch: the prose references
preprocess_dataset.py while the example command runs scripts/process_dataset.py;
make them consistent by updating either the descriptive text or the command so
both refer to the same script name (choose one: preprocess_dataset.py or
scripts/process_dataset.py) and ensure the example invocation (the bash block)
uses that exact filename along with the shown arguments (e.g.,
--resolution-buckets 384x256x97), so copy-paste users won’t fail.
- Line 3: Update the note sentence that reads "This is a sample script...
verified to run on a Linux RTX 5090 system, but runs into OOM" to explicitly
state support status: indicate that the example can run for small-scale
inference or demo on Linux RTX 5090 but is not supported for full training due
to OOM (i.e., full training is unsupported on a single RTX 5090), and add a
separate short clause calling out the Windows text-encoder load failure as a
known limitation; ensure the revised README message clearly differentiates
"works for demo/inference" vs "not suitable for full training" and references
the Windows text-encoder issue.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 386-392: The calibration loop can crash from an unhandled
StopIteration because DataLoader with drop_last=True may yield zero batches;
update the logic around calib_loader, calib_steps and data_iter to (1) compute
calib_steps from the loader itself (e.g., use len(calib_loader) or
floor(len(dataset)/batch_size when drop_last=True) instead of len(dataset) so
steps match actual batches) and (2) robustly handle StopIteration by checking if
len(calib_loader) == 0 and skipping calibration, or by wrapping both attempts to
call next(data_iter) (the code using data_iter and next(data_iter)) in a
try-except that recreates the iterator and if still raises StopIteration, exits
the calibration loop gracefully rather than assuming a second next will succeed.

---

Duplicate comments:
In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 139-143: The code currently calls torch.load(path,
map_location="cpu", weights_only=False) which allows unsafe pickle
deserialization; change the call to torch.load(path, map_location="cpu",
weights_only=True) to avoid untrusted pickle deserialization when loading
CLI-provided checkpoints (keep the existing handling of the returned object and
"state_dict" key), and add a short fallback/error path if weights_only=True
fails for an older checkpoint format so we fail safe rather than re-enabling
pickle.
- Around line 845-861: The CLI currently treats arguments equal to their default
as "not provided" so YAML values win; change the parser so explicitly-passed
defaults are preserved by constructing the ArgumentParser with
argument_default=argparse.SUPPRESS (or otherwise suppressing defaults) and then
checking for presence in vars(args) when merging CLI→YAML; update the parser
that defines train_parser (and any other parser instances that add --calib-size,
--kd-loss-weight, --exclude-blocks and the similar args around lines 953-961) so
add_argument calls remain the same but the parser is created with
argument_default=argparse.SUPPRESS and the merge logic uses "if 'calib_size' in
vars(args)" (and analogous keys for kd_loss_weight and exclude_blocks) to
respect explicit CLI values even when they equal the default.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 11b1053f-f84a-482c-853d-306709d229d2

📥 Commits

Reviewing files that changed from the base of the PR and between 28d6dc1 and bdf2f3e.

📒 Files selected for processing (5)
  • examples/windows/torch_onnx/diffusers/qad_example/README.md
  • examples/windows/torch_onnx/diffusers/qad_example/fsdp_custom.yaml
  • examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml
  • examples/windows/torch_onnx/diffusers/qad_example/requirements.txt
  • examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml
  • examples/windows/torch_onnx/diffusers/qad_example/requirements.txt
  • examples/windows/torch_onnx/diffusers/qad_example/fsdp_custom.yaml

@@ -0,0 +1,162 @@
# LTX-2 QAD Example (Quantization-Aware Distillation)

**Note:** This is a **sample script for illustrating the QAD pipeline**. It has been verified to run on a **Linux RTX 5090** system, but runs into **OOM (Out of Memory)** on that configuration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the single-RTX support statement.

“Verified to run on Linux RTX 5090” conflicts with “runs into OOM” in the same sentence. Please state explicitly whether this setup is unsupported for full training (and note the Windows text-encoder load failure as a known limitation).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/windows/torch_onnx/diffusers/qad_example/README.md` at line 3,
Update the note sentence that reads "This is a sample script... verified to run
on a Linux RTX 5090 system, but runs into OOM" to explicitly state support
status: indicate that the example can run for small-scale inference or demo on
Linux RTX 5090 but is not supported for full training due to OOM (i.e., full
training is unsupported on a single RTX 5090), and add a separate short clause
calling out the Windows text-encoder load failure as a known limitation; ensure
the revised README message clearly differentiates "works for demo/inference" vs
"not suitable for full training" and references the Windows text-encoder issue.

Comment on lines +66 to +70
Run the LTX preprocessing script to extract latents and text embeddings from your videos. Use `preprocess_dataset.py` with the following arguments (matching the LTX training pipeline):

```bash
python scripts/process_dataset.py /path/to/dataset.json \
--resolution-buckets 384x256x97 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dataset preprocessing command name is inconsistent.

The text says preprocess_dataset.py, but the command uses scripts/process_dataset.py. Please align naming to avoid copy-paste failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/windows/torch_onnx/diffusers/qad_example/README.md` around lines 66
- 70, The README has a filename mismatch: the prose references
preprocess_dataset.py while the example command runs scripts/process_dataset.py;
make them consistent by updating either the descriptive text or the command so
both refer to the same script name (choose one: preprocess_dataset.py or
scripts/process_dataset.py) and ensure the example invocation (the bash block)
uses that exact filename along with the shown arguments (e.g.,
--resolution-buckets 384x256x97), so copy-paste users won’t fail.

Comment on lines +386 to +392
calib_loader = DataLoader(
dataset,
batch_size=self._config.optimization.batch_size,
shuffle=False,
num_workers=0,
drop_last=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py

Repository: NVIDIA/Model-Optimizer

Length of output: 152


🏁 Script executed:

sed -n '380,425p' examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1786


Uncaught StopIteration exception crashes calibration loop when dataset size is smaller than batch size.

The calibration loop exception handling at lines 406–408 has a critical flaw: when drop_last=True and len(dataset) < batch_size, the DataLoader produces zero batches. The first next(data_iter) raises StopIteration, the except block catches it and recreates the iterator, but the second next(data_iter) on line 408 is not wrapped in a try-except and will raise StopIteration again, crashing the calibration without a handler.

Additionally, line 402 uses len(dataset) to compute calib_steps, but with drop_last=True, the actual number of batches may differ from dataset size.

Suggested fix
         calib_loader = DataLoader(
             dataset,
             batch_size=self._config.optimization.batch_size,
             shuffle=False,
             num_workers=0,
-            drop_last=True,
+            drop_last=False,
         )
+        if len(calib_loader) == 0:
+            logger.error("Calibration loader is empty. Check dataset size and batch_size.")
+            return
-
-        calib_steps = min(self._calib_size, len(dataset))
+        calib_steps = min(self._calib_size, len(calib_loader))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`
around lines 386 - 392, The calibration loop can crash from an unhandled
StopIteration because DataLoader with drop_last=True may yield zero batches;
update the logic around calib_loader, calib_steps and data_iter to (1) compute
calib_steps from the loader itself (e.g., use len(calib_loader) or
floor(len(dataset)/batch_size when drop_last=True) instead of len(dataset) so
steps match actual batches) and (2) robustly handle StopIteration by checking if
len(calib_loader) == 0 and skipping calibration, or by wrapping both attempts to
call next(data_iter) (the code using data_iter and next(data_iter)) in a
try-except that recreates the iterator and if still raises StopIteration, exits
the calibration loop gracefully rather than assuming a second next will succeed.

@ynankani ynankani merged commit 296a865 into main Mar 6, 2026
38 checks passed
@ynankani ynankani deleted the ynankani/qad_example branch March 6, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants