[None][test] Enable single-GPU LPIPS CI protection for VisualGen models by chang-l · Pull Request #15564 · NVIDIA/TensorRT-LLM

chang-l · 2026-06-24T00:26:44Z

Description

Enable Stage-1 end-to-end LPIPS golden accuracy protection for all single-GPU VisualGen models.

This PR adds QwenImage and Cosmos3-Nano coverage, refreshes all eight single-GPU goldens from one pinned build, explicitly disables pipeline torch compile for every LPIPS generator, and removes the remaining single-GPU LPIPS waivers.

It also unwaives test_wan_t2v_example, which passed locally on B200.

Single-GPU coverage

FLUX.1-dev (T2I)
FLUX.2-dev (T2I)
LTX-2 (T2V)
Wan2.1 T2V
Wan2.2 T2V
Qwen-Image (T2I)
Cosmos3-Nano (T2I)
Cosmos3-Nano (T2V)

The three newly registered B200 tests are:

test_qwenimage_lpips_against_golden
test_cosmos3_nano_t2i_lpips_against_golden
test_cosmos3_nano_t2v_lpips_against_golden

Multi-GPU waiver scope

The six pre-existing multi-GPU LPIPS waivers remain unchanged.

The two newly registered eight-rank variants were handled according to the measured compile-off results:

attn2d_2x2_ulysses2: LPIPS 0.232348 — below the unchanged 0.25 threshold, so it is enabled
cfg2_ulysses2_attn2d_2x1: LPIPS 0.255208 — above 0.25, so it remains waived under NVBug 6272644

No LPIPS threshold was changed.

Determinism and goldens

Every LPIPS pipeline uses:

TorchCompileConfig(enable=False)
torch.use_deterministic_algorithms(True)
the existing LPIPS thresholds and AlexNet backbone

All eight golden JSON files record the model parameters, compile/deterministic modes, package versions, TRT-LLM commit, and digest-qualified container image.

Golden environment:

Container: urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm-staging/release@sha256:3308a2dc0192a8329ea02eca7b5c44f290f5e894cd8c5921099308d84c3e5691
TensorRT-LLM: 1.3.0rc20
TRT-LLM commit: 85665f5fd331d0154a78172954846d843085e83f
diffusers: 0.38.0

That TRT-LLM build contains the Cosmos3 accuracy fix from #15545.

Verification

Rebased onto current upstream/main
All eight single-GPU LPIPS tests passed locally on B200 at 0.000000
test_wan_t2v_example passed locally on B200
attn2d_2x2_ulysses2 passed at LPIPS 0.232348
cfg2_ulysses2_attn2d_2x1 measured LPIPS 0.255208 and remains waived
A fresh deterministic, compile-off Wan2.2 regeneration was byte-identical to the ZIP golden (SHA-256 470c6fa7a013a8e691e5fa8cd040ff98eacbf60c13d5c7a88ec8f5f56c656d95)
Pre-commit suite passed

coderabbitai · 2026-06-24T00:32:34Z

📝 Walkthrough

Walkthrough

Adds LPIPS golden test coverage for two new VisualGen models: QwenImage (text-to-image) and Cosmos3-Nano (text-to-video and text-to-image). Three golden JSON fixtures, new generation helpers, an Inductor compile-worker quiesce fix applied across all LPIPS paths, three new integration tests, B200 test-list entries, and removal of six stale LPIPS waivers are included.

Changes

VisualGen LPIPS Golden Tests for QwenImage and Cosmos3-Nano

Layer / File(s)	Summary
Golden JSON fixtures and LFS tracking `.gitattributes`, `tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/qwenimage_lpips_golden.json`, `tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2i_lpips_golden.json`, `tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2v_lpips_golden_video.json`	Three new golden JSON fixtures define expected LPIPS configurations for QwenImage T2I, Cosmos3-Nano T2I, and Cosmos3-Nano T2V; a new LFS rule tracks `.zip` artifacts for the `visual_gen_lpips` directory.
Test constants and Inductor imports `tests/integration/defs/examples/visual_gen/test_visual_gen.py`	Imports `torch._inductor.config` and `shutdown_compile_workers`; declares LPIPS configuration constants for QwenImage and Cosmos3-Nano (prompts, dimensions, inference steps, thresholds).
Inductor quiesce mitigation and CUDA cleanup `tests/integration/defs/examples/visual_gen/test_visual_gen.py`	Adds `_disable_inductor_compile_worker_quiesce()` to disable the async compile pool quiesce flag and extends `_cleanup_cuda()` to call `shutdown_compile_workers()`; inserts the quiesce call into Flux, LTX2, and Wan LPIPS generation paths.
QwenImage and Cosmos3-Nano generation helpers `tests/integration/defs/examples/visual_gen/test_visual_gen.py`	Implements `_generate_qwenimage_lpips_image`, `_run_cosmos3_lpips_pipeline` (sets `DISABLE_GUARDRAILS` env var), `_generate_cosmos3_lpips_video`, and `_generate_cosmos3_lpips_image` using existing LPIPS eval utilities.
Integration tests, test list, and waiver updates `tests/integration/defs/examples/visual_gen/test_visual_gen.py`, `tests/integration/test_lists/test-db/l0_b200.yml`, `tests/integration/test_lists/waives.txt`	Three CUDA-gated tests assert LPIPS mean scores below model-specific thresholds; new entries with timeouts added to `l0_b200.yml`; six stale flux/ltx2/wan LPIPS waivers removed from `waives.txt`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#15535: Modifies the same tests/integration/test_lists/waives.txt region for VisualGen LPIPS golden tests, specifically WAN22 T2V LPIPS waivers in test_visual_gen_multi_gpu.py, using the same waiver mechanism.

Suggested reviewers

crazydemo
jieli-matrix
LarryXFly
StanleySun639
xinhe-nv

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title is specific and matches the main change: adding single-GPU LPIPS CI protection for VisualGen models.
Description check	✅ Passed	The description is detailed and covers scope, behavior, and verification; it only partially mirrors the template by omitting explicit Test Coverage and checklist sections.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/integration/test_lists/test-db/l0_b200.yml (1)
363-368: 🚀 Performance & Scalability | 🔵 Trivial

Schedule timeout calibration follow-up for these new B200 entries.

Lines 364-368 already mark these TIMEOUTs as placeholders. Coverage is sufficient for the newly added LPIPS happy paths in tests/integration/defs/examples/visual_gen/test_visual_gen.py, but this list needs a follow-up run to lock measured B200 wall-times and confirm whether test_cosmos3_nano_t2v_lpips_against_golden stays pre-merge or moves post-merge.

As per path instructions, tests/**: “Act as a QA engineer reviewing test changes and coverage... suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_lists/test-db/l0_b200.yml` around lines 363 - 368, The
three LPIPS test entries for test_qwenimage_lpips_against_golden,
test_cosmos3_nano_t2i_lpips_against_golden, and
test_cosmos3_nano_t2v_lpips_against_golden in the B200 configuration have
placeholder TIMEOUT values that need to be calibrated. Schedule and execute a
benchmark run on B200 hardware to measure the actual wall-times for each of
these three tests, then update the TIMEOUT values in lines 364-368 with the
measured times. Additionally, based on the measured time for
test_cosmos3_nano_t2v_lpips_against_golden and the gating budget constraints,
determine whether it should remain in the pre-merge test list or be moved to
post-merge/nightly stage, and update the configuration accordingly.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/defs/examples/visual_gen/test_visual_gen.py`:
- Around line 699-700: The test sets the environment variable
TRTLLM_DISABLE_COSMOS3_GUARDRAILS at the point where os.environ is modified but
never restores it after the test completes. Store the original value of this
environment variable before modifying it, then ensure the original value is
restored in a finally block or cleanup mechanism after the test runs to prevent
the mutated environment state from leaking into other tests. Apply this fix to
all locations where TRTLLM_DISABLE_COSMOS3_GUARDRAILS is set (including the
additional locations at lines 736-738).

---

Nitpick comments:
In `@tests/integration/test_lists/test-db/l0_b200.yml`:
- Around line 363-368: The three LPIPS test entries for
test_qwenimage_lpips_against_golden, test_cosmos3_nano_t2i_lpips_against_golden,
and test_cosmos3_nano_t2v_lpips_against_golden in the B200 configuration have
placeholder TIMEOUT values that need to be calibrated. Schedule and execute a
benchmark run on B200 hardware to measure the actual wall-times for each of
these three tests, then update the TIMEOUT values in lines 364-368 with the
measured times. Additionally, based on the measured time for
test_cosmos3_nano_t2v_lpips_against_golden and the gating budget constraints,
determine whether it should remain in the pre-merge test list or be moved to
post-merge/nightly stage, and update the configuration accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f9ff66da-48ec-41ab-8dad-7bce7fe44817

📥 Commits

Reviewing files that changed from the base of the PR and between 07270b7 and 6ae1b2e.

⛔ Files ignored due to path filters (1)

tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/visual_gen_lpips_golden_media.zip is excluded by !**/*.zip

📒 Files selected for processing (7)

.gitattributes
tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2i_lpips_golden.json
tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2v_lpips_golden_video.json
tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/qwenimage_lpips_golden.json
tests/integration/defs/examples/visual_gen/test_visual_gen.py
tests/integration/test_lists/test-db/l0_b200.yml
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

chang-l · 2026-06-29T01:54:32Z

/bot run

tensorrt-cicd · 2026-06-29T01:59:54Z

PR_Github #56255 [ run ] triggered by Bot. Commit: 32010aa Link to invocation

yibinl-nvidia

LGTM

tensorrt-cicd · 2026-06-29T03:04:29Z

PR_Github #56255 [ run ] completed with state FAILURE. Commit: 32010aa
/LLM/main/L0_MergeRequest_PR pipeline #45115 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chang-l · 2026-06-29T07:03:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-29T07:09:43Z

PR_Github #56328 [ run ] triggered by Bot. Commit: a51d939 Link to invocation

tensorrt-cicd · 2026-06-29T08:13:44Z

PR_Github #56328 [ run ] completed with state SUCCESS. Commit: a51d939
/LLM/main/L0_MergeRequest_PR pipeline #45176 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…ation Add default-setting single-GPU LPIPS golden tests for QwenImage and Cosmos3-Nano, completing single-GPU VisualGen LPIPS CI coverage across all supported models. Refresh all eight golden media entries with the pinned staging main image at TRT-LLM commit 85665f5, which contains the Cosmos3 accuracy fix from NVIDIA#15545. Record the TRT-LLM commit, package versions, container digest, compile-off mode, and deterministic-algorithm mode in every golden JSON while retaining the existing LPIPS thresholds. Explicitly disable pipeline torch compile for every LPIPS generator. Unwaive all eight single-GPU cases, test_wan_t2v_example, and the passing attn2d_2x2_ulysses2 multi-GPU case. Retain the six pre-existing multi-GPU waivers and waive only the new cfg2_ulysses2_attn2d_2x1 case that measured 0.255208 above the unchanged 0.25 threshold. Validated on B200: all eight single-GPU LPIPS cases passed at 0.000000, test_wan_t2v_example passed, and attn2d_2x2_ulysses2 passed at 0.232348. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-06-29T16:16:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-29T16:22:01Z

PR_Github #56406 [ run ] triggered by Bot. Commit: 9b4c317 Link to invocation

tensorrt-cicd · 2026-06-29T17:42:25Z

PR_Github #56406 [ run ] completed with state SUCCESS. Commit: 9b4c317
/LLM/main/L0_MergeRequest_PR pipeline #45250 completed with status: 'SUCCESS'

CI Report

Link to invocation

github-actions · 2026-06-29T17:43:02Z

✅ LFS objects already in storage (1 file) — no sync needed.

These LFS-tracked files are already present in this repository's LFS storage:

tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/visual_gen_lpips_golden_media.zip

github-actions Bot assigned chang-l Jun 24, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread tests/integration/defs/examples/visual_gen/test_visual_gen.py Outdated

chang-l force-pushed the dev-visgen-lpips-stage1 branch 3 times, most recently from bbfd5a0 to 32010aa Compare June 29, 2026 00:33

yibinl-nvidia reviewed Jun 29, 2026

View reviewed changes

Comment thread tests/integration/test_lists/waives.txt Outdated

yibinl-nvidia reviewed Jun 29, 2026

View reviewed changes

Comment thread tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/ltx2_lpips_golden_video.json

yibinl-nvidia approved these changes Jun 29, 2026

View reviewed changes

chang-l force-pushed the dev-visgen-lpips-stage1 branch from 32010aa to 380d227 Compare June 29, 2026 06:33

chang-l changed the title ~~[None][test] Add Stage-1 LPIPS golden accuracy tests for QwenImage and Cosmos3-Nano~~ [None][test] Enable single-GPU LPIPS CI protection for VisualGen models Jun 29, 2026

chang-l force-pushed the dev-visgen-lpips-stage1 branch from 380d227 to a51d939 Compare June 29, 2026 06:42

chang-l enabled auto-merge (squash) June 29, 2026 06:50

chang-l force-pushed the dev-visgen-lpips-stage1 branch from a51d939 to 9b4c317 Compare June 29, 2026 16:16

chang-l merged commit 863bc4e into NVIDIA:main Jun 29, 2026
7 checks passed

chang-l mentioned this pull request Jun 29, 2026

[https://nvbugs/6272644][fix] Stabilize and unwaive multi-GPU VisualGen LPIPS tests #15730

Open

Uh oh!

Conversation

chang-l commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Single-GPU coverage

Multi-GPU waiver scope

Determinism and goldens

Verification

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chang-l commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

Uh oh!

Uh oh!

yibinl-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

chang-l commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

chang-l commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chang-l commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading