Skip to content

[None][test] Enable single-GPU LPIPS CI protection for VisualGen models#15564

Merged
chang-l merged 1 commit into
NVIDIA:mainfrom
chang-l:dev-visgen-lpips-stage1
Jun 29, 2026
Merged

[None][test] Enable single-GPU LPIPS CI protection for VisualGen models#15564
chang-l merged 1 commit into
NVIDIA:mainfrom
chang-l:dev-visgen-lpips-stage1

Conversation

@chang-l

@chang-l chang-l commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Description

Enable Stage-1 end-to-end LPIPS golden accuracy protection for all single-GPU VisualGen models.

This PR adds QwenImage and Cosmos3-Nano coverage, refreshes all eight single-GPU goldens from one pinned build, explicitly disables pipeline torch compile for every LPIPS generator, and removes the remaining single-GPU LPIPS waivers.

It also unwaives test_wan_t2v_example, which passed locally on B200.

Single-GPU coverage

  • FLUX.1-dev (T2I)
  • FLUX.2-dev (T2I)
  • LTX-2 (T2V)
  • Wan2.1 T2V
  • Wan2.2 T2V
  • Qwen-Image (T2I)
  • Cosmos3-Nano (T2I)
  • Cosmos3-Nano (T2V)

The three newly registered B200 tests are:

  • test_qwenimage_lpips_against_golden
  • test_cosmos3_nano_t2i_lpips_against_golden
  • test_cosmos3_nano_t2v_lpips_against_golden

Multi-GPU waiver scope

The six pre-existing multi-GPU LPIPS waivers remain unchanged.

The two newly registered eight-rank variants were handled according to the measured compile-off results:

  • attn2d_2x2_ulysses2: LPIPS 0.232348 — below the unchanged 0.25 threshold, so it is enabled
  • cfg2_ulysses2_attn2d_2x1: LPIPS 0.255208 — above 0.25, so it remains waived under NVBug 6272644

No LPIPS threshold was changed.

Determinism and goldens

Every LPIPS pipeline uses:

  • TorchCompileConfig(enable=False)
  • torch.use_deterministic_algorithms(True)
  • the existing LPIPS thresholds and AlexNet backbone

All eight golden JSON files record the model parameters, compile/deterministic modes, package versions, TRT-LLM commit, and digest-qualified container image.

Golden environment:

  • Container: urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm-staging/release@sha256:3308a2dc0192a8329ea02eca7b5c44f290f5e894cd8c5921099308d84c3e5691
  • TensorRT-LLM: 1.3.0rc20
  • TRT-LLM commit: 85665f5fd331d0154a78172954846d843085e83f
  • diffusers: 0.38.0

That TRT-LLM build contains the Cosmos3 accuracy fix from #15545.

Verification

  • Rebased onto current upstream/main
  • All eight single-GPU LPIPS tests passed locally on B200 at 0.000000
  • test_wan_t2v_example passed locally on B200
  • attn2d_2x2_ulysses2 passed at LPIPS 0.232348
  • cfg2_ulysses2_attn2d_2x1 measured LPIPS 0.255208 and remains waived
  • A fresh deterministic, compile-off Wan2.2 regeneration was byte-identical to the ZIP golden (SHA-256 470c6fa7a013a8e691e5fa8cd040ff98eacbf60c13d5c7a88ec8f5f56c656d95)
  • Pre-commit suite passed

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds LPIPS golden test coverage for two new VisualGen models: QwenImage (text-to-image) and Cosmos3-Nano (text-to-video and text-to-image). Three golden JSON fixtures, new generation helpers, an Inductor compile-worker quiesce fix applied across all LPIPS paths, three new integration tests, B200 test-list entries, and removal of six stale LPIPS waivers are included.

Changes

VisualGen LPIPS Golden Tests for QwenImage and Cosmos3-Nano

Layer / File(s) Summary
Golden JSON fixtures and LFS tracking
.gitattributes, tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/qwenimage_lpips_golden.json, tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2i_lpips_golden.json, tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2v_lpips_golden_video.json
Three new golden JSON fixtures define expected LPIPS configurations for QwenImage T2I, Cosmos3-Nano T2I, and Cosmos3-Nano T2V; a new LFS rule tracks .zip artifacts for the visual_gen_lpips directory.
Test constants and Inductor imports
tests/integration/defs/examples/visual_gen/test_visual_gen.py
Imports torch._inductor.config and shutdown_compile_workers; declares LPIPS configuration constants for QwenImage and Cosmos3-Nano (prompts, dimensions, inference steps, thresholds).
Inductor quiesce mitigation and CUDA cleanup
tests/integration/defs/examples/visual_gen/test_visual_gen.py
Adds _disable_inductor_compile_worker_quiesce() to disable the async compile pool quiesce flag and extends _cleanup_cuda() to call shutdown_compile_workers(); inserts the quiesce call into Flux, LTX2, and Wan LPIPS generation paths.
QwenImage and Cosmos3-Nano generation helpers
tests/integration/defs/examples/visual_gen/test_visual_gen.py
Implements _generate_qwenimage_lpips_image, _run_cosmos3_lpips_pipeline (sets DISABLE_GUARDRAILS env var), _generate_cosmos3_lpips_video, and _generate_cosmos3_lpips_image using existing LPIPS eval utilities.
Integration tests, test list, and waiver updates
tests/integration/defs/examples/visual_gen/test_visual_gen.py, tests/integration/test_lists/test-db/l0_b200.yml, tests/integration/test_lists/waives.txt
Three CUDA-gated tests assert LPIPS mean scores below model-specific thresholds; new entries with timeouts added to l0_b200.yml; six stale flux/ltx2/wan LPIPS waivers removed from waives.txt.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#15535: Modifies the same tests/integration/test_lists/waives.txt region for VisualGen LPIPS golden tests, specifically WAN22 T2V LPIPS waivers in test_visual_gen_multi_gpu.py, using the same waiver mechanism.

Suggested reviewers

  • crazydemo
  • jieli-matrix
  • LarryXFly
  • StanleySun639
  • xinhe-nv
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title is specific and matches the main change: adding single-GPU LPIPS CI protection for VisualGen models.
Description check ✅ Passed The description is detailed and covers scope, behavior, and verification; it only partially mirrors the template by omitting explicit Test Coverage and checklist sections.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integration/test_lists/test-db/l0_b200.yml (1)

363-368: 🚀 Performance & Scalability | 🔵 Trivial

Schedule timeout calibration follow-up for these new B200 entries.

Lines 364-368 already mark these TIMEOUTs as placeholders. Coverage is sufficient for the newly added LPIPS happy paths in tests/integration/defs/examples/visual_gen/test_visual_gen.py, but this list needs a follow-up run to lock measured B200 wall-times and confirm whether test_cosmos3_nano_t2v_lpips_against_golden stays pre-merge or moves post-merge.

As per path instructions, tests/**: “Act as a QA engineer reviewing test changes and coverage... suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_lists/test-db/l0_b200.yml` around lines 363 - 368, The
three LPIPS test entries for test_qwenimage_lpips_against_golden,
test_cosmos3_nano_t2i_lpips_against_golden, and
test_cosmos3_nano_t2v_lpips_against_golden in the B200 configuration have
placeholder TIMEOUT values that need to be calibrated. Schedule and execute a
benchmark run on B200 hardware to measure the actual wall-times for each of
these three tests, then update the TIMEOUT values in lines 364-368 with the
measured times. Additionally, based on the measured time for
test_cosmos3_nano_t2v_lpips_against_golden and the gating budget constraints,
determine whether it should remain in the pre-merge test list or be moved to
post-merge/nightly stage, and update the configuration accordingly.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/defs/examples/visual_gen/test_visual_gen.py`:
- Around line 699-700: The test sets the environment variable
TRTLLM_DISABLE_COSMOS3_GUARDRAILS at the point where os.environ is modified but
never restores it after the test completes. Store the original value of this
environment variable before modifying it, then ensure the original value is
restored in a finally block or cleanup mechanism after the test runs to prevent
the mutated environment state from leaking into other tests. Apply this fix to
all locations where TRTLLM_DISABLE_COSMOS3_GUARDRAILS is set (including the
additional locations at lines 736-738).

---

Nitpick comments:
In `@tests/integration/test_lists/test-db/l0_b200.yml`:
- Around line 363-368: The three LPIPS test entries for
test_qwenimage_lpips_against_golden, test_cosmos3_nano_t2i_lpips_against_golden,
and test_cosmos3_nano_t2v_lpips_against_golden in the B200 configuration have
placeholder TIMEOUT values that need to be calibrated. Schedule and execute a
benchmark run on B200 hardware to measure the actual wall-times for each of
these three tests, then update the TIMEOUT values in lines 364-368 with the
measured times. Additionally, based on the measured time for
test_cosmos3_nano_t2v_lpips_against_golden and the gating budget constraints,
determine whether it should remain in the pre-merge test list or be moved to
post-merge/nightly stage, and update the configuration accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f9ff66da-48ec-41ab-8dad-7bce7fe44817

📥 Commits

Reviewing files that changed from the base of the PR and between 07270b7 and 6ae1b2e.

⛔ Files ignored due to path filters (1)
  • tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/visual_gen_lpips_golden_media.zip is excluded by !**/*.zip
📒 Files selected for processing (7)
  • .gitattributes
  • tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2i_lpips_golden.json
  • tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2v_lpips_golden_video.json
  • tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/qwenimage_lpips_golden.json
  • tests/integration/defs/examples/visual_gen/test_visual_gen.py
  • tests/integration/test_lists/test-db/l0_b200.yml
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

Comment thread tests/integration/defs/examples/visual_gen/test_visual_gen.py Outdated
@chang-l chang-l force-pushed the dev-visgen-lpips-stage1 branch 3 times, most recently from bbfd5a0 to 32010aa Compare June 29, 2026 00:33
@chang-l

chang-l commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56255 [ run ] triggered by Bot. Commit: 32010aa Link to invocation

Comment thread tests/integration/test_lists/waives.txt Outdated

@yibinl-nvidia yibinl-nvidia left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56255 [ run ] completed with state FAILURE. Commit: 32010aa
/LLM/main/L0_MergeRequest_PR pipeline #45115 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chang-l chang-l force-pushed the dev-visgen-lpips-stage1 branch from 32010aa to 380d227 Compare June 29, 2026 06:33
@chang-l chang-l changed the title [None][test] Add Stage-1 LPIPS golden accuracy tests for QwenImage and Cosmos3-Nano [None][test] Enable single-GPU LPIPS CI protection for VisualGen models Jun 29, 2026
@chang-l chang-l force-pushed the dev-visgen-lpips-stage1 branch from 380d227 to a51d939 Compare June 29, 2026 06:42
@chang-l chang-l enabled auto-merge (squash) June 29, 2026 06:50
@chang-l

chang-l commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56328 [ run ] triggered by Bot. Commit: a51d939 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56328 [ run ] completed with state SUCCESS. Commit: a51d939
/LLM/main/L0_MergeRequest_PR pipeline #45176 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…ation

Add default-setting single-GPU LPIPS golden tests for QwenImage and Cosmos3-Nano, completing single-GPU VisualGen LPIPS CI coverage across all supported models.

Refresh all eight golden media entries with the pinned staging main image at TRT-LLM commit 85665f5, which contains the Cosmos3 accuracy fix from NVIDIA#15545. Record the TRT-LLM commit, package versions, container digest, compile-off mode, and deterministic-algorithm mode in every golden JSON while retaining the existing LPIPS thresholds.

Explicitly disable pipeline torch compile for every LPIPS generator. Unwaive all eight single-GPU cases, test_wan_t2v_example, and the passing attn2d_2x2_ulysses2 multi-GPU case. Retain the six pre-existing multi-GPU waivers and waive only the new cfg2_ulysses2_attn2d_2x1 case that measured 0.255208 above the unchanged 0.25 threshold.

Validated on B200: all eight single-GPU LPIPS cases passed at 0.000000, test_wan_t2v_example passed, and attn2d_2x2_ulysses2 passed at 0.232348.

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
@chang-l chang-l force-pushed the dev-visgen-lpips-stage1 branch from a51d939 to 9b4c317 Compare June 29, 2026 16:16
@chang-l

chang-l commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56406 [ run ] triggered by Bot. Commit: 9b4c317 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56406 [ run ] completed with state SUCCESS. Commit: 9b4c317
/LLM/main/L0_MergeRequest_PR pipeline #45250 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chang-l chang-l merged commit 863bc4e into NVIDIA:main Jun 29, 2026
7 checks passed
@github-actions

Copy link
Copy Markdown

LFS objects already in storage (1 file) — no sync needed.

These LFS-tracked files are already present in this repository's LFS storage:

  • tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/visual_gen_lpips_golden_media.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants