[https://nvbugs/6401921][fix] Stabilize single-GPU Wan2.2/LTX2/Wan2.1 LPIPS test by chang-l · Pull Request #15854 · NVIDIA/TensorRT-LLM

chang-l · 2026-07-01T22:41:55Z

Summary by CodeRabbit

Bug Fixes
- Updated a video generation test flow to run in a more predictable execution mode, improving reliability of LPIPS-based video generation checks.
- Refreshed golden video metadata with newer runtime and environment details so comparisons better match the current setup.
Tests
- Re-enabled a previously waived integration test, allowing it to run as part of the regular test suite.

Description

Fix NVBug 6401921, where
test_wan22_t2v_lpips_against_golden regressed to LPIPS 0.251536 after the
PyTorch CI container moved from 26.02 to 26.04.

TorchCompileConfig(enable=False) skips VisualGen's configured transformer
compilation, but it does not suppress nested or unconditional
@torch.compile call sites. The resulting execution trajectory changed across
the PyTorch upgrade. A controlled A/B showed that forcing both containers fully
eager makes their final pre-VAE latents bit-identical and keeps cross-container
LPIPS below the existing 0.05 threshold.

This change:

wraps the single-GPU Wan2.2 LPIPS fixture in
torch.compiler.set_stance("force_eager");
refreshes only the Wan2.2 golden video in the LFS archive using the current
26.04 CI image and records the exact runtime provenance;
removes the NVBug 6401921 waiver.

The scope is intentionally limited to the failing Wan2.2 single-GPU golden.
Other VisualGen goldens retain their existing execution policy. This is the
single-GPU counterpart to #15730 and does not depend on it.

Test Coverage

Exact B200 reproduction with PyTorch 26.04 image
sha256:dad31c0b5290d836033c96d8b91f6524bdc7cc5b4d1000b4abcc57c6868ffdc0
and Jenkins post-merge build 2814 (tensorrt-llm==1.3.0rc21, commit
539ee226c4df7ab15802911083fe501e9d64c66e).
The regenerated force-eager video reproduced byte-for-byte across two fresh
runs: SHA-256
52828186f44b82a9f686f177d635b9f3cb0050f41c8d3ae55dade01d30a00b28.

Targeted integration test:

examples/visual_gen/test_visual_gen.py::test_wan22_t2v_lpips_against_golden PASSED
[E2E wan22_t2v LPIPS] score: 0.000000
1 passed in 35.43s

zip -T passed, the archive still contains eight unique members, and only
wan22_t2v_lpips_golden_video.mp4 differs from the previous archive.
All pre-commit hooks passed on the four changed files.

PR Checklist

Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

coderabbitai · 2026-07-01T23:06:24Z

📝 Walkthrough

Walkthrough

This change updates the golden video test metadata (adding torch_version, updating TensorRT-LLM version/commit and container image), wraps the Wan 2.2 LPIPS video generation call in a forced eager compiler stance, and removes the corresponding test waiver from waives.txt.

Changes

Wan22 T2V LPIPS golden test fix

Layer / File(s)	Summary
Force eager stance and update golden metadata `tests/integration/defs/examples/visual_gen/test_visual_gen.py`, `tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/wan22_t2v_lpips_golden_video.json`	Wraps `_generate_wan_lpips_video` call in `torch.compiler.set_stance("force_eager")` with a comment explaining nested `@torch.compile` isn't suppressed by `TorchCompileConfig(enable=False)`; updates golden JSON with new `torch_version`, `tensorrt_llm_version`, `tensorrt_llm_commit`, and `container_image` values.
Remove test waiver `tests/integration/test_lists/waives.txt`	Removes the waiver entry for `test_wan22_t2v_lpips_against_golden`, re-enabling the test.

Estimated code review effort: 2 (Simple) | ~10 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#15825: Adds the same waiver entry to waives.txt that this PR removes, directly conflicting in the waives list.

Suggested reviewers: yingguo-trt

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title captures the LPIPS test stabilization and NVBugs fix, though it is broader than the Wan2.2-only scope.
Description check	✅ Passed	The description includes the issue, solution, test coverage, and checklist, matching the template well.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

chang-l · 2026-07-02T03:41:14Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-02T03:46:47Z

PR_Github #57100 [ run ] triggered by Bot. Commit: 247f5e1 Link to invocation

tensorrt-cicd · 2026-07-02T05:27:36Z

PR_Github #57100 [ run ] completed with state SUCCESS. Commit: 247f5e1
/LLM/main/L0_MergeRequest_PR pipeline #45889 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chang-l · 2026-07-02T05:52:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-02T05:57:52Z

PR_Github #57140 [ run ] triggered by Bot. Commit: 247f5e1 Link to invocation

tensorrt-cicd · 2026-07-02T09:01:57Z

PR_Github #57140 [ run ] completed with state SUCCESS. Commit: 247f5e1
/LLM/main/L0_MergeRequest_PR pipeline #45922 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chang-l · 2026-07-02T09:22:03Z

/bot run --disable-fail-fast

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

…ed during conflict resolution Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-07-02T16:36:25Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-02T16:42:09Z

PR_Github #57221 [ run ] triggered by Bot. Commit: bc604a8 Link to invocation

tensorrt-cicd · 2026-07-02T21:29:34Z

PR_Github #57221 [ run ] completed with state SUCCESS. Commit: bc604a8
/LLM/main/L0_MergeRequest_PR pipeline #45992 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chang-l · 2026-07-02T21:53:40Z

/bot run --stage-list "DGX_B200-PyTorch-Post-Merge-1"

…eration and preserve failing candidates Wan2.1 and LTX-2 LPIPS goldens were generated on the 26.02 container; the CI container moved to 26.04 and both tests now fail deterministically in B200 post-merge (wan21 0.0956, ltx2 0.1513 vs the 0.05 threshold). Cross-machine eager variance on the same container measures ~0.04 LPIPS for the 1-step Wan2.1 config, so goldens regenerated on a dev machine leave no reliable margin. - Run Wan2.1 and LTX-2 LPIPS generation under torch.compiler.set_stance (force_eager), matching the Wan2.2 fix; the LTX-2 wrap covers the golden fixture and both sides of the cuda-graph-vs-eager comparison. - On an LPIPS threshold failure, copy the generated candidate into pytest's --output-dir (archived per-stage by CI) so the golden can be refreshed with CI-generated media instead of dev-machine approximations. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-07-03T00:26:35Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-03T00:32:11Z

PR_Github #57279 [ run ] triggered by Bot. Commit: 87aeeff Link to invocation

…the 26.04 stack The previous goldens were generated on the 26.02 container and fail deterministically on 26.04 CI (wan21 0.0956, ltx2 0.1513 vs 0.05). Regenerated on B200 with the CI devel image (pytorch-26.04, tag -15694), the CI-built 1.3.0rc21 wheel, torch 2.12.0a0+0291f960b6.nv26.04, and force_eager generation. Only the wan21/ltx2 zip members changed; both tests score LPIPS 0.000000 against these goldens on the generating host. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-07-03T03:04:06Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-07-03T04:33:22Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

chang-l · 2026-07-03T04:34:40Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-03T04:38:51Z

PR_Github #57351 [ run ] triggered by Bot. Commit: 311e756 Link to invocation

tensorrt-cicd · 2026-07-03T04:40:20Z

PR_Github #57352 [ run ] triggered by Bot. Commit: 311e756 Link to invocation

tensorrt-cicd · 2026-07-04T02:49:47Z

PR_Github #57502 [ run ] completed with state SUCCESS. Commit: fba9638
/LLM/main/L0_MergeRequest_PR pipeline #46235 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…he LPIPS golden pipeline Root cause of the residual wan22 LPIPS failure (0.059334 vs <0.05): the generated frames were identical to the golden, but on the failing CI stage the candidate was encoded as MPEG-4 Part 2 via the test helper's cv2 fallback (OpenCV-bundled Lavf62) while the golden is H.264/x264 (ffmpeg 6.1) — LPIPS then measures codec artifacts (PSNR 33-37 dB uniform noise), not model output. Frame-level decode comparison of the CI candidate (recovered via the base64 stdout channel) against the golden confirmed mean |diff| < 4/255 with no structural differences. Two fixes: - media/encoding.py: only cache a successful ffmpeg probe. The stage's early negative probe (before the test fixture apt-installs ffmpeg) was cached for the process lifetime and silently downgraded every later encode to the fallback encoder. - test_visual_gen.py: refuse to fall back to cv2/mp4v for LPIPS media — fail loudly instead, since a codec switch invalidates the comparison. No golden changes needed: with the encoder fixed, wan22 is expected to reproduce bit-exactly like wan21/ltx2 (LPIPS 0.000000). Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-07-04T03:03:35Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-04T03:09:41Z

PR_Github #57522 [ run ] triggered by Bot. Commit: cefaf80 Link to invocation

chang-l · 2026-07-04T03:32:42Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-04T03:38:00Z

PR_Github #57526 [ run ] triggered by Bot. Commit: cefaf80 Link to invocation

tensorrt-cicd · 2026-07-04T03:41:08Z

PR_Github #57522 [ run ] completed with state ABORTED. Commit: cefaf80

Link to invocation

chang-l · 2026-07-04T03:52:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-04T03:58:16Z

PR_Github #57527 [ run ] triggered by Bot. Commit: cefaf80 Link to invocation

tensorrt-cicd · 2026-07-04T04:01:43Z

PR_Github #57526 [ run ] completed with state ABORTED. Commit: cefaf80

Link to invocation

chang-l · 2026-07-04T04:08:00Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-04T04:14:20Z

PR_Github #57534 [ run ] triggered by Bot. Commit: cefaf80 Link to invocation

tensorrt-cicd · 2026-07-04T04:17:47Z

PR_Github #57527 [ run ] completed with state ABORTED. Commit: cefaf80

Link to invocation

tensorrt-cicd · 2026-07-04T08:29:52Z

PR_Github #57534 [ run ] completed with state SUCCESS. Commit: cefaf80
/LLM/main/L0_MergeRequest_PR pipeline #46265 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…a-graph LPIPS test test_ltx2_cuda_graph_lpips_matches_eager never requested _visual_gen_deps (which installs ffmpeg), so when it ran first in a stage it previously passed only via the silent cv2/mp4v fallback on both sides of the comparison. With that fallback now a hard failure, declare the fixture so ffmpeg is installed before encoding. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-07-04T08:40:19Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-04T08:46:20Z

PR_Github #57552 [ run ] triggered by Bot. Commit: b2fe972 Link to invocation

tensorrt-cicd · 2026-07-04T14:15:41Z

PR_Github #57552 [ run ] completed with state SUCCESS. Commit: b2fe972
/LLM/main/L0_MergeRequest_PR pipeline #46280 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chang-l · 2026-07-04T14:22:21Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-07-04T14:27:51Z

PR_Github #57559 [ run ] triggered by Bot. Commit: b2fe972 Link to invocation

tensorrt-cicd · 2026-07-04T15:30:42Z

PR_Github #57559 [ run ] completed with state SUCCESS. Commit: b2fe972
/LLM/main/L0_MergeRequest_PR pipeline #46287 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…en by NVIDIA#14827 Same root cause as the already-waived T2V variant: NVIDIA#14827 changed the effective Cosmos3 generation parameters (_resolve_t2i_default rewrites the test's pinned steps/guidance/resolution because they equal the video defaults), so the T2I output no longer matches its golden (LPIPS 0.608, deterministic across three runs: pipelines 46107/46265/46287). Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l · 2026-07-04T18:49:14Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2"

chang-l · 2026-07-04T18:52:25Z

/bot run

tensorrt-cicd · 2026-07-04T18:55:11Z

PR_Github #57577 [ run ] triggered by Bot. Commit: d3a3584 Link to invocation

tensorrt-cicd · 2026-07-04T18:58:22Z

PR_Github #57578 [ run ] triggered by Bot. Commit: d3a3584 Link to invocation

tensorrt-cicd · 2026-07-04T18:58:28Z

PR_Github #57577 [ run ] completed with state ABORTED. Commit: d3a3584

Link to invocation

tensorrt-cicd · 2026-07-04T19:37:40Z

PR_Github #57578 [ run ] completed with state FAILURE. Commit: d3a3584
/LLM/main/L0_MergeRequest_PR pipeline #46305 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned chang-l Jul 1, 2026

[NVBUG 6401921][fix] stabilize single-GPU Wan LPIPS test

247f5e1

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l force-pushed the codex/nvbug-6401921-force-eager-lpips branch from 461dc9c to 247f5e1 Compare July 1, 2026 23:02

chang-l marked this pull request as ready for review July 1, 2026 23:03

chang-l requested a review from yibinl-nvidia July 1, 2026 23:03

yibinl-nvidia approved these changes Jul 1, 2026

View reviewed changes

Comment thread tests/integration/defs/examples/visual_gen/test_visual_gen.py

chang-l added 2 commits July 2, 2026 09:25

Merge branch 'main' into codex/nvbug-6401921-force-eager-lpips

518c3bb

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

[https://nvbugs/6401921][fix] Remove wan22 LPIPS waive lines duplicat…

bc604a8

…ed during conflict resolution Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

Merge branch 'main' into codex/nvbug-6401921-force-eager-lpips

311e756

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>

chang-l enabled auto-merge (squash) July 4, 2026 18:57

chang-l changed the title ~~[https://nvbugs/6401921][fix] Stabilize single-GPU Wan2.2 LPIPS test~~ [https://nvbugs/6401921][fix] Stabilize single-GPU Wan2.2/LTX2/Wan2.1 LPIPS test Jul 4, 2026

Uh oh!

Conversation

chang-l commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

Uh oh!

chang-l commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

chang-l commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

chang-l commented Jul 2, 2026

Uh oh!

chang-l commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

chang-l commented Jul 2, 2026

Uh oh!

chang-l commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

chang-l commented Jul 3, 2026

Uh oh!

chang-l commented Jul 3, 2026

Uh oh!

chang-l commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

chang-l commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

chang-l commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

chang-l commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

chang-l commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

chang-l commented Jul 4, 2026

Uh oh!

chang-l commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading