[https://nvbugs/6401921][fix] Stabilize single-GPU Wan2.2/LTX2/Wan2.1 LPIPS test#15854
[https://nvbugs/6401921][fix] Stabilize single-GPU Wan2.2/LTX2/Wan2.1 LPIPS test#15854chang-l wants to merge 10 commits into
Conversation
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
461dc9c to
247f5e1
Compare
📝 WalkthroughWalkthroughThis change updates the golden video test metadata (adding torch_version, updating TensorRT-LLM version/commit and container image), wraps the Wan 2.2 LPIPS video generation call in a forced eager compiler stance, and removes the corresponding test waiver from waives.txt. ChangesWan22 T2V LPIPS golden test fix
Estimated code review effort: 2 (Simple) | ~10 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57100 [ run ] triggered by Bot. Commit: |
|
PR_Github #57100 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #57140 [ run ] triggered by Bot. Commit: |
|
PR_Github #57140 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
…ed during conflict resolution Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57221 [ run ] triggered by Bot. Commit: |
|
PR_Github #57221 [ run ] completed with state
|
|
/bot run --stage-list "DGX_B200-PyTorch-Post-Merge-1" |
…eration and preserve failing candidates Wan2.1 and LTX-2 LPIPS goldens were generated on the 26.02 container; the CI container moved to 26.04 and both tests now fail deterministically in B200 post-merge (wan21 0.0956, ltx2 0.1513 vs the 0.05 threshold). Cross-machine eager variance on the same container measures ~0.04 LPIPS for the 1-step Wan2.1 config, so goldens regenerated on a dev machine leave no reliable margin. - Run Wan2.1 and LTX-2 LPIPS generation under torch.compiler.set_stance (force_eager), matching the Wan2.2 fix; the LTX-2 wrap covers the golden fixture and both sides of the cuda-graph-vs-eager comparison. - On an LPIPS threshold failure, copy the generated candidate into pytest's --output-dir (archived per-stage by CI) so the golden can be refreshed with CI-generated media instead of dev-machine approximations. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57279 [ run ] triggered by Bot. Commit: |
…the 26.04 stack The previous goldens were generated on the 26.02 container and fail deterministically on 26.04 CI (wan21 0.0956, ltx2 0.1513 vs 0.05). Regenerated on B200 with the CI devel image (pytorch-26.04, tag -15694), the CI-built 1.3.0rc21 wheel, torch 2.12.0a0+0291f960b6.nv26.04, and force_eager generation. Only the wan21/ltx2 zip members changed; both tests score LPIPS 0.000000 against these goldens on the generating host. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
1 similar comment
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57351 [ run ] triggered by Bot. Commit: |
|
PR_Github #57352 [ run ] triggered by Bot. Commit: |
|
PR_Github #57502 [ run ] completed with state
|
…he LPIPS golden pipeline Root cause of the residual wan22 LPIPS failure (0.059334 vs <0.05): the generated frames were identical to the golden, but on the failing CI stage the candidate was encoded as MPEG-4 Part 2 via the test helper's cv2 fallback (OpenCV-bundled Lavf62) while the golden is H.264/x264 (ffmpeg 6.1) — LPIPS then measures codec artifacts (PSNR 33-37 dB uniform noise), not model output. Frame-level decode comparison of the CI candidate (recovered via the base64 stdout channel) against the golden confirmed mean |diff| < 4/255 with no structural differences. Two fixes: - media/encoding.py: only cache a successful ffmpeg probe. The stage's early negative probe (before the test fixture apt-installs ffmpeg) was cached for the process lifetime and silently downgraded every later encode to the fallback encoder. - test_visual_gen.py: refuse to fall back to cv2/mp4v for LPIPS media — fail loudly instead, since a codec switch invalidates the comparison. No golden changes needed: with the encoder fixed, wan22 is expected to reproduce bit-exactly like wan21/ltx2 (LPIPS 0.000000). Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57522 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57526 [ run ] triggered by Bot. Commit: |
|
PR_Github #57522 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #57527 [ run ] triggered by Bot. Commit: |
|
PR_Github #57526 [ run ] completed with state |
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57534 [ run ] triggered by Bot. Commit: |
|
PR_Github #57527 [ run ] completed with state |
|
PR_Github #57534 [ run ] completed with state
|
…a-graph LPIPS test test_ltx2_cuda_graph_lpips_matches_eager never requested _visual_gen_deps (which installs ffmpeg), so when it ran first in a stage it previously passed only via the silent cv2/mp4v fallback on both sides of the comparison. With that fallback now a hard failure, declare the fixture so ffmpeg is installed before encoding. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57552 [ run ] triggered by Bot. Commit: |
|
PR_Github #57552 [ run ] completed with state
|
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
PR_Github #57559 [ run ] triggered by Bot. Commit: |
|
PR_Github #57559 [ run ] completed with state
|
…en by NVIDIA#14827 Same root cause as the already-waived T2V variant: NVIDIA#14827 changed the effective Cosmos3 generation parameters (_resolve_t2i_default rewrites the test's pinned steps/guidance/resolution because they equal the video defaults), so the T2I output no longer matches its golden (LPIPS 0.608, deterministic across three runs: pipelines 46107/46265/46287). Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-PyTorch-Post-Merge-1, DGX_B200-PyTorch-Post-Merge-2" |
|
/bot run |
|
PR_Github #57577 [ run ] triggered by Bot. Commit: |
|
PR_Github #57578 [ run ] triggered by Bot. Commit: |
|
PR_Github #57577 [ run ] completed with state |
|
PR_Github #57578 [ run ] completed with state
|
Summary by CodeRabbit
Bug Fixes
Tests
Description
Fix NVBug 6401921, where
test_wan22_t2v_lpips_against_goldenregressed to LPIPS 0.251536 after thePyTorch CI container moved from 26.02 to 26.04.
TorchCompileConfig(enable=False)skips VisualGen's configured transformercompilation, but it does not suppress nested or unconditional
@torch.compilecall sites. The resulting execution trajectory changed acrossthe PyTorch upgrade. A controlled A/B showed that forcing both containers fully
eager makes their final pre-VAE latents bit-identical and keeps cross-container
LPIPS below the existing 0.05 threshold.
This change:
torch.compiler.set_stance("force_eager");26.04 CI image and records the exact runtime provenance;
The scope is intentionally limited to the failing Wan2.2 single-GPU golden.
Other VisualGen goldens retain their existing execution policy. This is the
single-GPU counterpart to #15730 and does not depend on it.
Test Coverage
Exact B200 reproduction with PyTorch 26.04 image
sha256:dad31c0b5290d836033c96d8b91f6524bdc7cc5b4d1000b4abcc57c6868ffdc0and Jenkins post-merge build 2814 (
tensorrt-llm==1.3.0rc21, commit539ee226c4df7ab15802911083fe501e9d64c66e).The regenerated force-eager video reproduced byte-for-byte across two fresh
runs: SHA-256
52828186f44b82a9f686f177d635b9f3cb0050f41c8d3ae55dade01d30a00b28.Targeted integration test:
zip -Tpassed, the archive still contains eight unique members, and onlywan22_t2v_lpips_golden_video.mp4differs from the previous archive.All pre-commit hooks passed on the four changed files.
PR Checklist
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.