[None][test] Enable single-GPU LPIPS CI protection for VisualGen models#15564
Conversation
📝 WalkthroughWalkthroughAdds LPIPS golden test coverage for two new VisualGen models: QwenImage (text-to-image) and Cosmos3-Nano (text-to-video and text-to-image). Three golden JSON fixtures, new generation helpers, an Inductor compile-worker quiesce fix applied across all LPIPS paths, three new integration tests, B200 test-list entries, and removal of six stale LPIPS waivers are included. ChangesVisualGen LPIPS Golden Tests for QwenImage and Cosmos3-Nano
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/integration/test_lists/test-db/l0_b200.yml (1)
363-368: 🚀 Performance & Scalability | 🔵 TrivialSchedule timeout calibration follow-up for these new B200 entries.
Lines 364-368 already mark these TIMEOUTs as placeholders. Coverage is sufficient for the newly added LPIPS happy paths in
tests/integration/defs/examples/visual_gen/test_visual_gen.py, but this list needs a follow-up run to lock measured B200 wall-times and confirm whethertest_cosmos3_nano_t2v_lpips_against_goldenstays pre-merge or moves post-merge.As per path instructions,
tests/**: “Act as a QA engineer reviewing test changes and coverage... suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_lists/test-db/l0_b200.yml` around lines 363 - 368, The three LPIPS test entries for test_qwenimage_lpips_against_golden, test_cosmos3_nano_t2i_lpips_against_golden, and test_cosmos3_nano_t2v_lpips_against_golden in the B200 configuration have placeholder TIMEOUT values that need to be calibrated. Schedule and execute a benchmark run on B200 hardware to measure the actual wall-times for each of these three tests, then update the TIMEOUT values in lines 364-368 with the measured times. Additionally, based on the measured time for test_cosmos3_nano_t2v_lpips_against_golden and the gating budget constraints, determine whether it should remain in the pre-merge test list or be moved to post-merge/nightly stage, and update the configuration accordingly.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/defs/examples/visual_gen/test_visual_gen.py`:
- Around line 699-700: The test sets the environment variable
TRTLLM_DISABLE_COSMOS3_GUARDRAILS at the point where os.environ is modified but
never restores it after the test completes. Store the original value of this
environment variable before modifying it, then ensure the original value is
restored in a finally block or cleanup mechanism after the test runs to prevent
the mutated environment state from leaking into other tests. Apply this fix to
all locations where TRTLLM_DISABLE_COSMOS3_GUARDRAILS is set (including the
additional locations at lines 736-738).
---
Nitpick comments:
In `@tests/integration/test_lists/test-db/l0_b200.yml`:
- Around line 363-368: The three LPIPS test entries for
test_qwenimage_lpips_against_golden, test_cosmos3_nano_t2i_lpips_against_golden,
and test_cosmos3_nano_t2v_lpips_against_golden in the B200 configuration have
placeholder TIMEOUT values that need to be calibrated. Schedule and execute a
benchmark run on B200 hardware to measure the actual wall-times for each of
these three tests, then update the TIMEOUT values in lines 364-368 with the
measured times. Additionally, based on the measured time for
test_cosmos3_nano_t2v_lpips_against_golden and the gating budget constraints,
determine whether it should remain in the pre-merge test list or be moved to
post-merge/nightly stage, and update the configuration accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f9ff66da-48ec-41ab-8dad-7bce7fe44817
⛔ Files ignored due to path filters (1)
tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/visual_gen_lpips_golden_media.zipis excluded by!**/*.zip
📒 Files selected for processing (7)
.gitattributestests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2i_lpips_golden.jsontests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/cosmos3_nano_t2v_lpips_golden_video.jsontests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/qwenimage_lpips_golden.jsontests/integration/defs/examples/visual_gen/test_visual_gen.pytests/integration/test_lists/test-db/l0_b200.ymltests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
bbfd5a0 to
32010aa
Compare
|
/bot run |
|
PR_Github #56255 [ run ] triggered by Bot. Commit: |
|
PR_Github #56255 [ run ] completed with state
|
32010aa to
380d227
Compare
380d227 to
a51d939
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #56328 [ run ] triggered by Bot. Commit: |
|
PR_Github #56328 [ run ] completed with state
|
…ation Add default-setting single-GPU LPIPS golden tests for QwenImage and Cosmos3-Nano, completing single-GPU VisualGen LPIPS CI coverage across all supported models. Refresh all eight golden media entries with the pinned staging main image at TRT-LLM commit 85665f5, which contains the Cosmos3 accuracy fix from NVIDIA#15545. Record the TRT-LLM commit, package versions, container digest, compile-off mode, and deterministic-algorithm mode in every golden JSON while retaining the existing LPIPS thresholds. Explicitly disable pipeline torch compile for every LPIPS generator. Unwaive all eight single-GPU cases, test_wan_t2v_example, and the passing attn2d_2x2_ulysses2 multi-GPU case. Retain the six pre-existing multi-GPU waivers and waive only the new cfg2_ulysses2_attn2d_2x1 case that measured 0.255208 above the unchanged 0.25 threshold. Validated on B200: all eight single-GPU LPIPS cases passed at 0.000000, test_wan_t2v_example passed, and attn2d_2x2_ulysses2 passed at 0.232348. Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
a51d939 to
9b4c317
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #56406 [ run ] triggered by Bot. Commit: |
|
PR_Github #56406 [ run ] completed with state |
|
✅ LFS objects already in storage (1 file) — no sync needed. These LFS-tracked files are already present in this repository's LFS storage:
|
Description
Enable Stage-1 end-to-end LPIPS golden accuracy protection for all single-GPU VisualGen models.
This PR adds QwenImage and Cosmos3-Nano coverage, refreshes all eight single-GPU goldens from one pinned build, explicitly disables pipeline torch compile for every LPIPS generator, and removes the remaining single-GPU LPIPS waivers.
It also unwaives
test_wan_t2v_example, which passed locally on B200.Single-GPU coverage
The three newly registered B200 tests are:
test_qwenimage_lpips_against_goldentest_cosmos3_nano_t2i_lpips_against_goldentest_cosmos3_nano_t2v_lpips_against_goldenMulti-GPU waiver scope
The six pre-existing multi-GPU LPIPS waivers remain unchanged.
The two newly registered eight-rank variants were handled according to the measured compile-off results:
attn2d_2x2_ulysses2: LPIPS0.232348— below the unchanged0.25threshold, so it is enabledcfg2_ulysses2_attn2d_2x1: LPIPS0.255208— above0.25, so it remains waived under NVBug 6272644No LPIPS threshold was changed.
Determinism and goldens
Every LPIPS pipeline uses:
TorchCompileConfig(enable=False)torch.use_deterministic_algorithms(True)All eight golden JSON files record the model parameters, compile/deterministic modes, package versions, TRT-LLM commit, and digest-qualified container image.
Golden environment:
urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm-staging/release@sha256:3308a2dc0192a8329ea02eca7b5c44f290f5e894cd8c5921099308d84c3e56911.3.0rc2085665f5fd331d0154a78172954846d843085e83f0.38.0That TRT-LLM build contains the Cosmos3 accuracy fix from #15545.
Verification
upstream/main0.000000test_wan_t2v_examplepassed locally on B200attn2d_2x2_ulysses2passed at LPIPS0.232348cfg2_ulysses2_attn2d_2x1measured LPIPS0.255208and remains waived470c6fa7a013a8e691e5fa8cd040ff98eacbf60c13d5c7a88ec8f5f56c656d95)