NVIDIA · zhenhuaw-me · Jun 17, 2026 · Jun 11, 2026 · Jun 12, 2026 · Jun 13, 2026
@@ -162,33 +162,31 @@ still force VG stages.
 Block selection — entry-pattern based only:
 VisualGen has no `condition.terms.backend` of its own; VG entries
 live in `backend: pytorch` and `backend: tensorrt` blocks. A block
-"belongs to VG" iff any of its `tests:` entries matches one of the
-three stable VG path families:
-
-- `unittest/_torch/visual_gen/...` (28 entries)
-- `examples/test_visual_gen.py...` (1 entry)
-- `visual_gen/test_visual_gen_benchmark.py` (1 entry)
+"belongs to VG" iff any of its `tests:` entries lives under a dedicated
+`visual_gen/` test path or is the VisualGen perf-sanity entry under
+`perf/`.
 
 For each matched block, `block_filters` keeps only the VG entries.
 Non-VG siblings in the same block stay governed by other rules.
 
-Outward-facing fallback: unlike AutoDeploy, VG is imported eagerly
-(top-level `from tensorrt_llm._torch.visual_gen.config import ...`
-in `commands/serve.py`, `commands/utils.py`,
-`serve/openai_server.py`). The 5 files that define / re-export the
-public API symbols (`VisualGenArgs`, `ParallelConfig`, `VisualGen`,
-`VisualGenParams`) are listed in `_VG_OUTWARD_FILES`; touching any
-of them claims the changed files but emits `scope=None` so Selector
-falls back to baseline. This protects trtllm-serve / trtllm-bench
-startup paths from VG signature drift slipping through pre-merge.
+Outward-facing fallback: unlike AutoDeploy, VG public symbols are
+imported eagerly by non-VG startup paths such as `commands/serve.py`,
+`commands/utils.py`, and `serve/openai_server.py`. The public API
+package prefix (`tensorrt_llm/visual_gen/`) is listed in
+`_VG_OUTWARD_PREFIXES`; touching any non-doc file under it claims the
+changed files but emits `scope=None` so Selector falls back to baseline.
+This protects trtllm-serve / trtllm-bench startup paths from VG
+signature drift slipping through pre-merge.
 
 Outcomes:
 
 - No VG source files in the diff → rule returns `None`.
-- VG source touched, all internal → `scope=visualgenonly`; sanity
-  off (VG changes don't affect wheel sanity); perfsanity on iff a
-  matched block lives in `l0_perf` or `*perf_sanity*`.
-- VG source touched, any outward-facing file → `scope=None`
+- VG source touched, all internal (`examples/visual_gen/**` or
+  `tensorrt_llm/_torch/visual_gen/**`) → `scope=visualgenonly`;
+  sanity off (VG changes don't affect wheel sanity); perfsanity on iff
+  a matched block lives in `l0_perf` or `*perf_sanity*`.
+- VG source touched, any outward-facing path under
+  `tensorrt_llm/visual_gen/` → `scope=None`
   (fallback).
 - VG source touched but no VG block found anywhere (defensive) →
   `scope=None` (fallback).

@@ -20,21 +20,18 @@
 Block selection — entry-pattern based only:
 VisualGen does NOT have its own `condition.terms.backend`; VG test
 entries live in `backend: pytorch` and `backend: tensorrt` blocks.
-A block "belongs to VG" iff any of its `tests:` entries matches one
-of the three stable VG entry path families:
-  - `unittest/_torch/visual_gen/...`         (28 entries)
-  - `examples/test_visual_gen.py...`         (1 entry)
-  - `visual_gen/test_visual_gen_benchmark.py` (1 entry)
+A block "belongs to VG" iff any of its `tests:` entries lives under a
+dedicated `visual_gen/` test path or is the VisualGen perf-sanity entry.
 
 Outward-facing fallback:
 Unlike AutoDeploy, VG is imported eagerly (module-level) by non-VG
 code: `commands/serve.py`, `commands/utils.py`, and
 `serve/openai_server.py` import `VisualGenArgs` / `ParallelConfig` /
 `VisualGen` / `VisualGenParams` at top level. A signature change to
 those symbols can break trtllm-serve startup, which would affect
-non-VG tests. The 5 files that define / re-export those symbols are
-listed in `_VG_OUTWARD_FILES`; touching any of them forces fallback
-even if the rest of the diff is VG-internal.
+non-VG tests. The public API package prefix is listed in
+`_VG_OUTWARD_PREFIXES`; touching any file under it forces fallback even
+if the rest of the diff is VG-internal.
 """
 
 from __future__ import annotations
@@ -55,27 +52,17 @@
     "tensorrt_llm/visual_gen/",
 )
 
-# Files inside _VG_SRC_PREFIXES that are imported eagerly by non-VG
-# code (top-level `from ... import VisualGenArgs / ParallelConfig /
-# VisualGen / VisualGenParams`). Touching any of these can break
-# trtllm-serve / trtllm-bench startup paths, so the rule defers to
-# baseline rather than narrowing.
-_VG_OUTWARD_FILES: frozenset[str] = frozenset(
-    {
-        "tensorrt_llm/_torch/visual_gen/config.py",
-        "tensorrt_llm/visual_gen/__init__.py",
-        "tensorrt_llm/visual_gen/args.py",
-        "tensorrt_llm/visual_gen/params.py",
-        "tensorrt_llm/visual_gen/visual_gen.py",
-    }
-)
+# Public VisualGen API package imported eagerly by non-VG code. Touching
+# any non-doc file under this prefix can break trtllm-serve / trtllm-bench
+# startup paths, so the rule defers to baseline rather than narrowing.
+_VG_OUTWARD_PREFIXES: tuple[str, ...] = ("tensorrt_llm/visual_gen/",)
 
-# Substrings that mark a test entry as VG. Cover all three path
-# families that appear in test-db YAMLs (audited 2026-05).
+# Substrings that mark a test entry as VG. VG tests are expected to live
+# under dedicated visual_gen test directories, except the perf-sanity
+# frontend which stays with the shared perf tests.
 _VG_ENTRY_PATTERNS: tuple[str, ...] = (
-    "unittest/_torch/visual_gen/",
-    "examples/test_visual_gen.py",
-    "visual_gen/test_visual_gen_benchmark.py",
+    "visual_gen/",
+    "perf/test_visual_gen_perf_sanity.py",
 )
 
 
@@ -122,19 +109,19 @@ def apply(self, pr: PRInputs) -> Optional[RuleResult]:
         if not claimed:
             return None
 
-        # Outward-facing VG files break the "self-contained subsystem"
+        # Outward-facing VG paths break the "self-contained subsystem"
         # assumption — they are imported eagerly by trtllm-serve /
         # trtllm-bench. Claim the files (so they don't go unhandled and
         # silently fallback) but emit scope=None so Selector falls back
         # to baseline coverage instead of narrowing to VG-only stages.
-        outward = claimed & _VG_OUTWARD_FILES
+        outward = {f for f in claimed if f.startswith(_VG_OUTWARD_PREFIXES)}
         if outward:
             return RuleResult(
                 handled_files=claimed,
                 affected_stages=set(),
                 scope=None,
                 reason=(
-                    f"visualgen: {len(outward)} outward-facing VG file(s) "
+                    f"visualgen: {len(outward)} outward-facing VG path(s) "
                     f"touched ({sorted(outward)[0]}{'...' if len(outward) > 1 else ''}); "
                     "fallback to baseline"
                 ),

@@ -197,8 +197,8 @@ def visual_gen_command(
     """Benchmark VisualGen (image/video generation) models offline."""
     import yaml
 
-    from tensorrt_llm._torch.visual_gen.config import VisualGenArgs
     from tensorrt_llm.visual_gen import VisualGen, VisualGenParams
+    from tensorrt_llm.visual_gen.args import VisualGenArgs
 
     if prompt is None and prompt_file is None:
         raise click.UsageError("Either --prompt or --prompt_file must be specified.")

@@ -7,8 +7,8 @@
 import click
 from click.core import ParameterSource
 
-from tensorrt_llm._torch.visual_gen.config import ParallelConfig
 from tensorrt_llm.llmapi.utils import download_hf_partial
+from tensorrt_llm.visual_gen.args import ParallelConfig
 
 logger = logging.getLogger(__name__)
 

diff --git a/.../visual_gen_lpips/flux1_lpips_golden.json → .../visual_gen_lpips/flux1_lpips_golden.json b/.../visual_gen_lpips/flux1_lpips_golden.json → .../visual_gen_lpips/flux1_lpips_golden.json
diff --git a/.../visual_gen_lpips/flux2_lpips_golden.json → .../visual_gen_lpips/flux2_lpips_golden.json b/.../visual_gen_lpips/flux2_lpips_golden.json → .../visual_gen_lpips/flux2_lpips_golden.json
diff --git a/...al_gen_lpips/ltx2_lpips_golden_video.json → ...al_gen_lpips/ltx2_lpips_golden_video.json b/...al_gen_lpips/ltx2_lpips_golden_video.json → ...al_gen_lpips/ltx2_lpips_golden_video.json
diff --git a/...n_lpips/visual_gen_lpips_golden_media.zip → ...n_lpips/visual_gen_lpips_golden_media.zip b/...n_lpips/visual_gen_lpips_golden_media.zip → ...n_lpips/visual_gen_lpips_golden_media.zip
diff --git a/...n_lpips/wan21_t2v_lpips_golden_video.json → ...n_lpips/wan21_t2v_lpips_golden_video.json b/...n_lpips/wan21_t2v_lpips_golden_video.json → ...n_lpips/wan21_t2v_lpips_golden_video.json
diff --git a/...n_lpips/wan22_t2v_lpips_golden_video.json → ...n_lpips/wan22_t2v_lpips_golden_video.json b/...n_lpips/wan22_t2v_lpips_golden_video.json → ...n_lpips/wan22_t2v_lpips_golden_video.json
diff --git a/...egration/defs/examples/test_visual_gen.py → ...fs/examples/visual_gen/test_visual_gen.py b/...egration/defs/examples/test_visual_gen.py → ...fs/examples/visual_gen/test_visual_gen.py
diff --git a/...efs/examples/test_visual_gen_multi_gpu.py → ...s/visual_gen/test_visual_gen_multi_gpu.py b/...efs/examples/test_visual_gen_multi_gpu.py → ...s/visual_gen/test_visual_gen_multi_gpu.py
@@ -21,7 +21,7 @@
 import torch
 import torch.distributed as dist
 import torch.multiprocessing as mp
-from defs.examples.test_visual_gen import (
+from defs.examples.visual_gen.test_visual_gen import (
     WAN22_LPIPS_FRAME_RATE,
     WAN22_LPIPS_GUIDANCE_SCALE,
     WAN22_LPIPS_HEIGHT,
@@ -40,8 +40,8 @@
 )
 
 try:
-    from tensorrt_llm._torch.visual_gen.config import ParallelConfig
     from tensorrt_llm._utils import get_free_port
+    from tensorrt_llm.visual_gen.args import ParallelConfig
 
     MODULES_AVAILABLE = True
 except ImportError:

diff --git a/tests/integration/test_lists/test-db/l0_a10.yml b/tests/integration/test_lists/test-db/l0_a10.yml
@@ -114,7 +114,7 @@ l0_a10:
   # visual_gen
   - unittest/_torch/visual_gen/test_visual_gen_params.py
   - unittest/visual_gen/test_output.py
-  - unittest/media/test_encoding.py
+  - unittest/visual_gen/test_media_encoding.py
   - unittest/_torch/visual_gen/test_tensor_payload.py
   # llmapi
   - unittest/llmapi/test_llm_utils.py

diff --git a/tests/integration/test_lists/test-db/l0_b200.yml b/tests/integration/test_lists/test-db/l0_b200.yml
@@ -251,13 +251,13 @@ l0_b200:
   - unittest/_torch/visual_gen/test_wan_transformer.py
   - unittest/_torch/visual_gen/test_cosmos3_transformer.py
   - unittest/_torch/visual_gen/test_cosmos3_pipeline.py
-  - examples/test_visual_gen.py::test_wan_t2v_example
-  - examples/test_visual_gen.py::test_flux1_example
-  - examples/test_visual_gen.py::test_flux2_example
-  - examples/test_visual_gen.py::test_ltx2_example
-  - examples/test_visual_gen.py::test_wan_i2v_example
-  - examples/test_visual_gen.py::test_cosmos3_example
-  # - examples/test_visual_gen.py
+  - examples/visual_gen/test_visual_gen.py::test_wan_t2v_example
+  - examples/visual_gen/test_visual_gen.py::test_flux1_example
+  - examples/visual_gen/test_visual_gen.py::test_flux2_example
+  - examples/visual_gen/test_visual_gen.py::test_ltx2_example
+  - examples/visual_gen/test_visual_gen.py::test_wan_i2v_example
+  - examples/visual_gen/test_visual_gen.py::test_cosmos3_example
+  # - examples/visual_gen/test_visual_gen.py
   # ------------- Host perf module regression tests (6 representative scenarios) ---------------
   - perf/host_perf/test_module_scheduler.py::test_scheduler_production[production_gen_only_bs8]
   - perf/host_perf/test_module_scheduler.py::test_scheduler_production[production_mixed_32gen_4ctx]
@@ -353,13 +353,13 @@ l0_b200:
   - accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_1gpu[v2_kv_cache-True-True-trtllm-auto]
   - accuracy/test_llm_api_pytorch_multimodal.py::TestNanoV3Omni::test_auto_dtype[bf16]
   # ------------- VisualGen single-GPU tests ---------------
-  - examples/test_visual_gen.py::test_visual_gen_quickstart
-  - examples/test_visual_gen.py::test_visual_gen_api_walkthrough
-  - examples/test_visual_gen.py::test_flux1_lpips_against_golden
-  - examples/test_visual_gen.py::test_flux2_lpips_against_golden
-  - examples/test_visual_gen.py::test_ltx2_lpips_against_golden
-  - examples/test_visual_gen.py::test_wan21_t2v_lpips_against_golden
-  - examples/test_visual_gen.py::test_wan22_t2v_lpips_against_golden
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_quickstart
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_api_walkthrough
+  - examples/visual_gen/test_visual_gen.py::test_flux1_lpips_against_golden
+  - examples/visual_gen/test_visual_gen.py::test_flux2_lpips_against_golden
+  - examples/visual_gen/test_visual_gen.py::test_ltx2_lpips_against_golden
+  - examples/visual_gen/test_visual_gen.py::test_wan21_t2v_lpips_against_golden
+  - examples/visual_gen/test_visual_gen.py::test_wan22_t2v_lpips_against_golden
   - visual_gen/test_visual_gen_benchmark.py::test_offline_benchmark
   - visual_gen/test_visual_gen_benchmark.py::test_online_benchmark[openai-videos]
 # ------------- AutoDeploy Backend Stages ---------------

diff --git a/tests/integration/test_lists/test-db/l0_dgx_b200.yml b/tests/integration/test_lists/test-db/l0_dgx_b200.yml
@@ -224,8 +224,8 @@ l0_dgx_b200:
   - accuracy/test_llm_api_pytorch.py::TestMistralLarge3_675B::test_fp8[latency_moe_deepgemm] TIMEOUT (60)
   - accuracy/test_llm_api_pytorch.py::TestNemotronV3Super::test_nvfp4_parallelism[TP8_PP1] TIMEOUT (60)
   - test_e2e.py::test_deepseek_r1_mtp_bench TIMEOUT(60) # Cover https://nvbugs/5670108
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[cfg2_ulysses2_attn2d_2x1]
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[attn2d_2x2_ulysses2]
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[cfg2_ulysses2_attn2d_2x1]
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[attn2d_2x2_ulysses2]
 - condition:
     ranges:
       system_gpu_count:
@@ -309,18 +309,18 @@ l0_dgx_b200:
   - accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=CUTEDSL-mtp_nextn=2-ep4-fp8kv=False-attention_dp=True-cuda_graph=False-overlap_scheduler=False-low_precision_combine=True-torch_compile=False]
   - accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=CUTEDSL-mtp_nextn=2-ep4-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-low_precision_combine=True-torch_compile=False]
   - accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp4_tp2pp2[torch_compile=False-enable_gemm_allreduce_fusion=False]
-  - examples/test_visual_gen.py::test_wan_t2v_example
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[ulysses4]
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[cfg2_ulysses2]
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[attn2d_2x2]
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_tp[tp2]
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_tp[cfg2_tp2]
-  - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_tp[tp2_ulysses2]
-  - examples/test_visual_gen.py::test_vbench_dimension_score_wan
-  - examples/test_visual_gen.py::test_vbench_dimension_score_wan22_a14b_fp8
-  - examples/test_visual_gen.py::test_vbench_dimension_score_wan22_a14b_nvfp4
-  - examples/test_visual_gen.py::test_vbench_dimension_score_ltx2_bf16
-  - examples/test_visual_gen.py::test_vbench_dimension_score_ltx2_fp8
+  - examples/visual_gen/test_visual_gen.py::test_wan_t2v_example
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[ulysses4]
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[cfg2_ulysses2]
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[attn2d_2x2]
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_tp[tp2]
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_tp[cfg2_tp2]
+  - examples/visual_gen/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_tp[tp2_ulysses2]
+  - examples/visual_gen/test_visual_gen.py::test_vbench_dimension_score_wan
+  - examples/visual_gen/test_visual_gen.py::test_vbench_dimension_score_wan22_a14b_fp8
+  - examples/visual_gen/test_visual_gen.py::test_vbench_dimension_score_wan22_a14b_nvfp4
+  - examples/visual_gen/test_visual_gen.py::test_vbench_dimension_score_ltx2_bf16
+  - examples/visual_gen/test_visual_gen.py::test_vbench_dimension_score_ltx2_fp8
   - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention_4gpus[target_sparsity_0.5-fp8kv=False]
   - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention_4gpus[target_sparsity_0.5-fp8kv=True]
   - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention_4gpus[target_sparsity_0.9-fp8kv=False]

diff --git a/tests/integration/test_lists/test-db/l0_gh200.yml b/tests/integration/test_lists/test-db/l0_gh200.yml
@@ -23,8 +23,8 @@ l0_gh200:
   - unittest/bindings
   - unittest/llmapi/test_llm_quant.py
   - llmapi/test_llm_examples.py::test_llmapi_quickstart_atexit
-  - examples/test_visual_gen.py::test_visual_gen_quickstart
-  - examples/test_visual_gen.py::test_visual_gen_api_walkthrough
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_quickstart
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_api_walkthrough
   - unittest/test_model_runner_cpp.py
   - accuracy/test_cli_flow.py::TestGptNext::test_auto_dtype
   - examples/test_medusa.py::test_llm_medusa_with_qaunt_base_model_1gpu[fp8-use_py_session-medusa-vicuna-7b-v1.3-4-heads-float16-bs1] TIMEOUT (90)

diff --git a/tests/integration/test_lists/test-db/l0_h100.yml b/tests/integration/test_lists/test-db/l0_h100.yml
@@ -317,8 +317,8 @@ l0_h100:
   - unittest/test_model_runner_cpp.py
   - unittest/llmapi/test_llm_quant.py # 5.5 mins on H100
   - llmapi/test_llm_examples.py::test_llmapi_quickstart_atexit
-  - examples/test_visual_gen.py::test_visual_gen_quickstart
-  - examples/test_visual_gen.py::test_visual_gen_api_walkthrough
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_quickstart
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_api_walkthrough
   - unittest/trt/attention/test_gpt_attention_IFB.py
   - accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_fp8_prequantized
   - examples/test_multimodal.py::test_llm_multimodal_general[Llama-3.2-11B-Vision-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1]

diff --git a/tests/integration/test_lists/test-db/l0_l40s.yml b/tests/integration/test_lists/test-db/l0_l40s.yml
@@ -61,8 +61,8 @@ l0_l40s:
   - examples/test_llama.py::test_llm_llama_v3_dora_1gpu[commonsense-llama-v3-8b-dora-r32-llama-v3-8b-hf-base_fp16]
   - examples/test_nemotron_nas.py::test_nemotron_nas_summary_1gpu[DeciLM-7B]
   - llmapi/test_llm_examples.py::test_llmapi_quickstart
-  - examples/test_visual_gen.py::test_visual_gen_quickstart
-  - examples/test_visual_gen.py::test_visual_gen_api_walkthrough
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_quickstart
+  - examples/visual_gen/test_visual_gen.py::test_visual_gen_api_walkthrough
   - llmapi/test_llm_examples.py::test_llmapi_example_inference
   - llmapi/test_llm_examples.py::test_llmapi_example_inference_async
   - llmapi/test_llm_examples.py::test_llmapi_example_inference_async_streaming