11version : 0.0.1
22l0_dgx_b200 :
3+ - condition :
4+ ranges :
5+ system_gpu_count :
6+ gte : 2
7+ lte : 2
8+ wildcards :
9+ gpu :
10+ - ' *b200*'
11+ linux_distribution_name : ubuntu*
12+ cpu : x86_64
13+ terms :
14+ stage : pre_merge
15+ backend : pytorch
16+ orchestrator : mpi
17+ tests :
18+ - unittest/_torch/misc/test_autotuner.py::test_autotuner_distributed_strategy
19+ - accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_bf16[tp2-CUTLASS]
20+ - accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_bf16[tp2-TRTLLM]
21+ # ------------- KV Cache V2 Scheduler IT (multi-GPU) ---------------
22+ - kv_cache/test_kv_cache_v2_scheduler.py::TestKVCacheV2DSv3Lite::test_mtp_draft_tokens
23+ - kv_cache/test_kv_cache_v2_scheduler.py::TestKVCacheV2DSv3Lite::test_mtp_chunked_draft_tokens
24+ - kv_cache/test_kv_cache_v2_scheduler.py::TestKVCacheV2DSv3Lite::test_mtp_eviction
25+ # ------------- VisualGen multi-GPU tests ---------------
26+ - unittest/_torch/visual_gen/test_flux_pipeline.py::TestFluxParallelism::test_ulysses_2gpu_correctness
27+ - unittest/_torch/visual_gen/test_flux_pipeline.py::TestFluxCombinedOptimizations::test_all_optimizations_combined
328- condition :
429 ranges :
530 system_gpu_count :
@@ -15,7 +40,6 @@ l0_dgx_b200:
1540 backend : pytorch
1641 orchestrator : mpi
1742 tests :
18- - unittest/_torch/misc/test_autotuner.py::test_autotuner_distributed_strategy
1943 - accuracy/test_llm_api_pytorch.py::TestNemotronV3Super::test_auto_dtype_4gpus[4-4-False-True-True]
2044 - accuracy/test_llm_api_pytorch.py::TestNemotronV3Super::test_auto_dtype_4gpus[4-4-True-True-True]
2145 - accuracy/test_llm_api_pytorch.py::TestNemotronV3Super::test_nvfp4_4gpu_mtp_ar TIMEOUT (60)
@@ -30,8 +54,6 @@ l0_dgx_b200:
3054 - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[dep4_latency_moe_trtllm-torch_compile=False]
3155 - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[dep4_latency_moe_cutlass-torch_compile=False]
3256 - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[dep4_latency_moe_cutlass-torch_compile=True]
33- - accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_bf16[tp2-CUTLASS]
34- - accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_bf16[tp2-TRTLLM]
3557 - disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_ucx[DeepSeek-V3-Lite-fp8]
3658 - disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8]
3759 - disaggregated/test_disaggregated.py::test_disaggregated_gpt_oss_120b_harmony[gpt_oss/gpt-oss-120b]
@@ -42,10 +64,6 @@ l0_dgx_b200:
4264 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus_python_scheduler[ep4-mtp_nextn=2]
4365 - accuracy/test_llm_api_pytorch.py::TestMiniMaxM2::test_4gpus[attention_dp=False-cuda_graph=True-overlap_scheduler=True-tp_size=4-ep_size=4] TIMEOUT (60)
4466 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[pp4-mtp_nextn=0-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False] TIMEOUT (60)
45- # ------------- KV Cache V2 Scheduler IT (multi-GPU) ---------------
46- - kv_cache/test_kv_cache_v2_scheduler.py::TestKVCacheV2DSv3Lite::test_mtp_draft_tokens
47- - kv_cache/test_kv_cache_v2_scheduler.py::TestKVCacheV2DSv3Lite::test_mtp_chunked_draft_tokens
48- - kv_cache/test_kv_cache_v2_scheduler.py::TestKVCacheV2DSv3Lite::test_mtp_eviction
4967 # ------------- NVBug 6025177: trtllm-serve cross-request KV contamination (OpenAI) ---------------
5068 - test_e2e.py::test_openai_kv_cache_contamination TIMEOUT (120)
5169- condition :
@@ -81,7 +99,6 @@ l0_dgx_b200:
8199 - unittest/_torch/modules/moe/test_moe_module.py::test_configurable_moe_multi_gpu -k "DEEPGEMM and not MEGAMOE_DEEPGEMM"
82100 # --- MEGAMOE_DEEPGEMM (W4A8_MXFP4_MXFP8 only) ---
83101 - unittest/_torch/modules/moe/test_moe_module.py::test_configurable_moe_multi_gpu -k "MEGAMOE_DEEPGEMM"
84- - unittest/_torch/modules/moe/test_moe_module.py::test_configurable_moe_single_gpu -k "MEGAMOE_DEEPGEMM"
85102 # ------------- MoE: test_multi_gpu_eplb ---------------
86103 - unittest/_torch/modules/moe/test_moe_module.py::test_configurable_moe_multi_gpu_eplb
87104- condition :
@@ -165,8 +182,6 @@ l0_dgx_b200:
165182 - accuracy/test_disaggregated_serving.py::TestQwen3NextInstruct::test_auto_dtype[use_py_transceiver=False] TIMEOUT (60)
166183 # ------------- VisualGen multi-GPU tests ---------------
167184 - unittest/_torch/visual_gen/multi_gpu
168- - unittest/_torch/visual_gen/test_flux_pipeline.py::TestFluxParallelism::test_ulysses_2gpu_correctness
169- - unittest/_torch/visual_gen/test_flux_pipeline.py::TestFluxCombinedOptimizations::test_all_optimizations_combined
170185- condition :
171186 ranges :
172187 system_gpu_count :
@@ -192,7 +207,6 @@ l0_dgx_b200:
192207 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_fp8_blockscale[baseline_fp8kv] TIMEOUT (60)
193208 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_fp8_blockscale[latency] TIMEOUT (60)
194209 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_fp8_blockscale[disable_skip_indexer] TIMEOUT (60)
195- - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_nvfp4_attn_multi_gpus TIMEOUT (60)
196210 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_nvfp4_multi_gpus[baseline_fp8kv] TIMEOUT (60)
197211 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_nvfp4_multi_gpus[latency] TIMEOUT (60)
198212 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_nvfp4_multi_gpus[disable_skip_indexer] TIMEOUT (60)
@@ -305,30 +319,22 @@ l0_dgx_b200:
305319 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=CUTEDSL-mtp_nextn=2-ep4-fp8kv=False-attention_dp=True-cuda_graph=False-overlap_scheduler=False-low_precision_combine=True-torch_compile=False]
306320 - accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=CUTEDSL-mtp_nextn=2-ep4-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-low_precision_combine=True-torch_compile=False]
307321 - accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp4_tp2pp2[torch_compile=False-enable_gemm_allreduce_fusion=False]
308- - examples/test_visual_gen.py::test_visual_gen_quickstart
309- - examples/test_visual_gen.py::test_visual_gen_api_walkthrough
310322 - examples/test_visual_gen.py::test_wan_t2v_example
311- - examples/test_visual_gen.py::test_flux1_lpips_against_golden
312- - examples/test_visual_gen.py::test_flux2_lpips_against_golden
313- - examples/test_visual_gen.py::test_ltx2_lpips_against_golden
314- - examples/test_visual_gen.py::test_wan21_t2v_lpips_against_golden
315- - examples/test_visual_gen.py::test_wan22_t2v_lpips_against_golden
316323 - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[ulysses4]
317324 - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[cfg2_ulysses2]
318325 - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[ulysses2_ring2]
319326 - examples/test_visual_gen_multi_gpu.py::test_wan22_t2v_lpips_against_golden_multi_gpu[attn2d_2x2]
320327 - examples/test_visual_gen.py::test_vbench_dimension_score_wan
321328 - examples/test_visual_gen.py::test_vbench_dimension_score_wan22_a14b_fp8
322329 - examples/test_visual_gen.py::test_vbench_dimension_score_wan22_a14b_nvfp4
323- - visual_gen/test_visual_gen_benchmark.py::test_offline_benchmark
324- - visual_gen/test_visual_gen_benchmark.py::test_online_benchmark[openai-videos]
325330 - examples/test_visual_gen.py::test_vbench_dimension_score_ltx2_bf16
326331 - examples/test_visual_gen.py::test_vbench_dimension_score_ltx2_fp8
327332 - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention_4gpus[target_sparsity_0.5-fp8kv=False]
328333 - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention_4gpus[target_sparsity_0.5-fp8kv=True]
329334 - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention_4gpus[target_sparsity_0.9-fp8kv=False]
330335 - accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention_4gpus[target_sparsity_0.9-fp8kv=True]
331336 - disaggregated/test_disaggregated.py::test_disaggregated_mamba_conc_greater_than_mbs[NVIDIA-Nemotron-3-Super-120B-A12B-FP8]
337+ - accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_nvfp4_attn_multi_gpus TIMEOUT (60)
332338# ------------- AutoDeploy Backend Stages ---------------
333339- condition :
334340 ranges :
0 commit comments