Skip to content

Commit 4e8f404

Browse files
JadoTuGitLab CI Bot
authored andcommitted
[https://nvbugs/6281014][fix] fix the repeated cute.compile and simpilify the test (NVIDIA#15331)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com> Signed-off-by: Jian Tu <107457950+JadoTu@users.noreply.github.com> Signed-off-by: GitLab CI Bot <gitlab-ci@nvidia.com>
1 parent 6d981f0 commit 4e8f404

6 files changed

Lines changed: 3 additions & 26 deletions

File tree

tests/integration/defs/accuracy/references/mmlu.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -285,9 +285,6 @@ Qwen3/Qwen3-Next-80B-A3B-Thinking:
285285
- accuracy: 86
286286
Qwen3/Qwen3-Next-80B-A3B-Instruct:
287287
- accuracy: 86.03
288-
- quant_algo: NVFP4
289-
kv_cache_quant_algo: FP8
290-
accuracy: 85.08
291288
Qwen/Qwen3.5-397B-A17B:
292289
- quant_algo: NVFP4
293290
kv_cache_quant_algo: FP8

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5795,15 +5795,11 @@ def test_bf16_4gpu(self, tp_size, pp_size, ep_size, cuda_graph,
57955795
"tp_size,pp_size,ep_size,cuda_graph,overlap_scheduler,attention_dp,enable_block_reuse",
57965796
[
57975797
(1, 1, 1, True, True, False, True),
5798-
(1, 1, 1, True, True, False, False),
5799-
(4, 1, 1, True, True, False, False),
58005798
(4, 1, 4, True, True, True, False),
5801-
(4, 1, 4, True, True, False, False),
5802-
(4, 1, 4, False, False, False, False),
58035799
],
58045800
ids=[
5805-
"tp1_block_reuse", "tp1", "tp4ep1", "tp4ep4_adp_on",
5806-
"tp4ep4_adp_off", "no_cuda_graph_overlap"
5801+
"tp1_block_reuse",
5802+
"tp4ep4_adp_on",
58075803
])
58085804
def test_nvfp4(self, moe_backend, tp_size, pp_size, ep_size, cuda_graph,
58095805
overlap_scheduler, attention_dp, enable_block_reuse, mocker):
@@ -5820,7 +5816,7 @@ def test_nvfp4(self, moe_backend, tp_size, pp_size, ep_size, cuda_graph,
58205816
kv_cache_config.mamba_state_cache_interval = 256
58215817
pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler,
58225818
cuda_graph_config=CudaGraphConfig(
5823-
max_batch_size=512, enable_padding=False)
5819+
max_batch_size=512, enable_padding=True)
58245820
if cuda_graph else None)
58255821
moe_config = MoeConfig(backend=moe_backend)
58265822

@@ -5833,8 +5829,6 @@ def test_nvfp4(self, moe_backend, tp_size, pp_size, ep_size, cuda_graph,
58335829
enable_attention_dp=attention_dp,
58345830
**pytorch_config,
58355831
moe_config=moe_config) as llm:
5836-
task = MMLU(self.MODEL_NAME)
5837-
task.evaluate(llm)
58385832
mocker.patch.object(GSM8K, "MAX_OUTPUT_LEN",
58395833
self.GSM8K_MAX_OUTPUT_LEN)
58405834
task = GSM8K(self.MODEL_NAME)

tests/integration/test_lists/qa/llm_function_core.txt

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -789,11 +789,6 @@ accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_w4a8_mxfp4[fp8-latency]
789789
accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_w4a8_mxfp4[mxfp8-latency]
790790
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_off]
791791
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_on]
792-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[no_cuda_graph_overlap-cutlass]
793-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp1-cutlass]
794-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep1-cutlass]
795-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_off-cutlass]
796-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_off-trtllm]
797792
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_on-cutlass]
798793
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_on-trtllm]
799794
accuracy/test_llm_api_pytorch.py::TestQwen3NextThinking::test_auto_dtype[tp4ep4]

tests/integration/test_lists/test-db/l0_gb200_multi_gpus.yml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -76,13 +76,8 @@ l0_gb200_multi_gpus:
7676
- accuracy/test_llm_api_pytorch.py::TestQwen3NextThinking::test_auto_dtype[tp4ep4]
7777
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_off]
7878
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_on]
79-
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp1-cutlass]
80-
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep1-cutlass]
8179
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_on-cutlass]
82-
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_off-cutlass]
83-
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[no_cuda_graph_overlap-cutlass]
8480
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_on-trtllm]
85-
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_off-trtllm]
8681
- accuracy/test_llm_api_pytorch_multimodal.py::TestMistralLarge3_675B::test_nvfp4_4gpus[latency_moe_trtllm] TIMEOUT (90)
8782
- accuracy/test_dwdp_disaggregated_serving.py::TestDwdpDeepSeekV3Lite::test_dwdp_accuracy
8883
- accuracy/test_dwdp_disaggregated_serving.py::TestDwdpDeepSeekV3Lite::test_dwdp_accuracy_contention_opt

tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ l0_rtx_pro_6000:
5252
backend: pytorch
5353
tests:
5454
- test_e2e.py::test_ptp_quickstart_advanced[GPT-OSS-120B-gpt_oss/gpt-oss-120b]
55-
- accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp1-cutlass]
5655
- accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_1gpu[v1_kv_cache-True-True-cutlass-auto]
5756
- accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[latency_moe_cutlass-torch_compile=False] # 8mins
5857
- accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[latency_moe_cutlass-torch_compile=True] # 8 mins

tests/integration/test_lists/waives.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,6 @@ accuracy/test_llm_api_pytorch.py::TestMistralLarge3_675B::test_nvfp4_4gpus[laten
7373
accuracy/test_llm_api_pytorch.py::TestPhi4MiniInstruct::test_auto_dtype SKIP (https://nvbugs/6076767)
7474
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_off] SKIP (https://nvbugs/6255417)
7575
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_on] SKIP (https://nvbugs/6094068)
76-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[no_cuda_graph_overlap-cutlass] SKIP (https://nvbugs/6281014)
77-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp1-cutlass] SKIP (https://nvbugs/6116088)
78-
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep1-cutlass] SKIP (https://nvbugs/6281014)
7976
accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_nvfp4[tp4ep4_adp_on-trtllm] SKIP (https://nvbugs/6094068)
8077
accuracy/test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_fp8[latency] SKIP (https://nvbugs/6177390)
8178
accuracy/test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_fp8[throughput_latency] SKIP (https://nvbugs/6177390)

0 commit comments

Comments
 (0)