Skip to content

Commit db4f329

Browse files
committed
Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B200 TRT
1 parent bc2cc68 commit db4f329

2 files changed

Lines changed: 12 additions & 5 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1801,7 +1801,7 @@ dsv4-fp4-b200-vllm-agentic:
18011801
- { tp: 8, ep: 8, dp-attn: true, offloading: cpu, conc-list: [64, 128, 256] }
18021802

18031803
dsv4-fp4-b200-trt:
1804-
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715
1804+
image: nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1
18051805
model: deepseek-ai/DeepSeek-V4-Pro
18061806
model-prefix: dsv4
18071807
runner: b200-dsv4
@@ -1814,15 +1814,15 @@ dsv4-fp4-b200-trt:
18141814
osl: 1024
18151815
search-space:
18161816
- { tp: 8, conc-start: 1, conc-end: 32 }
1817-
- { tp: 8, ep: 8, dp-attn: true, conc-start: 32, conc-end: 2048 }
1817+
- { tp: 8, ep: 8, dp-attn: true, conc-start: 32, conc-end: 128 }
18181818
- isl: 8192
18191819
osl: 1024
18201820
search-space:
18211821
- { tp: 8, conc-start: 1, conc-end: 32 }
1822-
- { tp: 8, ep: 8, dp-attn: true, conc-start: 32, conc-end: 1024 }
1822+
- { tp: 8, ep: 8, dp-attn: true, conc-start: 32, conc-end: 256 }
18231823

18241824
dsv4-fp4-b200-trt-mtp:
1825-
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715
1825+
image: nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1
18261826
model: deepseek-ai/DeepSeek-V4-Pro
18271827
model-prefix: dsv4
18281828
runner: b200-dsv4
@@ -1835,7 +1835,7 @@ dsv4-fp4-b200-trt-mtp:
18351835
osl: 1024
18361836
search-space:
18371837
- { tp: 8, conc-start: 1, conc-end: 32, spec-decoding: mtp }
1838-
- { tp: 8, ep: 8, dp-attn: true, conc-start: 32, conc-end: 512, spec-decoding: mtp }
1838+
- { tp: 8, ep: 8, dp-attn: true, conc-start: 32, conc-end: 128, spec-decoding: mtp }
18391839
- isl: 8192
18401840
osl: 1024
18411841
search-space:

perf-changelog.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3458,3 +3458,10 @@
34583458
- "Add MiniMax-M2.5 FP8 GB200 disaggregated multinode vLLM benchmarks via Dynamo"
34593459
- "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/"
34603460
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1648
3461+
3462+
- config-keys:
3463+
- dsv4-fp4-b200-trt
3464+
- dsv4-fp4-b200-trt-mtp
3465+
description:
3466+
- "Use official TRT-LLM release image (nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc15.post1) for B200 DeepSeek-V4-Pro TRT configs, replacing the custom ghcr.io feat/deepseek_v4 build (9aa3715)."
3467+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PENDING

0 commit comments

Comments
 (0)