Skip to content

Commit 3e4d6dd

Browse files
wzhao18github-actions[bot]cquil11claude
authored
Updated DSv4 vllm B300 MTP (#1271)
* add DP to b300 mtp * Update changelog * Update Docker image version for dsv4-fp4-b300-vllm-mtp * Update Docker image version to v0.20.2 * Modify search-space parameters in nvidia-master.yaml * Merge duplicate DP_ATTENTION conditions in benchmark script Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com>
1 parent 5fe6d56 commit 3e4d6dd

3 files changed

Lines changed: 21 additions & 4 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2825,7 +2825,7 @@ dsv4-fp4-b300-trt-mtp:
28252825
- { tp: 8, ep: 8, dp-attn: true, conc-start: 256, conc-end: 1024, spec-decoding: mtp }
28262826

28272827
dsv4-fp4-b300-vllm-mtp:
2828-
image: vllm/vllm-openai:v0.20.0-cu130
2828+
image: vllm/vllm-openai:v0.20.2
28292829
model: deepseek-ai/DeepSeek-V4-Pro
28302830
model-prefix: dsv4
28312831
runner: b300
@@ -2838,13 +2838,15 @@ dsv4-fp4-b300-vllm-mtp:
28382838
osl: 1024
28392839
search-space:
28402840
- { tp: 4, conc-start: 1, conc-end: 256, spec-decoding: mtp }
2841-
- { tp: 8, conc-start: 1, conc-end: 64, spec-decoding: mtp }
2841+
- { tp: 8, conc-start: 1, conc-end: 8, spec-decoding: mtp }
2842+
- { tp: 4, ep: 4, dp-attn: true, conc-start: 256, conc-end: 1024, spec-decoding: mtp }
28422843
- isl: 8192
28432844
osl: 1024
28442845
search-space:
28452846
- { tp: 4, conc-start: 1, conc-end: 64, spec-decoding: mtp }
2846-
- { tp: 8, conc-start: 1, conc-end: 64, spec-decoding: mtp }
2847+
- { tp: 8, conc-start: 1, conc-end: 8, spec-decoding: mtp }
28472848
- { tp: 4, ep: 4, conc-start: 64, conc-end: 256, spec-decoding: mtp }
2849+
- { tp: 4, ep: 4, dp-attn: true, conc-start: 256, conc-end: 512, spec-decoding: mtp }
28482850

28492851
qwen3.5-fp8-h200-sglang:
28502852
image: lmsysorg/sglang:v0.5.9-cu129-amd64

benchmarks/single_node/dsv4_fp4_b300_vllm_mtp.sh

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,14 @@ if [ "${EP_SIZE:-1}" -gt 1 ]; then
3636
EP_ARGS=(--enable-expert-parallel)
3737
fi
3838

39-
MAX_NUM_BATCHED_TOKENS=$(( ISL * 2 ))
39+
MOE_ARGS=()
40+
if [ "${DP_ATTENTION}" = "true" ]; then
41+
MOE_ARGS=(--moe-backend deep_gemm_mega_moe)
42+
MAX_NUM_BATCHED_TOKENS=2048
43+
else
44+
MAX_NUM_BATCHED_TOKENS=$(( ISL * 2 ))
45+
fi
46+
4047
BENCHMARK_MAX_MODEL_LEN=$MAX_MODEL_LEN
4148

4249
if [ "${EVAL_ONLY}" = "true" ]; then
@@ -61,6 +68,7 @@ vllm serve "$MODEL" --host 0.0.0.0 --port "$PORT" \
6168
--block-size 256 \
6269
--no-enable-prefix-caching \
6370
"${EP_ARGS[@]}" \
71+
"${MOE_ARGS[@]}" \
6472
--compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE","custom_ops":["all"]}' \
6573
--attention_config.use_fp4_indexer_cache True \
6674
--tokenizer-mode deepseek_v4 \

perf-changelog.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2486,3 +2486,10 @@
24862486
description:
24872487
- "Update SGLang image from v0.5.9-cu130 to v0.5.11-cu130"
24882488
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1322
2489+
2490+
- config-keys:
2491+
- dsv4-fp4-b300-vllm-mtp
2492+
description:
2493+
- "Update image tag to vllm/vllm-openai:v0.20.2"
2494+
- "Add DEP configs for B300 vLLM MTP"
2495+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1271

0 commit comments

Comments
 (0)