Skip to content

Commit 2e2f876

Browse files
committed
fix(profile): switch Flash vLLM MTP to DEP8
1 parent f9d6523 commit 2e2f876

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2073,8 +2073,9 @@ dsv4-flash-fp4-b300-vllm:
20732073
search-space:
20742074
- { tp: 4, ep: 1, conc-start: 64, conc-end: 64 }
20752075

2076-
# Targeted Flash vLLM MTP profile at the same single-point profile location.
2077-
# The shared vLLM MTP launcher selects 3 speculative tokens for this model.
2076+
# Targeted Flash vLLM MTP DEP8 profile at the same single-point profile
2077+
# location. The shared launcher maps dp-attn=true to DP without TP, and selects
2078+
# 3 speculative tokens for this model.
20782079
dsv4-flash-fp4-b300-vllm-mtp:
20792080
image: vllm/vllm-openai:v0.21.0
20802081
model: deepseek-ai/DeepSeek-V4-Flash
@@ -2088,7 +2089,7 @@ dsv4-flash-fp4-b300-vllm-mtp:
20882089
- isl: 1024
20892090
osl: 1024
20902091
search-space:
2091-
- { tp: 4, ep: 1, conc-start: 64, conc-end: 64, spec-decoding: mtp }
2092+
- { tp: 8, ep: 8, dp-attn: true, conc-start: 64, conc-end: 64, spec-decoding: mtp }
20922093

20932094
# Targeted Flash MTP profile: DEP4 at the same 1k1k conc=64 point as the
20942095
# non-MTP Flash profile above. The shared SGLang MTP launcher selects the

0 commit comments

Comments
 (0)