Skip to content

Commit a7bbbf5

Browse files
Fix deprecated SGLang flags: replace --enable-ep-moe with --ep-size 8 and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
1 parent eb8c247 commit a7bbbf5

2 files changed

Lines changed: 3 additions & 3 deletions

File tree

benchmarks/dsr1_fp4_b200_docker.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,6 @@ PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path $MODEL --host 0.
2121
--tensor-parallel-size=$TP --data-parallel-size=1 \
2222
--cuda-graph-max-bs 256 --max-running-requests 256 --mem-fraction-static 0.85 --kv-cache-dtype fp8_e4m3 \
2323
--chunked-prefill-size 16384 \
24-
--enable-ep-moe --quantization modelopt_fp4 --enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
25-
--enable-symm-mem --disable-radix-cache --attention-backend trtllm_mla --enable-flashinfer-trtllm-moe --stream-interval 10
24+
--ep-size 8 --quantization modelopt_fp4 --enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
25+
--enable-symm-mem --disable-radix-cache --attention-backend trtllm_mla --moe-runner-backend flashinfer_trtllm --stream-interval 10
2626

benchmarks/dsr1_fp8_b200_docker.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path=$MODEL --host=0.
3434
--cuda-graph-max-bs 128 --max-running-requests 128 \
3535
--mem-fraction-static 0.82 --kv-cache-dtype fp8_e4m3 --chunked-prefill-size 32768 --max-prefill-tokens 32768 \
3636
--enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL --disable-radix-cache \
37-
--attention-backend trtllm_mla --stream-interval 30 --enable-flashinfer-trtllm-moe --quantization fp8
37+
--attention-backend trtllm_mla --stream-interval 30 --moe-runner-backend flashinfer_trtllm --quantization fp8

0 commit comments

Comments
 (0)