Skip to content

Commit d22bf9c

Browse files
cquil11CopilotCopilot
authored
chore: refactor Docker runner launch to be like SLURM (#227)
* initial poc * remove -d flag when launching docker container * syntax error * compatibility fixes * add correct endpoint prefix * remove reference env var * run vllm serve in background * unescape sequences * stop vllm to stdout after it stops * stop vllm to stdout after it stops pt 2 * get rid of docker stop as no longer in detatched * clone bench serving to tmp dir * clone bench serving to tmp dir pt 2 * add explanatory comment * cleaning up * cleaning up * adding mi355x refactor * adding h200 initial refactor * different way to see server logs * cleanup * now fail if server fails * starting on b200 * doign b200 * reverting erroneous change * fixing b200 * fixing b200 pt 2 * updating mi300 * updating mi300 pt 2 * updating mi300 pt 3 -- remove detached mode * cleaning up mi355x * fixing mi300x and updating 325x * reverting max conc to 512 on gptoss fp4 b200 docker * fixing mi300x and updating 325x * cleanng up * add wait for h200 slurm dsr1 * max num seqs back to 512 for gptoss fpr b200 docker * fix port issue for dsr1 mi300x docker * fix mi355x docker NUM_PROMPTS * adding prop of failure for server logs * add utils function for benchmark * add utils function for benchmark * function-ize the waiting for server to start * dont show arg parsing set -x * dont show arg parsing set +x oops * dont show arg parsing set +x oops * capture server pid * nebdius dont scancel * changes to comments in benchmark lib . sh * Update benchmarks/dsr1_fp4_mi355x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update .github/workflows/benchmark-tmpl.yml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * adding back whitespace * adding back whitespace * adding back whitespace * remove tg launch script * Update benchmarks/gptoss_fp4_h100_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/dsr1_fp8_mi325x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/dsr1_fp8_mi355x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/gptoss_fp4_b200_trt_slurm.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Audit and correct required environment variables documentation in all benchmark scripts (#252) * Initial plan * Update required env vars documentation in all benchmark scripts Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * Fix required env vars - remove NF, PREFILL_SIZE, and correct PORT/PORT_OFFSET Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * Remove internally-calculated vars from required env vars (EXTRA_CONFIG_FILE, MAX_NUM_TOKENS, MOE_BACKEND) Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * removing oci node rebase with main --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
1 parent 1b1bc29 commit d22bf9c

35 files changed

Lines changed: 923 additions & 694 deletions

benchmarks/benchmark_lib.sh

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
#!/usr/bin/env bash
2+
3+
# Shared benchmarking utilities for InferenceMAX
4+
5+
# Wait for server to be ready by polling the health endpoint
6+
# All parameters are required
7+
# Parameters:
8+
# --port: Server port
9+
# --server-log: Path to server log file
10+
# --server-pid: Server process ID (required)
11+
# --sleep-interval: Sleep interval between health checks (optional, default: 5)
12+
wait_for_server_ready() {
13+
set +x
14+
local port=""
15+
local server_log=""
16+
local server_pid=""
17+
local sleep_interval=5
18+
19+
# Parse arguments
20+
while [[ $# -gt 0 ]]; do
21+
case $1 in
22+
--port)
23+
port="$2"
24+
shift 2
25+
;;
26+
--server-log)
27+
server_log="$2"
28+
shift 2
29+
;;
30+
--server-pid)
31+
server_pid="$2"
32+
shift 2
33+
;;
34+
--sleep-interval)
35+
sleep_interval="$2"
36+
shift 2
37+
;;
38+
*)
39+
echo "Unknown parameter: $1"
40+
return 1
41+
;;
42+
esac
43+
done
44+
45+
# Validate required parameters
46+
if [[ -z "$port" ]]; then
47+
echo "Error: --port is required"
48+
return 1
49+
fi
50+
if [[ -z "$server_log" ]]; then
51+
echo "Error: --server-log is required"
52+
return 1
53+
fi
54+
if [[ -z "$server_pid" ]]; then
55+
echo "Error: --server-pid is required"
56+
return 1
57+
fi
58+
59+
# Show logs until server is ready
60+
tail -f "$server_log" &
61+
local TAIL_PID=$!
62+
until curl --output /dev/null --silent --fail http://0.0.0.0:$port/health; do
63+
if ! kill -0 "$server_pid" 2>/dev/null; then
64+
echo "Server died before becoming healthy. Exiting."
65+
kill $TAIL_PID
66+
exit 1
67+
fi
68+
sleep "$sleep_interval"
69+
done
70+
kill $TAIL_PID
71+
}
72+
73+
# Run benchmark serving with standardized parameters
74+
# All parameters are required
75+
# Parameters:
76+
# --model: Model name
77+
# --port: Server port
78+
# --backend: Backend type - e.g., 'vllm' or 'openai'
79+
# --input-len: Random input sequence length
80+
# --output-len: Random output sequence length
81+
# --random-range-ratio: Random range ratio
82+
# --num-prompts: Number of prompts
83+
# --max-concurrency: Max concurrency
84+
# --result-filename: Result filename without extension
85+
# --result-dir: Result directory
86+
run_benchmark_serving() {
87+
set +x
88+
local model=""
89+
local port=""
90+
local backend=""
91+
local input_len=""
92+
local output_len=""
93+
local random_range_ratio=""
94+
local num_prompts=""
95+
local max_concurrency=""
96+
local result_filename=""
97+
local result_dir=""
98+
99+
# Parse arguments
100+
while [[ $# -gt 0 ]]; do
101+
case $1 in
102+
--model)
103+
model="$2"
104+
shift 2
105+
;;
106+
--port)
107+
port="$2"
108+
shift 2
109+
;;
110+
--backend)
111+
backend="$2"
112+
shift 2
113+
;;
114+
--input-len)
115+
input_len="$2"
116+
shift 2
117+
;;
118+
--output-len)
119+
output_len="$2"
120+
shift 2
121+
;;
122+
--random-range-ratio)
123+
random_range_ratio="$2"
124+
shift 2
125+
;;
126+
--num-prompts)
127+
num_prompts="$2"
128+
shift 2
129+
;;
130+
--max-concurrency)
131+
max_concurrency="$2"
132+
shift 2
133+
;;
134+
--result-filename)
135+
result_filename="$2"
136+
shift 2
137+
;;
138+
--result-dir)
139+
result_dir="$2"
140+
shift 2
141+
;;
142+
*)
143+
echo "Unknown parameter: $1"
144+
return 1
145+
;;
146+
esac
147+
done
148+
149+
# Validate all required parameters
150+
if [[ -z "$model" ]]; then
151+
echo "Error: --model is required"
152+
return 1
153+
fi
154+
if [[ -z "$port" ]]; then
155+
echo "Error: --port is required"
156+
return 1
157+
fi
158+
if [[ -z "$backend" ]]; then
159+
echo "Error: --backend is required"
160+
return 1
161+
fi
162+
if [[ -z "$input_len" ]]; then
163+
echo "Error: --input-len is required"
164+
return 1
165+
fi
166+
if [[ -z "$output_len" ]]; then
167+
echo "Error: --output-len is required"
168+
return 1
169+
fi
170+
if [[ -z "$random_range_ratio" ]]; then
171+
echo "Error: --random-range-ratio is required"
172+
return 1
173+
fi
174+
if [[ -z "$num_prompts" ]]; then
175+
echo "Error: --num-prompts is required"
176+
return 1
177+
fi
178+
if [[ -z "$max_concurrency" ]]; then
179+
echo "Error: --max-concurrency is required"
180+
return 1
181+
fi
182+
if [[ -z "$result_filename" ]]; then
183+
echo "Error: --result-filename is required"
184+
return 1
185+
fi
186+
if [[ -z "$result_dir" ]]; then
187+
echo "Error: --result-dir is required"
188+
return 1
189+
fi
190+
191+
# Clone benchmark serving repo
192+
local BENCH_SERVING_DIR=$(mktemp -d /tmp/bmk-XXXXXX)
193+
git clone https://github.com/kimbochen/bench_serving.git "$BENCH_SERVING_DIR"
194+
195+
# Run benchmark
196+
set -x
197+
python3 "$BENCH_SERVING_DIR/benchmark_serving.py" \
198+
--model "$model" \
199+
--backend "$backend" \
200+
--base-url "http://0.0.0.0:$port" \
201+
--dataset-name random \
202+
--random-input-len "$input_len" \
203+
--random-output-len "$output_len" \
204+
--random-range-ratio "$random_range_ratio" \
205+
--num-prompts "$num_prompts" \
206+
--max-concurrency "$max_concurrency" \
207+
--request-rate inf \
208+
--ignore-eos \
209+
--save-result \
210+
--percentile-metrics 'ttft,tpot,itl,e2el' \
211+
--result-dir "$result_dir" \
212+
--result-filename "$result_filename.json"
213+
set +x
214+
}

benchmarks/dsr1_fp4_b200_docker.sh

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,25 @@
11
#!/usr/bin/env bash
22

3+
# === Required Env Vars ===
4+
# MODEL
5+
# PORT
6+
# TP
7+
# CONC
8+
# ISL
9+
# OSL
10+
# RANDOM_RANGE_RATIO
11+
# RESULT_FILENAME
12+
# EP_SIZE
13+
# NUM_PROMPTS
14+
315
nvidia-smi
416

517
# To improve CI stability, we patch this helper function to prevent a race condition that
618
# happens 1% of the time. ref: https://github.com/flashinfer-ai/flashinfer/pull/1779
719
sed -i '102,108d' /usr/local/lib/python3.12/dist-packages/flashinfer/jit/cubin_loader.py
820

21+
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
22+
923
# Default: recv every ~10 requests; if CONC ≥ 16, relax to ~30 requests between scheduler recv polls.
1024
if [[ $CONC -ge 16 ]]; then
1125
SCHEDULER_RECV_INTERVAL=30
@@ -22,5 +36,27 @@ PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path $MODEL --host 0.
2236
--cuda-graph-max-bs 256 --max-running-requests 256 --mem-fraction-static 0.85 --kv-cache-dtype fp8_e4m3 \
2337
--chunked-prefill-size 16384 \
2438
--ep-size $EP_SIZE --quantization modelopt_fp4 --enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
25-
--enable-symm-mem --disable-radix-cache --attention-backend trtllm_mla --moe-runner-backend flashinfer_trtllm --stream-interval 10
39+
--enable-symm-mem --disable-radix-cache --attention-backend trtllm_mla --moe-runner-backend flashinfer_trtllm --stream-interval 10 > $SERVER_LOG 2>&1 &
40+
41+
SERVER_PID=$!
42+
43+
# Source benchmark utilities
44+
source "$(dirname "$0")/benchmark_lib.sh"
45+
46+
# Wait for server to be ready
47+
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"
48+
49+
pip install -q datasets pandas
50+
51+
run_benchmark_serving \
52+
--model "$MODEL" \
53+
--port "$PORT" \
54+
--backend vllm \
55+
--input-len "$ISL" \
56+
--output-len "$OSL" \
57+
--random-range-ratio "$RANDOM_RANGE_RATIO" \
58+
--num-prompts "$NUM_PROMPTS" \
59+
--max-concurrency "$CONC" \
60+
--result-filename "$RESULT_FILENAME" \
61+
--result-dir /workspace/
2662

benchmarks/dsr1_fp4_b200_trt_slurm.sh

Lines changed: 22 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,13 @@
11
#!/usr/bin/env bash
22

3-
# === Required Env Vars ===
4-
# HF_TOKEN
5-
# HF_HUB_CACHE
6-
# IMAGE
3+
# === Required Env Vars ===
74
# MODEL
5+
# TP
6+
# CONC
87
# ISL
98
# OSL
109
# MAX_MODEL_LEN
1110
# RANDOM_RANGE_RATIO
12-
# TP
13-
# CONC
1411
# RESULT_FILENAME
1512
# PORT_OFFSET
1613
# DP_ATTENTION
@@ -100,24 +97,22 @@ mpirun -n 1 --oversubscribe --allow-run-as-root \
10097
--extra_llm_api_options=$EXTRA_CONFIG_FILE \
10198
> $SERVER_LOG 2>&1 &
10299

103-
104-
set +x
105-
while IFS= read -r line; do
106-
printf '%s\n' "$line"
107-
if [[ "$line" == *"Application startup complete"* ]]; then
108-
break
109-
fi
110-
done < <(tail -F -n0 "$SERVER_LOG")
111-
112-
git clone https://github.com/kimbochen/bench_serving.git
113-
set -x
114-
python3 bench_serving/benchmark_serving.py \
115-
--model $MODEL --backend openai \
116-
--base-url http://0.0.0.0:$PORT \
117-
--dataset-name random \
118-
--random-input-len $ISL --random-output-len $OSL --random-range-ratio $RANDOM_RANGE_RATIO \
119-
--num-prompts $(( $CONC * 10 )) --max-concurrency $CONC \
120-
--request-rate inf --ignore-eos \
121-
--save-result --percentile-metrics 'ttft,tpot,itl,e2el' \
122-
--result-dir /workspace/ \
123-
--result-filename $RESULT_FILENAME.json
100+
SERVER_PID=$!
101+
102+
# Source benchmark utilities
103+
source "$(dirname "$0")/benchmark_lib.sh"
104+
105+
# Wait for server to be ready
106+
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"
107+
108+
run_benchmark_serving \
109+
--model "$MODEL" \
110+
--port "$PORT" \
111+
--backend openai \
112+
--input-len "$ISL" \
113+
--output-len "$OSL" \
114+
--random-range-ratio "$RANDOM_RANGE_RATIO" \
115+
--num-prompts $(( $CONC * 10 )) \
116+
--max-concurrency "$CONC" \
117+
--result-filename "$RESULT_FILENAME" \
118+
--result-dir /workspace/
Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
#!/usr/bin/env bash
22

3-
# ========= Required Env Vars =========
4-
# HF_TOKEN
5-
# HF_HUB_CACHE
3+
# === Required Env Vars ===
64
# MODEL
7-
# MAX_MODEL_LEN
8-
# RANDOM_RANGE_RATIO
5+
# PORT
96
# TP
107
# CONC
11-
# PORT
8+
# ISL
9+
# OSL
10+
# RANDOM_RANGE_RATIO
11+
# RESULT_FILENAME
12+
# NUM_PROMPTS
1213
export SGLANG_USE_AITER=1
1314

1415
PREFILL_SIZE=196608
@@ -18,6 +19,8 @@ if [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then
1819
fi
1920
fi
2021

22+
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
23+
2124
set -x
2225
python3 -m sglang.launch_server --model-path=$MODEL --trust-remote-code \
2326
--host=0.0.0.0 --port=$PORT \
@@ -27,5 +30,24 @@ python3 -m sglang.launch_server --model-path=$MODEL --trust-remote-code \
2730
--disable-radix-cache \
2831
--num-continuous-decode-steps=4 \
2932
--max-prefill-tokens=$PREFILL_SIZE \
30-
--cuda-graph-max-bs=128
33+
--cuda-graph-max-bs=128 > $SERVER_LOG 2>&1 &
34+
35+
SERVER_PID=$!
36+
37+
# Source benchmark utilities
38+
source "$(dirname "$0")/benchmark_lib.sh"
39+
40+
# Wait for server to be ready
41+
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"
3142

43+
run_benchmark_serving \
44+
--model "$MODEL" \
45+
--port "$PORT" \
46+
--backend vllm \
47+
--input-len "$ISL" \
48+
--output-len "$OSL" \
49+
--random-range-ratio "$RANDOM_RANGE_RATIO" \
50+
--num-prompts "$NUM_PROMPTS" \
51+
--max-concurrency "$CONC" \
52+
--result-filename "$RESULT_FILENAME" \
53+
--result-dir /workspace/

0 commit comments

Comments
 (0)