Skip to content

Commit 0fe9dcc

Browse files
rkarhila-amdaustenstonejganganiJatin Ganganicquil11
authored
Change dsr1 fp8 image to lmsysorg/sglang 0.5.5.post3 and fp4 image to 0.5.5.post2 for AMD MI355 (#247)
* Change AMD MI355 docker image to lmsysorg/sglang:v0.5.5.post2-rocm700-mi35x for dsr1-fp8 * Adjust preview for dark mode and light mode (#250) * Adjust preview for dark mode and light mode picture element for better display based on color scheme. * Rounded * Update image alt text in README.md * adding ISL/OSL to collect results table summary (#249) Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com> * chore: refactor Docker runner launch to be like SLURM (#227) * initial poc * remove -d flag when launching docker container * syntax error * compatibility fixes * add correct endpoint prefix * remove reference env var * run vllm serve in background * unescape sequences * stop vllm to stdout after it stops * stop vllm to stdout after it stops pt 2 * get rid of docker stop as no longer in detatched * clone bench serving to tmp dir * clone bench serving to tmp dir pt 2 * add explanatory comment * cleaning up * cleaning up * adding mi355x refactor * adding h200 initial refactor * different way to see server logs * cleanup * now fail if server fails * starting on b200 * doign b200 * reverting erroneous change * fixing b200 * fixing b200 pt 2 * updating mi300 * updating mi300 pt 2 * updating mi300 pt 3 -- remove detached mode * cleaning up mi355x * fixing mi300x and updating 325x * reverting max conc to 512 on gptoss fp4 b200 docker * fixing mi300x and updating 325x * cleanng up * add wait for h200 slurm dsr1 * max num seqs back to 512 for gptoss fpr b200 docker * fix port issue for dsr1 mi300x docker * fix mi355x docker NUM_PROMPTS * adding prop of failure for server logs * add utils function for benchmark * add utils function for benchmark * function-ize the waiting for server to start * dont show arg parsing set -x * dont show arg parsing set +x oops * dont show arg parsing set +x oops * capture server pid * nebdius dont scancel * changes to comments in benchmark lib . sh * Update benchmarks/dsr1_fp4_mi355x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update .github/workflows/benchmark-tmpl.yml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * adding back whitespace * adding back whitespace * adding back whitespace * remove tg launch script * Update benchmarks/gptoss_fp4_h100_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/dsr1_fp8_mi325x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/dsr1_fp8_mi355x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/gptoss_fp4_b200_trt_slurm.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Audit and correct required environment variables documentation in all benchmark scripts (#252) * Initial plan * Update required env vars documentation in all benchmark scripts Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * Fix required env vars - remove NF, PREFILL_SIZE, and correct PORT/PORT_OFFSET Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * Remove internally-calculated vars from required env vars (EXTRA_CONFIG_FILE, MAX_NUM_TOKENS, MOE_BACKEND) Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * removing oci node rebase with main --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> * Bump actions/checkout from 5.0.0 to 6.0.0 in the github-actions group (#253) Bumps the github-actions group with 1 update: [actions/checkout](https://github.com/actions/checkout). Updates `actions/checkout` from 5.0.0 to 6.0.0 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@08c6903...1af3b93) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add b200 DGXC node to b200 runners list (#245) * Bump actions/setup-python in the github-actions group (#259) Bumps the github-actions group with 1 update: [actions/setup-python](https://github.com/actions/setup-python). Updates `actions/setup-python` from 6.0.0 to 6.1.0 - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@e797f83...83679a8) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: 6.1.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: refresh GB200 SGLang DSR1 submission (#257) * Bumps DSR1 SGLang code * update how we get the resulting log files --------- Co-authored-by: Elnifio <elnifio0519@gmail.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com> * Update GPTOSS B200 TRTLLM (#266) * Update GPTOSS B200 AGG * set dp attention env vars * Add DP attn comment --------- Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com> * Fixed community container for MI35x dsr1 for fp8 for real * Update dsr1_fp4_mi355x_docker.sh with env flags * Update dsr1_fp4_mi355x_slurm.sh * from post2 to post3 for fp8 * tidy formatting --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Austen Stone <austenstone@github.com> Co-authored-by: Jatin Gangani <5560074+jgangani@users.noreply.github.com> Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ankur Singh <ankusingh@nvidia.com> Co-authored-by: yunzhoul-nv <yunzhoul@nvidia.com> Co-authored-by: Elnifio <elnifio0519@gmail.com> Co-authored-by: ppalanga <ppalanga@amd.com>
1 parent 7c33829 commit 0fe9dcc

5 files changed

Lines changed: 14 additions & 3 deletions

File tree

.github/configs/amd-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
dsr1-fp4-mi355x-sglang:
2-
image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915
2+
image: lmsysorg/sglang:v0.5.5.post2-rocm700-mi35x
33
model: amd/DeepSeek-R1-0528-MXFP4-Preview
44
model-prefix: dsr1
55
runner: mi355x
@@ -63,7 +63,7 @@ dsr1-fp8-mi325x-sglang:
6363
- { tp: 8, conc-start: 4, conc-end: 64 }
6464

6565
dsr1-fp8-mi355x-sglang:
66-
image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915
66+
image: lmsysorg/sglang:v0.5.5.post3-rocm700-mi35x
6767
model: deepseek-ai/DeepSeek-R1-0528
6868
model-prefix: dsr1
6969
runner: mi355x

benchmarks/dsr1_fp4_mi355x_docker.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
# RESULT_FILENAME
1212
# NUM_PROMPTS
1313
export SGLANG_USE_AITER=1
14+
export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
1415

1516
PREFILL_SIZE=196608
1617
if [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then

benchmarks/dsr1_fp4_mi355x_slurm.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
# RANDOM_RANGE_RATIO
1111
# RESULT_FILENAME
1212
export SGLANG_USE_AITER=1
13+
export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
1314
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
1415

1516
PREFILL_SIZE=196608

benchmarks/dsr1_fp8_mi355x_docker.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,14 @@
1414
# https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/inference-sglang-deepseek-r1-fp8.html
1515

1616
export SGLANG_USE_AITER=1
17+
export RCCL_MSCCL_ENABLE=0
18+
export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
19+
1720

1821
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
1922

2023
python3 -m sglang.launch_server \
24+
--attention-backend aiter \
2125
--model-path $MODEL \
2226
--host=0.0.0.0 \
2327
--port $PORT \
@@ -27,6 +31,7 @@ python3 -m sglang.launch_server \
2731
--mem-fraction-static 0.8 --disable-radix-cache \
2832
--num-continuous-decode-steps 4 \
2933
--max-prefill-tokens 196608 \
34+
--enable-torch-compile \
3035
--cuda-graph-max-bs 128 > $SERVER_LOG 2>&1 &
3136

3237
SERVER_PID=$!

benchmarks/dsr1_fp8_mi355x_slurm.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,14 @@
1212

1313
export HF_MODULES_CACHE="/tmp/hf_modules_cache/"
1414
export SGLANG_USE_AITER=1
15+
export RCCL_MSCCL_ENABLE=0
16+
export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
1517

1618
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
1719

1820
set -x
1921
python3 -m sglang.launch_server \
22+
--attention-backend aiter \
2023
--model-path $MODEL \
2124
--host=0.0.0.0 \
2225
--port $PORT \
@@ -27,7 +30,8 @@ python3 -m sglang.launch_server \
2730
--disable-radix-cache \
2831
--num-continuous-decode-steps 4 \
2932
--max-prefill-tokens 196608 \
30-
--cuda-graph-max-bs 128 > $SERVER_LOG 2>&1 &
33+
--cuda-graph-max-bs 128 \
34+
--enable-torch-compile > $SERVER_LOG 2>&1 &
3135

3236
SERVER_PID=$!
3337

0 commit comments

Comments
 (0)