Change dsr1 fp8 image to lmsysorg/sglang 0.5.5.post3 and fp4 image to 0.5.5.post2 for AMD MI355 (#247)

rkarhila-amd · austenstone · jgangani · web-flow · commit 0fe9dccffb1e · 2025-12-04T18:02:41.000-06:00
* Change AMD MI355 docker image to lmsysorg/sglang:v0.5.5.post2-rocm700-mi35x for dsr1-fp8 * Adjust preview for dark mode and light mode (#250) * Adjust preview for dark mode and light mode picture element for better display based on color scheme. * Rounded * Update image alt text in README.md * adding ISL/OSL to collect results table summary (#249) Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com> * chore: refactor Docker runner launch to be like SLURM (#227) * initial poc * remove -d flag when launching docker container * syntax error * compatibility fixes * add correct endpoint prefix * remove reference env var * run vllm serve in background * unescape sequences * stop vllm to stdout after it stops * stop vllm to stdout after it stops pt 2 * get rid of docker stop as no longer in detatched * clone bench serving to tmp dir * clone bench serving to tmp dir pt 2 * add explanatory comment * cleaning up * cleaning up * adding mi355x refactor * adding h200 initial refactor * different way to see server logs * cleanup * now fail if server fails * starting on b200 * doign b200 * reverting erroneous change * fixing b200 * fixing b200 pt 2 * updating mi300 * updating mi300 pt 2 * updating mi300 pt 3 -- remove detached mode * cleaning up mi355x * fixing mi300x and updating 325x * reverting max conc to 512 on gptoss fp4 b200 docker * fixing mi300x and updating 325x * cleanng up * add wait for h200 slurm dsr1 * max num seqs back to 512 for gptoss fpr b200 docker * fix port issue for dsr1 mi300x docker * fix mi355x docker NUM_PROMPTS * adding prop of failure for server logs * add utils function for benchmark * add utils function for benchmark * function-ize the waiting for server to start * dont show arg parsing set -x * dont show arg parsing set +x oops * dont show arg parsing set +x oops * capture server pid * nebdius dont scancel * changes to comments in benchmark lib . sh * Update benchmarks/dsr1_fp4_mi355x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update .github/workflows/benchmark-tmpl.yml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * adding back whitespace * adding back whitespace * adding back whitespace * remove tg launch script * Update benchmarks/gptoss_fp4_h100_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/dsr1_fp8_mi325x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/dsr1_fp8_mi355x_docker.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/gptoss_fp4_b200_trt_slurm.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Audit and correct required environment variables documentation in all benchmark scripts (#252) * Initial plan * Update required env vars documentation in all benchmark scripts Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * Fix required env vars - remove NF, PREFILL_SIZE, and correct PORT/PORT_OFFSET Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * Remove internally-calculated vars from required env vars (EXTRA_CONFIG_FILE, MAX_NUM_TOKENS, MOE_BACKEND) Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com> * removing oci node rebase with main --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> * Bump actions/checkout from 5.0.0 to 6.0.0 in the github-actions group (#253) Bumps the github-actions group with 1 update: [actions/checkout](https://github.com/actions/checkout). Updates `actions/checkout` from 5.0.0 to 6.0.0 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@08c6903...1af3b93) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add b200 DGXC node to b200 runners list (#245) * Bump actions/setup-python in the github-actions group (#259) Bumps the github-actions group with 1 update: [actions/setup-python](https://github.com/actions/setup-python). Updates `actions/setup-python` from 6.0.0 to 6.1.0 - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@e797f83...83679a8) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: 6.1.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: refresh GB200 SGLang DSR1 submission (#257) * Bumps DSR1 SGLang code * update how we get the resulting log files --------- Co-authored-by: Elnifio <elnifio0519@gmail.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com> * Update GPTOSS B200 TRTLLM (#266) * Update GPTOSS B200 AGG * set dp attention env vars * Add DP attn comment --------- Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com> * Fixed community container for MI35x dsr1 for fp8 for real * Update dsr1_fp4_mi355x_docker.sh with env flags * Update dsr1_fp4_mi355x_slurm.sh * from post2 to post3 for fp8 * tidy formatting --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Austen Stone <austenstone@github.com> Co-authored-by: Jatin Gangani <5560074+jgangani@users.noreply.github.com> Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ankur Singh <ankusingh@nvidia.com> Co-authored-by: yunzhoul-nv <yunzhoul@nvidia.com> Co-authored-by: Elnifio <elnifio0519@gmail.com> Co-authored-by: ppalanga <ppalanga@amd.com>
diff --git a/.github/configs/amd-master.yaml b/.github/configs/amd-master.yaml
@@ -1,5 +1,5 @@
 dsr1-fp4-mi355x-sglang:
-  image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915
+  image: lmsysorg/sglang:v0.5.5.post2-rocm700-mi35x
   model: amd/DeepSeek-R1-0528-MXFP4-Preview
   model-prefix: dsr1
   runner: mi355x
@@ -63,7 +63,7 @@ dsr1-fp8-mi325x-sglang:
     - { tp: 8, conc-start: 4, conc-end: 64 }
 
 dsr1-fp8-mi355x-sglang:
-  image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915
+  image: lmsysorg/sglang:v0.5.5.post3-rocm700-mi35x
   model: deepseek-ai/DeepSeek-R1-0528
   model-prefix: dsr1
   runner: mi355x
diff --git a/benchmarks/dsr1_fp4_mi355x_docker.sh b/benchmarks/dsr1_fp4_mi355x_docker.sh
@@ -11,6 +11,7 @@
 # RESULT_FILENAME
 # NUM_PROMPTS
 export SGLANG_USE_AITER=1
+export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
 
 PREFILL_SIZE=196608
 if [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then
diff --git a/benchmarks/dsr1_fp4_mi355x_slurm.sh b/benchmarks/dsr1_fp4_mi355x_slurm.sh
@@ -10,6 +10,7 @@
 # RANDOM_RANGE_RATIO
 # RESULT_FILENAME
 export SGLANG_USE_AITER=1
+export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
 SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
 
 PREFILL_SIZE=196608
diff --git a/benchmarks/dsr1_fp8_mi355x_docker.sh b/benchmarks/dsr1_fp8_mi355x_docker.sh
@@ -14,10 +14,14 @@
 # https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/inference-sglang-deepseek-r1-fp8.html
 
 export SGLANG_USE_AITER=1
+export RCCL_MSCCL_ENABLE=0
+export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
+
 
 SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
 
 python3 -m sglang.launch_server \
+    --attention-backend aiter \
     --model-path $MODEL \
     --host=0.0.0.0 \
     --port $PORT \
@@ -27,6 +31,7 @@ python3 -m sglang.launch_server \
     --mem-fraction-static 0.8 --disable-radix-cache \
     --num-continuous-decode-steps 4 \
     --max-prefill-tokens 196608 \
+    --enable-torch-compile \
     --cuda-graph-max-bs 128 > $SERVER_LOG 2>&1 &
 
 SERVER_PID=$!
diff --git a/benchmarks/dsr1_fp8_mi355x_slurm.sh b/benchmarks/dsr1_fp8_mi355x_slurm.sh
@@ -12,11 +12,14 @@
 
 export HF_MODULES_CACHE="/tmp/hf_modules_cache/"
 export SGLANG_USE_AITER=1
+export RCCL_MSCCL_ENABLE=0
+export ROCM_QUICK_REDUCE_QUANTIZATION=INT4
 
 SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
 
 set -x
 python3 -m sglang.launch_server \
+    --attention-backend aiter \
     --model-path $MODEL \
     --host=0.0.0.0 \
     --port $PORT \
@@ -27,7 +30,8 @@ python3 -m sglang.launch_server \
     --disable-radix-cache \
     --num-continuous-decode-steps 4 \
     --max-prefill-tokens 196608 \
-    --cuda-graph-max-bs 128 > $SERVER_LOG 2>&1 &
+    --cuda-graph-max-bs 128 \
+    --enable-torch-compile > $SERVER_LOG 2>&1 &
 
 SERVER_PID=$!