Skip to content

Commit 96f1598

Browse files
seungrokjclaudefunctionstackx
authored
[AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark (#1628)
* feat(atom/dsr1-fp8-mi355x-mtp): update ATOM image to nightly_202605301523 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update perf-changelog pr-link to #1628 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update dsr1-fp8-mi355x-atom-mtp: stable image, wider concurrency, DP-attn support - Switch image from nightly_202605301523 to stable atom0.1.3 - Expand concurrency search space from 4-256 to 4-1024 - Refactor benchmark script to use PARALLEL_ARGS/SPEC_ARGS pattern - Remove ISL/OSL-based max-model-len calculation - Update perf-changelog image reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * dsr1_fp8_mi355x_atom_mtp.sh: remove trailing whitespace Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * perf-changelog: fix unclosed quote in dsr1-fp8-mi355x-atom-mtp entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [AMD] fix dsr1-fp8-mi355x-atom-mtp: reduce concurrency limits and disable prefix caching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
1 parent 0fc4f86 commit 96f1598

3 files changed

Lines changed: 25 additions & 18 deletions

File tree

.github/configs/amd-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1197,7 +1197,7 @@ dsr1-fp8-mi355x-atom:
11971197
- { tp: 8, conc-start: 4, conc-end: 128 }
11981198

11991199
dsr1-fp8-mi355x-atom-mtp:
1200-
image: rocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511
1200+
image: rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3
12011201
model: deepseek-ai/DeepSeek-R1-0528
12021202
model-prefix: dsr1
12031203
runner: mi355x
@@ -1209,7 +1209,7 @@ dsr1-fp8-mi355x-atom-mtp:
12091209
- isl: 1024
12101210
osl: 1024
12111211
search-space:
1212-
- { tp: 8, conc-start: 4, conc-end: 256, spec-decoding: mtp }
1212+
- { tp: 8, conc-start: 4, conc-end: 512, spec-decoding: mtp }
12131213
- isl: 8192
12141214
osl: 1024
12151215
search-space:

benchmarks/single_node/fixed_seq_len/dsr1_fp8_mi355x_atom_mtp.sh

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,22 @@ SERVER_LOG=/workspace/server.log
2323

2424
export OMP_NUM_THREADS=1
2525

26-
# Calculate max-model-len based on ISL and OSL
27-
if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
28-
CALCULATED_MAX_MODEL_LEN=""
29-
else
30-
CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "
31-
fi
32-
26+
CALCULATED_MAX_MODEL_LEN=""
3327
if [ "${EVAL_ONLY}" = "true" ]; then
3428
setup_eval_context
3529
CALCULATED_MAX_MODEL_LEN=" --max-model-len $EVAL_MAX_MODEL_LEN "
3630
fi
3731

38-
if [ "$EP_SIZE" -gt 1 ]; then
39-
EP=" --enable-expert-parallel"
40-
else
41-
EP=" "
42-
fi
32+
PARALLEL_ARGS=(-tp "$TP") #TP
33+
if [ "$DP_ATTENTION" = "true" ]; then
34+
if [ "$EP_SIZE" -gt 1 ]; then #DP+EP
35+
PARALLEL_ARGS=(-tp "$TP" --enable-expert-parallel --enable-dp-attention )
36+
else #DP+TP
37+
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
38+
fi
39+
fi
40+
41+
SPEC_ARGS=(--method mtp --num-speculative-tokens 3 )
4342

4443
# Start GPU monitoring (power, temperature, clocks every second)
4544
start_gpu_monitor
@@ -49,10 +48,10 @@ set -x
4948
python3 -m atom.entrypoints.openai_server \
5049
--model $MODEL \
5150
--server-port $PORT \
52-
-tp $TP \
53-
--kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN $EP \
54-
--method mtp \
55-
--num-speculative-tokens 3 \
51+
"${PARALLEL_ARGS[@]}" \
52+
"${SPEC_ARGS[@]}" \
53+
--kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN \
54+
--no-enable_prefix_caching \
5655
> $SERVER_LOG 2>&1 &
5756

5857
SERVER_PID=$!

perf-changelog.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3360,6 +3360,14 @@
33603360
- "Update vLLM ROCm image from nightly-4f940896a32c9e2a0eba7f50d521bf5f6b4de458 to v0.22.0"
33613361
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1624
33623362

3363+
- config-keys:
3364+
- dsr1-fp8-mi355x-atom-mtp
3365+
description:
3366+
- "Update ATOM image to rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3"
3367+
- "isl=1024/osl=1024: +47% to +116% improvement across conc 4-256 vs prior InferenceX numbers"
3368+
- "isl=8192/osl=1024: +47% to +131% improvement across conc 4-256"
3369+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1628
3370+
33633371
- config-keys:
33643372
- kimik2.5-fp4-mi355x-vllm
33653373
description:

0 commit comments

Comments
 (0)