Skip to content

Commit c23a59f

Browse files
authored
chore: add new mi325x SLURM nodes (#224)
* add new mi325x scripts and configs * adding ability to filter runner node on runner-model-sweep in god file * use hf instead of huggingface-cli; add debug info * no cpus per task * 256 cpus per task * revert erroneous change * get rid of debugging * whitespace
1 parent 1fae268 commit c23a59f

6 files changed

Lines changed: 8 additions & 5 deletions

File tree

.github/configs/runners.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ mi300x:
3434
- 'mi300x-oci_0'
3535
mi325x:
3636
- 'mi325x-amd_0'
37+
- 'mi325x-amd_1'
38+
- 'mi325x-amd_2'
39+
- 'mi325x-amd_3'
3740
- 'mi325x-tw_0'
3841
- 'mi325x-tw_1'
3942
- 'mi325x-tw_2'

benchmarks/dsr1_fp8_h200_slurm.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
echo "JOB \$SLURM_JOB_ID running on \$SLURMD_NODENAME"
1818

1919
pip3 install --user sentencepiece
20-
huggingface-cli download $MODEL
20+
hf download $MODEL
2121
PORT=$(( 8888 + $PORT_OFFSET ))
2222
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
2323

benchmarks/dsr1_fp8_mi300x_slurm.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
1717

18-
huggingface-cli download $MODEL
18+
hf download $MODEL
1919

2020
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
2121
PORT=8888

benchmarks/dsr1_fp8_mi325x_slurm.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
44

55
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
66
PORT=8888
7-
huggingface-cli download $MODEL
7+
hf download $MODEL
88

99
# Reference
1010
# https://rocm.docs.amd.com/en/docs-7.0-rc1/preview/benchmark-docker/inference-sglang-deepseek-r1-fp8.html#run-the-inference-benchmark

benchmarks/gptoss_fp4_mi300x_slurm.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
1717

18-
huggingface-cli download $MODEL
18+
hf download $MODEL
1919

2020
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
2121
PORT=8888

benchmarks/gptoss_fp4_mi325x_slurm.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
1818

19-
huggingface-cli download $MODEL
19+
hf download $MODEL
2020

2121
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
2222
PORT=8888

0 commit comments

Comments
 (0)