Skip to content

Commit acb3ad0

Browse files
Bihan  RanaBihan  Rana
authored andcommitted
Add rccl test
1 parent 2e3da2c commit acb3ad0

6 files changed

Lines changed: 201 additions & 0 deletions

File tree

docs/examples.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,20 @@ hide:
160160
</a>
161161
</div>
162162

163+
## Cluster
164+
<div class="tx-landing__highlights_grid">
165+
<a href="/examples/cluster/rccl-tests"
166+
class="feature-cell sky">
167+
<h3>
168+
RCCL tests
169+
</h3>
170+
171+
<p>
172+
Run multi-node RCCL tests with MPI
173+
</p>
174+
</a>
175+
</div>
176+
163177
## Misc
164178

165179
<div class="tx-landing__highlights_grid">

docs/examples/cluster/rccl-tests/index.md

Whitespace-only changes.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
type: task
2+
name: rccl-tests
3+
nodes: 2
4+
5+
image: rocm/dev-ubuntu-22.04:6.4-complete
6+
env:
7+
- NCCL_DEBUG=INFO
8+
- MPI_HOME=/usr/lib/x86_64-linux-gnu/openmpi
9+
commands:
10+
# Setup MPI
11+
- apt-get install -y git libopenmpi-dev openmpi-bin
12+
# Build RCCL Tests
13+
- git clone https://github.com/ROCm/rccl-tests.git
14+
- cd rccl-tests
15+
- make MPI=1 MPI_HOME=$MPI_HOME
16+
- |
17+
FIFO=/tmp/dstack_job
18+
if [ ${DSTACK_NODE_RANK} -eq 0 ]; then
19+
sleep 10
20+
echo "$DSTACK_NODES_IPS" | tr ' ' '\n' > hostfile
21+
MPIRUN='mpirun --allow-run-as-root --hostfile hostfile'
22+
# Wait for other nodes
23+
while true; do
24+
if ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 true >/dev/null 2>&1; then
25+
break
26+
fi
27+
echo 'Waiting for nodes...'
28+
sleep 5
29+
done
30+
# Run NCCL Tests
31+
${MPIRUN} \
32+
-n ${DSTACK_GPUS_NUM} -N ${DSTACK_GPUS_PER_NODE} \
33+
--mca btl_tcp_if_include ens41np0 \
34+
-x LD_PRELOAD=/workflow/libibverbs/libbnxt_re-rdmav34.so \
35+
-x NCCL_IB_HCA=mlx5_0/1,bnxt_re0,bnxt_re1,bnxt_re2,bnxt_re3,bnxt_re4,bnxt_re5,bnxt_re6,bnxt_re7 \
36+
-x NCCL_IB_GID_INDEX=3 \
37+
-x NCCL_IB_DISABLE=0 \
38+
./build/all_reduce_perf -b 8M -e 8G -f 2 -g 1 -w 5 --iters 20 -c 0;
39+
# Notify nodes the job is done
40+
${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 sh -c "echo done > ${FIFO}"
41+
else
42+
mkfifo ${FIFO}
43+
# Wait for a message from the first node
44+
cat ${FIFO}
45+
fi
46+
resources:
47+
gpu: mi300x:8
48+
49+
# Mount Broadcom driver compatible libibverbs binary
50+
volumes:
51+
- /usr/local/lib/libbnxt_re-rdmav34.so:/workflow/libibverbs/libbnxt_re-rdmav34.so
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# RCCL tests
2+
3+
This example shows how to run distributed [RCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rccl-tests){:target="_blank"} with MPI using `dstack`.
4+
5+
## Running as a task
6+
7+
Here's an example of a task that runs AllReduce test on 2 nodes, each with 8 `Mi300x` GPUs (16 processes in total).
8+
9+
<div editor-title="examples/cluster/rccl-tests/.dstack.yml">
10+
11+
```yaml
12+
type: task
13+
name: rccl-tests
14+
nodes: 2
15+
16+
image: rocm/dev-ubuntu-22.04:6.4-complete
17+
env:
18+
- NCCL_DEBUG=INFO
19+
- MPI_HOME=/usr/lib/x86_64-linux-gnu/openmpi
20+
commands:
21+
# Setup MPI
22+
- apt-get install -y git libopenmpi-dev openmpi-bin
23+
# Build RCCL Tests
24+
- git clone https://github.com/ROCm/rccl-tests.git
25+
- cd rccl-tests
26+
- make MPI=1 MPI_HOME=$MPI_HOME
27+
- |
28+
FIFO=/tmp/dstack_job
29+
if [ ${DSTACK_NODE_RANK} -eq 0 ]; then
30+
sleep 10
31+
echo "$DSTACK_NODES_IPS" | tr ' ' '\n' > hostfile
32+
MPIRUN='mpirun --allow-run-as-root --hostfile hostfile'
33+
# Wait for other nodes
34+
while true; do
35+
if ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 true >/dev/null 2>&1; then
36+
break
37+
fi
38+
echo 'Waiting for nodes...'
39+
sleep 5
40+
done
41+
# Run NCCL Tests
42+
${MPIRUN} \
43+
-n ${DSTACK_GPUS_NUM} -N ${DSTACK_GPUS_PER_NODE} \
44+
--mca btl_tcp_if_include ens41np0 \
45+
-x LD_PRELOAD=/workflow/libibverbs/libbnxt_re-rdmav34.so \
46+
-x NCCL_IB_HCA=mlx5_0/1,bnxt_re0,bnxt_re1,bnxt_re2,bnxt_re3,bnxt_re4,bnxt_re5,bnxt_re6,bnxt_re7 \
47+
-x NCCL_IB_GID_INDEX=3 \
48+
-x NCCL_IB_DISABLE=0 \
49+
./build/all_reduce_perf -b 8M -e 8G -f 2 -g 1 -w 5 --iters 20 -c 0;
50+
# Notify nodes the job is done
51+
${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 sh -c "echo done > ${FIFO}"
52+
else
53+
mkfifo ${FIFO}
54+
# Wait for a message from the first node
55+
cat ${FIFO}
56+
fi
57+
resources:
58+
gpu: mi300x:8
59+
60+
# Mount Broadcom driver compatible libibverbs binary
61+
volumes:
62+
- /usr/local/lib/libbnxt_re-rdmav34.so:/workflow/libibverbs/libbnxt_re-rdmav34.so
63+
```
64+
65+
</div>
66+
67+
The script orchestrates distributed execution across multiple nodes using MPI. The master node (identified by
68+
`DSTACK_NODE_RANK=0`) generates `hostfile` listing all node IPs and continuously checks until all worker nodes are
69+
accessible via MPI. Once confirmed, it executes the `/rccl-tests/build/all_reduce_perf` benchmark script across all available GPUs.
70+
71+
Worker nodes use a FIFO pipe to block execution until they receive a termination signal from the master
72+
node. This ensures worker nodes remain active during the test and only exit once the master node completes the
73+
benchmark.
74+
75+
> The `rocm/dev-ubuntu-22.04:6.4-complete` image used in the example comes with ROCm 6.4 and RCCL.
76+
> Broadcom’s kernel driver `bnxt_re` should match the corresponding RoCE userspace library `libbnxt_re`.
77+
> To ensure this, we mount the host’s `libbnxt_re-rdmav34.so` into the container and preload it using `LD_PRELOAD`.
78+
> This guarantees that the container uses the exact library version bundled with the host driver. Without this, an ABI mismatch occurs.
79+
80+
### Creating a fleet
81+
Define an SSH fleet configuration by listing the IP addresses of each node in the cluster, along with the SSH user and SSH key configured for each host.
82+
83+
```yaml
84+
type: fleet
85+
# The name is optional, if not specified, generated randomly
86+
name: mi300x-fleet
87+
88+
# SSH credentials for the on-prem servers
89+
ssh_config:
90+
user: root
91+
identity_file: ~/.ssh/id_rsa
92+
hosts:
93+
- 144.202.58.28
94+
- 137.220.58.52
95+
```
96+
97+
### Apply a configuration
98+
99+
To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply/) command.
100+
101+
<div class="termy">
102+
103+
```shell
104+
$ dstack apply -f examples/cluster/rccl-tests/.dstack.yml
105+
106+
# BACKEND RESOURCES INSTANCE TYPE PRICE
107+
1 ssh (remote) cpu=256 mem=2268GB disk=752GB MI300X:192GB:8 instance $0 idle
108+
2 ssh (remote) cpu=256 mem=2268GB disk=752GB MI300X:192GB:8 instance $0 idle
109+
110+
Submit the run rccl-tests? [y/n]: y
111+
```
112+
113+
</div>
114+
115+
## Source code
116+
117+
The source-code of this example can be found in
118+
[`examples/cluster/rccl-tests` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/cluster/rccl-tests).
119+
120+
## What's next?
121+
122+
1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
123+
[services](https://dstack.ai/docs/services), and [fleets](https://dstack.ai/docs/concepts/fleets).
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
type: fleet
2+
# The name is optional, if not specified, generated randomly
3+
name: mi300x-fleet
4+
5+
# SSH credentials for the on-prem servers
6+
ssh_config:
7+
user: root
8+
identity_file: ~/.ssh/id_rsa
9+
hosts:
10+
- 144.202.58.28
11+
- 137.220.58.52

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,8 @@ nav:
259259
- TPU: examples/accelerators/tpu/index.md
260260
- Intel Gaudi: examples/accelerators/intel/index.md
261261
- Tenstorrent: examples/accelerators/tenstorrent/index.md
262+
- Cluster:
263+
- RCCL Tests: examples/cluster/rccl-tests/index.md
262264
- Misc:
263265
- Docker Compose: examples/misc/docker-compose/index.md
264266
- NCCL Tests: examples/misc/nccl-tests/index.md

0 commit comments

Comments
 (0)