Skip to content

Commit 575776b

Browse files
[Examples] Update nccl-tests (#2415)
Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>
1 parent bbae827 commit 575776b

File tree

2 files changed

+18
-12
lines changed

2 files changed

+18
-12
lines changed

examples/misc/nccl-tests/.dstack.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
type: task
22
name: nccl-tests
33

4-
image: un1def/aws-efa-test
54
nodes: 2
65

6+
image: dstackai/efa
77
env:
88
- NCCL_DEBUG=INFO
9-
109
commands:
1110
- |
1211
# We use FIFO for inter-node communication
@@ -25,7 +24,7 @@ commands:
2524
done
2625
# Run NCCL Tests
2726
${MPIRUN} \
28-
-n $((DSTACK_NODES_NUM * DSTACK_GPUS_PER_NODE)) -N ${DSTACK_GPUS_PER_NODE} \
27+
-n ${DSTACK_GPUS_NUM} -N ${DSTACK_GPUS_PER_NODE} \
2928
--mca btl_tcp_if_exclude lo,docker0 \
3029
--bind-to none \
3130
./all_reduce_perf -b 8 -e 8G -f 2 -g 1

examples/misc/nccl-tests/README.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,21 @@
22

33
This example shows how to run distributed [NCCL Tests :material-arrow-top-right-thin:{ .external }](https://github.com/NVIDIA/nccl-tests){:target="_blank"} with MPI using `dstack`.
44

5-
??? info "AWS EFA"
6-
The used image is optimized for AWS [EFA :material-arrow-top-right-thin:{ .external }](https://aws.amazon.com/hpc/efa/){:target="_blank"} but works with regular TCP/IP network adapters as well.
5+
## Running as a task
76

8-
## Configuration
9-
10-
This configuration runs AllReduce test on 2 nodes with 4 GPUs each (8 processes total), but you can adjust both `nodes` and `resources.gpu` without modifying the script.
7+
Here's an example of a task that runs AllReduce test on 2 nodes, each with 4 GPUs (8 processes in total).
118

129
<div editor-title="examples/misc/nccl-tests/.dstack.yml">
1310

1411
```yaml
1512
type: task
1613
name: nccl-tests
1714

18-
image: un1def/aws-efa-test
1915
nodes: 2
2016

17+
image: dstackai/efa
2118
env:
2219
- NCCL_DEBUG=INFO
23-
2420
commands:
2521
- |
2622
# We use FIFO for inter-node communication
@@ -39,7 +35,7 @@ commands:
3935
done
4036
# Run NCCL Tests
4137
${MPIRUN} \
42-
-n $((DSTACK_NODES_NUM * DSTACK_GPUS_PER_NODE)) -N ${DSTACK_GPUS_PER_NODE} \
38+
-n ${DSTACK_GPUS_NUM} -N ${DSTACK_GPUS_PER_NODE} \
4339
--mca btl_tcp_if_exclude lo,docker0 \
4440
--bind-to none \
4541
./all_reduce_perf -b 8 -e 8G -f 2 -g 1
@@ -59,7 +55,18 @@ resources:
5955

6056
</div>
6157

62-
### Running a configuration
58+
The script orchestrates distributed execution across multiple nodes using MPI. The master node (identified by
59+
`DSTACK_NODE_RANK=0`) generates a hostfile listing all node IPs and continuously checks until all worker nodes are
60+
accessible via MPI. Once confirmed, it executes the `all_reduce_perf` benchmark across all available GPUs.
61+
62+
Worker nodes use a FIFO pipe to block execution until they receive a termination signal from the master
63+
node. This ensures worker nodes remain active during the test and only exit once the master node completes the
64+
benchmark.
65+
66+
> The `dstackai/efa` image is optimized for [AWS EFA :material-arrow-top-right-thin:{ .external }](https://aws.amazon.com/hpc/efa/){:target="_blank"}
67+
> but also works with regular TCP/IP network adapters as well as InfiniBand.
68+
69+
### Apply a configuration
6370

6471
To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply/) command.
6572

0 commit comments

Comments
 (0)