Skip to content

Commit bad82cf

Browse files
author
Richard Top
committed
Added Cray tests
1 parent 001fb1c commit bad82cf

1 file changed

Lines changed: 78 additions & 46 deletions

File tree

docs/blog/posts/2025/09/eessi-cray-slingshot11.md

Lines changed: 78 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -72,60 +72,19 @@ location to be automatically picked up by the software shipped with EESSI. This
7272
/cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib
7373
```
7474

75-
**Validating the `libmpi.so.40` in `host_injections` from OpenMPI/5.0.7 on ARM nodes built with:**
75+
**OpenMPI/5.0.7 on ARM nodes built with:**
7676
```
7777
./configure --prefix=/cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0 --with-cuda=${EBROOTCUDA} --with-cuda-libdir=${EBROOTCUDA}/lib64 --with-slurm --enable-mpi-ext=cuda --with-libfabric=${EBROOTLIBFABRIC} --with-ucx=${EBROOTUCX} --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0 --with-libevent=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0 --with-pmix=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0 --with-ucc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0 --with-prrte=internal
7878
```
79-
```
80-
ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40
81-
82-
linux-vdso.so.1 (0x0000fffcfd1d0000)
83-
libucc.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0/lib64/libucc.so.1 (0x0000fffcfce50000)
84-
libucs.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCX/1.14.1-GCCcore-12.3.0/lib64/libucs.so.0 (0x0000fffcfcde0000)
85-
libnuma.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/numactl/2.0.16-GCCcore-12.3.0/lib64/libnuma.so.1 (0x0000fffcfcdb0000)
86-
libucm.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCX/1.14.1-GCCcore-12.3.0/lib64/libucm.so.0 (0x0000fffcfcd70000)
87-
libopen-pal.so.80 => /cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0/lib/libopen-pal.so.80 (0x0000fffcfcc40000)
88-
libfabric.so.1 => /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libfabric.so.1 (0x0000fffcfca50000)
89-
librdmacm.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/librdmacm.so.1 (0x0000fffcfca10000)
90-
libefa.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libefa.so.1 (0x0000fffcfc9e0000)
91-
libibverbs.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libibverbs.so.1 (0x0000fffcfc9a0000)
92-
libcxi.so.1 => /cluster/installations/eessi/default/aarch64/software/shs-libcxi/1.7.0-GCCcore-12.3.0/lib64/libcxi.so.1 (0x0000fffcfc960000)
93-
libcurl.so.4 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libcurl.so.4 (0x0000fffcfc8a0000)
94-
libjson-c.so.5 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/json-c/0.16-GCCcore-12.3.0/lib64/libjson-c.so.5 (0x0000fffcfc870000)
95-
libatomic.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib64/libatomic.so.1 (0x0000fffcfc840000)
96-
libcudart.so.12 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software/CUDA/12.1.1/lib64/libcudart.so.12 (0x0000fffcfc780000)
97-
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x0000fffcf97d0000)
98-
libnvidia-ml.so.1 => /usr/lib64/libnvidia-ml.so.1 (0x0000fffcf8980000)
99-
libnl-route-3.so.200 => /cluster/installations/eessi/default/aarch64/software/libnl/3.11.0-GCCcore-12.3.0/lib64/libnl-route-3.so.200 (0x0000fffcf88d0000)
100-
libnl-3.so.200 => /cluster/installations/eessi/default/aarch64/software/libnl/3.11.0-GCCcore-12.3.0/lib64/libnl-3.so.200 (0x0000fffcf8890000)
101-
libpmix.so.2 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0/lib64/libpmix.so.2 (0x0000fffcf8690000)
102-
libevent_core-2.1.so.7 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0/lib64/libevent_core-2.1.so.7 (0x0000fffcf8630000)
103-
libevent_pthreads-2.1.so.7 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0/lib64/libevent_pthreads-2.1.so.7 (0x0000fffcf8600000)
104-
libhwloc.so.15 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0/lib64/libhwloc.so.15 (0x0000fffcf8580000)
105-
libpciaccess.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libpciaccess/0.17-GCCcore-12.3.0/lib64/libpciaccess.so.0 (0x0000fffcf8550000)
106-
libxml2.so.2 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libxml2/2.11.4-GCCcore-12.3.0/lib64/libxml2.so.2 (0x0000fffcf83e0000)
107-
libz.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libz.so.1 (0x0000fffcf83a0000)
108-
liblzma.so.5 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/liblzma.so.5 (0x0000fffcf8330000)
109-
libm.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libm.so.6 (0x0000fffcf8280000)
110-
libc.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libc.so.6 (0x0000fffcf80e0000)
111-
/lib/ld-linux-aarch64.so.1 (0x0000fffcfd1e0000)
112-
libcares.so.2 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libcares.so.2 (0x0000fffcf80a0000)
113-
libnghttp2.so.14 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libnghttp2.so.14 (0x0000fffcf8050000)
114-
libssl.so.1.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/OpenSSL/1.1/lib64/libssl.so.1.1 (0x0000fffcf7fb0000)
115-
libcrypto.so.1.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/OpenSSL/1.1/lib64/libcrypto.so.1.1 (0x0000fffcf7d10000)
116-
libdl.so.2 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libdl.so.2 (0x0000fffcf7ce0000)
117-
libpthread.so.0 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libpthread.so.0 (0x0000fffcf7cb0000)
118-
librt.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/librt.so.1 (0x0000fffcf7c80000)
119-
```
120-
12179
### Testing
12280

12381
We plan to provide more comprehensive test results in the future. In this blog post we want to report that the approach works in principle, and that the EESSI stack can pick up and use the custom OpenMPI build and extract
12482
performance from the host interconnect **without the need to rebuild any software packages**.
12583

126-
**1- Test using OSU-Micro-Benchmarks on 2-nodes (x86_64 AMD-CPUs)**:
84+
**1- Test using OSU-Micro-Benchmarks from EESSI on 2-nodes (x86_64 AMD-CPUs)**:
12785
```
12886
Environment set up to use EESSI (2023.06), have fun!
87+
12988
hostname:
13089
x1001c6s2b0n1
13190
x1001c6s3b0n0
@@ -207,7 +166,7 @@ Currently Loaded Modules:
207166
2097152 90.79
208167
```
209168

210-
**2- Test using OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 on 2-nodes (Grace/Hopper GPUs)**:
169+
**2- Test using OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 from EESSI on 2-nodes/2-GPUs (Grace/Hopper GPUs)**:
211170
```
212171
Environment set up to use EESSI (2023.06), have fun!
213172
@@ -297,6 +256,79 @@ Currently Loaded Modules:
297256
2097152 93.98
298257
4194304 180.14
299258
```
300-
## Conclusion
301259

260+
**3- Test using OSU-Micro-Benchmarks/7.5 with PrgEnv-cray on 2-nodes/2-GPUs (Grace/Hopper GPUs)**:
261+
```
262+
263+
hostname:
264+
x1000c4s4b1n0
265+
x1000c5s3b0n0
266+
267+
CPU info:
268+
Vendor ID: ARM
269+
270+
Currently Loaded Modules:
271+
1) craype-arm-grace 8) craype/2.7.34
272+
2) libfabric/1.22.0 9) cray-dsmml/0.3.1
273+
3) craype-network-ofi 10) cray-mpich/8.1.32
274+
4) perftools-base/25.03.0 11) cray-libsci/25.03.0
275+
5) xpmem/2.11.3-1.3_gdbda01a1eb3d 12) PrgEnv-cray/8.6.0
276+
6) cce/19.0.0 13) cudatoolkit/24.11_12.6
277+
278+
# OSU MPI-CUDA Bi-Directional Bandwidth Test v7.5
279+
# Datatype: MPI_CHAR.
280+
# Size Bandwidth (MB/s)
281+
1 1.06
282+
2 2.17
283+
4 4.40
284+
8 8.80
285+
16 17.64
286+
32 35.17
287+
64 70.55
288+
128 140.91
289+
256 281.22
290+
512 559.04
291+
1024 1114.45
292+
2048 2081.25
293+
4096 4068.64
294+
8192 1852.11
295+
16384 18564.47
296+
32768 22647.40
297+
65536 33108.03
298+
131072 39553.95
299+
262144 43140.01
300+
524288 44853.40
301+
1048576 45761.69
302+
2097152 46228.10
303+
4194304 46470.29
304+
305+
# OSU MPI-CUDA Latency Test v7.5
306+
# Datatype: MPI_CHAR.
307+
# Size Avg Latency(us)
308+
1 2.76
309+
2 2.72
310+
4 2.90
311+
8 2.86
312+
16 2.85
313+
32 2.73
314+
64 2.60
315+
128 3.41
316+
256 4.17
317+
512 4.19
318+
1024 4.29
319+
2048 4.44
320+
4096 4.66
321+
8192 7.59
322+
16384 8.17
323+
32768 8.44
324+
65536 9.92
325+
131072 12.59
326+
262144 18.07
327+
524288 29.00
328+
1048576 50.64
329+
2097152 94.06
330+
4194304 180.44
331+
```
332+
333+
## Conclusion
302334
The approach demonstrates EESSI's flexibility in accommodating specialized hardware requirements while preserving the benefits of a standardized software stack! There is plenty of more testing to do, but the signs at this stage are very good!

0 commit comments

Comments
 (0)