Skip to content

Commit 2eb7040

Browse files
author
Richard Top
committed
EESSI Meets Slingshot-11-part2
1 parent 3589f58 commit 2eb7040

1 file changed

Lines changed: 194 additions & 0 deletions

File tree

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
author: [Richard]
3+
date: 2026-05-08
4+
slug: EESSI-on-Cray-Slingshot-part2
5+
---
6+
7+
# MPI at Warp Speed: EESSI Meets Slingshot-11<sub><sup>(part2)</sup></sub>
8+
9+
Building on our initial HPE/Cray Slingshot‑11 results, we further refined MPI tuning and validated the setup using EESSI/2025.06. The outcome is a significant performance improvement, bringing EESSI MPI behavior much closer to vendor tuned Cray MPI environments.
10+
In our previous blog post, [MPI at Warp Speed: EESSI Meets Slingshot‑11](https://www.eessi.io/docs/blog/2025/11/14/EESSI-on-Cray-Slingshot/), we demonstrated that EESSI could successfully leverage the HPE Cray Slingshot‑11 interconnect via the [host_injections](https://www.eessi.io/docs/site_specific_config/host_injections/) mechanism. Even as a proof‑of‑concept, the results were promising especially for GPU aware MPI communication on NVIDIA Grace Hopper systems.
11+
We have continued to tune and refine MPI communication while using EESSI/2025.06 software stack. Through updates to several core components and improvements to library configuration, we significantly reduced latency overheads and improved bandwidth utilization across Slingshot‑11.
12+
In this follow‑up post, we present the results using OSU-Micro-Benchmarks/7.5 and discuss show how close EESSI can now get to native, vendor‑optimized MPI performance on Slingshot‑11 systems.
13+
14+
### System Architecture
15+
16+
Our target system is [Olivia](https://documentation.sigma2.no/hpc_machines/olivia.html#olivia) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all
17+
connected via HPE Slingshot high-speed interconnect.
18+
It consists of two main distinct partitions:
19+
20+
- **Partition 1**: x86_64 AMD CPUs without accelerators
21+
- **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators
22+
23+
### Testing
24+
25+
The following tests were conducted on Olivia accel partition (Grace nodes with Hopper GPUs), using two-node, two-GPU configuration with one MPI task per node.
26+
27+
We evaluated two OSU Micro-Benchmark builds:
28+
29+
1- OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI
30+
31+
2- OSU-Micro-Benchmarks/7.5 compiled with PrgEnv-cray.
32+
33+
The following commands were used to run the benchmarks:
34+
35+
`mpirun -np 2 osu_bibw D D`
36+
37+
`mpirun -np 2 osu_latency D D`
38+
39+
<details>
40+
<summary>See details</summary>
41+
42+
<b>Test using OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI</b>:
43+
```
44+
Environment set up to use EESSI (2025.06), have fun!
45+
46+
hostname:
47+
gpu-1-111
48+
gpu-1-102
49+
50+
CPU info:
51+
Vendor ID: ARM
52+
53+
Currently Loaded Modules:
54+
1) EESSI/2025.06 12) PMIx/5.0.2-GCCcore-13.3.0
55+
2) GCCcore/13.3.0 13) PRRTE/3.0.5-GCCcore-13.3.0
56+
3) GCC/13.3.0 14) UCC/1.3.0-GCCcore-13.3.0
57+
4) numactl/2.0.18-GCCcore-13.3.0 15) OpenMPI/5.0.3-GCC-13.3.0
58+
5) libxml2/2.12.7-GCCcore-13.3.0 16) gompi/2024a
59+
6) libpciaccess/0.18.1-GCCcore-13.3.0 17) GDRCopy/2.4.1-GCCcore-13.3.0
60+
7) hwloc/2.10.0-GCCcore-13.3.0 18) UCX-CUDA/1.16.0-GCCcore-13.3.0-CUDA-12.6.0 (g)
61+
8) OpenSSL/3 19) NCCL/2.22.3-GCCcore-13.3.0-CUDA-12.6.0 (g)
62+
9) libevent/2.1.12-GCCcore-13.3.0 20) UCC-CUDA/1.3.0-GCCcore-13.3.0-CUDA-12.6.0 (g)
63+
10) UCX/1.16.0-GCCcore-13.3.0 21) OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 (g)
64+
Where:
65+
g: built for GPU
66+
67+
# OSU MPI-CUDA Bi-Directional Bandwidth Test v7.5
68+
# Datatype: MPI_CHAR.
69+
# Size Bandwidth (MB/s)
70+
1 1.24
71+
2 2.53
72+
4 5.09
73+
8 10.21
74+
16 20.56
75+
32 41.06
76+
64 82.56
77+
128 164.61
78+
256 328.11
79+
512 652.27
80+
1024 1295.71
81+
2048 2568.34
82+
4096 3161.87
83+
8192 10383.73
84+
16384 19679.28
85+
32768 26194.74
86+
65536 34068.25
87+
131072 38747.45
88+
262144 38515.90
89+
524288 37048.28
90+
1048576 44631.12
91+
2097152 44871.95
92+
4194304 45065.66
93+
94+
# OSU MPI-CUDA Latency Test v7.5
95+
# Datatype: MPI_CHAR.
96+
# Size Avg Latency(us)
97+
1 2.79
98+
2 2.82
99+
4 2.91
100+
8 2.76
101+
16 2.82
102+
32 2.89
103+
64 2.80
104+
128 3.71
105+
256 4.14
106+
512 4.21
107+
1024 4.31
108+
2048 4.44
109+
4096 4.85
110+
8192 8.40
111+
16384 9.31
112+
32768 15.94
113+
65536 12.02
114+
131072 13.51
115+
262144 18.55
116+
524288 29.56
117+
1048576 51.48
118+
2097152 94.93
119+
4194304 180.92
120+
```
121+
122+
<b>Test using OSU-Micro-Benchmarks/7.5 with PrgEnv-cray</b>:
123+
```
124+
125+
hostname:
126+
gpu-1-111
127+
gpu-1-102
128+
129+
CPU info:
130+
Vendor ID: ARM
131+
132+
Currently Loaded Modules:
133+
1) craype-arm-grace 8) craype/2.7.34
134+
2) libfabric/1.22.0 9) cray-dsmml/0.3.1
135+
3) craype-network-ofi 10) cray-mpich/8.1.32
136+
4) perftools-base/25.03.0 11) cray-libsci/25.03.0
137+
5) xpmem/2.11.3-1.3_gdbda01a1eb3d 12) PrgEnv-cray/8.6.0
138+
6) cce/19.0.0 13) cudatoolkit/24.11_12.6
139+
140+
# OSU MPI-CUDA Bi-Directional Bandwidth Test v7.5
141+
# Datatype: MPI_CHAR.
142+
# Size Bandwidth (MB/s)
143+
1 1.06
144+
2 2.17
145+
4 4.40
146+
8 8.80
147+
16 17.64
148+
32 35.17
149+
64 70.55
150+
128 140.91
151+
256 281.22
152+
512 559.04
153+
1024 1114.45
154+
2048 2081.25
155+
4096 4068.64
156+
8192 1852.11
157+
16384 18564.47
158+
32768 22647.40
159+
65536 33108.03
160+
131072 39553.95
161+
262144 43140.01
162+
524288 44853.40
163+
1048576 45761.69
164+
2097152 46228.10
165+
4194304 46470.29
166+
167+
# OSU MPI-CUDA Latency Test v7.5
168+
# Datatype: MPI_CHAR.
169+
# Size Avg Latency(us)
170+
1 2.76
171+
2 2.72
172+
4 2.90
173+
8 2.86
174+
16 2.85
175+
32 2.73
176+
64 2.60
177+
128 3.41
178+
256 4.17
179+
512 4.19
180+
1024 4.29
181+
2048 4.44
182+
4096 4.66
183+
8192 7.59
184+
16384 8.17
185+
32768 8.44
186+
65536 9.92
187+
131072 12.59
188+
262144 18.07
189+
524288 29.00
190+
1048576 50.64
191+
2097152 94.06
192+
4194304 180.44
193+
```
194+
</details>

0 commit comments

Comments
 (0)