Skip to content

Commit a37a0c1

Browse files
committed
improve formatting for EESSI On Slingshot blog post + fix author
1 parent eb2a6a3 commit a37a0c1

2 files changed

Lines changed: 39 additions & 19 deletions

File tree

docs/blog/posts/2025/09/eessi-cray-slingshot11.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
author: [Richard]
2+
authors: [TopRichard]
33
date: 2025-11-14
44
slug: EESSI-on-Cray-Slingshot
55
---

docs/blog/posts/2026/05/eessi-cray-slingshot11-part2.md

Lines changed: 38 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,64 @@
11
---
2-
author: [Richard]
2+
authors: [TopRichard]
33
date: 2026-05-11
44
slug: EESSI-on-Cray-Slingshot-part2
55
---
66

7-
# MPI at Warp Speed: EESSI Meets Slingshot-11<sub><sup>(part2)</sup></sub>
7+
# MPI at Warp Speed: EESSI Meets Slingshot-11<sub><sup>(bis)</sup></sub>
88

9-
Building on our initial HPE/Cray Slingshot‑11 results, we further refined MPI tuning and validated the setup using EESSI/2025.06. The outcome is a significant performance improvement, bringing EESSI MPI behavior much closer to vendor tuned Cray MPI environments.
10-
In our previous blog post, [MPI at Warp Speed: EESSI Meets Slingshot‑11](https://www.eessi.io/docs/blog/2025/11/14/EESSI-on-Cray-Slingshot/), we demonstrated that EESSI could successfully leverage the HPE Cray Slingshot‑11 interconnect via the [host_injections](https://www.eessi.io/docs/site_specific_config/host_injections/) mechanism. Even as a proof‑of‑concept, the results were promising especially for GPU aware MPI communication on NVIDIA Grace Hopper systems.
11-
We have continued to tune and refine MPI communication while using EESSI/2025.06 software stack. Through updates to several core components and improvements to library configuration, we significantly reduced latency overheads and improved bandwidth utilization across Slingshot‑11.
12-
In this follow up blog post, we present the results using OSU-Micro-Benchmarks/7.5 and show how close EESSI can now get to native, vendor‑optimized MPI performance on Slingshot‑11 systems.
9+
Building on our initial HPE/Cray Slingshot‑11 results, we further refined MPI tuning and validated the setup using EESSI 2025.06.
10+
11+
The outcome is a significant performance improvement, bringing MPI support in EESSI much closer to vendor tuned Cray MPI environments.
12+
13+
<!-- more -->
14+
15+
In our previous blog post, [MPI at Warp Speed: EESSI Meets Slingshot‑11](../../2025/09/eessi-cray-slingshot11.md),
16+
we demonstrated that EESSI could successfully leverage the HPE Cray Slingshot‑11 interconnect via the
17+
[host_injections](../../../../site_specific_config/host_injections.md) mechanism.
18+
19+
Even as a proof‑of‑concept, the results were promising, especially for GPU aware MPI communication on NVIDIA Grace Hopper systems.
20+
21+
We have continued to tune and refine MPI communication while using EESSI 2025.06 software stack. Through updates to several core components
22+
and improvements to library configuration, we significantly reduced latency overheads and improved bandwidth utilization across Slingshot‑11.
23+
24+
In this follow-up blog post we present the results using OSU-Micro-Benchmarks 7.5, and show how close EESSI can now get to native,
25+
vendor-optimized MPI performance on Slingshot‑11 systems.
1326

1427
### System Architecture
1528

16-
Our target system is [Olivia](https://documentation.sigma2.no/hpc_machines/olivia.html#olivia) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all
17-
connected via HPE Slingshot high-speed interconnect.
18-
It consists of two main distinct partitions:
29+
Our target system is [Olivia](https://documentation.sigma2.no/hpc_machines/olivia.html#olivia),
30+
which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage,
31+
all connected via HPE Slingshot high-speed interconnect. It consists of two main distinct partitions:
1932

2033
- **Partition 1**: x86_64 AMD CPUs without accelerators
2134
- **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators
2235

2336
### Testing
2437

25-
The following tests were conducted on Olivia accel partition (Grace nodes with Hopper GPUs), using two-node, two-GPU configuration with one MPI task per node.
38+
The following tests were conducted on the `accel` partition of Olivia (Grace nodes with Hopper GPUs),
39+
using a 2-node 2-GPU configuration with one MPI task per node.
2640

2741
We evaluated two OSU Micro-Benchmark builds:
2842

29-
1- OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI
30-
31-
2- OSU-Micro-Benchmarks/7.5 compiled with PrgEnv-cray.
43+
- `OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0` from EESSI;
44+
- `OSU-Micro-Benchmarks/7.5` compiled with `PrgEnv-cray`.
3245

3346
The following commands were used to run the benchmarks:
3447

35-
`srun -N 2 --ntasks-per-node=1 osu_bibw -i 10 D D`
48+
```{ .bash .copy }
49+
srun -N 2 --ntasks-per-node=1 osu_bibw -i 10 D D
50+
```
3651

37-
`srun -N 2 --ntasks-per-node=1 osu_latency -i 10 D D`
52+
```{ .bash .copy }
53+
srun -N 2 --ntasks-per-node=1 osu_latency -i 10 D D
54+
```
3855

3956
![OSU CUDA Bi-bandwidth](OSU‑7.5-CUDA-bibw.png) ![OSU CUDA Latency](OSU‑7.5-CUDA-Latency.png)
4057

4158
<details>
4259
<summary>See details</summary>
4360

44-
<b>Test using OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI</b>:
61+
Test using `OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0` from EESSI:
4562
```
4663
Environment set up to use EESSI (2025.06), have fun!
4764
@@ -123,7 +140,7 @@ Currently Loaded Modules:
123140
4194304 179.65
124141
```
125142

126-
<b>Test using OSU-Micro-Benchmarks/7.5 with PrgEnv-cray</b>:
143+
Test using `OSU-Micro-Benchmarks/7.5` with `PrgEnv-cray`:
127144
```
128145
129146
hostname:
@@ -199,4 +216,7 @@ Currently Loaded Modules:
199216
</details>
200217

201218
## Conclusion
202-
There is a notable improvement in performance. While additional testing is still required, the current results are highly satisfactory.
219+
220+
There is a notable improvement in performance compared to the [previous blog post](../../2025/09/eessi-cray-slingshot11.md).
221+
222+
While additional testing is still required, the current results are highly satisfactory.

0 commit comments

Comments
 (0)