|
1 | 1 | --- |
2 | | -author: [Richard] |
| 2 | +authors: [TopRichard] |
3 | 3 | date: 2026-05-11 |
4 | 4 | slug: EESSI-on-Cray-Slingshot-part2 |
5 | 5 | --- |
6 | 6 |
|
7 | | -# MPI at Warp Speed: EESSI Meets Slingshot-11<sub><sup>(part2)</sup></sub> |
| 7 | +# MPI at Warp Speed: EESSI Meets Slingshot-11<sub><sup>(bis)</sup></sub> |
8 | 8 |
|
9 | | -Building on our initial HPE/Cray Slingshot‑11 results, we further refined MPI tuning and validated the setup using EESSI/2025.06. The outcome is a significant performance improvement, bringing EESSI MPI behavior much closer to vendor tuned Cray MPI environments. |
10 | | -In our previous blog post, [MPI at Warp Speed: EESSI Meets Slingshot‑11](https://www.eessi.io/docs/blog/2025/11/14/EESSI-on-Cray-Slingshot/), we demonstrated that EESSI could successfully leverage the HPE Cray Slingshot‑11 interconnect via the [host_injections](https://www.eessi.io/docs/site_specific_config/host_injections/) mechanism. Even as a proof‑of‑concept, the results were promising especially for GPU aware MPI communication on NVIDIA Grace Hopper systems. |
11 | | -We have continued to tune and refine MPI communication while using EESSI/2025.06 software stack. Through updates to several core components and improvements to library configuration, we significantly reduced latency overheads and improved bandwidth utilization across Slingshot‑11. |
12 | | -In this follow up blog post, we present the results using OSU-Micro-Benchmarks/7.5 and show how close EESSI can now get to native, vendor‑optimized MPI performance on Slingshot‑11 systems. |
| 9 | +Building on our initial HPE/Cray Slingshot‑11 results, we further refined MPI tuning and validated the setup using EESSI 2025.06. |
| 10 | + |
| 11 | +The outcome is a significant performance improvement, bringing MPI support in EESSI much closer to vendor tuned Cray MPI environments. |
| 12 | + |
| 13 | +<!-- more --> |
| 14 | + |
| 15 | +In our previous blog post, [MPI at Warp Speed: EESSI Meets Slingshot‑11](../../2025/09/eessi-cray-slingshot11.md), |
| 16 | +we demonstrated that EESSI could successfully leverage the HPE Cray Slingshot‑11 interconnect via the |
| 17 | +[host_injections](../../../../site_specific_config/host_injections.md) mechanism. |
| 18 | + |
| 19 | +Even as a proof‑of‑concept, the results were promising, especially for GPU aware MPI communication on NVIDIA Grace Hopper systems. |
| 20 | + |
| 21 | +We have continued to tune and refine MPI communication while using EESSI 2025.06 software stack. Through updates to several core components |
| 22 | +and improvements to library configuration, we significantly reduced latency overheads and improved bandwidth utilization across Slingshot‑11. |
| 23 | + |
| 24 | +In this follow-up blog post we present the results using OSU-Micro-Benchmarks 7.5, and show how close EESSI can now get to native, |
| 25 | +vendor-optimized MPI performance on Slingshot‑11 systems. |
13 | 26 |
|
14 | 27 | ### System Architecture |
15 | 28 |
|
16 | | -Our target system is [Olivia](https://documentation.sigma2.no/hpc_machines/olivia.html#olivia) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all |
17 | | -connected via HPE Slingshot high-speed interconnect. |
18 | | -It consists of two main distinct partitions: |
| 29 | +Our target system is [Olivia](https://documentation.sigma2.no/hpc_machines/olivia.html#olivia), |
| 30 | +which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, |
| 31 | +all connected via HPE Slingshot high-speed interconnect. It consists of two main distinct partitions: |
19 | 32 |
|
20 | 33 | - **Partition 1**: x86_64 AMD CPUs without accelerators |
21 | 34 | - **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators |
22 | 35 |
|
23 | 36 | ### Testing |
24 | 37 |
|
25 | | -The following tests were conducted on Olivia accel partition (Grace nodes with Hopper GPUs), using two-node, two-GPU configuration with one MPI task per node. |
| 38 | +The following tests were conducted on the `accel` partition of Olivia (Grace nodes with Hopper GPUs), |
| 39 | +using a 2-node 2-GPU configuration with one MPI task per node. |
26 | 40 |
|
27 | 41 | We evaluated two OSU Micro-Benchmark builds: |
28 | 42 |
|
29 | | -1- OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI |
30 | | - |
31 | | -2- OSU-Micro-Benchmarks/7.5 compiled with PrgEnv-cray. |
| 43 | +- `OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0` from EESSI; |
| 44 | +- `OSU-Micro-Benchmarks/7.5` compiled with `PrgEnv-cray`. |
32 | 45 |
|
33 | 46 | The following commands were used to run the benchmarks: |
34 | 47 |
|
35 | | -`srun -N 2 --ntasks-per-node=1 osu_bibw -i 10 D D` |
| 48 | +```{ .bash .copy } |
| 49 | +srun -N 2 --ntasks-per-node=1 osu_bibw -i 10 D D |
| 50 | +``` |
36 | 51 |
|
37 | | -`srun -N 2 --ntasks-per-node=1 osu_latency -i 10 D D` |
| 52 | +```{ .bash .copy } |
| 53 | +srun -N 2 --ntasks-per-node=1 osu_latency -i 10 D D |
| 54 | +``` |
38 | 55 |
|
39 | 56 |   |
40 | 57 |
|
41 | 58 | <details> |
42 | 59 | <summary>See details</summary> |
43 | 60 |
|
44 | | -<b>Test using OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 from EESSI</b>: |
| 61 | +Test using `OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0` from EESSI: |
45 | 62 | ``` |
46 | 63 | Environment set up to use EESSI (2025.06), have fun! |
47 | 64 |
|
@@ -123,7 +140,7 @@ Currently Loaded Modules: |
123 | 140 | 4194304 179.65 |
124 | 141 | ``` |
125 | 142 |
|
126 | | -<b>Test using OSU-Micro-Benchmarks/7.5 with PrgEnv-cray</b>: |
| 143 | +Test using `OSU-Micro-Benchmarks/7.5` with `PrgEnv-cray`: |
127 | 144 | ``` |
128 | 145 |
|
129 | 146 | hostname: |
@@ -199,4 +216,7 @@ Currently Loaded Modules: |
199 | 216 | </details> |
200 | 217 |
|
201 | 218 | ## Conclusion |
202 | | -There is a notable improvement in performance. While additional testing is still required, the current results are highly satisfactory. |
| 219 | + |
| 220 | +There is a notable improvement in performance compared to the [previous blog post](../../2025/09/eessi-cray-slingshot11.md). |
| 221 | + |
| 222 | +While additional testing is still required, the current results are highly satisfactory. |
0 commit comments