Skip to content

{2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1#1482

Draft
bedroge wants to merge 2 commits intoEESSI:mainfrom
bedroge:gromacs_2025.3_cuda
Draft

{2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1#1482
bedroge wants to merge 2 commits intoEESSI:mainfrom
bedroge:gromacs_2025.3_cuda

Conversation

@bedroge
Copy link
Copy Markdown
Collaborator

@bedroge bedroge commented Apr 22, 2026

Requires:

9 out of 87 required modules missing:

* Catch2/2.13.10-GCCcore-14.3.0 (Catch2-2.13.10-GCCcore-14.3.0.eb)
* gfbf/2025b (gfbf-2025b.eb)
* hypothesis/6.136.6-GCCcore-14.3.0 (hypothesis-6.136.6-GCCcore-14.3.0.eb)
* spin/0.14-GCCcore-14.3.0 (spin-0.14-GCCcore-14.3.0.eb)
* pybind11/3.0.0-GCC-14.3.0 (pybind11-3.0.0-GCC-14.3.0.eb)
* SciPy-bundle/2025.07-gfbf-2025b (SciPy-bundle-2025.07-gfbf-2025b.eb)
* networkx/3.5-gfbf-2025b (networkx-3.5-gfbf-2025b.eb)
* mpi4py/4.1.0-gompi-2025b (mpi4py-4.1.0-gompi-2025b.eb)
* GROMACS/2025.3-foss-2025b-CUDA-12.9.1 (GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb)

@bedroge bedroge added accel:nvidia 2025.06-software.eessi.io 2025.06 version of software.eessi.io labels Apr 22, 2026
@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 23, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@eessi-bot-rug
Copy link
Copy Markdown

eessi-bot-rug Bot commented Apr 23, 2026

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: amd-zen5 and accelerator nvidia/cc120
Building for: x86_64/amd/zen5 and accelerator nvidia/cc120
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28614014

date job status comment
Apr 23 08:55:57 UTC 2026 submitted job id 28614014 awaits release by job manager
Apr 23 08:56:57 UTC 2026 released job awaits launch by Slurm scheduler
Apr 23 09:19:02 UTC 2026 running job 28614014 is running
Apr 23 09:49:33 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-28614014.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-accel-nvidia-cc120-17769376410.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120
no other files in tarball
Apr 23 09:49:33 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /b88eedf0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /8c8bf48b @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /6d7a17a9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /e5a16ba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /634d019c @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /e9b09ad8 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /b1ea69c1 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /a317b8da @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /a102bba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /7bd54429 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /84994f87 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /d58e51e9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28614014.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 23, 2026

The build succeeded, but it fails in the CUDA sanity check:

== 2026-04-23 11:47:13,762 easyblock.py:3849 INFO CUDA sanity check detailed report:
12 files missing one or more CUDA compute capabilities:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so
12 files with device code for more CUDA Compute Capabilities than requested:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so
12 files missing PTX code for the highest configured CUDA Compute Capability:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so

I guess it may be related to the 120f that we're using, as the binaries do seem to have support for sm_120:

Fatbin elf code:
================
arch = sm_120

@bedroge bedroge changed the title {2025.06}[2025b] GROMACS 2025.3 with CUDA-12.9.1 {2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1 Apr 24, 2026
@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 24, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@eessi-bot-rug
Copy link
Copy Markdown

eessi-bot-rug Bot commented Apr 24, 2026

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: amd-zen5 and accelerator nvidia/cc120
Building for: x86_64/amd/zen5 and accelerator nvidia/cc120
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28630885

date job status comment
Apr 24 07:43:34 UTC 2026 submitted job id 28630885 awaits release by job manager
Apr 24 07:44:50 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 07:46:53 UTC 2026 running job 28630885 is running
Apr 24 08:17:23 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-28630885.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-accel-nvidia-cc120-17770184860.tar.zstsize: 32 MiB (33916242 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_081438UTC
other under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120
no other files in tarball
Apr 24 08:17:23 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /b88eedf0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /8c8bf48b @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /6d7a17a9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /e5a16ba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /634d019c @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /e9b09ad8 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /b1ea69c1 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /a317b8da @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /a102bba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /7bd54429 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /84994f87 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /d58e51e9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28630885.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 24, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/intel/skylake_avx512,accel=nvidia/cc70

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf Bot commented Apr 24, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1482/22223686

date job status comment
Apr 24 11:43:06 UTC 2026 submitted job id 22223686 will be eligible to start in about 20 seconds
Apr 24 11:43:20 UTC 2026 received job awaits launch by Slurm scheduler
Apr 24 11:45:00 UTC 2026 running job 22223686 is running
Apr 24 13:01:50 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-22223686.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17770356230.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Apr 24 13:01:50 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /15d6e239 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /5471f15a @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /1dc400ef @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /9715dde6 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /ed938ed4 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /8d24cea9 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /946648aa @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /9eb3f1e9 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-22223686.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf Bot commented Apr 24, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1482/22223689

date job status comment
Apr 24 11:43:12 UTC 2026 submitted job id 22223689 will be eligible to start in about 20 seconds
Apr 24 11:43:24 UTC 2026 received job awaits launch by Slurm scheduler
Apr 24 11:43:47 UTC 2026 running job 22223689 is running
Apr 25 11:44:10 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job22223689.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Apr 25 11:44:10 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job22223689.test does not exist in job directory, or parsing it failed.

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent Bot commented Apr 24, 2026

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: intel-cascadelake and accelerator nvidia/cc70
Building for: x86_64/intel/cascadelake and accelerator nvidia/cc70
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1482/40819786

date job status comment
Apr 24 11:43:12 UTC 2026 submitted job id 40819786 awaits release by job manager
Apr 24 11:44:50 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:46:54 UTC 2026 running job 40819786 is running
Apr 24 13:09:46 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-40819786.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-cascadelake-accel-nvidia-cc70-17770361340.tar.zstsize: 31 MiB (32681382 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_130833UTC
other under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70
no other files in tarball
Apr 24 13:09:46 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-40819786.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-jsc
Copy link
Copy Markdown

eessi-bot-jsc Bot commented Apr 24, 2026

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace and accelerator nvidia/cc90
Building for: aarch64/nvidia/grace and accelerator nvidia/cc90
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.04/pr_1482/14684643

date job status comment
Apr 24 11:43:15 UTC 2026 submitted job id 14684643 awaits release by job manager
Apr 24 11:44:07 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:45:10 UTC 2026 running job 14684643 is running
Apr 24 12:39:58 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-14684643.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc90-17770337320.tar.gzsize: 32 MiB (34349669 bytes)
entries: 760
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_122736UTC
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90
no other files in tarball
Apr 24 12:39:58 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 18/30 test case(s) from 30 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14684643.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-rug
Copy link
Copy Markdown

eessi-bot-rug Bot commented Apr 24, 2026

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: intel-skylake_avx512 and accelerator nvidia/cc70
Building for: x86_64/intel/skylake_avx512 and accelerator nvidia/cc70
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28636437

date job status comment
Apr 24 11:43:16 UTC 2026 submitted job id 28636437 awaits release by job manager
Apr 24 11:44:00 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 12:08:04 UTC 2026 running job 28636437 is running
Apr 24 13:09:06 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-28636437.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-skylake_avx512-accel-nvidia-cc70-17770359480.tar.zstsize: 31 MiB (32689771 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_130523UTC
other under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70
no other files in tarball
Apr 24 13:09:06 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node %device_type=gpu /495ccd0c @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node %device_type=gpu /61fda20d @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node %device_type=gpu /e3d4ae3b @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node %device_type=gpu /ce7fe725 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node %device_type=gpu /5c339fc9 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node %device_type=gpu /b4bd1071 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node /c3881e1d @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node /5f02f86c @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node /530b49da @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node /f49f730d @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node /c412ac42 @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node /18861056 @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28636437.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent Bot commented Apr 24, 2026

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: amd-zen3 and accelerator nvidia/cc80
Building for: x86_64/amd/zen3 and accelerator nvidia/cc80
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1482/15689215

date job status comment
Apr 24 11:43:18 UTC 2026 submitted job id 15689215 awaits release by job manager
Apr 24 11:44:46 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:48:58 UTC 2026 running job 15689215 is running
Apr 24 12:55:31 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-15689215.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen3-accel-nvidia-cc80-17770352040.tar.zstsize: 34 MiB (36599032 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_125258UTC
other under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Apr 24 12:55:31 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-15689215.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 24, 2026

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

@boegel
Copy link
Copy Markdown
Contributor

boegel commented Apr 24, 2026

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

ulimit -l being set to 8MB causing trouble doesn't seem too crazy to me...

Maybe we just need to add ulimit -l unlimited to bot build (job) script?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io accel:nvidia

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants