Skip to content

{2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1#1482

Draft
bedroge wants to merge 3 commits into
EESSI:mainfrom
bedroge:gromacs_2025.3_cuda
Draft

{2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1#1482
bedroge wants to merge 3 commits into
EESSI:mainfrom
bedroge:gromacs_2025.3_cuda

Conversation

@bedroge

@bedroge bedroge commented Apr 22, 2026

Copy link
Copy Markdown
Collaborator

Requires:

9 out of 87 required modules missing:

* Catch2/2.13.10-GCCcore-14.3.0 (Catch2-2.13.10-GCCcore-14.3.0.eb)
* gfbf/2025b (gfbf-2025b.eb)
* hypothesis/6.136.6-GCCcore-14.3.0 (hypothesis-6.136.6-GCCcore-14.3.0.eb)
* spin/0.14-GCCcore-14.3.0 (spin-0.14-GCCcore-14.3.0.eb)
* pybind11/3.0.0-GCC-14.3.0 (pybind11-3.0.0-GCC-14.3.0.eb)
* SciPy-bundle/2025.07-gfbf-2025b (SciPy-bundle-2025.07-gfbf-2025b.eb)
* networkx/3.5-gfbf-2025b (networkx-3.5-gfbf-2025b.eb)
* mpi4py/4.1.0-gompi-2025b (mpi4py-4.1.0-gompi-2025b.eb)
* GROMACS/2025.3-foss-2025b-CUDA-12.9.1 (GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb)

@bedroge bedroge added accel:nvidia 2025.06-software.eessi.io 2025.06 version of software.eessi.io labels Apr 22, 2026
@bedroge

bedroge commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@eessi-bot-rug

eessi-bot-rug Bot commented Apr 23, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: amd-zen5 and accelerator nvidia/cc120
Building for: x86_64/amd/zen5 and accelerator nvidia/cc120
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28614014

date job status comment
Apr 23 08:55:57 UTC 2026 submitted job id 28614014 awaits release by job manager
Apr 23 08:56:57 UTC 2026 released job awaits launch by Slurm scheduler
Apr 23 09:19:02 UTC 2026 running job 28614014 is running
Apr 23 09:49:33 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-28614014.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-accel-nvidia-cc120-17769376410.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120
no other files in tarball
Apr 23 09:49:33 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /b88eedf0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /8c8bf48b @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /6d7a17a9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /e5a16ba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /634d019c @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /e9b09ad8 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /b1ea69c1 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /a317b8da @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /a102bba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /7bd54429 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /84994f87 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /d58e51e9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28614014.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator Author

The build succeeded, but it fails in the CUDA sanity check:

== 2026-04-23 11:47:13,762 easyblock.py:3849 INFO CUDA sanity check detailed report:
12 files missing one or more CUDA compute capabilities:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so
12 files with device code for more CUDA Compute Capabilities than requested:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so
12 files missing PTX code for the highest configured CUDA Compute Capability:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so

I guess it may be related to the 120f that we're using, as the binaries do seem to have support for sm_120:

Fatbin elf code:
================
arch = sm_120

@bedroge bedroge changed the title {2025.06}[2025b] GROMACS 2025.3 with CUDA-12.9.1 {2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1 Apr 24, 2026
@bedroge

bedroge commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@eessi-bot-rug

eessi-bot-rug Bot commented Apr 24, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: amd-zen5 and accelerator nvidia/cc120
Building for: x86_64/amd/zen5 and accelerator nvidia/cc120
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28630885

date job status comment
Apr 24 07:43:34 UTC 2026 submitted job id 28630885 awaits release by job manager
Apr 24 07:44:50 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 07:46:53 UTC 2026 running job 28630885 is running
Apr 24 08:17:23 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-28630885.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-accel-nvidia-cc120-17770184860.tar.zstsize: 32 MiB (33916242 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_081438UTC
other under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120
no other files in tarball
Apr 24 08:17:23 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /b88eedf0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /8c8bf48b @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /6d7a17a9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /e5a16ba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /634d019c @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /e9b09ad8 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /b1ea69c1 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /a317b8da @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /a102bba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /7bd54429 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /84994f87 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /d58e51e9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28630885.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/intel/skylake_avx512,accel=nvidia/cc70

@eessi-bot-surf

eessi-bot-surf Bot commented Apr 24, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1482/22223686

date job status comment
Apr 24 11:43:06 UTC 2026 submitted job id 22223686 will be eligible to start in about 20 seconds
Apr 24 11:43:20 UTC 2026 received job awaits launch by Slurm scheduler
Apr 24 11:45:00 UTC 2026 running job 22223686 is running
Apr 24 13:01:50 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-22223686.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17770356230.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Apr 24 13:01:50 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /15d6e239 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /5471f15a @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /1dc400ef @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /9715dde6 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /ed938ed4 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /8d24cea9 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /946648aa @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /9eb3f1e9 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-22223686.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-surf

eessi-bot-surf Bot commented Apr 24, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1482/22223689

date job status comment
Apr 24 11:43:12 UTC 2026 submitted job id 22223689 will be eligible to start in about 20 seconds
Apr 24 11:43:24 UTC 2026 received job awaits launch by Slurm scheduler
Apr 24 11:43:47 UTC 2026 running job 22223689 is running
Apr 25 11:44:10 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job22223689.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Apr 25 11:44:10 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job22223689.test does not exist in job directory, or parsing it failed.

@gpu-bot-ugent

gpu-bot-ugent Bot commented Apr 24, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: intel-cascadelake and accelerator nvidia/cc70
Building for: x86_64/intel/cascadelake and accelerator nvidia/cc70
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1482/40819786

date job status comment
Apr 24 11:43:12 UTC 2026 submitted job id 40819786 awaits release by job manager
Apr 24 11:44:50 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:46:54 UTC 2026 running job 40819786 is running
Apr 24 13:09:46 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-40819786.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-cascadelake-accel-nvidia-cc70-17770361340.tar.zstsize: 31 MiB (32681382 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_130833UTC
other under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70
no other files in tarball
Apr 24 13:09:46 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-40819786.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-jsc

eessi-bot-jsc Bot commented Apr 24, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace and accelerator nvidia/cc90
Building for: aarch64/nvidia/grace and accelerator nvidia/cc90
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.04/pr_1482/14684643

date job status comment
Apr 24 11:43:15 UTC 2026 submitted job id 14684643 awaits release by job manager
Apr 24 11:44:07 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:45:10 UTC 2026 running job 14684643 is running
Apr 24 12:39:58 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-14684643.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc90-17770337320.tar.gzsize: 32 MiB (34349669 bytes)
entries: 760
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_122736UTC
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90
no other files in tarball
Apr 24 12:39:58 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 18/30 test case(s) from 30 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14684643.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-rug

eessi-bot-rug Bot commented Apr 24, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: intel-skylake_avx512 and accelerator nvidia/cc70
Building for: x86_64/intel/skylake_avx512 and accelerator nvidia/cc70
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28636437

date job status comment
Apr 24 11:43:16 UTC 2026 submitted job id 28636437 awaits release by job manager
Apr 24 11:44:00 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 12:08:04 UTC 2026 running job 28636437 is running
Apr 24 13:09:06 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-28636437.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-skylake_avx512-accel-nvidia-cc70-17770359480.tar.zstsize: 31 MiB (32689771 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_130523UTC
other under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70
no other files in tarball
Apr 24 13:09:06 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node %device_type=gpu /495ccd0c @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node %device_type=gpu /61fda20d @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node %device_type=gpu /e3d4ae3b @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node %device_type=gpu /ce7fe725 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node %device_type=gpu /5c339fc9 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node %device_type=gpu /b4bd1071 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node /c3881e1d @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node /5f02f86c @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node /530b49da @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node /f49f730d @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node /c412ac42 @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node /18861056 @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28636437.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent

gpu-bot-ugent Bot commented Apr 24, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: amd-zen3 and accelerator nvidia/cc80
Building for: x86_64/amd/zen3 and accelerator nvidia/cc80
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1482/15689215

date job status comment
Apr 24 11:43:18 UTC 2026 submitted job id 15689215 awaits release by job manager
Apr 24 11:44:46 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:48:58 UTC 2026 running job 15689215 is running
Apr 24 12:55:31 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-15689215.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen3-accel-nvidia-cc80-17770352040.tar.zstsize: 34 MiB (36599032 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_125258UTC
other under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Apr 24 12:55:31 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-15689215.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator Author

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

@boegel

boegel commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

ulimit -l being set to 8MB causing trouble doesn't seem too crazy to me...

Maybe we just need to add ulimit -l unlimited to bot build (job) script?

@bedroge

bedroge commented May 1, 2026

Copy link
Copy Markdown
Collaborator Author

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

ulimit -l being set to 8MB causing trouble doesn't seem too crazy to me...

Maybe we just need to add ulimit -l unlimited to bot build (job) script?

I tried various things with an interactive job on Snellius, but ulimit -l always printed unlimited. Looking at the logs and the job details again, I found that the job itself is also reported as OUT_OF_MEMORY, so maybe that's the real issue here. sacct shows that the job requested 120GB though, which should be more than enough for GROMACS? Also, MaxRSS is 15916602K, i.e. ~16GB, so I don't understand this... From what I can see in the logs, it doesn't seem to use /dev/shm either.

edit: the zen4 job also ran out of memory according to Slurm, but somehow kept running and then timed out after a day.

@casparvl do you have any idea what's going on?

@casparvl

casparvl commented May 4, 2026

Copy link
Copy Markdown
Collaborator

The only thing I can think of: these nodes don't have local disks, so /tmp is essentially also in memory. Could it be that we are writing a lot there? I mean, it'd have to be a whole lot.

@casparvl

casparvl commented May 4, 2026

Copy link
Copy Markdown
Collaborator

edit: the zen4 job also ran out of memory according to Slurm, but somehow kept running and then timed out after a day.

I've seen this happen before. If you have, say, 3 processes running, OOM killer might kill one, leave 2 stray processes that just wait for the other one to do something. And that then runs indefinitely. SLURM doesn't end the job, since you still have running processes.

@bedroge

bedroge commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

I've done an interactive build on an A100 node on Snellius with my personal account and on top of EESSI (without a container), that worked fine:

== Build succeeded for 1 out of 1 (total: 30 mins 20 secs)
== Summary:
   * [SUCCESS] GROMACS/2025.4-foss-2025b-CUDA-12.9.1

No memory issues, and the max memory usage was like ~4GB. I'll do another one with the container.

@bedroge

bedroge commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

Let me just try this again as well:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@eessi-bot-surf

eessi-bot-surf Bot commented May 8, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.05/pr_1482/22588119

date job status comment
May 08 14:27:42 UTC 2026 submitted job id 22588119 will be eligible to start in about 20 seconds

@casparvl

Copy link
Copy Markdown
Collaborator

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@casparvl

Copy link
Copy Markdown
Collaborator

Hmmm, something is wrong. We changed some things in the bot config in our config management system, but for some reason it concludes it shouldn't submit a job based on the above commands. Will dig into why...

@ocaisa

ocaisa commented May 13, 2026

Copy link
Copy Markdown
Member

@bedroge I don't know if you are looking at adding the hwloc support to this, but just to mention you would also need a dependency on https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/h/hwloc-CUDA in that case

@bedroge

bedroge commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 5, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14864138

date job status comment
Jun 05 19:02:07 UTC 2026 submitted job id 14864138 awaits release by job manager
Jun 05 19:03:05 UTC 2026 released job awaits launch by Slurm scheduler
Jun 05 19:04:12 UTC 2026 running job 14864138 is running
Jun 05 19:43:26 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14864138.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17806881730.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 05 19:43:26 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14864138.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 5, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14864254

date job status comment
Jun 05 20:49:54 UTC 2026 submitted job id 14864254 awaits release by job manager
Jun 05 20:51:01 UTC 2026 released job awaits launch by Slurm scheduler
Jun 05 20:52:15 UTC 2026 running job 14864254 is running
Jun 05 21:31:11 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14864254.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17806946890.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 05 21:31:11 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14864254.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@EESSI EESSI deleted a comment from eessi-bot-jsc Bot Jun 5, 2026
@EESSI EESSI deleted a comment from eessi-bot-jsc Bot Jun 5, 2026
@EESSI EESSI deleted a comment from eessi-bot-jsc Bot Jun 5, 2026
@EESSI EESSI deleted a comment from eessi-bot-jsc Bot Jun 5, 2026
@EESSI EESSI deleted a comment from eessi-bot-jsc Bot Jun 5, 2026
@bedroge

bedroge commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 5, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14864256

date job status comment
Jun 05 20:51:54 UTC 2026 submitted job id 14864256 awaits release by job manager
Jun 05 20:52:08 UTC 2026 released job awaits launch by Slurm scheduler
Jun 05 20:53:22 UTC 2026 running job 14864256 is running
Jun 05 21:32:14 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14864256.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17806947530.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 05 21:32:14 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14864256.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

The build for Grace + compute capability 7.0 keeps failing with:

ESC[0;32m[       OK ] ESC[mPropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/12 (480 ms)
ESC[0;32m[----------] ESC[m13 tests from PropagatorsWithConstraints/PeriodicActionsTest (6287 ms total)

ESC[0;32m[----------] ESC[mGlobal test environment tear-down
ESC[0;32m[==========] ESC[m13 tests from 1 test suite ran. (6406 ms total)
ESC[0;32m[  PASSED  ] ESC[m12 tests.
ESC[0;31m[  FAILED  ] ESC[m1 test, listed below:
ESC[0;31m[  FAILED  ] ESC[mPropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/11, where GetParam() = ({ ("comm-mode", "linear"), ("integrator", "md-vv"), ("maxGromppWarningsTolerated", "0"), ("nstcomm", "5"), ("nstpcouple", "3"), ("nsttcouple", "2"), ("pcoupl", "no"), ("simulationName", "tip3p5"), ("tcoupl", "v-rescale") }, 0x40b7c4)

 1 FAILED TEST
Opened /tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithConstraints_PeriodicActionsTest_PeriodicActionsAgreeWithReference_11_reference.edr as single precision energy file
Opened /tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithConstraints_PeriodicActionsTest_PeriodicActionsAgreeWithReference_11.edr as single precision energy file
^MReading energy frame      0 time    0.000         ^MReading energy frame      0 time    0.000         ^MReading energy frame      1 time    0.001         ^MReading energy frame      1 time    0.001         ^MReading energy frame      2 time    0.002         ^MReading energy frame      2 time    0.002         ^MReading energy frame      3 time    0.003         ^MReading energy frame      3 time    0.003         ^MReading energy frame      4 time    0.004         ^MReading energy frame      4 time    0.004         ^MReading energy frame      5 time    0.005         ^MReading energy frame      5 time    0.005         ^MReading energy frame      6 time    0.006         ^MReading energy frame      6 time    0.006         /tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/energycomparison.cpp:124: Failure
  Value of: energyValueInTest
    Actual: -26.620468139648438
  Expected: energyValueInReference
  Which is: -26.583398818969727
Difference: 0.0370693 (19435 single-prec. ULPs, rel. 0.00139)
 Tolerance: abs. 0.0214577, 18000 ULPs
Google Test trace:
/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/energycomparison.cpp:118: Comparing Pressure between frames
/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/energycomparison.cpp:113: Comparing energy reference frame Time 0.006000 Step 6 and test frame Time 0.006000 Step 6
/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/periodicactions.cpp:171: Found frame from reference file named Time 0.006000 Step 6
/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/periodicactions.cpp:161: Found frame from test file named Time 0.006000 Step 6
/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/periodicactions.cpp:214: Comparing energy frames from reference '/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithConstraints_PeriodicActionsTest_PeriodicActionsAgreeWithReference_11_reference.edr' and test '/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithConstraints_PeriodicActionsTest_PeriodicActionsAgreeWithReference_11.edr'
/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/periodicactions.cpp:197: Comparing to observe energies every step works
/tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/gromacs-2025.4/src/programs/mdrun/tests/periodicactions.cpp:188: Comparing two simulations of 'tip3p5' with integrator 'md-vv'
^MReading energy frame      7 time    0.007         ^MReading energy frame      7 time    0.007         ^MReading energy frame      8 time    0.008         ^MReading energy frame      8 time    0.008         ^MReading energy frame      9 time    0.009         ^MReading energy frame      9 time    0.009         ^MReading energy frame     10 time    0.010         ^MReading energy frame     10 time    0.010         ^MReading energy frame     11 time    0.011         ^MReading energy frame     11 time    0.011         ^MReading energy frame     12 time    0.012         ^MReading energy frame     12 time    0.012         ^MReading energy frame     13 time    0.013         ^MReading energy frame     13 time    0.013         ^MReading energy frame     14 time    0.014         ^MReading energy frame     14 time    0.014         ^MReading energy frame     15 time    0.015         ^MReading energy frame     15 time    0.015         ^MReading energy frame     16 time    0.016         ^MReading energy frame     16 time    0.016         ^MLast energy frame read 16 time    0.016         
NOTE 1 [file /tmp/eessibot/easybuild/build/GROMACS/2025.4/foss-2025b-CUDA-12.9.1/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithConstraints_PeriodicActionsTest_PeriodicActionsAgreeWithReference_11_input.mdp]:
  With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
  that with the Verlet scheme, nstlist has no effect on the accuracy of
  your simulation.

Something similar initially happened for a few other targets, including Grace + CC 8.0 and Grace + CC 9.0, but then another attempt worked fine. The one for CC 7.0 keeps failing though, I've tried it many times...

@al42and sorry to ping you again, but would you happen to know what could be wrong here?

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

I've tried this interactively, and it also failed in the same way. I then tried make check instead of make check -j 16 and that seems to work consistently. With -j 8 it also worked, and then suddenly -j16 also worked twice in a row, but the third and fourth time it failed again, and also with -j8. So I'm not sure what's going on. 🤷‍♂️

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Let's try again

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 9, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14878455

date job status comment
Jun 09 08:43:55 UTC 2026 submitted job id 14878455 awaits release by job manager
Jun 09 08:44:43 UTC 2026 released job awaits launch by Slurm scheduler
Jun 09 08:52:05 UTC 2026 running job 14878455 is running
Jun 09 09:40:48 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14878455.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17809972360.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 09 09:40:48 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14878455.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 9, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14879470

date job status comment
Jun 09 09:29:35 UTC 2026 submitted job id 14879470 awaits release by job manager
Jun 09 09:30:27 UTC 2026 released job awaits launch by Slurm scheduler
Jun 09 09:40:44 UTC 2026 running job 14879470 is running
Jun 09 10:40:32 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14879470.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17810012350.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 09 10:40:32 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14879470.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@boegel

boegel commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

I've tried this interactively, and it also failed in the same way. I then tried make check instead of make check -j 16 and that seems to work consistently. With -j 8 it also worked, and then suddenly -j16 also worked twice in a row, but the third and fourth time it failed again, and also with -j8. So I'm not sure what's going on. 🤷‍♂️

Could be related to available memory?

How long does a serial make check take?

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 9, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14880091

date job status comment
Jun 09 11:01:07 UTC 2026 submitted job id 14880091 awaits release by job manager
Jun 09 11:01:19 UTC 2026 released job awaits launch by Slurm scheduler
Jun 09 11:02:31 UTC 2026 running job 14880091 is running
Jun 09 11:41:36 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14880091.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17810049310.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 09 11:41:36 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14880091.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Could be related to available memory?

I can't imagine that would be it, since this does not use a lot of memory, and I don't see any out-of-memory signs.

How long does a serial make check take?

Not completely sure anymore, but I think it took a few minutes or so (less than 10?).

@boegel

boegel commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Could be related to available memory?

I can't imagine that would be it, since this does not use a lot of memory, and I don't see any out-of-memory signs.

How long does a serial make check take?

Not completely sure anymore, but I think it took a few minutes or so (less than 10?).

Then maybe we should just run the tests serially? Doesn't seem worth the pain to run them in parallel?

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Then maybe we should just run the tests serially? Doesn't seem worth the pain to run them in parallel?

Trying that now with a slightly modified easyblock (in an interactive job), but it's taking longer than expected/before. I guess that's because in the previous attempt I ran it with -j 16 first, and then did the serial make check in an EB interactive shell after it had failed. That basically allowed it to skip the compilation steps that are part of the test step.

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 9, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14899901

date job status comment
Jun 09 18:26:14 UTC 2026 submitted job id 14899901 awaits release by job manager
Jun 09 18:27:23 UTC 2026 released job awaits launch by Slurm scheduler
Jun 09 18:28:30 UTC 2026 running job 14899901 is running
Jun 09 19:09:59 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14899901.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17810317810.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 09 19:09:59 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14899901.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 9, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14899902

date job status comment
Jun 09 18:26:20 UTC 2026 submitted job id 14899902 awaits release by job manager
Jun 09 18:27:15 UTC 2026 released job awaits launch by Slurm scheduler
Jun 09 18:28:37 UTC 2026 running job 14899902 is running
Jun 09 19:08:55 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14899902.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17810316870.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 09 19:08:55 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14899902.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc70

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 9, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14901785

date job status comment
Jun 09 20:44:29 UTC 2026 submitted job id 14901785 awaits release by job manager
Jun 09 20:45:29 UTC 2026 released job awaits launch by Slurm scheduler
Jun 09 20:46:36 UTC 2026 running job 14901785 is running
Jun 09 21:25:14 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14901785.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17810399560.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 09 21:25:14 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14901785.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-jsc

eessi-bot-jsc Bot commented Jun 9, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc70
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.06/pr_1482/14901786

date job status comment
Jun 09 20:44:35 UTC 2026 submitted job id 14901786 awaits release by job manager
Jun 09 20:45:21 UTC 2026 released job awaits launch by Slurm scheduler
Jun 09 20:46:44 UTC 2026 running job 14901786 is running
Jun 09 21:26:17 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14901786.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc70-17810400130.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Jun 09 21:26:17 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14901786.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Ran it interactively with a patched easyblock that does a serial make check, but now the same test failed again with the same error. So that doesn't (always) solve it either, apparently. Not sure what to do now, and why I did have a few successful attempts yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io accel:nvidia

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants