Skip to content

Issues setting up period test suite runs on BSC HCA cluster #34

@casparvl

Description

@casparvl

While not 100% related to this repository, there wasn't any other place I could reasonably put this.

First attempt

Install a virtualenv on the login node, and run ReFrame from there (which is what our CI/run_reframe.sh from the EESSI test suite does):

That failed with

cvanleeuwe@hca-server:~$ python3 -m venv test
The virtual environment was not created successfully because ensurepip is not
available.  On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

    apt install python3.12-venv

You may need to use sudo with that command.  After installing the python3-venv
package, recreate your virtual environment.

Failing command: /home/cvanleeuwe/test/bin/python3

cvanleeuwe@hca-server:~$ virtualenv test
-bash: /usr/local/bin/virtualenv: cannot execute: required file not found

Action: @julianmorillo has asked a sysadmin to install python3.12-venv on the login host.

Second attempt

Get an interactive session on a batch node, and use ReFrame from the EESSI software env:

$ cat load_eessi_testsuite_env.sh
source eessi_test_venv/bin/activate

SCRIPT_DIR=$(dirname $(realpath $BASH_SOURCE))

export PYTHONPATH=$PYTHONPATH:$SCRIPT_DIR/test-suite
export PYTHONPATH=$PYTHONPATH:$SCRIPT_DIR/test-suite

export RFM_CONFIG_FILES=$SCRIPT_DIR/test-suite/config/bsc_hca.py
export RFM_CHECK_SEARCH_PATH=$SCRIPT_DIR/test-suite/eessi/testsuite/tests
export RFM_CHECK_SEARCH_RECURSIVE=1
export RFM_PREFIX=$SCRIPT_DIR/reframe_runs

# Make sure we also see GPU-enabled tests
export EESSI_ACCELERATOR_TARGET_OVERRIDE=accel/nvidia/cc90

# Make sure we can load CUDA-modules on the login node, where the driver symlinks in host_injections are dangling
export EESSI_OVERRIDE_GPU_CHECK=True

export EESSI_VERSION_OVERRIDE=2025.06-001
source /cvmfs/software.eessi.io/versions/2025.06/init/lmod/bash

module load ReFrame/4.7.4

Then reframe --list will initiate CPU autodetection, which vails with:

[2026-02-12T10:27:43] debug: reframe: --- /home/cvanleeuwe/EESSI/rfm.6p3ogngw/rfm-detect-job.out ---
Requirement already satisfied: pip in ./venv.reframe/lib/python3.9/site-packages (20.3.4)
Collecting pip
  Downloading pip-26.0.1-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.3.4
    Uninstalling pip-20.3.4:
      Successfully uninstalled pip-20.3.4
Successfully installed pip-26.0.1
Collecting reframe-hpc==4.7.4
  Using cached ReFrame_HPC-4.7.4-py3-none-any.whl.metadata (7.2 kB)
Collecting archspec>=0.2.4 (from reframe-hpc==4.7.4)
  Using cached archspec-0.2.5-py3-none-any.whl.metadata (4.4 kB)
Collecting argcomplete (from reframe-hpc==4.7.4)
  Using cached argcomplete-3.6.3-py3-none-any.whl.metadata (16 kB)
Collecting filelock (from reframe-hpc==4.7.4)
  Using cached filelock-3.19.1-py3-none-any.whl.metadata (2.1 kB)
Collecting jsonschema (from reframe-hpc==4.7.4)
  Using cached jsonschema-4.25.1-py3-none-any.whl.metadata (7.6 kB)
Collecting PyYAML (from reframe-hpc==4.7.4)
  Using cached pyyaml-6.0.3.tar.gz (130 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting requests (from reframe-hpc==4.7.4)
  Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting semver (from reframe-hpc==4.7.4)
  Using cached semver-3.0.4-py3-none-any.whl.metadata (6.8 kB)
Collecting tabulate (from reframe-hpc==4.7.4)
  Using cached tabulate-0.9.0-py3-none-any.whl.metadata (34 kB)
Collecting lxml==5.3.0 (from reframe-hpc==4.7.4)
  Using cached lxml-5.3.0.tar.gz (3.7 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
-reframe: command `pip install reframe-hpc==4.7.4' failed (exit code: 1)

--- /home/cvanleeuwe/EESSI/rfm.6p3ogngw/rfm-detect-job.out ---
[2026-02-12T10:27:43] debug: reframe: --- /home/cvanleeuwe/EESSI/rfm.6p3ogngw/rfm-detect-job.err ---
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [3 lines of output]
      Building lxml version 5.3.0.
      Building without Cython.
      Error: Please make sure the libxml2 and libxslt development packages are installed.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed to build 'lxml' when getting requirements to build wheel

(visible in the ReFrame log file)

Action: @julianmorillo has asked a sysadmin to install libxm2 and libxslt on the batch hosts.

Third attempt

Installed libxslt with EESSI-extend. Then, load both in my .bashrc, since I have no other way of changing the pre-amble of the auto-detection job:

export EESSI_VERSION_OVERRIDE=2025.06-001
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load EESSI-extend/2025.06

module load libxslt/1.1.43-GCCcore-14.3.0
module load libxml2/2.14.3-GCCcore-14.3.0

Then, run reframe --list again. I still get the same error as above...

Fourth attempt

I login to each of the batch nodes separately, manually load the ReFrame installation from EESSI, run:

reframe --detect-host-topology=topo.json

Then, move it to the folder where ReFrame expects it, e.g.

mv topo.json ~/.reframe/topology/HCA-banana/processor.json

After having done that at each partition, I can now list without triggering autodetection jobs, since ReFrame concludes they are all there already.

[List of matched checks]
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_nodes %device_type=cpu /bfbdc774
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=8_cores %device_type=cpu /13b1cbf8
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=4_cores %device_type=cpu /9855363d
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_cores %device_type=cpu /0535475e
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_nodes %device_type=cpu /1fffd703
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=8_cores %device_type=cpu /bb77df99
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=4_cores %device_type=cpu /c6ad4395
- EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_cores %device_type=cpu /3d834161
- EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_nodes /ba99bdd1
- EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a
- EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_cores /bbd790fd
- EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_nodes /54558a1b
- EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331
- EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=2_cores /50b87870
Found 16 check(s)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions