Skip to content

Commit 09761cd

Browse files
authored
Merge pull request #622 from casparvl/test_suite_gpu
Add instructions on how to enable running GPU tests if your submit host only contains CPU
2 parents de1b975 + 0b6ed4d commit 09761cd

1 file changed

Lines changed: 32 additions & 0 deletions

File tree

docs/test-suite/installation-configuration.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ The EESSI test suite requires
1111
* Python >= 3.7
1212
* [ReFrame](https://reframe-hpc.readthedocs.io) v4.3.3 (or newer)
1313
* [ReFrame test library (`hpctestlib`)](https://reframe-hpc.readthedocs.io/en/stable/hpctestlib.html)
14+
* (optionally) [EasyBuild](https://easybuild.io/)
1415

1516
??? note "(If your system Python version is lower than the minimum required version, click here for some tips)"
1617

@@ -25,6 +26,8 @@ The EESSI test suite requires
2526
* You can install a ReFrame module with EasyBuild and a [ReFrame easyconfig](https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/r/ReFrame) containing a more recent Python version.
2627
* Set RFM_PURGE_ENVIRONMENT=1 if you use Python from a module. The ReFrame easyconfigs automatically do that for you.
2728

29+
??? note EasyBuild is needed for certain tests (e.g. BLAS) that need to load multiple modules together. EasyBuilds functionality is used in these cases to finding matching-pairs of modules. If EasyBuild is not available, a warning will be printed and the tests requiring this functionality will be skipped.
30+
2831
#### Installing Reframe
2932

3033
General instructions for installing ReFrame are available in the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/started.html). To check if ReFrame is available, run the `reframe` command:
@@ -167,3 +170,32 @@ also end up in the location specified by `$RFM_PREFIX`.
167170
since our [common logging configuration](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/common_config.py)
168171
only picks up on the `$RFM_PREFIX` environment variable to determine the location for the ReFrame log file.
169172

173+
#### Enabling GPU tests when submitting from a CPU-only node
174+
175+
By default, the test suite checks the currently available modules to determine which tests to run. This means that if you run the `reframe` command on a CPU node, but want it to submit GPU tests since you have GPU batch nodes, you'll need to make sure the test suite can 'see' (and load) those modules.
176+
177+
By default, EESSI makes modules available based on the host architecture. So, if the host on which the ReFrame command runs does not have NVIDIA GPUs, the CUDA-based modules will not be available, and the test suite will not run tests for them:
178+
179+
```bash
180+
module load EESSI/2023.06
181+
reframe -n LAMMPS_lj.*gpu -t 1_node --list
182+
...
183+
[List of matched checks]
184+
Found 0 check(s)
185+
```
186+
187+
To make sure the EESSI test suite can see (and load) the GPU modules on the CPU host, we have to set two additional variables before loading the EESSI module: `EESSI_ACCELERATOR_TARGET_OVERRIDE` and `EESSI_OVERRIDE_GPU_CHECK`. For example, if you want to run tests on a node with a `zen4` CPU and `H100` GPU
188+
189+
```bash
190+
export EESSI_OVERRIDE_GPU_CHECK=True
191+
export EESSI_ACCELERATOR_TARGET_OVERRIDE=accel/nvidia/cc90
192+
module load EESSI/2023.06
193+
reframe -n LAMMPS_lj.*gpu -t 1_node --list
194+
...
195+
[List of matched checks]
196+
- EESSI_LAMMPS_lj %device_type=gpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1 %scale=1_node /1f2ca7c1
197+
Found 1 check(s)
198+
```
199+
200+
!!! note
201+
Since EESSI in principle exposes exactly the same modules for all supported Nvidia compute capabilities, it should not typically matter which _exact_ compute capability you provide here.

0 commit comments

Comments
 (0)