Skip to content

Commit faa9bbb

Browse files
committed
fix: add FI_PROVIDER_PATH to crnch-gpu modules; document SLURM GRES and OFI provider requirements
1 parent 8863022 commit faa9bbb

2 files changed

Lines changed: 23 additions & 1 deletion

File tree

docs/documentation/intel-gpu-max.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,27 @@ mpirun -n <ranks> -hosts <node1>,<node2> ./simulation
288288
Nodes must have passwordless SSH from the launch node and no `pam_slurm_adopt`
289289
blocking. Suppress the SSH login banner on remote nodes with `touch ~/.hushlogin`.
290290

291+
**OFI provider path**: Intel MPI 2021.x ships its own libfabric providers in
292+
`$I_MPI_ROOT/libfabric/lib/prov/`. The system libfabric may not include the tcp
293+
or shm providers. Always set:
294+
295+
```bash
296+
export FI_PROVIDER_PATH=$I_MPI_ROOT/libfabric/lib/prov
297+
```
298+
299+
Without this, `PMPI_Init` aborts with `OFI fi_getinfo() failed: No data available`.
300+
This is handled automatically by `source ./mfc.sh load -c crnch -m gpu`.
301+
302+
**SLURM GPU access**: on SLURM-managed Intel GPU nodes, processes outside a SLURM
303+
allocation cannot open `/dev/dri/renderD128`. Always request the GPU resource:
304+
305+
```bash
306+
#SBATCH --gres=gpu:max_1100:1 # Intel GPU Max 1100
307+
```
308+
309+
Without `--gres`, `omp_get_num_devices()` returns 0 and the process aborts with
310+
integer divide-by-zero in `s_initialize_mpi_domain` (rank % num_devices with 0 devices).
311+
291312
### `libumf.so.1` not found at runtime
292313
The 2026.0 Level Zero and OpenCL UR adapters link against `libumf.so.1`.
293314
If not in `LD_LIBRARY_PATH`, all adapters fail silently and sycl-ls reports

toolchain/modules

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,8 @@ crnch-gpu FC=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14/b
122122
crnch-gpu PATH=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/compiler/2025.0/bin:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14/bin:${PATH}
123123
crnch-gpu MKLROOT=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mkl/2025.0
124124
crnch-gpu I_MPI_ROOT=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14
125-
crnch-gpu LD_LIBRARY_PATH=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mkl/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/compiler/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14/lib:${LD_LIBRARY_PATH}
125+
crnch-gpu LD_LIBRARY_PATH=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mkl/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/compiler/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14/libfabric/lib:${LD_LIBRARY_PATH}
126126
crnch-gpu LIBRARY_PATH=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mkl/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/compiler/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/2025.0/lib:/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14/lib:${LIBRARY_PATH}
127127
crnch-gpu I_MPI_FABRICS=shm:ofi
128128
crnch-gpu FI_PROVIDER=tcp
129+
crnch-gpu FI_PROVIDER_PATH=/net/projects/tools/x86_64/rhel-8/intel-oneapi/2025.1/mpi/2021.14/libfabric/lib/prov

0 commit comments

Comments
 (0)