You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Intel GPU Max (Ponte Vecchio) OpenMP target offload support
Add end-to-end support for building and running MFC on Intel Data Center
GPU Max (Ponte Vecchio) using ifx 2025.0+ with OpenMP target offload to
SPIR-V/SPIR64. Verified on GT CRNCH RoboGator (dash4) with Intel GPU
Max 1100. All 161 1D regression tests pass.
## Compiler and build system
- Recognize IntelLLVM compiler ID throughout CMakeLists.txt (was Intel)
- Add -fiopenmp -fopenmp-targets=spir64 compile/link flags for GPU builds
- Add -fp-model=precise to prevent ifx FP reassociation in SPIR-V kernels
- Add -fpp to global compile flags for Intel preprocessor compatibility
- Link MKL parallel, libmkl_sycl_dft, libsycl, libOpenCL for oneMKL FFT
- Strip SPIR-V from mkl_dfti_omp_offload.o via clang-offload-bundler to
fix zeModuleDynamicLink Level Zero failures
- Add --intel-aot flag: AOT compilation via ocloc to native PVC ISA,
eliminates ~30 min Level Zero JIT delay (test runs: 30 min -> 14 sec)
- Add IntelLLVM to no-FFTW-from-source list in dependencies/CMakeLists.txt
- Fix LAPACK PIE link error with ifx on Ubuntu 22.04
## GPU kernel fixes
- omp_macros.fpp: add Intel-specific OMP_PARALLEL_LOOP, END_OMP_PARALLEL_LOOP,
OMP_ROUTINE, OMP_MKL_DISPATCH branches for SPIR-V codegen
- parallel_macros.fpp: add GPU_MKL_DISPATCH() macro for oneMKL dispatch
- shared_parallel_macros.fpp: add USING_INTEL Fypp variable; extend all
#:if not MFC_CASE_OPTIMIZATION and USING_AMD guards to include USING_INTEL
and bare #:if USING_AMD guards for dimension(sys_size) in m_cbc/m_compute_cbc
- m_fftw.fpp: oneMKL DFTI + ! dispatch GPU FFT path for Intel
- m_compute_levelset.fpp: split single if-else dispatch to fix multi-callee
phi-node issue and inliner ICE; add -fno-inline workaround
- m_riemann_solvers.fpp, m_variables_conversion.fpp, m_bubbles_EE.fpp,
m_weno.fpp, m_sim_helpers.fpp, m_pressure_relaxation.fpp, m_boundary_common,
m_chemistry.fpp, m_phase_change.fpp, m_bubbles_EL.fpp, m_viscous.fpp,
m_ibm.fpp, m_hyperelastic.fpp, m_acoustic_src.fpp, m_surface_tension.fpp,
m_data_output.fpp, m_qbmm.fpp, m_compute_cbc.fpp, m_cbc.fpp, m_ib_patches.fpp:
explicit array sizes in GPU_ROUTINE arguments (no assumed-shape in SPIR-V)
and extend VLA guards to USING_INTEL for non-case-optimized GPU builds
- m_helper.fpp: Intel-specific workarounds for SPIR-V codegen
## Toolchain
- Add GT CRNCH RoboGator (crnch) module entry with Intel oneAPI 2025.1
- run.py: Intel GPU detection, set LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=256
and SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY=0 for ~16% speedup
- run/input.py: post-process pyrometheus m_thermochem.f90 for --gpu mp
(replace C-macro GPU_ROUTINE with literal ! declare target)
- build.py, state.py: --intel-aot flag and ocloc device selection
- test.py: --binary mpirun support to bypass SLURM srun slot limits on CRNCH
- bootstrap/modules.sh: crnch module bootstrap
- templates/include/helpers.mako: Intel MPI I_MPI_FABRICS=shm hint
- modules: crnch entry (Intel oneAPI 2025.1, mpiifx, GPU Max 1100)
## Documentation
- docs/documentation/intel-gpu-max.md: full build, run, troubleshoot guide
0 commit comments