Skip to content

{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc#1043

Closed
adammccartney wants to merge 3 commits intoEESSI:2023.06-software.eessi.iofrom
adammccartney:hook_nvhpc_pre_configure
Closed

{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc#1043
adammccartney wants to merge 3 commits intoEESSI:2023.06-software.eessi.iofrom
adammccartney:hook_nvhpc_pre_configure

Conversation

@adammccartney
Copy link
Copy Markdown

This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries.

The content of the hook is extracted from:
https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547

This adds a pre_configure_hook for NVHPC. It performs some search and replace
operations on the "localrc" file used by NVHPC to detect information about the
system. In particular it points the sysroot flag at the eessi eprefix variable,
and appends two variables definitions about where to look for system libraries.

The content of the hook is extracted from:
https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547
@eessi-bot-deucalion
Copy link
Copy Markdown

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Apr 24, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent Bot commented Apr 24, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link
Copy Markdown

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@eessi-bot-surf
Copy link
Copy Markdown

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Apr 24, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/intel/cascadelake, x86_64/intel/icelake, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@hvelab
Copy link
Copy Markdown
Contributor

hvelab commented Apr 29, 2025

Hi @adammccartney ,

As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?

Thank you!

Comment thread eb_hooks.py Outdated
Comment thread eb_hooks.py
pre-configure hook for nvhpc
- search and replace operations in the ec dict
"""
if self.name == "NVHPC":
Copy link
Copy Markdown
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now, but there is quite a bit of discussion currently about changing the naming in the EasyBuild context. We are currently naming the compilers only NVHPC but we should perhaps be defining a toolchain hierarchy for NVHPC since they also contain MPI and some math libraries. This may lead to the compilers being called something like nvidia_compilers or NVHPC becoming a fatter toolchain.

Copy link
Copy Markdown
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're also going to need some versioning clauses here for things you have actually tested.

This is definitely a corner case at present as EESSI itself will not ship his toolchain (currently), and I wonder if we should not allow an environment variable to force the use of the hook, something like

Suggested change
if self.name == "NVHPC":
if self.name == "NVHPC":
force_nvhpc_hook = 'EESSI_FORCE_NVHPC_HOOK'
if self.version in [...] or os.getenv(force_nvhpc_hook, False):
...
else:
print_msg(f"Not using existing hook for {self.name}/{self.version}, if you wish to force this please set the envvar {force_nvhpc_hook}")

@adammccartney
Copy link
Copy Markdown
Author

Hi @adammccartney ,

As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?

Thank you!

Sure, would be happy to. Would you mind giving a few points of guidance? Let me know what would be useful to see.
I haven't had the time to look into ReFrame at all yet, so there are no tests (yet) apart from the sanity checks in the EasyBuild. The build last week was done in the eessi-container from the "software layer" repo, which was slightly adapted to suit our own build environment. The build command looks like the following:

#!/bin/bash

project_root="$(realpath $(dirname $(dirname $(dirname $(dirname $BASH_SOURCE)))))"

eb "${project_root}/easyconfigs/2025/NVHPC-25.1-CUDA-12.6.0.eb" \
    -r --cuda-compute-capabilities=9.0 \
    --configfiles="${project_root}/easybuild-asc-config/2025/config.cfg" \
    --hooks="${project_root}/easybuild-asc-config/2025/eb_hooks.py"

As you can see we are referencing an explicit config for easybuild and I think the easyconfig for NVHPC is slightly adapted to include the "accept-eula" variable or whatver. I'll backport this to a "vanilla" eessi-extend environment today that can be used to install stuff on host-injections. I guess it would be useful to have a command that can be run in the standard container started by eessi_container.sh ?

Replaces EBROOTGENTOO with EPREFIX/usr
Comment thread eb_hooks.py
Comment on lines +754 to +758
new_opts = f'''installdir=%(installdir)s/Linux_x86_64/%(version)s
EPREFIX={eprefix}
sed -i "s@\(set LDSO=.*\);@\\1 --sysroot=$EPREFIX;@" $installdir/compilers/bin/localrc
echo "set DEFLIBDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc
echo "set DEFSTDOBJDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc'''
Copy link
Copy Markdown
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, the logic here could be added directly to the relevant section of the NVHPC easyblock (and used conditionally based on whether the EB build option --sysroot is set).

Copy link
Copy Markdown
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adammccartney
Copy link
Copy Markdown
Author

adammccartney commented Apr 29, 2025

So, interestingly the sanity check now fails if I try to build this directly on a compute node (x86_64/amd/zen4 is the architecture by the way) . The initial build was done in the eessi container as I mentioned, set up to use the EESSI_PROJECT_INSTALL variable pointing at a writeable /cvmfs/software.asc.ac.at directory.
The build now fails when I try to use EESSI_SITE_INSTALL. Maybe there is something leaking in via the ld cache on the host as was previously observed. Makes me wonder about how usable the compiler is if we load it from the custom cvmfs repo...

@adammccartney
Copy link
Copy Markdown
Author

bot: build inst:eessi-bot-mc-azure arch:x86_64/amd/zen4 repo:eessi.io-2023.06-software accelerator:nvidia/cc90

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Apr 29, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • account adammccartney has NO permission to send commands to the bot

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Apr 29, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • account adammccartney has NO permission to send commands to the bot

@eessi-bot-surf
Copy link
Copy Markdown

Updates by the bot instance eessi-bot-surf (click for details)
  • account adammccartney has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent Bot commented Apr 29, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • account adammccartney has NO permission to send commands to the bot

So I finally got this working (building to a custom cvmfs repo, then loading on a compute node and reproducing
the sanity checks).

There were a number of issues that needed to be worked out.

1. A potential issue that might appear if the linker happens to first find a script called "libc.so".
The script is located in the compat layer and looks like it may possibly(?) redirect the linker to
the host /lib64/libc.so.6 if it gets picked up.

> cat $EPREFIX/usr/lib64/libc.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib64/libc.so.6 /usr/lib64/libc_nonshared.a  AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) )

2. Another issue is that nvc++ will scan a number of directories looking for localrc files, if there
are any old localrc files lying around that point to the wrong place, this will cause problems.
The case below shows a situation where the localrc was pointing to a (removed) host_injections path

>nvc++ -dryrun -std=c++20 minimal.cpp minimal
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/.nvc++rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/nativerc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/fnativerc
Skipping rcfiles/internalrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccirc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cpprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cppcurc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/paralgorc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lincommonrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmcomprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/omprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/iparc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acc1rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cudaselectrc
Skipping rcfiles/persnvflangrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acclin8664rc
Skipping rcfiles/acctoolsrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/targetrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/deprecatedrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/c++llvmrc
Skipping rcfiles/llvmxrc (not found)
Skipping rcfiles/tunexrc (not found)
Skipping rcfiles/clangxrc (not found)
Skipping rcfiles/gccxrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/persnvirc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/localrc
Skipping localrc.n3001-003 (not found)
Reading rcfile /home/fs60000/admccartney/.config/NVIDIA/nvhpc/25.1/localrc.n3001-003
Skipping siterc (not found)
Skipping siterc.n3001-003 (not found)
Skipping $GCCLOCALRC (not found)
Skipping .mynvrc (not found)
Skipping .mynvc++rc (not found)
Skipping .mynvcpprc (not found)
Skipping .mynvx86rc (not found)
Skipping $MYLOCALRC (not found)
Skipping cudarc (not found)
Action(realpath(/opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../..))
Error in path /opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../..

It should __not__ be finding the localrc files in my home directory

{EESSI 2023.06} admccartney@n3001-003 ~/tests/nvhpc
>rm -rf ~/.config/NVIDIA/nvhpc/25.1/

> nvc++ -std=c++20 minimal.cpp -o minimal
>./minimal
Hello world
@adammccartney
Copy link
Copy Markdown
Author

Okay, so I got this working with some careful attention to what the linker was up to.
See the commit message a2fe8be
For a bit more info.

@laraPPr
Copy link
Copy Markdown
Collaborator

laraPPr commented Jul 1, 2025

@adammccartney Thank you for your contribution. We have recently split up the software-layer reposotory. The changes that are made in this pr should target the new repository, https://github.com/EESSI/software-layer-scripts. Which is why will close this pr.

We are at the moment also reworking how we handle NVHPC upstream in EasyBuild. This is also why we recommend to hold of on impleting this until we have finished that work in EasyBuild.

@laraPPr laraPPr closed this Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants