{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc#1043
{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc#1043adammccartney wants to merge 3 commits intoEESSI:2023.06-software.eessi.iofrom
Conversation
This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries. The content of the hook is extracted from: https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
Hi @adammccartney , As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it? Thank you! |
| pre-configure hook for nvhpc | ||
| - search and replace operations in the ec dict | ||
| """ | ||
| if self.name == "NVHPC": |
There was a problem hiding this comment.
This is good for now, but there is quite a bit of discussion currently about changing the naming in the EasyBuild context. We are currently naming the compilers only NVHPC but we should perhaps be defining a toolchain hierarchy for NVHPC since they also contain MPI and some math libraries. This may lead to the compilers being called something like nvidia_compilers or NVHPC becoming a fatter toolchain.
There was a problem hiding this comment.
You're also going to need some versioning clauses here for things you have actually tested.
This is definitely a corner case at present as EESSI itself will not ship his toolchain (currently), and I wonder if we should not allow an environment variable to force the use of the hook, something like
| if self.name == "NVHPC": | |
| if self.name == "NVHPC": | |
| force_nvhpc_hook = 'EESSI_FORCE_NVHPC_HOOK' | |
| if self.version in [...] or os.getenv(force_nvhpc_hook, False): | |
| ... | |
| else: | |
| print_msg(f"Not using existing hook for {self.name}/{self.version}, if you wish to force this please set the envvar {force_nvhpc_hook}") |
Sure, would be happy to. Would you mind giving a few points of guidance? Let me know what would be useful to see. As you can see we are referencing an explicit config for easybuild and I think the easyconfig for NVHPC is slightly adapted to include the "accept-eula" variable or whatver. I'll backport this to a "vanilla" eessi-extend environment today that can be used to install stuff on host-injections. I guess it would be useful to have a command that can be run in the standard container started by |
Replaces EBROOTGENTOO with EPREFIX/usr
| new_opts = f'''installdir=%(installdir)s/Linux_x86_64/%(version)s | ||
| EPREFIX={eprefix} | ||
| sed -i "s@\(set LDSO=.*\);@\\1 --sysroot=$EPREFIX;@" $installdir/compilers/bin/localrc | ||
| echo "set DEFLIBDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc | ||
| echo "set DEFSTDOBJDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc''' |
There was a problem hiding this comment.
Thinking about this a bit more, the logic here could be added directly to the relevant section of the NVHPC easyblock (and used conditionally based on whether the EB build option --sysroot is set).
There was a problem hiding this comment.
|
So, interestingly the sanity check now fails if I try to build this directly on a compute node ( |
|
bot: build inst:eessi-bot-mc-azure arch:x86_64/amd/zen4 repo:eessi.io-2023.06-software accelerator:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
So I finally got this working (building to a custom cvmfs repo, then loading on a compute node and reproducing
the sanity checks).
There were a number of issues that needed to be worked out.
1. A potential issue that might appear if the linker happens to first find a script called "libc.so".
The script is located in the compat layer and looks like it may possibly(?) redirect the linker to
the host /lib64/libc.so.6 if it gets picked up.
> cat $EPREFIX/usr/lib64/libc.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib64/libc.so.6 /usr/lib64/libc_nonshared.a AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) )
2. Another issue is that nvc++ will scan a number of directories looking for localrc files, if there
are any old localrc files lying around that point to the wrong place, this will cause problems.
The case below shows a situation where the localrc was pointing to a (removed) host_injections path
>nvc++ -dryrun -std=c++20 minimal.cpp minimal
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/.nvc++rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/nativerc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/fnativerc
Skipping rcfiles/internalrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccirc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cpprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cppcurc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/paralgorc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lincommonrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmcomprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/omprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/iparc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acc1rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cudaselectrc
Skipping rcfiles/persnvflangrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acclin8664rc
Skipping rcfiles/acctoolsrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/targetrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/deprecatedrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/c++llvmrc
Skipping rcfiles/llvmxrc (not found)
Skipping rcfiles/tunexrc (not found)
Skipping rcfiles/clangxrc (not found)
Skipping rcfiles/gccxrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/persnvirc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/localrc
Skipping localrc.n3001-003 (not found)
Reading rcfile /home/fs60000/admccartney/.config/NVIDIA/nvhpc/25.1/localrc.n3001-003
Skipping siterc (not found)
Skipping siterc.n3001-003 (not found)
Skipping $GCCLOCALRC (not found)
Skipping .mynvrc (not found)
Skipping .mynvc++rc (not found)
Skipping .mynvcpprc (not found)
Skipping .mynvx86rc (not found)
Skipping $MYLOCALRC (not found)
Skipping cudarc (not found)
Action(realpath(/opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../..))
Error in path /opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../..
It should __not__ be finding the localrc files in my home directory
{EESSI 2023.06} admccartney@n3001-003 ~/tests/nvhpc
>rm -rf ~/.config/NVIDIA/nvhpc/25.1/
> nvc++ -std=c++20 minimal.cpp -o minimal
>./minimal
Hello world
|
Okay, so I got this working with some careful attention to what the linker was up to. |
|
@adammccartney Thank you for your contribution. We have recently split up the software-layer reposotory. The changes that are made in this pr should target the new repository, https://github.com/EESSI/software-layer-scripts. Which is why will close this pr. We are at the moment also reworking how we handle NVHPC upstream in EasyBuild. This is also why we recommend to hold of on impleting this until we have finished that work in EasyBuild. |
This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries.
The content of the hook is extracted from:
https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547