-
Notifications
You must be signed in to change notification settings - Fork 81
Updated link_nvidia_host_libraries.sh for better edge case handling #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ocaisa
merged 25 commits into
EESSI:2023.06-software.eessi.io
from
Darkless012:2023.06-software.eessi.io
Mar 18, 2025
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
5025a66
Updated link_nvidia_host_libraries.sh for better edge case handling a…
Darkless012 abe7150
Fixed copy/paste error, updated logging, fixed most shellcheck issues.
Darkless012 26ee21c
Added Github Action Tests for link_nvidia_host_libraries.sh
Darkless012 70813b4
updated GHAction test for link_nvidia_host_libraries.sh - unnecessary…
Darkless012 18add4d
updated GHAction test for link_nvidia_host_libraries.sh - changed EES…
Darkless012 a67c88e
updated GHAction test for link_nvidia_host_libraries.sh - using defau…
Darkless012 a0178df
updated GHAction test for link_nvidia_host_libraries.sh - get host_in…
Darkless012 ed2a0da
updated GHAction test for link_nvidia_host_libraries.sh - updating ho…
Darkless012 3d0ce44
updated GHAction test for link_nvidia_host_libraries.sh - updating ho…
Darkless012 cb70c60
updated GHAction test for link_nvidia_host_libraries.sh - added check…
Darkless012 a43d6a1
updated GHAction test for link_nvidia_host_libraries.sh - fix proper …
Darkless012 69134a9
updated GHAction test for link_nvidia_host_libraries.sh - fix proper …
Darkless012 d9bf093
updated GHAction test for link_nvidia_host_libraries.sh - fix proper …
Darkless012 0140f92
updated GHAction test for link_nvidia_host_libraries.sh - fix proper …
Darkless012 3e539d1
updated GHAction test for link_nvidia_host_libraries.sh - fix mocking…
Darkless012 51b5f8b
updated GHAction test for link_nvidia_host_libraries.sh - fix permiss…
Darkless012 8ea284c
updated GHAction test for link_nvidia_host_libraries.sh - updated moc…
Darkless012 ebcfaa5
updated GHAction test for link_nvidia_host_libraries.sh - final minor…
Darkless012 981aa74
updated GHAction test for link_nvidia_host_libraries.sh - final minor…
Darkless012 871ee6a
Update workflow to trigger only on specific file change.
Darkless012 03fac99
Update .github/workflows/tests_link_nvidia_host_libraries.yml
Darkless012 b2de71d
Update .github/workflows/tests_link_nvidia_host_libraries.yml
Darkless012 7a938b5
Update scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Darkless012 621a949
Apply suggestions from code review
ocaisa 27d940c
Update .github/workflows/tests_link_nvidia_host_libraries.yml
ocaisa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,196 @@ | ||
| # documentation: https://help.github.com/en/articles/workflow-syntax-for-github-actions | ||
| name: Test NVIDIA Host Libraries Linking | ||
| on: | ||
| push: | ||
| branches: | ||
| - '*-software.eessi.io' # Matches any branch ending with '-software.eessi.io' | ||
| pull_request: | ||
| paths: | ||
| - 'scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh' # PR changes only relevant for this specific file | ||
| - '.github/workflows/tests_link_nvidia_host_libraries.yml' # Also test when changing the tests themselves | ||
| permissions: | ||
| contents: read # to fetch code (actions/checkout) | ||
| jobs: | ||
| build: | ||
| runs-on: ubuntu-24.04 | ||
| steps: | ||
| - name: checkout | ||
| uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938 # v4.2.0 | ||
|
|
||
| - name: Initialize EESSI | ||
| uses: eessi/github-action-eessi@v3 | ||
|
|
||
| - name: Setup mock NVIDIA libraries | ||
| run: | | ||
| # Run the script to create mock libraries | ||
| chmod +x ./tests/nvidia-libs/mock-nvidia-libs.sh | ||
| echo ">>> Running ./tests/nvidia-libs/mock-nvidia-libs.sh" | ||
| ./tests/nvidia-libs/mock-nvidia-libs.sh | ||
|
|
||
| # Create symlink to override host's ldconfig, since the script tries to use /sbin/ldconfig first. | ||
| echo "Symlinking ldconfig to /sbin/ldconfig" | ||
| sudo ln -sf /tmp/ldconfig/ldconfig /sbin/ldconfig | ||
|
|
||
| # Verify the symlink was created correctly | ||
| ls -la /sbin/ldconfig | ||
|
|
||
| - name: Setup mock nvidia-smi | ||
| run: | | ||
| # Create directory for mock nvidia-smi | ||
| mkdir -p /tmp/nvidia-bin | ||
|
|
||
| # Copy the mock script | ||
| chmod +x ./tests/nvidia-libs/mock-nvidia-smi.sh | ||
| echo ">>> Copying ./tests/nvidia-libs/mock-nvidia-smi.sh" | ||
| cp ./tests/nvidia-libs/mock-nvidia-smi.sh /tmp/nvidia-bin/nvidia-smi | ||
|
|
||
| # Add to PATH | ||
| echo "Updating PATH" | ||
| echo "PATH=/tmp/nvidia-bin:$PATH" >> $GITHUB_ENV | ||
|
|
||
| - name: Test LD_PRELOAD mode | ||
| run: | | ||
| echo ">>> Testing LD_PRELOAD mode" | ||
|
|
||
| # Run the script with LD_PRELOAD option (shouldn't create symlinks) | ||
| output=$(./scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh --show-ld-preload || { echo "Script returned non-zero: $?"; echo $output; exit 1; }) | ||
|
|
||
| echo "$output" | ||
|
|
||
| echo ">>> Running checks" | ||
|
|
||
| # Check for expected outputs | ||
| echo "$output" | grep "export EESSI_GPU_COMPAT_LD_PRELOAD=" || { echo "EESSI_GPU_COMPAT_LD_PRELOAD not found in output"; exit 1; } | ||
| echo "$output" | grep "export EESSI_GPU_LD_PRELOAD=" || { echo "EESSI_GPU_LD_PRELOAD not found in output"; exit 1; } | ||
| echo "$output" | grep "export EESSI_OVERRIDE_GPU_CHECK=" || { echo "EESSI_OVERRIDE_GPU_CHECK not found in output"; exit 1; } | ||
|
|
||
| # Verify that no symlinks were created | ||
| if [ -e "/opt/eessi/nvidia/x86_64/host/driver_version.txt" ]; then | ||
| echo "Error: symlinks were created in LD_PRELOAD mode" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "LD_PRELOAD mode test passed." | ||
|
|
||
| - name: Test normal run (first time) | ||
| run: | | ||
| echo ">>> Testing normal run - first time" | ||
|
|
||
| # Run with verbose mode | ||
| output=$(./scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh --verbose || { echo "Script returned non-zero: $?"; echo $output; exit 1; }) | ||
|
|
||
| echo "$output" | ||
|
|
||
| echo ">>> Running checks" | ||
|
|
||
| # Check if NVIDIA GPU was detected - Driver version and CUDA version are hardcoded in `tests/nvidia-libs/mock-nvidia-smi.sh` | ||
| echo "$output" | grep "Found NVIDIA GPU driver version 535.129.03" || { echo "Failed to detect NVIDIA driver version"; exit 1; } | ||
| echo "$output" | grep "Found host CUDA version 8.0" || { echo "Failed to detect CUDA version"; exit 1; } | ||
|
|
||
| # Check if libraries were found | ||
| echo "$output" | grep "Matched.*CUDA Libraries" || { echo "Failed to match CUDA libraries"; exit 1; } | ||
|
|
||
| # Verify symlinks were created | ||
| if [ ! -d "/opt/eessi/nvidia/x86_64/host" ]; then | ||
| echo "Error: host directory wasn't created" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check if version files were created | ||
| if [ ! -f "/opt/eessi/nvidia/x86_64/host/driver_version.txt" ]; then | ||
| echo "Error: driver_version.txt wasn't created" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check driver version content | ||
| grep "535.129.03" "/opt/eessi/nvidia/x86_64/host/driver_version.txt" || { echo "Incorrect driver version"; exit 1; } | ||
|
|
||
| # Check if latest symlink was created | ||
| if [ ! -L "/opt/eessi/nvidia/x86_64/latest" ]; then | ||
| echo "Error: 'latest' symlink wasn't created" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check if latest points to host | ||
| readlink "/opt/eessi/nvidia/x86_64/latest" | grep "host" || { echo "latest doesn't point to host"; exit 1; } | ||
|
|
||
| # Check if symlinks to libraries were created and point to correct files | ||
| echo ">>> Checking library symlinks" | ||
|
|
||
| # List dir with libraries | ||
| echo "Showing content of /tmp/nvidia_libs" | ||
| echo "$(ls -l /tmp/nvidia_libs)" | ||
| echo "Showing content of /opt/eessi/nvidia/x86_64/host" | ||
| echo "$(ls -l /opt/eessi/nvidia/x86_64/host)" | ||
|
|
||
| # List expected library names - list of libraries is hardcoded in `tests/nvidia-libs/mock-nvidia-libs.sh` | ||
| libraries=( | ||
| "libcuda.so" | ||
| "libcuda.so.1" | ||
| "libnvidia-ml.so" | ||
| "libnvidia-ml.so.1" | ||
| "libnvidia-ptxjitcompiler.so" | ||
| "libnvidia-ptxjitcompiler.so.1" | ||
| "libcudadebugger.so" | ||
| "libcudadebugger.so.1" | ||
| ) | ||
|
|
||
| # Check each expected library symlink | ||
| for lib in "${libraries[@]}"; do | ||
| lib_path="/opt/eessi/nvidia/x86_64/host/$lib" | ||
|
|
||
| # Check if the symlink exists | ||
| if [ ! -L "$lib_path" ]; then | ||
| echo "Error: Symlink for $lib was not created" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check if symlink target exists | ||
| target=$(readlink "$lib_path") | ||
| if [ ! -e "$target" ]; then | ||
| echo "Error: Symlink $lib_path points to non-existent file: $target" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Verify it points to our mock library in /tmp/nvidia_libs | ||
| if [[ "$target" != "/tmp/nvidia_libs/$lib"* && "$target" != *"/tmp/nvidia_libs/"* ]]; then | ||
| echo "Error: Symlink $lib_path points to $target, which is not in our mock directory" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo ">>> Verified symlink: $lib -> $target" | ||
| done | ||
|
|
||
| echo "First normal run test passed" | ||
|
|
||
| - name: Test normal run (second time) | ||
| run: | | ||
| echo ">>> Testing normal run - second time - should be idempotent" | ||
| # Remove all write permissions on /opt/eessi so any attempts to write files fail | ||
| chmod -R a-w /opt/eessi | ||
|
|
||
| # Store file timestamps before second run (ignoring access time) | ||
| stat_before=$(stat --format="%n %s %y %U %G %m %i" "/opt/eessi/nvidia/x86_64/host/driver_version.txt") | ||
|
|
||
| # Run script again | ||
| output=$(./scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh || { echo "Script returned non-zero: $?"; echo $output; exit 1; }) | ||
|
|
||
| echo "$output" | ||
|
|
||
| echo ">>> Running checks" | ||
|
|
||
| # Store file timestamps after second run (ignoring access time) | ||
| stat_after=$(stat --format="%n %s %y %U %G %m %i" "/opt/eessi/nvidia/x86_64/host/driver_version.txt") | ||
|
|
||
| # Compare timestamps - should be the same (files shouldn't be modified) | ||
| if [[ "$stat_before" != "$stat_after" ]]; then | ||
| echo "Error: files were modified on second run when they shouldn't have been" | ||
| echo "Before: $stat_before" | ||
| echo "After: $stat_after" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check for message indicating that libraries are already linked | ||
| echo "$output" | grep "have already been linked" || { echo "Missing 'already linked' message"; exit 1; } | ||
|
|
||
| echo "Second normal run test passed" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.