Is this a duplicate?
Type of Bug
Silent Failure
Component
cuda.bindings
Describe the bug
@leofang wrote:
I noticed that on the system that I’m on, which has a system CTK 12.3 and I installed CTK 12.9 from conda, the pathfinder from either cuda.bindings 12.9.0 or cuda.pathfinder 1.0.0 would pick up nvJitLink 12.3 (the system one) instead of 12.9 (the conda one), which is not following the behavior that we documented.
I suspect that the logic in _load_nvidia_dynamic_library_no_cache might be wrong:
# Find the library path
found = _find_nvidia_dynamic_library(libname)
if found.abs_path is None:
loaded = load_with_system_search(libname, found.lib_searched_for)
because in _find_nvidia_dynamic_library we always do this on Linux:
self.lib_searched_for = f"lib{libname}.so"
meaning we don’t search with the full soname (libnvJitLink.so.12), but the symlink name (libnvJitLink.so), which conda does not provide if we only install the libnvjitlink package and not the libnvjitlink-dev package.
Therefore, the load_with_system_search function behaves wrong because we fed it a wrong soname.
How to Reproduce
I think a simple reproducer would be:
- launch a vanilla Ubuntu container
- Install miniforge and then create a new conda env with only cuda-pathfinder (from pip) and libnvjitlink (from conda-forge) installed.
- Run the pathfinder
from cuda import pathfinder
pathfinder.load_nvidia_dynamic_lib("nvJitLink")
Expected behavior
The conda .so should be found.
Operating System
No response
nvidia-smi output
No response
Is this a duplicate?
Type of Bug
Silent Failure
Component
cuda.bindings
Describe the bug
@leofang wrote:
I noticed that on the system that I’m on, which has a system CTK 12.3 and I installed CTK 12.9 from conda, the pathfinder from either cuda.bindings 12.9.0 or cuda.pathfinder 1.0.0 would pick up nvJitLink 12.3 (the system one) instead of 12.9 (the conda one), which is not following the behavior that we documented.
I suspect that the logic in
_load_nvidia_dynamic_library_no_cachemight be wrong:because in
_find_nvidia_dynamic_librarywe always do this on Linux:meaning we don’t search with the full soname (
libnvJitLink.so.12), but the symlink name (libnvJitLink.so), which conda does not provide if we only install the libnvjitlink package and not the libnvjitlink-dev package.Therefore, the
load_with_system_searchfunction behaves wrong because we fed it a wrong soname.How to Reproduce
I think a simple reproducer would be:
Expected behavior
The conda
.soshould be found.Operating System
No response
nvidia-smi output
No response