Skip to content

Commit 86adb00

Browse files
rparolinclaude
andcommitted
[FEA]: Add per-library path override env var to cuda.pathfinder
Adds CUDA_PATHFINDER_<LIBNAME_UPPER>_PATH_OVERRIDE as a developer escape hatch for pointing cuda.pathfinder at a custom build of a specific library (e.g. a development branch of nvshmem) without having to remove the wheel or copy .so files into site-packages. The value can be either an absolute file path (used as-is) or a directory (searched with the same platform logic as conda / CUDA_PATH). The override has the highest priority and applies uniformly to CTK, third-party, and driver libraries. If the env var is set but the library cannot be resolved from it, the load fails immediately rather than silently falling through to other search steps. This keeps the override behavior explicit and easy to debug. Closes #1054. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8a83a4f commit 86adb00

3 files changed

Lines changed: 148 additions & 11 deletions

File tree

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
SearchContext,
2525
derive_ctk_root,
2626
find_via_ctk_root,
27+
find_via_path_override,
2728
run_find_steps,
2829
)
2930
from cuda.pathfinder._dynamic_libs.subprocess_protocol import (
@@ -60,8 +61,13 @@ def _load_driver_lib_no_cache(desc: LibDescriptor) -> LoadedDL:
6061
Driver libs (libcuda, libnvidia-ml) are part of the display driver, not
6162
the CUDA Toolkit. They are expected to be discoverable via the platform's
6263
native loader mechanisms, so the full CTK search cascade (site-packages,
63-
conda, CUDA_PATH, canary) is unnecessary.
64+
conda, CUDA_PATH, canary) is unnecessary. The per-library override env
65+
var is still honored so developers can point at a custom build.
6466
"""
67+
override_ctx = SearchContext(desc)
68+
override_find = find_via_path_override(override_ctx)
69+
if override_find is not None:
70+
return LOADER.load_with_abs_path(desc, override_find.abs_path, override_find.found_via)
6571
loaded = LOADER.check_if_already_loaded_from_elsewhere(desc, False)
6672
if loaded is not None:
6773
return loaded
@@ -221,23 +227,37 @@ def load_nvidia_dynamic_lib(libname: str) -> LoadedDL:
221227
RuntimeError: If Python is not 64-bit.
222228
223229
Search order:
224-
0. **Already loaded in the current process**
230+
0. **Per-library path override (developer escape hatch)**
231+
232+
- If ``CUDA_PATHFINDER_<LIBNAME_UPPER>_PATH_OVERRIDE`` is set, it
233+
takes precedence over every other source. The value may be either
234+
an absolute path to the library file or a directory containing it
235+
(searched with the same logic as other anchor-based steps). For
236+
``libname="nvshmem_host"`` the variable is
237+
``CUDA_PATHFINDER_NVSHMEM_HOST_PATH_OVERRIDE``.
238+
239+
If the override is set but the library cannot be resolved from
240+
it, the load fails immediately rather than silently falling
241+
through. This makes the override behavior explicit and easy to
242+
debug.
243+
244+
1. **Already loaded in the current process**
225245
226246
- If a matching library is already loaded by some other component,
227247
return its absolute path and handle and skip the rest of the search.
228248
229-
1. **NVIDIA Python wheels**
249+
2. **NVIDIA Python wheels**
230250
231251
- Scan installed distributions (``site-packages``) to find libraries
232252
shipped in NVIDIA wheels.
233253
234-
2. **Conda environment**
254+
3. **Conda environment**
235255
236256
- Conda installations are discovered via ``CONDA_PREFIX``, which is
237257
defined automatically in activated conda environments (see
238258
https://docs.conda.io/projects/conda-build/en/stable/user-guide/environment-variables.html).
239259
240-
3. **OS default mechanisms**
260+
4. **OS default mechanisms**
241261
242262
- Fall back to the native loader:
243263
@@ -253,30 +273,31 @@ def load_nvidia_dynamic_lib(libname: str) -> LoadedDL:
253273
As a result, the native DLL search used here does **not** include
254274
the system ``PATH``.
255275
256-
4. **Environment variables**
276+
5. **Environment variables**
257277
258278
- If set, use ``CUDA_PATH`` or ``CUDA_HOME`` (in that order).
259279
On Windows, this is the typical way system-installed CTK DLLs are
260280
located. Note that the NVIDIA CTK installer automatically
261281
adds ``CUDA_PATH`` to the system-wide environment.
262282
263-
5. **CTK root canary probe (discoverable libs only)**
283+
6. **CTK root canary probe (discoverable libs only)**
264284
265285
- For selected libraries whose shared object doesn't reside on the
266286
standard linker path (currently ``nvvm``), attempt to derive CTK
267287
root by system-loading a well-known CTK canary library in a
268288
subprocess and then searching relative to that root. On Windows,
269289
the canary uses the same native ``LoadLibraryExW`` semantics as
270-
step 3, so there is also no ``PATH``-based discovery.
290+
step 4, so there is also no ``PATH``-based discovery.
271291
272292
**Driver libraries** (``"cuda"``, ``"nvml"``):
273293
274294
These are part of the NVIDIA display driver (not the CUDA Toolkit) and
275295
are expected to be reachable via the native OS loader path. For these
276296
libraries the search is simplified to:
277297
278-
0. Already loaded in the current process
279-
1. OS default mechanisms (``dlopen`` / ``LoadLibraryExW``)
298+
0. Per-library path override (``CUDA_PATHFINDER_<LIBNAME>_PATH_OVERRIDE``)
299+
1. Already loaded in the current process
300+
2. OS default mechanisms (``dlopen`` / ``LoadLibraryExW``)
280301
281302
The CTK-specific steps (site-packages, conda, ``CUDA_PATH``, canary
282303
probe) are skipped entirely.

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/search_steps.py

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,54 @@ def find_via_ctk_root(ctx: SearchContext, ctk_root: str) -> FindResult | None:
158158
# ---------------------------------------------------------------------------
159159

160160

161+
_PATH_OVERRIDE_ENV_PREFIX = "CUDA_PATHFINDER_"
162+
_PATH_OVERRIDE_ENV_SUFFIX = "_PATH_OVERRIDE"
163+
164+
165+
def path_override_env_var(libname: str) -> str:
166+
"""Return the per-library override environment variable name."""
167+
return f"{_PATH_OVERRIDE_ENV_PREFIX}{libname.upper()}{_PATH_OVERRIDE_ENV_SUFFIX}"
168+
169+
170+
def find_via_path_override(ctx: SearchContext) -> FindResult | None:
171+
"""Resolve a library via the per-library override environment variable.
172+
173+
The variable name is ``CUDA_PATHFINDER_<LIBNAME_UPPER>_PATH_OVERRIDE``.
174+
175+
Value semantics:
176+
- Unset or empty: this step is a no-op and returns ``None``.
177+
- Path to an existing regular file: used as the resolved library file.
178+
- Path to an existing directory: searched for the library file using the
179+
same platform logic as other anchor-based steps.
180+
- Anything else (path does not exist, directory has no matching library):
181+
raises :class:`DynamicLibNotFoundError`. An explicit override that fails
182+
to resolve must not silently fall through to other search steps.
183+
"""
184+
env_var = path_override_env_var(ctx.libname)
185+
override = os.environ.get(env_var)
186+
if not override:
187+
return None
188+
189+
found_via = f"override({env_var})"
190+
191+
if os.path.isfile(override):
192+
return FindResult(os.path.normpath(override), found_via)
193+
194+
if os.path.isdir(override):
195+
abs_path = _find_using_lib_dir(ctx, override)
196+
if abs_path is not None:
197+
return FindResult(abs_path, found_via)
198+
err = ", ".join(ctx.error_messages) or f"no matching file under {override!r}"
199+
att = "\n".join(ctx.attachments)
200+
raise DynamicLibNotFoundError(
201+
f'{env_var}={override!r} is set but {ctx.lib_searched_for!r} was not found there: {err}\n{att}'
202+
)
203+
204+
raise DynamicLibNotFoundError(
205+
f'{env_var}={override!r} is set but the path does not exist as a file or directory.'
206+
)
207+
208+
161209
def find_in_site_packages(ctx: SearchContext) -> FindResult | None:
162210
"""Search pip wheel install locations."""
163211
rel_dirs = ctx.platform.site_packages_rel_dirs(ctx.desc)
@@ -208,7 +256,9 @@ def find_in_cuda_path(ctx: SearchContext) -> FindResult | None:
208256
# ---------------------------------------------------------------------------
209257

210258
#: Find steps that run before the already-loaded check and system search.
211-
EARLY_FIND_STEPS: tuple[FindStep, ...] = (find_in_site_packages, find_in_conda)
259+
#: The path-override step has the highest priority and fails loudly if the
260+
#: override env var is set but the library cannot be resolved from it.
261+
EARLY_FIND_STEPS: tuple[FindStep, ...] = (find_via_path_override, find_in_site_packages, find_in_conda)
212262

213263
#: Find steps that run after system search fails.
214264
LATE_FIND_STEPS: tuple[FindStep, ...] = (find_in_cuda_path,)

cuda_pathfinder/tests/test_search_steps.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
find_in_conda,
2222
find_in_cuda_path,
2323
find_in_site_packages,
24+
find_via_path_override,
25+
path_override_env_var,
2426
run_find_steps,
2527
)
2628

@@ -440,3 +442,67 @@ def test_nvvm_cuda_home_linux(self, mocker, tmp_path):
440442
assert result is not None
441443
assert result.abs_path == str(so_file)
442444
assert result.found_via == "CUDA_PATH"
445+
446+
447+
# ---------------------------------------------------------------------------
448+
# find_via_path_override
449+
# ---------------------------------------------------------------------------
450+
451+
452+
class TestPathOverrideEnvVar:
453+
def test_uppercases_libname(self):
454+
assert path_override_env_var("cudart") == "CUDA_PATHFINDER_CUDART_PATH_OVERRIDE"
455+
456+
def test_preserves_underscore(self):
457+
assert path_override_env_var("nvshmem_host") == "CUDA_PATHFINDER_NVSHMEM_HOST_PATH_OVERRIDE"
458+
459+
def test_uppercases_mixed_case(self):
460+
assert path_override_env_var("cublasLt") == "CUDA_PATHFINDER_CUBLASLT_PATH_OVERRIDE"
461+
462+
463+
class TestFindViaPathOverride:
464+
def test_unset_returns_none(self, monkeypatch):
465+
monkeypatch.delenv("CUDA_PATHFINDER_CUDART_PATH_OVERRIDE", raising=False)
466+
assert find_via_path_override(_ctx()) is None
467+
468+
def test_empty_returns_none(self, monkeypatch):
469+
monkeypatch.setenv("CUDA_PATHFINDER_CUDART_PATH_OVERRIDE", "")
470+
assert find_via_path_override(_ctx()) is None
471+
472+
def test_file_path_used_as_is(self, monkeypatch, tmp_path):
473+
so_file = tmp_path / "libcudart.so.99"
474+
so_file.touch()
475+
monkeypatch.setenv("CUDA_PATHFINDER_CUDART_PATH_OVERRIDE", str(so_file))
476+
result = find_via_path_override(_ctx())
477+
assert result is not None
478+
assert result.abs_path == os.path.normpath(str(so_file))
479+
assert result.found_via == "override(CUDA_PATHFINDER_CUDART_PATH_OVERRIDE)"
480+
481+
def test_directory_searched_linux(self, monkeypatch, tmp_path):
482+
so_file = tmp_path / "libcudart.so"
483+
so_file.touch()
484+
monkeypatch.setenv("CUDA_PATHFINDER_CUDART_PATH_OVERRIDE", str(tmp_path))
485+
result = find_via_path_override(_ctx(platform=LinuxSearchPlatform()))
486+
assert result is not None
487+
assert result.abs_path == str(so_file)
488+
assert result.found_via == "override(CUDA_PATHFINDER_CUDART_PATH_OVERRIDE)"
489+
490+
def test_nonexistent_path_raises(self, monkeypatch, tmp_path):
491+
bogus = tmp_path / "does-not-exist"
492+
monkeypatch.setenv("CUDA_PATHFINDER_CUDART_PATH_OVERRIDE", str(bogus))
493+
with pytest.raises(DynamicLibNotFoundError, match="does not exist"):
494+
find_via_path_override(_ctx())
495+
496+
def test_directory_without_lib_raises(self, monkeypatch, tmp_path):
497+
monkeypatch.setenv("CUDA_PATHFINDER_CUDART_PATH_OVERRIDE", str(tmp_path))
498+
with pytest.raises(DynamicLibNotFoundError, match="was not found there"):
499+
find_via_path_override(_ctx(platform=LinuxSearchPlatform()))
500+
501+
def test_per_lib_isolation(self, monkeypatch, tmp_path):
502+
# Override for nvshmem_host must not affect cudart lookups.
503+
monkeypatch.setenv("CUDA_PATHFINDER_NVSHMEM_HOST_PATH_OVERRIDE", str(tmp_path / "nope"))
504+
monkeypatch.delenv("CUDA_PATHFINDER_CUDART_PATH_OVERRIDE", raising=False)
505+
assert find_via_path_override(_ctx()) is None
506+
507+
def test_runs_first_in_early_steps(self):
508+
assert EARLY_FIND_STEPS[0] is find_via_path_override

0 commit comments

Comments
 (0)