Skip to content

Commit 6121573

Browse files
cpcloudcursoragentrwgk
authored
refactor(pathfinder): descriptor-driven library discovery and loading (NVIDIA#1685)
* refactor(pathfinder): introduce LibDescriptor and registry Add a per-library descriptor dataclass that consolidates all metadata (sonames, DLLs, site-packages paths, dependencies, loader flags) into a single frozen object. The registry is built at import time from the existing data tables -- zero behavioral change. 291 parametrized tests verify the registry is a faithful representation of the source dicts. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): extract composable search steps Introduce SearchContext and FindStep to replace the monolithic finder class. Each search mechanism (site-packages, conda, CUDA_HOME) becomes a standalone step with a uniform (SearchContext) -> FindResult | None signature. Keep already-loaded handling and dependency loading as orchestration concerns. Delete the old find_nvidia_dynamic_lib module after migrating its logic. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): make anchor-point dirs descriptor-driven Add per-platform anchor-relative directory lists to LibDescriptor and use them for CUDA_HOME/conda anchor resolution. This removes special-case branching (e.g. nvvm) from the anchor-point search. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): thread LibDescriptor through loader layer Update the platform-specific loader code to consume LibDescriptor directly instead of consulting supported_nvidia_libs tables at runtime. This makes the loading path data-driven (desc.linux_sonames/windows_dlls, desc.requires_* flags, desc.dependencies). Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): add and simplify PlatformLoader seam Introduce the platform loader boundary for dlopen calls and fold the immediate wrapper cleanup into the same change so loader dispatch stays straightforward while preserving behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): add SearchPlatform seam for search steps Introduce search_platform.py, exporting a single PLATFORM instance that implements the per-OS filesystem search behavior. search_steps routes all platform differences through SearchContext.platform, removing OS branching from the search step implementations. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): inline SearchPlatform lib-dir lookup helpers Inline single-use lib-dir lookup helpers into the platform implementations to reduce helper surface area while keeping shared rel-dir scanning helpers. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): remove unused LibDescriptor properties Drop the platform-dispatch convenience properties that became unused after introducing PlatformLoader/SearchPlatform. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): add authored descriptor catalog and parity tests Add a canonical descriptor catalog module that contains one DescriptorSpec per supported dynamic library. Add exhaustive parity tests asserting the catalog matches the current LIB_DESCRIPTORS registry field-for-field before runtime wiring is flipped. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): build LIB_DESCRIPTORS from authored catalog Switch lib_descriptor.py from "assemble-from-supported tables" to "registry-from-authored catalog". Keep backward-compatible names (`LibDescriptor`, `Strategy`, and `LIB_DESCRIPTORS`) while making descriptor_catalog the canonical source. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): derive legacy tables from descriptor catalog Replace the hand-authored supported_nvidia_libs tables with compatibility constants derived from DESCRIPTOR_CATALOG while preserving historical export names and behaviors. This makes descriptor data the single authored source and keeps supported_nvidia_libs as a derived-views shim for existing imports. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(pathfinder): tighten refactor follow-ups across search and tests Consolidate post-refactor fixes for driver-lib test alignment, platform search-path edge cases, and typing/import cleanup so behavior and diagnostics remain stable. Co-authored-by: Cursor <cursoragent@cursor.com> * test(pathfinder): rewrite tautological catalog tests as structural invariants The previous tests compared catalog entries against LIB_DESCRIPTORS, which is built directly from the same catalog -- always passing by construction. Replace with parametrized checks that verify real properties: name uniqueness, valid identifiers, strategy values, dependency graph integrity, soname/dll format, and driver lib constraints. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): trim descriptor catalog defaults and import layout Simplify catalog entries by relying on DescriptorSpec defaults and fold companion import-order cleanup into the same readability-focused change. EOF && git cherry-pick -n 5d5547f a804f26c4 && git commit --trailer "Co-authored-by: Cursor <cursoragent@cursor.com>" -F - <<'EOF' refactor(pathfinder): split add-nv-library core flow from optional UI Keep add-nv-library lightweight by default with prompt/CLI-first behavior while moving Textual chrome behind explicit UI tasks and lockfile feature splits. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(pathfinder): restore descriptor-driven CTK canary fallback Reinstate CTK-root canary discovery in the refactored loader path and define canary eligibility/anchors on per-library descriptors so fallback policy lives with the rest of library metadata. Co-authored-by: Cursor <cursoragent@cursor.com> * style(pathfinder): fix pre-commit formatting and mypy return type Co-authored-by: Cursor <cursoragent@cursor.com> * feat(toolshed): add shared catalog writer for descriptor_catalog.py Provides helpers to import, update, and rewrite the descriptor catalog file in place. Each toolshed extraction script can merge its findings (sonames, DLLs, site-packages paths) into the authored catalog without touching fields it doesn't own. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(toolshed): update extraction scripts to write descriptor_catalog.py in place Replace print-to-stdout workflow (manual copy-paste) with direct in-place catalog updates via the shared _catalog_writer module. - build_pathfinder_sonames.py: scans directories for .so files and extracts SONAMEs via readelf (replaces find_sonames.sh pipeline) - build_pathfinder_dlls.py: parses 7z listings, updates windows_dlls - make_site_packages_libdirs.py: parses collected paths, updates site_packages_linux or site_packages_windows All three scripts now derive the set of known library names from the catalog itself, eliminating the duplicated LIBNAMES_IN_SCOPE constant. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(toolshed): add unified update_catalog.py entry point Platform-aware dispatcher: runs sonames extraction on Linux, DLL listing parsing on Windows. Single command to update descriptor_catalog.py from CTK installations. Made-with: Cursor * fix(pathfinder): use platform descriptor fields for supported libnames Prevent import-time failures after the descriptor-catalog refactor by deriving availability from linux/windows descriptor fields instead of removed helpers. Remove an unused catalog-writer variable so pre-commit remains green. Made-with: Cursor * refactor(pathfinder): rename descriptor strategy to packaged_with Use a more descriptive field name for library classification so descriptor semantics are explicit in runtime and toolshed code paths. This aligns terminology with review feedback while preserving behavior and compatibility. Made-with: Cursor * refactor(pathfinder): drop unused Strategy alias Remove the `Strategy = PackagedWith` alias from the descriptor catalog and related re-exports to avoid confusing, unused terminology. Made-with: Cursor --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>
1 parent bcba37c commit 6121573

23 files changed

Lines changed: 2147 additions & 1084 deletions

AGENTS.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# cuda_pathfinder agent instructions
2+
3+
You are working on `cuda_pathfinder`, a Python sub-package of the
4+
[cuda-python](https://github.com/NVIDIA/cuda-python) monorepo. It finds and
5+
loads NVIDIA dynamic libraries (CTK, third-party, and driver) across Linux and
6+
Windows.
7+
8+
## Workspace
9+
10+
The workspace root is `cuda_pathfinder/` inside the monorepo. Use the
11+
`working_directory` parameter on the Shell tool when you need the monorepo root
12+
(one level up).
13+
14+
## Conventions
15+
16+
- **Python**: all source is pure Python (no Cython in this sub-package).
17+
- **Testing**: `pytest` with `pytest-mock` (`mocker` fixture). Use
18+
`spawned_process_runner` for real-loading tests that need process isolation
19+
(dynamic linker state leaks across tests otherwise). Use the
20+
`info_summary_append` fixture to emit `INFO` lines visible in CI/QA logs.
21+
- **STRICTNESS env var**: `CUDA_PATHFINDER_TEST_LOAD_NVIDIA_DYNAMIC_LIB_STRICTNESS`
22+
controls whether missing libs are tolerated (`see_what_works`, default) or
23+
fatal (`all_must_work`).
24+
- **Formatting/linting**: rely on pre-commit (runs automatically on commit). Do
25+
not run formatters manually.
26+
- **Imports**: use `from cuda.pathfinder._dynamic_libs...` for internal imports
27+
in tests; public API is `from cuda.pathfinder import load_nvidia_dynamic_lib`.
28+
29+
## Testing guidelines
30+
31+
- **Real tests over mocks**: mocks are fine for hard-to-reach branches (e.g.
32+
24-bit Python), but every loading path must also have a real-loading test that
33+
runs in a spawned child process. Track results with `INFO` lines so CI logs
34+
show what actually loaded.
35+
- **No real lib names in negative tests**: when parametrizing unsupported /
36+
invalid libnames, use obviously fake names (`"bogus"`, `"not_a_real_lib"`)
37+
to avoid confusion when searching the codebase.
38+
- **`functools.cache` awareness**: `load_nvidia_dynamic_lib` is cached. Tests
39+
that patch internals it depends on must call
40+
`load_nvidia_dynamic_lib.cache_clear()` first, or use a child process for
41+
isolation.
42+
43+
## Key modules
44+
45+
- `cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py` -- main entry
46+
point and dispatch logic (CTK vs driver).
47+
- `cuda/pathfinder/_dynamic_libs/supported_nvidia_libs.py` -- canonical
48+
registry of sonames, DLLs, site-packages paths, and dependencies.
49+
- `cuda/pathfinder/_dynamic_libs/find_nvidia_dynamic_lib.py` -- CTK search
50+
cascade (site-packages, conda, CUDA_HOME).
51+
- `tests/child_load_nvidia_dynamic_lib_helper.py` -- lightweight helper
52+
imported by spawned child processes (avoids re-importing the full test
53+
module).
54+
55+
### Fix all code review findings from lib-descriptor-refactor review
56+
57+
**Request:** Fix all 8 findings from the external code review.
58+
59+
**Actions (in worktree `cuda_pathfinder_refactor`):**
60+
1. `search_steps.py`: Restored `os.path.normpath(dirname)` in
61+
`_find_lib_dir_using_anchor` (regression from pre-refactor fix). Added
62+
`NoReturn` annotation to `raise_not_found`.
63+
2. `search_platform.py`: Guarded `os.listdir(lib_dir)` in
64+
`WindowsSearchPlatform.find_in_lib_dir` with `os.path.isdir` check to
65+
prevent crash on missing directory.
66+
3. `test_descriptor_catalog.py`: Rewrote tautological tests as structural
67+
invariant tests (uniqueness, valid names, strategy values, dep graph,
68+
soname/dll format, driver lib constraints). 237 new parametrized cases.
69+
4. `platform_loader.py`: Eliminated `WindowsLoader`/`LinuxLoader` boilerplate
70+
classes — assign the platform module directly as `LOADER`. Removed stale
71+
`type: ignore`.
72+
5. `descriptor_catalog.py`: Trimmed default-valued fields from all entries,
73+
added `# ---` section comments (CTK / third-party / driver).
74+
6. `load_nvidia_dynamic_lib.py`: Fixed import layout — `TYPE_CHECKING` block
75+
now properly separated after unconditional imports.
76+
77+
All 742 tests pass, all pre-commit hooks green.

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/canary_probe_subprocess.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,17 @@
44

55
import json
66

7+
from cuda.pathfinder._dynamic_libs.lib_descriptor import LIB_DESCRIPTORS
78
from cuda.pathfinder._dynamic_libs.load_dl_common import DynamicLibNotFoundError, LoadedDL
8-
from cuda.pathfinder._utils.platform_aware import IS_WINDOWS
9-
10-
if IS_WINDOWS:
11-
from cuda.pathfinder._dynamic_libs.load_dl_windows import load_with_system_search
12-
else:
13-
from cuda.pathfinder._dynamic_libs.load_dl_linux import load_with_system_search
9+
from cuda.pathfinder._dynamic_libs.platform_loader import LOADER
1410

1511

1612
def _probe_canary_abs_path(libname: str) -> str | None:
13+
desc = LIB_DESCRIPTORS.get(libname)
14+
if desc is None:
15+
raise ValueError(f"Unsupported canary library name: {libname!r}")
1716
try:
18-
loaded: LoadedDL | None = load_with_system_search(libname)
17+
loaded: LoadedDL | None = LOADER.load_with_system_search(desc)
1918
except DynamicLibNotFoundError:
2019
return None
2120
if loaded is None:

0 commit comments

Comments
 (0)