Skip to content

[FEA]: Add support for pathfinder.find_bitcode_lib("nccl_device")#1975

Merged
rwgk merged 4 commits intoNVIDIA:mainfrom
xiakun-lu:nccl-bitcode
Apr 25, 2026
Merged

[FEA]: Add support for pathfinder.find_bitcode_lib("nccl_device")#1975
rwgk merged 4 commits intoNVIDIA:mainfrom
xiakun-lu:nccl-bitcode

Conversation

@xiakun-lu
Copy link
Copy Markdown

@xiakun-lu xiakun-lu commented Apr 24, 2026

Closes #1831

Description

Adds support for cuda.pathfinder.find_bitcode_lib("nccl")

The new bitcode descriptor resolves NCCL device bitcode as libnccl_device.bc under nvidia/nccl/lib, matching the NCCL wheel package layout. The API name is nccl, consistent with the existing pathfinder naming used for NCCL dynamic library discovery.

The bitcode search-order test now covers all entries in SUPPORTED_BITCODE_LIBS, including nccl, while accounting for descriptors that define multiple site-packages candidate directories.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Xiakun Lu <xiakunl@nvidia.com>
@github-actions github-actions Bot added the cuda.pathfinder Everything related to the cuda.pathfinder module label Apr 24, 2026
@github-actions

This comment has been minimized.

@leofang leofang requested a review from rwgk April 24, 2026 23:59
@rparolin rparolin self-requested a review April 25, 2026 00:37
@rparolin
Copy link
Copy Markdown
Collaborator

lgtm, but I defer to @rwgk for the final approval.

@leofang
Copy link
Copy Markdown
Member

leofang commented Apr 25, 2026

Q: is this for #1831?

Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! My main concern is nccl vs nccl_device, it'd be inconsistent as-is.

The other suggestion is minor, but also very easy.

As a sanity check I looked at the CI test logs. It looks good, e.g.:

INFO test_locate_bitcode_lib[nccl]: lib_path='/opt/hostedtoolcache/Python/3.10.20/x64/lib/python3.10/site-packages/nvidia/nccl/lib/libnccl_device.bc'

"site_packages_dirs": ("nvidia/nvshmem/lib",),
"available_on_windows": False,
},
"nccl": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make this nccl_device to match the existing nvshmem_device convention, and move the new entry before nvshmem_device to keep this list in lexicographic order?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see nccl, could you please make it nccl_device for internal consistency?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In list DESCRIPTOR_CATALOG of dynamic libs, the name of nvshmem is "nvshmem_host" and the name of nccl is "nccl", which reflect the .so filenames libnvshmem_host.so.3 and libnccl.so.2. nccl as a key in _SUPPORTED_BITCODE_LIBS_INFO doesn't match nvshmem, but it's consistent with nccl's name in dynamic lib list DESCRIPTOR_CATALOG. I'd like to keep the key simple for nccl, so I don't have to think about it should end with _device or _host when I call these pathfinder APIs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I don't have to think

It's gonna be surprising/confusing for anyone else, especially because the filename is libnccl_device.bc. Unless there is more than a personal preference (?), nccl_device is clearly the more consistent name here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching filename libnccl_device.bc is a good point.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



def _make_bitcode_lib_file(dir_path: Path) -> str:
def _make_bitcode_lib_file(dir_path: Path, libname: str = "device") -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safer and more readable to avoid the defaults here and below. I tried out the change below. Could you please apply?

@@ -36,14 +36,14 @@ def clear_find_bitcode_lib_cache():
     get_cuda_path_or_home.cache_clear()


-def _make_bitcode_lib_file(dir_path: Path, libname: str = "device") -> str:
+def _make_bitcode_lib_file(dir_path: Path, libname: str) -> str:
     dir_path.mkdir(parents=True, exist_ok=True)
     file_path = dir_path / _bitcode_lib_filename(libname)
     file_path.touch()
     return str(file_path)


-def _bitcode_lib_dir_under(anchor_dir: Path, libname: str = "device") -> Path:
+def _bitcode_lib_dir_under(anchor_dir: Path, libname: str) -> Path:
     return anchor_dir / _bitcode_lib_info(libname)["rel_path"]


@@ -138,7 +138,7 @@ def test_locate_bitcode_lib_search_order(monkeypatch, tmp_path, libname):
 @pytest.mark.usefixtures("clear_find_bitcode_lib_cache")
 def test_find_bitcode_lib_not_found_error_includes_cuda_home_directory_listing(monkeypatch, tmp_path):
     cuda_home = tmp_path / "cuda-home"
-    lib_dir = _bitcode_lib_dir_under(cuda_home)
+    lib_dir = _bitcode_lib_dir_under(cuda_home, "device")
     lib_dir.mkdir(parents=True, exist_ok=True)
     extra_file = lib_dir / "README.txt"
     extra_file.write_text("placeholder", encoding="utf-8")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied.

@rwgk rwgk added the P0 High priority - Must do! label Apr 25, 2026
@rwgk rwgk added this to the cuda.pathfinder next milestone Apr 25, 2026
@rwgk rwgk added the feature New feature or request label Apr 25, 2026
…ode_lib_dir_under

Signed-off-by: Xiakun Lu <xiakunl@nvidia.com>
Signed-off-by: Xiakun Lu <xiakunl@nvidia.com>
Signed-off-by: Xiakun Lu <xiakunl@nvidia.com>
Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Will it be helpful if this is released soon? (It's easy, <30 minutes effort.)

@rwgk rwgk enabled auto-merge (squash) April 25, 2026 05:56
@xiakun-lu xiakun-lu changed the title [FEA]: Add support for pathfinder.find_bitcode_lib("nccl") [FEA]: Add support for pathfinder.find_bitcode_lib("nccl_device") Apr 25, 2026
@xiakun-lu
Copy link
Copy Markdown
Author

Thanks!

Will it be helpful if this is released soon? (It's easy, <30 minutes effort.)

definitely helpful! would appreciate it.

@rwgk rwgk merged commit 8a83a4f into NVIDIA:main Apr 25, 2026
181 of 183 checks passed
@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.pathfinder Everything related to the cuda.pathfinder module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA]: Add support for pathfinder.find_bitcode_lib("nccl_device")

4 participants