cuda.core.system: Add basic Nvlink and Utilization support#1918
cuda.core.system: Add basic Nvlink and Utilization support#1918mdboom wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
|
ac86822 to
039013e
Compare
rwgk
left a comment
There was a problem hiding this comment.
Generated with the help of Cursor GPT-5.4 Extra High Fast
Manually verified.
Medium: Invalid NVLink indices are accepted and fail late
Device.nvlink() currently accepts negative or out-of-range link indices and
returns NvlinkInfo without validating them first. That differs from existing
indexed accessors such as Device.fan(), which validate eagerly. In practice,
device.nvlink(-1) constructs successfully and only fails later when a
property such as .version is accessed, which turns a basic argument error
into a delayed runtime failure.
Relevant paths:
cuda_core/cuda/core/system/_device.pyx:585cuda_core/cuda/core/system/_device.pyx:683cuda_core/cuda/core/system/_nvlink.pxi
Low: NvlinkInfo.version documents a non-existent return type
The public enum exported by cuda.core.system is NvlinkVersion, and the API
index plus tests use that spelling, but NvlinkInfo.version is annotated and
documented as NvLinkVersion. That leaks a wrong type name into the generated
help/doc output and points users at a symbol that does not exist.
Relevant paths:
cuda_core/cuda/core/system/_nvlink.pxi:21cuda_core/docs/source/api.rst:225cuda_core/tests/system/test_system_device.py:747
Low: NvlinkInfo.state has no direct test coverage
The new test_nvlink() checks construction of NvlinkInfo and accesses
.version, but it never reads .state. As a result, the wrapper path behind
NvlinkInfo.state has no direct coverage even on systems where the test does
not skip.
Relevant paths:
cuda_core/cuda/core/system/_nvlink.pxi:35cuda_core/tests/system/test_system_device.py:734
|
Thanks for having your agent fight with my agent, @rwgk. ;) |
These APIs are needed by rapidsai/jupterlab-nvdashboard and rapidsai/rapids-cli