Skip to content

Commit f440fa9

Browse files
authored
Merge branch 'NVIDIA:main' into main
2 parents 8bdbff0 + d393729 commit f440fa9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2858
-1456
lines changed

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ci:
1515
# pre-commit autoupdate --freeze
1616
repos:
1717
- repo: https://github.com/astral-sh/ruff-pre-commit
18-
rev: 5ba58aca0bd5bc7c0e1c0fc45af2e88d6a2bde83 # frozen: v0.14.10
18+
rev: c60c980e561ed3e73101667fe8365c609d19a438 # frozen: v0.15.9
1919
hooks:
2020
- id: ruff-check
2121
args: [--fix, --show-fixes]
@@ -76,22 +76,22 @@ repos:
7676
- id: rst-inline-touching-normal
7777

7878
- repo: https://github.com/pre-commit/mirrors-mypy
79-
rev: a66e98df7b4aeeb3724184b332785976d062b92e # frozen: v1.19.1
79+
rev: 8e5c80792e2ec0c87804d8ef915bf35e2caea6da # frozen: v1.20.0
8080
hooks:
8181
- id: mypy
8282
name: mypy-pathfinder
8383
files: ^cuda_pathfinder/cuda/.*\.py$ # Exclude tests directory
8484
args: [--config-file=cuda_pathfinder/pyproject.toml]
8585

8686
- repo: https://github.com/rhysd/actionlint
87-
rev: "0933c147c9d6587653d45fdcb4c497c57a65f9af" # frozen: v1.7.10
87+
rev: "914e7df21a07ef503a81201c76d2b11c789d3fca" # frozen: v1.7.12
8888
hooks:
8989
- id: actionlint
9090
args: ["-shellcheck="]
9191
exclude: ^\.github/workflows/coverage.yml$
9292

9393
- repo: https://github.com/MarcoGorelli/cython-lint
94-
rev: "d9ff7ce99ef4f2ae8fba93079ca9d76c4651d4ac" # frozen: v0.18.0
94+
rev: "7c6152f6c8f9087684ff2e09a9227941e233bafb" # frozen: v0.19.0
9595
hooks:
9696
- id: cython-lint
9797
args: [--no-pycodestyle]

context7.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
{
2-
"description": "Python access to NVIDIA's CUDA platform, including cuda.bindings (low-level CUDA C API bindings), cuda.core (Pythonic CUDA runtime), and cuda.pathfinder (component discovery)",
32
"url": "https://context7.com/nvidia/cuda-python",
43
"public_key": "pk_gupaHhsdvsuT1j3BZpb7i"
54
}

cuda_core/cuda/core/__init__.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

@@ -31,12 +31,6 @@
3131
from cuda.core import system, utils
3232
from cuda.core._device import Device
3333
from cuda.core._event import Event, EventOptions
34-
from cuda.core._graph import (
35-
Graph,
36-
GraphBuilder,
37-
GraphCompleteOptions,
38-
GraphDebugPrintOptions,
39-
)
4034
from cuda.core._graphics import GraphicsResource
4135
from cuda.core._launch_config import LaunchConfig
4236
from cuda.core._launcher import launch
@@ -69,3 +63,12 @@
6963
StreamOptions,
7064
)
7165
from cuda.core._tensor_map import TensorMapDescriptor, TensorMapDescriptorOptions
66+
from cuda.core.graph import (
67+
Condition,
68+
Graph,
69+
GraphAllocOptions,
70+
GraphBuilder,
71+
GraphCompleteOptions,
72+
GraphDebugPrintOptions,
73+
GraphDef,
74+
)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Handle and Object Registries
2+
3+
When Python-managed objects round-trip through the CUDA driver (e.g.,
4+
querying a graph's nodes and getting back raw `CUgraphNode` pointers),
5+
we need to recover the original Python object rather than creating a
6+
duplicate.
7+
8+
This document describes the approach used to achieve this. The pattern
9+
is driven mainly by needs arising in the context of CUDA graphs, but
10+
it is general and can be extended to other object types as needs arise.
11+
12+
This solves the same problem as pybind11's `registered_instances` map
13+
and is sometimes called the Identity Map pattern. Two registries work
14+
together to map a raw driver handle all the way back to the original
15+
Python object. Both use weak references so they
16+
do not prevent cleanup. Entries are removed either explicitly (via
17+
`destroy()` or a Box destructor) or implicitly when the weak reference
18+
expires.
19+
20+
## Level 1: Driver Handle -> Resource Handle (C++)
21+
22+
`HandleRegistry` in `resource_handles.cpp` maps a raw CUDA handle
23+
(e.g., `CUevent`, `CUkernel`, `CUgraphNode`) to the `weak_ptr` that
24+
owns it. When a `_ref` constructor receives a raw handle, it
25+
checks the registry first. If found, it returns the existing
26+
`shared_ptr`, preserving the Box and its metadata (e.g., `EventBox`
27+
carries timing/IPC flags, `KernelBox` carries the library dependency).
28+
29+
Without this level, a round-tripped handle would produce a new Box
30+
with default metadata, losing information that was set at creation.
31+
32+
Instances: `event_registry`, `kernel_registry`, `graph_node_registry`.
33+
34+
## Level 2: Resource Handle -> Python Object (Cython)
35+
36+
`_node_registry` in `_graph_node.pyx` is a `WeakValueDictionary`
37+
mapping a resource address (`shared_ptr::get()`) to a Python
38+
`GraphNode` object. When `GraphNode._create` receives a handle from
39+
Level 1, it checks this registry. If found, it returns the existing
40+
Python object.
41+
42+
Without this level, each driver round-trip would produce a distinct
43+
Python object for the same logical node, resulting in surprising
44+
behavior:
45+
46+
```python
47+
a = g.empty()
48+
a.succ = {b}
49+
b2, = a.succ # queries driver, gets back CUgraphNode for b
50+
assert b2 is b # fails without Level 2 registry
51+
```

cuda_core/cuda/core/_cpp/resource_handles.cpp

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -174,13 +174,8 @@ class HandleRegistry {
174174
}
175175

176176
void unregister_handle(const Key& key) noexcept {
177-
try {
178-
std::lock_guard<std::mutex> lock(mutex_);
179-
auto it = map_.find(key);
180-
if (it != map_.end() && it->second.expired()) {
181-
map_.erase(it);
182-
}
183-
} catch (...) {}
177+
std::lock_guard<std::mutex> lock(mutex_);
178+
map_.erase(key);
184179
}
185180

186181
Handle lookup(const Key& key) {
@@ -393,6 +388,7 @@ ContextHandle get_event_context(const EventHandle& h) noexcept {
393388
return h ? get_box(h)->h_context : ContextHandle{};
394389
}
395390

391+
// See REGISTRY_DESIGN.md (Level 1: Driver Handle -> Resource Handle)
396392
static HandleRegistry<CUevent, EventHandle> event_registry;
397393

398394
EventHandle create_event_handle(const ContextHandle& h_ctx, unsigned int flags,
@@ -899,6 +895,7 @@ static const KernelBox* get_box(const KernelHandle& h) {
899895
);
900896
}
901897

898+
// See REGISTRY_DESIGN.md (Level 1: Driver Handle -> Resource Handle)
902899
static HandleRegistry<CUkernel, KernelHandle> kernel_registry;
903900

904901
KernelHandle create_kernel_handle(const LibraryHandle& h_library, const char* name) {
@@ -957,7 +954,7 @@ GraphHandle create_graph_handle_ref(CUgraph graph, const GraphHandle& h_parent)
957954

958955
namespace {
959956
struct GraphNodeBox {
960-
CUgraphNode resource;
957+
mutable CUgraphNode resource;
961958
GraphHandle h_graph;
962959
};
963960
} // namespace
@@ -969,15 +966,37 @@ static const GraphNodeBox* get_box(const GraphNodeHandle& h) {
969966
);
970967
}
971968

969+
// See REGISTRY_DESIGN.md (Level 1: Driver Handle -> Resource Handle)
970+
static HandleRegistry<CUgraphNode, GraphNodeHandle> graph_node_registry;
971+
972972
GraphNodeHandle create_graph_node_handle(CUgraphNode node, const GraphHandle& h_graph) {
973+
if (node) {
974+
if (auto h = graph_node_registry.lookup(node)) {
975+
return h;
976+
}
977+
}
973978
auto box = std::make_shared<const GraphNodeBox>(GraphNodeBox{node, h_graph});
974-
return GraphNodeHandle(box, &box->resource);
979+
GraphNodeHandle h(box, &box->resource);
980+
if (node) {
981+
graph_node_registry.register_handle(node, h);
982+
}
983+
return h;
975984
}
976985

977986
GraphHandle graph_node_get_graph(const GraphNodeHandle& h) noexcept {
978987
return h ? get_box(h)->h_graph : GraphHandle{};
979988
}
980989

990+
void invalidate_graph_node(const GraphNodeHandle& h) noexcept {
991+
if (h) {
992+
CUgraphNode node = get_box(h)->resource;
993+
if (node) {
994+
graph_node_registry.unregister_handle(node);
995+
}
996+
get_box(h)->resource = nullptr;
997+
}
998+
}
999+
9811000
// ============================================================================
9821001
// Graphics Resource Handles
9831002
// ============================================================================

cuda_core/cuda/core/_cpp/resource_handles.hpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,9 @@ GraphNodeHandle create_graph_node_handle(CUgraphNode node, const GraphHandle& h_
415415
// Extract the owning graph handle from a node handle.
416416
GraphHandle graph_node_get_graph(const GraphNodeHandle& h) noexcept;
417417

418+
// Zero the CUgraphNode resource inside the handle, marking it invalid.
419+
void invalidate_graph_node(const GraphNodeHandle& h) noexcept;
420+
418421
// ============================================================================
419422
// Graphics resource handle functions
420423
// ============================================================================

cuda_core/cuda/core/_device.pyx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

55
from __future__ import annotations
66

77
cimport cpython
8-
from libc.stdint cimport uintptr_t
98

109
from cuda.bindings cimport cydriver
1110
from cuda.core._utils.cuda_utils cimport HANDLE_RETURN
@@ -24,7 +23,6 @@ from cuda.core._resource_handles cimport (
2423
as_cu,
2524
)
2625

27-
from cuda.core._graph import GraphBuilder
2826
from cuda.core._stream import IsStreamT, Stream, StreamOptions
2927
from cuda.core._utils.clear_error_support import assert_type
3028
from cuda.core._utils.cuda_utils import (
@@ -1363,15 +1361,17 @@ class Device:
13631361
self._check_context_initialized()
13641362
handle_return(runtime.cudaDeviceSynchronize())
13651363

1366-
def create_graph_builder(self) -> GraphBuilder:
1367-
"""Create a new :obj:`~_graph.GraphBuilder` object.
1364+
def create_graph_builder(self) -> "GraphBuilder":
1365+
"""Create a new :obj:`~graph.GraphBuilder` object.
13681366

13691367
Returns
13701368
-------
1371-
:obj:`~_graph.GraphBuilder`
1369+
:obj:`~graph.GraphBuilder`
13721370
Newly created graph builder object.
13731371

13741372
"""
1373+
from cuda.core.graph._graph_builder import GraphBuilder
1374+
13751375
self._check_context_initialized()
13761376
return GraphBuilder._init(stream=self.create_stream(), is_stream_owner=True)
13771377

cuda_core/cuda/core/_graph/__init__.py

Lines changed: 0 additions & 19 deletions
This file was deleted.

cuda_core/cuda/core/_graph/_graph_def/__init__.py

Lines changed: 0 additions & 51 deletions
This file was deleted.

cuda_core/cuda/core/_launcher.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def launch(stream: Stream | GraphBuilder | IsStreamT, config: LaunchConfig, kern
2626
2727
Parameters
2828
----------
29-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`
29+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`
3030
The stream establishing the stream ordering semantic of a
3131
launch.
3232
config : :obj:`LaunchConfig`

0 commit comments

Comments
 (0)