Skip to content

Commit c2f79a1

Browse files
authored
Graph node follow-ups: repr, containment, empty(), registry docs (#1859)
* Reorganize graph test files for clarity Rename test files to reflect what they actually test: - test_basic -> test_graph_builder (stream capture tests) - test_conditional -> test_graph_builder_conditional - test_advanced -> test_graph_update (moved child_graph and stream_lifetime tests into test_graph_builder) - test_capture_alloc -> test_graph_memory_resource - test_explicit* -> test_graphdef* Made-with: Cursor * Enhance Graph.update() and add whole-graph update tests - Extend Graph.update() to accept both GraphBuilder and GraphDef sources - Surface CUgraphExecUpdateResultInfo details on update failure instead of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE message - Release the GIL during cuGraphExecUpdate via nogil block - Add parametrized happy-path test covering both GraphBuilder and GraphDef - Add error-case tests: unfinished builder, topology mismatch, wrong type Made-with: Cursor * Add AdjacencySet proxy for pred/succ and GraphNode.remove() Replace cached tuple-based pred/succ with mutable AdjacencySet backed by direct CUDA driver calls. Add GraphNode.remove() wrapping cuGraphDestroyNode. Made-with: Cursor * Add edge mutation support and MutableSet interface for GraphNode adjacencies Enable adding/removing edges between graph nodes via AdjacencySet (a MutableSet proxy on GraphNode.pred/succ), node removal via discard(), and property setters for bulk edge replacement. Includes comprehensive mutation and interface tests. Closes part of #1330 (step 2: edge mutation on GraphDef). Made-with: Cursor * Use requires_module mark for numpy version checks in mutation tests Replace inline skipif version check with requires_module(np, "2.1") from the shared test helpers, consistent with other test files. Made-with: Cursor * Fix empty-graph return type: return set() instead of () for nodes/edges Made-with: Cursor * Rename AdjacencySet to AdjacencySetProxy, add bulk ops and safety guards Rename class and file to AdjacencySetProxy to clarify write-through semantics. Add bulk-efficient clear(), __isub__(), __ior__() overrides and remove_edges() on the Cython core. Guard GraphNode.discard() against double-destroy via membership check. Filter duplicates in update(). Add error-path tests for wrong types, cross-graph edges, and self-edges. Made-with: Cursor * Add destroy() method with handle invalidation, remove GRAPH_NODE_SENTINEL Replace discard() with destroy() which calls cuGraphDestroyNode and then zeroes the CUgraphNode resource in the handle box via invalidate_graph_node_handle. This prevents stale memory access on destroyed nodes. Properties (type, pred, succ, handle) degrade gracefully to None/empty for destroyed nodes. Remove the GRAPH_NODE_SENTINEL (0x1) approach in favor of using NULL for both sentinels and destroyed nodes, which is simpler and avoids the risk of passing 0x1 to driver APIs that treat it as a valid pointer. Made-with: Cursor * Add GraphNode identity cache for stable Python object round-trips Nodes retrieved via GraphDef.nodes(), edges(), or pred/succ traversal now return the same Python object that was originally created, enabling identity checks with `is`. A C++ HandleRegistry deduplicates CUgraphNode handles, and a Cython WeakValueDictionary caches the Python wrapper objects. Made-with: Cursor * Purge node cache on destroy to prevent stale identity lookups Made-with: Cursor * Skip NULL nodes in graph_node_registry to fix sentinel identity collision Sentinel (entry) nodes use NULL as their CUgraphNode, so caching them under a NULL key caused all sentinels across different graphs to share the same handle. This made nodes built from the wrong graph's entry point, causing CUDA_ERROR_INVALID_VALUE for conditional nodes and hash collisions in equality tests. Made-with: Cursor * Unregister destroyed nodes from C++ graph_node_registry When a node is destroyed, the driver may reuse its CUgraphNode pointer for a new node. Without unregistering the old entry, the registry returns a stale handle pointing to the wrong node type and graph. Made-with: Cursor * Add dedicated test for node identity preservation through round-trips Made-with: Cursor * Add handle= to all GraphNode subclass __repr__ for debugging Every subclass repr now starts with handle=0x... (the CUgraphNode pointer) followed by type-specific identity/parameter data. Dynamic queries (pred counts, subnode counts) are removed in favor of deterministic, cheap fields. This makes set comparison failures in test output readable when debugging graph mutation tests. Made-with: Cursor * Rename _node_cache/_cached to _node_registry/_registered Aligns Python-side terminology with the C++ graph_node_registry. Made-with: Cursor * Fix unregister_handle and rename invalidate_graph_node_handle unregister_handle: remove the expired() guard that prevented erasure when the shared_ptr was still alive. This caused stale registry entries after destroy(), leading to CUDA_ERROR_INVALID_VALUE when the driver reused CUgraphNode pointer values. Rename invalidate_graph_node_handle -> invalidate_graph_node for consistency with the rest of the graph node API. Made-with: Cursor * Add cheap containment test and early type check for AdjacencySetProxy Add _AdjacencySetCore.contains() that checks membership by comparing raw CUgraphNode handles at the C level, avoiding Python object construction. Uses a 16-element stack buffer for a single driver call in the common case. Move the type check in update() inline next to the extend loop so invalid input is rejected immediately. Made-with: Cursor * Add GraphDef.empty(), stack-buffer query optimization, and registry test - Add GraphDef.empty() for creating entry-point empty nodes; replace all no-arg join() calls on GraphDef with empty() in tests. - Optimize _AdjacencySetCore.query() to use a 16-element stack buffer, matching the contains() optimization. - Add test_registry_cleanup exercising destroy(), graph deletion, and weak-reference cleanup of the node registry. Made-with: Cursor * Document the two-level handle and object registry design Add REGISTRY_DESIGN.md explaining how the C++ HandleRegistry (Level 1) and Cython _node_registry (Level 2) work together to preserve Python object identity through driver round-trips. Add cross-references at each registry instantiation site. Made-with: Cursor * Fix import formatting in test_registry_cleanup Made-with: Cursor * Optimize GraphDef.nodes() and edges() to try a single driver call Pre-allocate vectors to 128 entries and pass them on the first call. Only fall back to a second call if the graph exceeds 128 nodes/edges. Made-with: Cursor
1 parent 5064470 commit c2f79a1

File tree

10 files changed

+235
-84
lines changed

10 files changed

+235
-84
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Handle and Object Registries
2+
3+
When Python-managed objects round-trip through the CUDA driver (e.g.,
4+
querying a graph's nodes and getting back raw `CUgraphNode` pointers),
5+
we need to recover the original Python object rather than creating a
6+
duplicate.
7+
8+
This document describes the approach used to achieve this. The pattern
9+
is driven mainly by needs arising in the context of CUDA graphs, but
10+
it is general and can be extended to other object types as needs arise.
11+
12+
This solves the same problem as pybind11's `registered_instances` map
13+
and is sometimes called the Identity Map pattern. Two registries work
14+
together to map a raw driver handle all the way back to the original
15+
Python object. Both use weak references so they
16+
do not prevent cleanup. Entries are removed either explicitly (via
17+
`destroy()` or a Box destructor) or implicitly when the weak reference
18+
expires.
19+
20+
## Level 1: Driver Handle -> Resource Handle (C++)
21+
22+
`HandleRegistry` in `resource_handles.cpp` maps a raw CUDA handle
23+
(e.g., `CUevent`, `CUkernel`, `CUgraphNode`) to the `weak_ptr` that
24+
owns it. When a `_ref` constructor receives a raw handle, it
25+
checks the registry first. If found, it returns the existing
26+
`shared_ptr`, preserving the Box and its metadata (e.g., `EventBox`
27+
carries timing/IPC flags, `KernelBox` carries the library dependency).
28+
29+
Without this level, a round-tripped handle would produce a new Box
30+
with default metadata, losing information that was set at creation.
31+
32+
Instances: `event_registry`, `kernel_registry`, `graph_node_registry`.
33+
34+
## Level 2: Resource Handle -> Python Object (Cython)
35+
36+
`_node_registry` in `_graph_node.pyx` is a `WeakValueDictionary`
37+
mapping a resource address (`shared_ptr::get()`) to a Python
38+
`GraphNode` object. When `GraphNode._create` receives a handle from
39+
Level 1, it checks this registry. If found, it returns the existing
40+
Python object.
41+
42+
Without this level, each driver round-trip would produce a distinct
43+
Python object for the same logical node, resulting in surprising
44+
behavior:
45+
46+
```python
47+
a = g.empty()
48+
a.succ = {b}
49+
b2, = a.succ # queries driver, gets back CUgraphNode for b
50+
assert b2 is b # fails without Level 2 registry
51+
```

cuda_core/cuda/core/_cpp/resource_handles.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,7 @@ ContextHandle get_event_context(const EventHandle& h) noexcept {
388388
return h ? get_box(h)->h_context : ContextHandle{};
389389
}
390390

391+
// See REGISTRY_DESIGN.md (Level 1: Driver Handle -> Resource Handle)
391392
static HandleRegistry<CUevent, EventHandle> event_registry;
392393

393394
EventHandle create_event_handle(const ContextHandle& h_ctx, unsigned int flags,
@@ -894,6 +895,7 @@ static const KernelBox* get_box(const KernelHandle& h) {
894895
);
895896
}
896897

898+
// See REGISTRY_DESIGN.md (Level 1: Driver Handle -> Resource Handle)
897899
static HandleRegistry<CUkernel, KernelHandle> kernel_registry;
898900

899901
KernelHandle create_kernel_handle(const LibraryHandle& h_library, const char* name) {
@@ -964,6 +966,7 @@ static const GraphNodeBox* get_box(const GraphNodeHandle& h) {
964966
);
965967
}
966968

969+
// See REGISTRY_DESIGN.md (Level 1: Driver Handle -> Resource Handle)
967970
static HandleRegistry<CUgraphNode, GraphNodeHandle> graph_node_registry;
968971

969972
GraphNodeHandle create_graph_node_handle(CUgraphNode node, const GraphHandle& h_graph) {

cuda_core/cuda/core/_graph/_graph_def/_adjacency_set_proxy.pyx

Lines changed: 42 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ class AdjacencySetProxy(MutableSet):
3939
def __contains__(self, x):
4040
if not isinstance(x, GraphNode):
4141
return False
42-
return x in (<_AdjacencySetCore>self._core).query()
42+
return (<_AdjacencySetCore>self._core).contains(<GraphNode>x)
4343

4444
def __iter__(self):
4545
return iter((<_AdjacencySetCore>self._core).query())
@@ -87,13 +87,13 @@ class AdjacencySetProxy(MutableSet):
8787
if isinstance(other, GraphNode):
8888
nodes.append(other)
8989
else:
90-
nodes.extend(other)
90+
for n in other:
91+
if not isinstance(n, GraphNode):
92+
raise TypeError(
93+
f"expected GraphNode, got {type(n).__name__}")
94+
nodes.append(n)
9195
if not nodes:
9296
return
93-
for n in nodes:
94-
if not isinstance(n, GraphNode):
95-
raise TypeError(
96-
f"expected GraphNode, got {type(n).__name__}")
9797
new = [n for n in nodes if n not in self]
9898
if new:
9999
(<_AdjacencySetCore>self._core).add_edges(new)
@@ -143,11 +143,14 @@ cdef class _AdjacencySetCore:
143143
cdef cydriver.CUgraphNode c_node = as_cu(self._h_node)
144144
if c_node == NULL:
145145
return []
146-
cdef size_t count = 0
146+
cdef cydriver.CUgraphNode buf[16]
147+
cdef size_t count = 16
148+
cdef size_t i
147149
with nogil:
148-
HANDLE_RETURN(self._query_fn(c_node, NULL, &count))
149-
if count == 0:
150-
return []
150+
HANDLE_RETURN(self._query_fn(c_node, buf, &count))
151+
if count <= 16:
152+
return [GraphNode._create(self._h_graph, buf[i])
153+
for i in range(count)]
151154
cdef vector[cydriver.CUgraphNode] nodes_vec
152155
nodes_vec.resize(count)
153156
with nogil:
@@ -156,6 +159,35 @@ cdef class _AdjacencySetCore:
156159
return [GraphNode._create(self._h_graph, nodes_vec[i])
157160
for i in range(count)]
158161

162+
cdef bint contains(self, GraphNode other):
163+
cdef cydriver.CUgraphNode c_node = as_cu(self._h_node)
164+
cdef cydriver.CUgraphNode target = as_cu(other._h_node)
165+
if c_node == NULL or target == NULL:
166+
return False
167+
cdef cydriver.CUgraphNode buf[16]
168+
cdef size_t count = 16
169+
cdef size_t i
170+
with nogil:
171+
HANDLE_RETURN(self._query_fn(c_node, buf, &count))
172+
173+
# Fast path for small sets.
174+
if count <= 16:
175+
for i in range(count):
176+
if buf[i] == target:
177+
return True
178+
return False
179+
180+
# Fallback for large sets.
181+
cdef vector[cydriver.CUgraphNode] nodes_vec
182+
nodes_vec.resize(count)
183+
with nogil:
184+
HANDLE_RETURN(self._query_fn(c_node, nodes_vec.data(), &count))
185+
assert count == nodes_vec.size()
186+
for i in range(count):
187+
if nodes_vec[i] == target:
188+
return True
189+
return False
190+
159191
cdef Py_ssize_t count(self):
160192
cdef cydriver.CUgraphNode c_node = as_cu(self._h_node)
161193
if c_node == NULL:

cuda_core/cuda/core/_graph/_graph_def/_graph_def.pyx

Lines changed: 36 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,16 @@ cdef class GraphDef:
159159
"""
160160
return self._entry.launch(config, kernel, *args)
161161
162+
def empty(self) -> "EmptyNode":
163+
"""Add an entry-point empty node (no dependencies).
164+
165+
Returns
166+
-------
167+
EmptyNode
168+
A new EmptyNode with no dependencies.
169+
"""
170+
return self._entry.join()
171+
162172
def join(self, *nodes) -> "EmptyNode":
163173
"""Create an empty node that depends on all given nodes.
164174

@@ -322,18 +332,20 @@ cdef class GraphDef:
322332
set of GraphNode
323333
All nodes in the graph.
324334
"""
325-
cdef size_t num_nodes = 0
335+
cdef vector[cydriver.CUgraphNode] nodes_vec
336+
nodes_vec.resize(128)
337+
cdef size_t num_nodes = 128
326338
327339
with nogil:
328-
HANDLE_RETURN(cydriver.cuGraphGetNodes(as_cu(self._h_graph), NULL, &num_nodes))
340+
HANDLE_RETURN(cydriver.cuGraphGetNodes(as_cu(self._h_graph), nodes_vec.data(), &num_nodes))
329341
330342
if num_nodes == 0:
331343
return set()
332344
333-
cdef vector[cydriver.CUgraphNode] nodes_vec
334-
nodes_vec.resize(num_nodes)
335-
with nogil:
336-
HANDLE_RETURN(cydriver.cuGraphGetNodes(as_cu(self._h_graph), nodes_vec.data(), &num_nodes))
345+
if num_nodes > 128:
346+
nodes_vec.resize(num_nodes)
347+
with nogil:
348+
HANDLE_RETURN(cydriver.cuGraphGetNodes(as_cu(self._h_graph), nodes_vec.data(), &num_nodes))
337349
338350
return set(GraphNode._create(self._h_graph, nodes_vec[i]) for i in range(num_nodes))
339351
@@ -346,21 +358,12 @@ cdef class GraphDef:
346358
Each element is a (from_node, to_node) pair representing
347359
a dependency edge in the graph.
348360
"""
349-
cdef size_t num_edges = 0
350-
351-
with nogil:
352-
IF CUDA_CORE_BUILD_MAJOR >= 13:
353-
HANDLE_RETURN(cydriver.cuGraphGetEdges(as_cu(self._h_graph), NULL, NULL, NULL, &num_edges))
354-
ELSE:
355-
HANDLE_RETURN(cydriver.cuGraphGetEdges(as_cu(self._h_graph), NULL, NULL, &num_edges))
356-
357-
if num_edges == 0:
358-
return set()
359-
360361
cdef vector[cydriver.CUgraphNode] from_nodes
361362
cdef vector[cydriver.CUgraphNode] to_nodes
362-
from_nodes.resize(num_edges)
363-
to_nodes.resize(num_edges)
363+
from_nodes.resize(128)
364+
to_nodes.resize(128)
365+
cdef size_t num_edges = 128
366+
364367
with nogil:
365368
IF CUDA_CORE_BUILD_MAJOR >= 13:
366369
HANDLE_RETURN(cydriver.cuGraphGetEdges(
@@ -369,6 +372,20 @@ cdef class GraphDef:
369372
HANDLE_RETURN(cydriver.cuGraphGetEdges(
370373
as_cu(self._h_graph), from_nodes.data(), to_nodes.data(), &num_edges))
371374
375+
if num_edges == 0:
376+
return set()
377+
378+
if num_edges > 128:
379+
from_nodes.resize(num_edges)
380+
to_nodes.resize(num_edges)
381+
with nogil:
382+
IF CUDA_CORE_BUILD_MAJOR >= 13:
383+
HANDLE_RETURN(cydriver.cuGraphGetEdges(
384+
as_cu(self._h_graph), from_nodes.data(), to_nodes.data(), NULL, &num_edges))
385+
ELSE:
386+
HANDLE_RETURN(cydriver.cuGraphGetEdges(
387+
as_cu(self._h_graph), from_nodes.data(), to_nodes.data(), &num_edges))
388+
372389
return set(
373390
(GraphNode._create(self._h_graph, from_nodes[i]),
374391
GraphNode._create(self._h_graph, to_nodes[i]))

cuda_core/cuda/core/_graph/_graph_def/_graph_node.pyx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ from cuda.core import Device
6363
from cuda.core._graph._graph_def._adjacency_set_proxy import AdjacencySetProxy
6464
from cuda.core._utils.cuda_utils import driver, handle_return
6565

66+
# See _cpp/REGISTRY_DESIGN.md (Level 2: Resource Handle -> Python Object)
6667
_node_registry = weakref.WeakValueDictionary()
6768

6869

0 commit comments

Comments
 (0)