Skip to content

Commit c6aea12

Browse files
authored
Publish the graph API as cuda.core.graph (#1858)
* Reorganize graph test files for clarity Rename test files to reflect what they actually test: - test_basic -> test_graph_builder (stream capture tests) - test_conditional -> test_graph_builder_conditional - test_advanced -> test_graph_update (moved child_graph and stream_lifetime tests into test_graph_builder) - test_capture_alloc -> test_graph_memory_resource - test_explicit* -> test_graphdef* Made-with: Cursor * Enhance Graph.update() and add whole-graph update tests - Extend Graph.update() to accept both GraphBuilder and GraphDef sources - Surface CUgraphExecUpdateResultInfo details on update failure instead of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE message - Release the GIL during cuGraphExecUpdate via nogil block - Add parametrized happy-path test covering both GraphBuilder and GraphDef - Add error-case tests: unfinished builder, topology mismatch, wrong type Made-with: Cursor * Add AdjacencySet proxy for pred/succ and GraphNode.remove() Replace cached tuple-based pred/succ with mutable AdjacencySet backed by direct CUDA driver calls. Add GraphNode.remove() wrapping cuGraphDestroyNode. Made-with: Cursor * Add edge mutation support and MutableSet interface for GraphNode adjacencies Enable adding/removing edges between graph nodes via AdjacencySet (a MutableSet proxy on GraphNode.pred/succ), node removal via discard(), and property setters for bulk edge replacement. Includes comprehensive mutation and interface tests. Closes part of #1330 (step 2: edge mutation on GraphDef). Made-with: Cursor * Use requires_module mark for numpy version checks in mutation tests Replace inline skipif version check with requires_module(np, "2.1") from the shared test helpers, consistent with other test files. Made-with: Cursor * Fix empty-graph return type: return set() instead of () for nodes/edges Made-with: Cursor * Rename AdjacencySet to AdjacencySetProxy, add bulk ops and safety guards Rename class and file to AdjacencySetProxy to clarify write-through semantics. Add bulk-efficient clear(), __isub__(), __ior__() overrides and remove_edges() on the Cython core. Guard GraphNode.discard() against double-destroy via membership check. Filter duplicates in update(). Add error-path tests for wrong types, cross-graph edges, and self-edges. Made-with: Cursor * Add destroy() method with handle invalidation, remove GRAPH_NODE_SENTINEL Replace discard() with destroy() which calls cuGraphDestroyNode and then zeroes the CUgraphNode resource in the handle box via invalidate_graph_node_handle. This prevents stale memory access on destroyed nodes. Properties (type, pred, succ, handle) degrade gracefully to None/empty for destroyed nodes. Remove the GRAPH_NODE_SENTINEL (0x1) approach in favor of using NULL for both sentinels and destroyed nodes, which is simpler and avoids the risk of passing 0x1 to driver APIs that treat it as a valid pointer. Made-with: Cursor * Add GraphNode identity cache for stable Python object round-trips Nodes retrieved via GraphDef.nodes(), edges(), or pred/succ traversal now return the same Python object that was originally created, enabling identity checks with `is`. A C++ HandleRegistry deduplicates CUgraphNode handles, and a Cython WeakValueDictionary caches the Python wrapper objects. Made-with: Cursor * Purge node cache on destroy to prevent stale identity lookups Made-with: Cursor * Make graph API public: rename _graph to cuda.core.graph Move the graph package from cuda.core._graph to cuda.core.graph and flatten the _graph_def subdirectory. GraphDef, Condition, GraphAllocOptions, GraphNode, and all node subclasses are now importable from cuda.core.graph. GraphDef, Condition, and GraphAllocOptions are also re-exported from cuda.core directly. Break the circular import (_stream → graph → _graph_node → _kernel_arg_handler → _buffer → _device → _stream) by making _device.pyx and _stream.pyx use local imports for GraphBuilder. Made-with: Cursor * Add 0.7.x release notes for GraphDef and cuda.core.graph module Made-with: Cursor * Skip NULL nodes in graph_node_registry to fix sentinel identity collision Sentinel (entry) nodes use NULL as their CUgraphNode, so caching them under a NULL key caused all sentinels across different graphs to share the same handle. This made nodes built from the wrong graph's entry point, causing CUDA_ERROR_INVALID_VALUE for conditional nodes and hash collisions in equality tests. Made-with: Cursor * Unregister destroyed nodes from C++ graph_node_registry When a node is destroyed, the driver may reuse its CUgraphNode pointer for a new node. Without unregistering the old entry, the registry returns a stale handle pointing to the wrong node type and graph. Made-with: Cursor * Add dedicated test for node identity preservation through round-trips Made-with: Cursor * Add API docs for cuda.core.graph and fix stale docstring references The _graph -> graph rename left behind broken Sphinx cross-references and the new graph types were missing from the API reference. Made-with: Cursor * Add handle= to all GraphNode subclass __repr__ for debugging Every subclass repr now starts with handle=0x... (the CUgraphNode pointer) followed by type-specific identity/parameter data. Dynamic queries (pred counts, subnode counts) are removed in favor of deterministic, cheap fields. This makes set comparison failures in test output readable when debugging graph mutation tests. Made-with: Cursor * Rename _node_cache/_cached to _node_registry/_registered Aligns Python-side terminology with the C++ graph_node_registry. Made-with: Cursor * Fix unregister_handle and rename invalidate_graph_node_handle unregister_handle: remove the expired() guard that prevented erasure when the shared_ptr was still alive. This caused stale registry entries after destroy(), leading to CUDA_ERROR_INVALID_VALUE when the driver reused CUgraphNode pointer values. Rename invalidate_graph_node_handle -> invalidate_graph_node for consistency with the rest of the graph node API. Made-with: Cursor * Add cheap containment test and early type check for AdjacencySetProxy Add _AdjacencySetCore.contains() that checks membership by comparing raw CUgraphNode handles at the C level, avoiding Python object construction. Uses a 16-element stack buffer for a single driver call in the common case. Move the type check in update() inline next to the extend loop so invalid input is rejected immediately. Made-with: Cursor * Add GraphDef.empty(), stack-buffer query optimization, and registry test - Add GraphDef.empty() for creating entry-point empty nodes; replace all no-arg join() calls on GraphDef with empty() in tests. - Optimize _AdjacencySetCore.query() to use a 16-element stack buffer, matching the contains() optimization. - Add test_registry_cleanup exercising destroy(), graph deletion, and weak-reference cleanup of the node registry. Made-with: Cursor * Document the two-level handle and object registry design Add REGISTRY_DESIGN.md explaining how the C++ HandleRegistry (Level 1) and Cython _node_registry (Level 2) work together to preserve Python object identity through driver round-trips. Add cross-references at each registry instantiation site. Made-with: Cursor * Fix import formatting in test_registry_cleanup Made-with: Cursor * Optimize GraphDef.nodes() and edges() to try a single driver call Pre-allocate vectors to 128 entries and pass them on the first call. Only fall back to a second call if the graph exceeds 128 nodes/edges. Made-with: Cursor * Update docstrings, api.rst structure, and release notes - Rename "Memory" section to "Memory management" in api.rst - Add introductory paragraph under "Node types" subsection - Update node docstrings to concise noun phrases - Update graph API class docstrings (GraphBuilder, Graph, etc.) - Revise release notes highlights and new features sections - Fix test_registry_cleanup import path (cuda.core._graph -> cuda.core.graph) Made-with: Cursor
1 parent c2f79a1 commit c6aea12

31 files changed

+238
-184
lines changed

cuda_core/cuda/core/__init__.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

@@ -31,12 +31,6 @@
3131
from cuda.core import system, utils
3232
from cuda.core._device import Device
3333
from cuda.core._event import Event, EventOptions
34-
from cuda.core._graph import (
35-
Graph,
36-
GraphBuilder,
37-
GraphCompleteOptions,
38-
GraphDebugPrintOptions,
39-
)
4034
from cuda.core._graphics import GraphicsResource
4135
from cuda.core._launch_config import LaunchConfig
4236
from cuda.core._launcher import launch
@@ -69,3 +63,12 @@
6963
StreamOptions,
7064
)
7165
from cuda.core._tensor_map import TensorMapDescriptor, TensorMapDescriptorOptions
66+
from cuda.core.graph import (
67+
Condition,
68+
Graph,
69+
GraphAllocOptions,
70+
GraphBuilder,
71+
GraphCompleteOptions,
72+
GraphDebugPrintOptions,
73+
GraphDef,
74+
)

cuda_core/cuda/core/_device.pyx

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

@@ -24,7 +24,6 @@ from cuda.core._resource_handles cimport (
2424
as_cu,
2525
)
2626

27-
from cuda.core._graph import GraphBuilder
2827
from cuda.core._stream import IsStreamT, Stream, StreamOptions
2928
from cuda.core._utils.clear_error_support import assert_type
3029
from cuda.core._utils.cuda_utils import (
@@ -1363,15 +1362,17 @@ class Device:
13631362
self._check_context_initialized()
13641363
handle_return(runtime.cudaDeviceSynchronize())
13651364

1366-
def create_graph_builder(self) -> GraphBuilder:
1367-
"""Create a new :obj:`~_graph.GraphBuilder` object.
1365+
def create_graph_builder(self) -> "GraphBuilder":
1366+
"""Create a new :obj:`~graph.GraphBuilder` object.
13681367

13691368
Returns
13701369
-------
1371-
:obj:`~_graph.GraphBuilder`
1370+
:obj:`~graph.GraphBuilder`
13721371
Newly created graph builder object.
13731372

13741373
"""
1374+
from cuda.core.graph._graph_builder import GraphBuilder
1375+
13751376
self._check_context_initialized()
13761377
return GraphBuilder._init(stream=self.create_stream(), is_stream_owner=True)
13771378

cuda_core/cuda/core/_graph/__init__.py

Lines changed: 0 additions & 19 deletions
This file was deleted.

cuda_core/cuda/core/_graph/_graph_def/__init__.py

Lines changed: 0 additions & 51 deletions
This file was deleted.

cuda_core/cuda/core/_launcher.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def launch(stream: Stream | GraphBuilder | IsStreamT, config: LaunchConfig, kern
2626
2727
Parameters
2828
----------
29-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`
29+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`
3030
The stream establishing the stream ordering semantic of a
3131
launch.
3232
config : :obj:`LaunchConfig`

cuda_core/cuda/core/_memory/_buffer.pyx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ cdef class Buffer:
182182
183183
Parameters
184184
----------
185-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`, optional
185+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`, optional
186186
The stream object to use for asynchronous deallocation. If None,
187187
the deallocation stream stored in the handle is used.
188188
"""
@@ -206,7 +206,7 @@ cdef class Buffer:
206206
----------
207207
dst : :obj:`~_memory.Buffer`
208208
Source buffer to copy data from
209-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`
209+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`
210210
Keyword argument specifying the stream for the
211211
asynchronous copy
212212

@@ -237,7 +237,7 @@ cdef class Buffer:
237237
----------
238238
src : :obj:`~_memory.Buffer`
239239
Source buffer to copy data from
240-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`
240+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`
241241
Keyword argument specifying the stream for the
242242
asynchronous copy
243243
@@ -262,7 +262,7 @@ cdef class Buffer:
262262
value : int | :obj:`collections.abc.Buffer`
263263
- int: Must be in range [0, 256). Converted to 1 byte.
264264
- :obj:`collections.abc.Buffer`: Must be 1, 2, or 4 bytes.
265-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`
265+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`
266266
Stream for the asynchronous fill operation.
267267
268268
Raises
@@ -496,7 +496,7 @@ cdef class MemoryResource:
496496
----------
497497
size : int
498498
The size of the buffer to allocate, in bytes.
499-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`, optional
499+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`, optional
500500
The stream on which to perform the allocation asynchronously.
501501
If None, it is up to each memory resource implementation to decide
502502
and document the behavior.
@@ -518,7 +518,7 @@ cdef class MemoryResource:
518518
The pointer or handle to the buffer to deallocate.
519519
size : int
520520
The size of the buffer to deallocate, in bytes.
521-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`, optional
521+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`, optional
522522
The stream on which to perform the deallocation asynchronously.
523523
If None, it is up to each memory resource implementation to decide
524524
and document the behavior.

cuda_core/cuda/core/_memory/_memory_pool.pyx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ cdef class _MemPool(MemoryResource):
129129
----------
130130
size : int
131131
The size of the buffer to allocate, in bytes.
132-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`, optional
132+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`, optional
133133
The stream on which to perform the allocation asynchronously.
134134
If None, an internal stream is used.
135135

@@ -153,7 +153,7 @@ cdef class _MemPool(MemoryResource):
153153
The pointer or handle to the buffer to deallocate.
154154
size : int
155155
The size of the buffer to deallocate, in bytes.
156-
stream : :obj:`~_stream.Stream` | :obj:`~_graph.GraphBuilder`, optional
156+
stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`, optional
157157
The stream on which to perform the deallocation asynchronously.
158158
If the buffer is deallocated without an explicit stream, the allocation stream
159159
is used.

cuda_core/cuda/core/_stream.pyx

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ from cuda.core._resource_handles cimport (
3838
as_py,
3939
)
4040

41-
from cuda.core._graph import GraphBuilder
4241

4342

4443
@dataclass
@@ -354,17 +353,19 @@ cdef class Stream:
354353

355354
return Stream._init(obj=_stream_holder())
356355

357-
def create_graph_builder(self) -> GraphBuilder:
358-
"""Create a new :obj:`~_graph.GraphBuilder` object.
356+
def create_graph_builder(self) -> "GraphBuilder":
357+
"""Create a new :obj:`~graph.GraphBuilder` object.
359358

360359
The new graph builder will be associated with this stream.
361360

362361
Returns
363362
-------
364-
:obj:`~_graph.GraphBuilder`
363+
:obj:`~graph.GraphBuilder`
365364
Newly created graph builder object.
366365

367366
"""
367+
from cuda.core.graph._graph_builder import GraphBuilder
368+
368369
return GraphBuilder._init(stream=self, is_stream_owner=False)
369370

370371

@@ -466,6 +467,8 @@ cdef cydriver.CUstream _handle_from_stream_protocol(obj) except*:
466467
# Helper for API functions that accept either Stream or GraphBuilder. Performs
467468
# needed checks and returns the relevant stream.
468469
cdef Stream Stream_accept(arg, bint allow_stream_protocol=False):
470+
from cuda.core.graph._graph_builder import GraphBuilder
471+
469472
if isinstance(arg, Stream):
470473
return <Stream>(arg)
471474
elif isinstance(arg, GraphBuilder):

cuda_core/cuda/core/experimental/__init__.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

@@ -46,12 +46,6 @@ def _warn_deprecated():
4646

4747
from cuda.core._device import Device
4848
from cuda.core._event import Event, EventOptions
49-
from cuda.core._graph import (
50-
Graph,
51-
GraphBuilder,
52-
GraphCompleteOptions,
53-
GraphDebugPrintOptions,
54-
)
5549
from cuda.core._launch_config import LaunchConfig
5650
from cuda.core._launcher import launch
5751
from cuda.core._layout import _StridedLayout
@@ -73,3 +67,9 @@ def _warn_deprecated():
7367
from cuda.core._module import Kernel, ObjectCode
7468
from cuda.core._program import Program, ProgramOptions
7569
from cuda.core._stream import Stream, StreamOptions
70+
from cuda.core.graph import (
71+
Graph,
72+
GraphBuilder,
73+
GraphCompleteOptions,
74+
GraphDebugPrintOptions,
75+
)

cuda_core/cuda/core/_graph/_graph_def/__init__.pxd renamed to cuda_core/cuda/core/graph/__init__.pxd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

5-
from cuda.core._graph._graph_def._graph_def cimport Condition, GraphDef
6-
from cuda.core._graph._graph_def._graph_node cimport GraphNode
7-
from cuda.core._graph._graph_def._subclasses cimport (
5+
from cuda.core.graph._graph_def cimport Condition, GraphDef
6+
from cuda.core.graph._graph_node cimport GraphNode
7+
from cuda.core.graph._subclasses cimport (
88
AllocNode,
99
ChildGraphNode,
1010
ConditionalNode,

0 commit comments

Comments
 (0)