Skip to content

Commit 54127d9

Browse files
committed
docs: audit example coverage and add pixi docs builds (#1680)
Make every current cuda.core and cuda.bindings example discoverable from the published docs, and add a reproducible pixi-based docs workflow so documentation changes can be built and debugged locally. Made-with: Cursor
1 parent c2f79a1 commit 54127d9

12 files changed

Lines changed: 8146 additions & 243 deletions

File tree

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
3+
4+
Examples
5+
========
6+
7+
This page links to the ``cuda.bindings`` examples shipped in the
8+
`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/main/cuda_bindings/examples>`_.
9+
Use it as a quick index when you want a runnable sample for a specific API area
10+
or CUDA feature.
11+
12+
Introduction
13+
------------
14+
15+
- `clock_nvrtc_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/clock_nvrtc_test.py>`_
16+
uses NVRTC-compiled CUDA code and the device clock to time a reduction
17+
kernel.
18+
- `simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_
19+
demonstrates cubemap texture sampling and transformation.
20+
- `simpleP2P_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleP2P_test.py>`_
21+
shows peer-to-peer memory access and transfers between multiple GPUs.
22+
- `simpleZeroCopy_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleZeroCopy_test.py>`_
23+
uses zero-copy mapped host memory for vector addition.
24+
- `systemWideAtomics_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/systemWideAtomics_test.py>`_
25+
demonstrates system-wide atomic operations on managed memory.
26+
- `vectorAddDrv_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/vectorAddDrv_test.py>`_
27+
uses the CUDA Driver API and unified virtual addressing for vector addition.
28+
- `vectorAddMMAP_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/vectorAddMMAP_test.py>`_
29+
uses virtual memory management APIs such as ``cuMemCreate`` and
30+
``cuMemMap`` for vector addition.
31+
32+
Concepts and techniques
33+
-----------------------
34+
35+
- `streamOrderedAllocation_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py>`_
36+
demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with
37+
memory-pool release thresholds.
38+
39+
CUDA features
40+
-------------
41+
42+
- `globalToShmemAsyncCopy_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py>`_
43+
compares asynchronous global-to-shared-memory copy strategies in matrix
44+
multiplication kernels.
45+
- `simpleCudaGraphs_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/3_CUDA_Features/simpleCudaGraphs_test.py>`_
46+
shows both manual CUDA graph construction and stream-capture-based replay.
47+
48+
Libraries and tools
49+
-------------------
50+
51+
- `conjugateGradientMultiBlockCG_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py>`_
52+
implements a conjugate-gradient solver with cooperative groups and
53+
multi-block synchronization.
54+
- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
55+
uses NVML to implement a Python subset of ``nvidia-smi``.
56+
57+
Advanced and interoperability
58+
-----------------------------
59+
60+
- `isoFDModelling_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/isoFDModelling_test.py>`_
61+
runs isotropic finite-difference wave propagation across multiple GPUs with
62+
peer-to-peer halo exchange.
63+
- `jit_program_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/jit_program_test.py>`_
64+
JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
65+
API.
66+
- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/numba_emm_plugin.py>`_
67+
shows how to back Numba's EMM interface with the NVIDIA CUDA Python Driver
68+
API.

cuda_bindings/docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
release
1212
install
1313
overview
14+
examples
1415
motivation
1516
environment_variables
1617
api

cuda_bindings/docs/source/overview.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,8 @@ API <http://docs.nvidia.com/cuda/cuda-driver-api/index.html>`_, manually create
3131
CUDA context and all required resources on the GPU, then launch the compiled
3232
CUDA C++ code and retrieve the results from the GPU. Now that you have an
3333
overview, jump into a commonly used example for parallel programming:
34-
`SAXPY <https://developer.nvidia.com/blog/six-ways-saxpy/>`_.
34+
`SAXPY <https://developer.nvidia.com/blog/six-ways-saxpy/>`_. For more
35+
end-to-end samples, see the :doc:`examples` page.
3536

3637
The first thing to do is import the `Driver
3738
API <https://docs.nvidia.com/cuda/cuda-driver-api/index.html>`_ and
@@ -520,7 +521,10 @@ CUDA objects
520521

521522
Certain CUDA kernels use native CUDA types as their parameters such as ``cudaTextureObject_t``. These types require special handling since they're neither a primitive ctype nor a custom user type. Since ``cuda.bindings`` exposes each of them as Python classes, they each implement ``getPtr()`` and ``__int__()``. These two callables used to support the NumPy and ctypes approach. The difference between each call is further described under `Tips and Tricks <https://nvidia.github.io/cuda-python/cuda-bindings/latest/tips_and_tricks.html#>`_.
522523

523-
For this example, lets use the ``transformKernel`` from `examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_:
524+
For this example, lets use the ``transformKernel`` from
525+
`examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_.
526+
The :doc:`examples` page links to more samples covering textures, graphs,
527+
memory mapping, and multi-GPU workflows.
524528

525529
.. code-block:: python
526530

0 commit comments

Comments
 (0)