NVIDIA
diff --git a/‎cuda_bindings/docs/source/examples.rst‎
Lines changed: 68 additions & 0 deletions b/‎cuda_bindings/docs/source/examples.rst‎
Lines changed: 68 additions & 0 deletions
diff --git a/‎cuda_bindings/docs/source/index.rst‎
Lines changed: 1 addition & 0 deletions b/‎cuda_bindings/docs/source/index.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎cuda_bindings/docs/source/overview.rst‎
Lines changed: 6 additions & 2 deletions b/‎cuda_bindings/docs/source/overview.rst‎
Lines changed: 6 additions & 2 deletions
@@ -0,0 +1,68 @@
+.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
+
+Examples
+========
+
+This page links to the ``cuda.bindings`` examples shipped in the
+`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/main/cuda_bindings/examples>`_.
+Use it as a quick index when you want a runnable sample for a specific API area
+or CUDA feature.
+
+Introduction
+------------
+
+- `clock_nvrtc_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/clock_nvrtc_test.py>`_
+  uses NVRTC-compiled CUDA code and the device clock to time a reduction
+  kernel.
+- `simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_
+  demonstrates cubemap texture sampling and transformation.
+- `simpleP2P_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleP2P_test.py>`_
+  shows peer-to-peer memory access and transfers between multiple GPUs.
+- `simpleZeroCopy_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleZeroCopy_test.py>`_
+  uses zero-copy mapped host memory for vector addition.
+- `systemWideAtomics_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/systemWideAtomics_test.py>`_
+  demonstrates system-wide atomic operations on managed memory.
+- `vectorAddDrv_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/vectorAddDrv_test.py>`_
+  uses the CUDA Driver API and unified virtual addressing for vector addition.
+- `vectorAddMMAP_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/vectorAddMMAP_test.py>`_
+  uses virtual memory management APIs such as ``cuMemCreate`` and
+  ``cuMemMap`` for vector addition.
+
+Concepts and techniques
+-----------------------
+
+- `streamOrderedAllocation_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py>`_
+  demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with
+  memory-pool release thresholds.
+
+CUDA features
+-------------
+
+- `globalToShmemAsyncCopy_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py>`_
+  compares asynchronous global-to-shared-memory copy strategies in matrix
+  multiplication kernels.
+- `simpleCudaGraphs_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/3_CUDA_Features/simpleCudaGraphs_test.py>`_
+  shows both manual CUDA graph construction and stream-capture-based replay.
+
+Libraries and tools
+-------------------
+
+- `conjugateGradientMultiBlockCG_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py>`_
+  implements a conjugate-gradient solver with cooperative groups and
+  multi-block synchronization.
+- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
+  uses NVML to implement a Python subset of ``nvidia-smi``.
+
+Advanced and interoperability
+-----------------------------
+
+- `isoFDModelling_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/isoFDModelling_test.py>`_
+  runs isotropic finite-difference wave propagation across multiple GPUs with
+  peer-to-peer halo exchange.
+- `jit_program_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/jit_program_test.py>`_
+  JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
+  API.
+- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/numba_emm_plugin.py>`_
+  shows how to back Numba's EMM interface with the NVIDIA CUDA Python Driver
+  API.
@@ -11,6 +11,7 @@
    release
    install
    overview
+   examples
    motivation
    environment_variables
    api
 
@@ -31,7 +31,8 @@ API <http://docs.nvidia.com/cuda/cuda-driver-api/index.html>`_, manually create
 CUDA context and all required resources on the GPU, then launch the compiled
 CUDA C++ code and retrieve the results from the GPU. Now that you have an
 overview, jump into a commonly used example for parallel programming:
-`SAXPY <https://developer.nvidia.com/blog/six-ways-saxpy/>`_.
+`SAXPY <https://developer.nvidia.com/blog/six-ways-saxpy/>`_. For more
+end-to-end samples, see the :doc:`examples` page.
 
 The first thing to do is import the `Driver
 API <https://docs.nvidia.com/cuda/cuda-driver-api/index.html>`_ and
@@ -520,7 +521,10 @@ CUDA objects
 
 Certain CUDA kernels use native CUDA types as their parameters such as ``cudaTextureObject_t``. These types require special handling since they're neither a primitive ctype nor a custom user type. Since ``cuda.bindings`` exposes each of them as Python classes, they each implement ``getPtr()`` and ``__int__()``. These two callables used to support the NumPy and ctypes approach. The difference between each call is further described under `Tips and Tricks <https://nvidia.github.io/cuda-python/cuda-bindings/latest/tips_and_tricks.html#>`_.
 
-For this example, lets use the ``transformKernel`` from `examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_:
+For this example, lets use the ``transformKernel`` from
+`examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_.
+The :doc:`examples` page links to more samples covering textures, graphs,
+memory mapping, and multi-GPU workflows.
 
 .. code-block:: python