|
| 1 | +.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 2 | +.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE |
| 3 | +
|
| 4 | +Examples |
| 5 | +======== |
| 6 | + |
| 7 | +This page links to the ``cuda.bindings`` examples shipped in the |
| 8 | +`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/|cuda_bindings_github_ref|/cuda_bindings/examples>`_. |
| 9 | +Use it as a quick index when you want a runnable sample for a specific API area |
| 10 | +or CUDA feature. |
| 11 | + |
| 12 | +Introduction |
| 13 | +------------ |
| 14 | + |
| 15 | +- `clock_nvrtc.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/clock_nvrtc.py>`_ |
| 16 | + uses NVRTC-compiled CUDA code and the device clock to time a reduction |
| 17 | + kernel. |
| 18 | +- `simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_ |
| 19 | + demonstrates cubemap texture sampling and transformation. |
| 20 | +- `simple_p2p.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_p2p.py>`_ |
| 21 | + shows peer-to-peer memory access and transfers between multiple GPUs. |
| 22 | +- `simple_zero_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_zero_copy.py>`_ |
| 23 | + uses zero-copy mapped host memory for vector addition. |
| 24 | +- `system_wide_atomics.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/system_wide_atomics.py>`_ |
| 25 | + demonstrates system-wide atomic operations on managed memory. |
| 26 | +- `vector_add_drv.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_drv.py>`_ |
| 27 | + uses the CUDA Driver API and unified virtual addressing for vector addition. |
| 28 | +- `vector_add_mmap.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_mmap.py>`_ |
| 29 | + uses virtual memory management APIs such as ``cuMemCreate`` and |
| 30 | + ``cuMemMap`` for vector addition. |
| 31 | + |
| 32 | +Concepts and techniques |
| 33 | +----------------------- |
| 34 | + |
| 35 | +- `stream_ordered_allocation.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py>`_ |
| 36 | + demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with |
| 37 | + memory-pool release thresholds. |
| 38 | + |
| 39 | +CUDA features |
| 40 | +------------- |
| 41 | + |
| 42 | +- `global_to_shmem_async_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py>`_ |
| 43 | + compares asynchronous global-to-shared-memory copy strategies in matrix |
| 44 | + multiplication kernels. |
| 45 | +- `simple_cuda_graphs.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py>`_ |
| 46 | + shows both manual CUDA graph construction and stream-capture-based replay. |
| 47 | + |
| 48 | +Libraries and tools |
| 49 | +------------------- |
| 50 | + |
| 51 | +- `conjugate_gradient_multi_block_cg.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`_ |
| 52 | + implements a conjugate-gradient solver with cooperative groups and |
| 53 | + multi-block synchronization. |
| 54 | +- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_ |
| 55 | + uses NVML to implement a Python subset of ``nvidia-smi``. |
| 56 | + |
| 57 | +Advanced and interoperability |
| 58 | +----------------------------- |
| 59 | + |
| 60 | +- `iso_fd_modelling.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/iso_fd_modelling.py>`_ |
| 61 | + runs isotropic finite-difference wave propagation across multiple GPUs with |
| 62 | + peer-to-peer halo exchange. |
| 63 | +- `jit_program.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/jit_program.py>`_ |
| 64 | + JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver |
| 65 | + API. |
| 66 | +- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/numba_emm_plugin.py>`_ |
| 67 | + shows how to back Numba's EMM interface with the NVIDIA CUDA Python Driver |
| 68 | + API. |
0 commit comments