Skip to content

Commit 39c021c

Browse files
author
cuda-python-bot
committed
Deploy doc preview for PR 1837 (d821db1)
1 parent 1dfb3a9 commit 39c021c

1,139 files changed

Lines changed: 111608 additions & 80341 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 370c1b5b6cca50801ae9a598001e10f3
3+
config: f1919d8af14b4e4c194465556f885e50
44
tags: 645f666f9bcd5a90fca523b33c5a78b7

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/contribute.rst.txt

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,17 @@
44
Contributing
55
============
66

7-
Thank you for your interest in contributing to ``cuda-bindings``! Based on the type of contribution, it will fall into two categories:
8-
9-
1. You want to report a bug, feature request, or documentation issue
10-
- File an `issue <https://github.com/NVIDIA/cuda-python/issues/new/choose>`_ describing what you encountered or what you want to see changed.
11-
- The NVIDIA team will evaluate the issues and triage them, scheduling
12-
them for a release. If you believe the issue needs priority attention
13-
comment on the issue to notify the team.
14-
2. You want to implement a feature, improvement, or bug fix:
15-
- At this time we do not accept code contributions.
7+
Thank you for your interest in contributing to ``cuda-bindings``! Based on the
8+
type of contribution, it will fall into two categories:
9+
10+
1. You want to report a bug, feature request, or documentation issue.
11+
12+
File an `issue <https://github.com/NVIDIA/cuda-python/issues/new/choose>`_
13+
describing what you encountered or what you want to see changed. The NVIDIA
14+
team will evaluate the issue, triage it, and schedule it for a release. If
15+
you believe the issue needs priority attention, comment on the issue to
16+
notify the team.
17+
18+
2. You want to implement a feature, improvement, or bug fix.
19+
20+
At this time we do not accept code contributions.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
3+
4+
Examples
5+
========
6+
7+
This page links to the ``cuda.bindings`` examples shipped in the
8+
`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/|cuda_bindings_github_ref|/cuda_bindings/examples>`_.
9+
Use it as a quick index when you want a runnable sample for a specific API area
10+
or CUDA feature.
11+
12+
Introduction
13+
------------
14+
15+
- `clock_nvrtc.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/clock_nvrtc.py>`_
16+
uses NVRTC-compiled CUDA code and the device clock to time a reduction
17+
kernel.
18+
- `simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_
19+
demonstrates cubemap texture sampling and transformation.
20+
- `simple_p2p.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_p2p.py>`_
21+
shows peer-to-peer memory access and transfers between multiple GPUs.
22+
- `simple_zero_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_zero_copy.py>`_
23+
uses zero-copy mapped host memory for vector addition.
24+
- `system_wide_atomics.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/system_wide_atomics.py>`_
25+
demonstrates system-wide atomic operations on managed memory.
26+
- `vector_add_drv.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_drv.py>`_
27+
uses the CUDA Driver API and unified virtual addressing for vector addition.
28+
- `vector_add_mmap.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_mmap.py>`_
29+
uses virtual memory management APIs such as ``cuMemCreate`` and
30+
``cuMemMap`` for vector addition.
31+
32+
Concepts and techniques
33+
-----------------------
34+
35+
- `stream_ordered_allocation.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py>`_
36+
demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with
37+
memory-pool release thresholds.
38+
39+
CUDA features
40+
-------------
41+
42+
- `global_to_shmem_async_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py>`_
43+
compares asynchronous global-to-shared-memory copy strategies in matrix
44+
multiplication kernels.
45+
- `simple_cuda_graphs.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py>`_
46+
shows both manual CUDA graph construction and stream-capture-based replay.
47+
48+
Libraries and tools
49+
-------------------
50+
51+
- `conjugate_gradient_multi_block_cg.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`_
52+
implements a conjugate-gradient solver with cooperative groups and
53+
multi-block synchronization.
54+
- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
55+
uses NVML to implement a Python subset of ``nvidia-smi``.
56+
57+
Advanced and interoperability
58+
-----------------------------
59+
60+
- `iso_fd_modelling.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/iso_fd_modelling.py>`_
61+
runs isotropic finite-difference wave propagation across multiple GPUs with
62+
peer-to-peer halo exchange.
63+
- `jit_program.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/jit_program.py>`_
64+
JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
65+
API.
66+
- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/numba_emm_plugin.py>`_
67+
shows how to back Numba's EMM interface with the NVIDIA CUDA Python Driver
68+
API.

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/index.rst.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
release
1212
install
1313
overview
14+
examples
1415
motivation
1516
environment_variables
1617
api

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/install.rst.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Installing from Source
7878
----------------------
7979

8080
Requirements
81-
^^^^^^^^^^^^
81+
~~~~~~~~~~~~
8282

8383
* CUDA Toolkit headers[^1]
8484
* CUDA Runtime static library[^2]
@@ -100,7 +100,7 @@ See `Environment Variables <environment_variables.rst>`_ for a description of ot
100100
Only ``cydriver``, ``cyruntime`` and ``cynvrtc`` are impacted by the header requirement.
101101

102102
Editable Install
103-
^^^^^^^^^^^^^^^^
103+
~~~~~~~~~~~~~~~~
104104

105105
You can use:
106106

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/module/generated/cuda.bindings.nvml.CoolerInfo_v1.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
.. autosummary::
2828

29-
~CoolerInfo_v1.ind_ex
29+
~CoolerInfo_v1.index
3030
~CoolerInfo_v1.ptr
3131
~CoolerInfo_v1.signal_type
3232
~CoolerInfo_v1.target

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/module/generated/cuda.bindings.nvml.PlatformInfo_v1.rst.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@
2727
.. autosummary::
2828

2929
~PlatformInfo_v1.chassis_physical_slot_number
30-
~PlatformInfo_v1.compute_slot_ind_ex
30+
~PlatformInfo_v1.compute_slot_index
3131
~PlatformInfo_v1.ib_guid
3232
~PlatformInfo_v1.module_id
33-
~PlatformInfo_v1.node_ind_ex
33+
~PlatformInfo_v1.node_index
3434
~PlatformInfo_v1.peer_type
3535
~PlatformInfo_v1.ptr
3636
~PlatformInfo_v1.rack_guid

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/module/generated/cuda.bindings.nvml.PlatformInfo_v2.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
~PlatformInfo_v2.peer_type
3434
~PlatformInfo_v2.ptr
3535
~PlatformInfo_v2.slot_number
36-
~PlatformInfo_v2.tray_ind_ex
36+
~PlatformInfo_v2.tray_index
3737
~PlatformInfo_v2.version
3838

3939

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/overview.rst.txt

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,14 @@ code into
2525
`PTX <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html>`_ and
2626
then extract the function to be called at a later point in the application. You
2727
construct your device code in the form of a string and compile it with
28-
`NVRTC <http://docs.nvidia.com/cuda/nvrtc/index.html>`_, a runtime compilation
28+
`NVRTC <https://docs.nvidia.com/cuda/nvrtc/index.html>`_, a runtime compilation
2929
library for CUDA C++. Using the NVIDIA `Driver
30-
API <http://docs.nvidia.com/cuda/cuda-driver-api/index.html>`_, manually create a
30+
API <https://docs.nvidia.com/cuda/cuda-driver-api/index.html>`_, manually create a
3131
CUDA context and all required resources on the GPU, then launch the compiled
3232
CUDA C++ code and retrieve the results from the GPU. Now that you have an
3333
overview, jump into a commonly used example for parallel programming:
34-
`SAXPY <https://developer.nvidia.com/blog/six-ways-saxpy/>`_.
34+
`SAXPY <https://developer.nvidia.com/blog/six-ways-saxpy/>`_. For more
35+
end-to-end samples, see the :doc:`examples` page.
3536

3637
The first thing to do is import the `Driver
3738
API <https://docs.nvidia.com/cuda/cuda-driver-api/index.html>`_ and
@@ -427,7 +428,7 @@ Putting it all together:
427428
)
428429
429430
The final step is to construct a ``kernelParams`` argument that fulfills all of the launch API conditions. This is made easy because each array object comes
430-
with a `ctypes <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ctypes.html#numpy.ndarray.ctypes>`_ data attribute that returns the underlying ``void*`` pointer value.
431+
with NumPy's `ctypes data attribute <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ctypes.html#numpy.ndarray.ctypes>`_ that returns the underlying ``void*`` pointer value.
431432

432433
By having the final array object contain all pointers, we fulfill the contiguous array requirement:
433434

@@ -520,7 +521,10 @@ CUDA objects
520521

521522
Certain CUDA kernels use native CUDA types as their parameters such as ``cudaTextureObject_t``. These types require special handling since they're neither a primitive ctype nor a custom user type. Since ``cuda.bindings`` exposes each of them as Python classes, they each implement ``getPtr()`` and ``__int__()``. These two callables used to support the NumPy and ctypes approach. The difference between each call is further described under `Tips and Tricks <https://nvidia.github.io/cuda-python/cuda-bindings/latest/tips_and_tricks.html#>`_.
522523

523-
For this example, lets use the ``transformKernel`` from `examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_:
524+
For this example, lets use the ``transformKernel`` from
525+
`simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_.
526+
The :doc:`examples` page links to more samples covering textures, graphs,
527+
memory mapping, and multi-GPU workflows.
524528

525529
.. code-block:: python
526530

docs/pr-preview/pr-1837/cuda-bindings/latest/_sources/release/11.8.6-notes.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
33
44
``cuda-bindings`` 11.8.6 Release notes
5-
====================================
5+
==========================================
66

77
Released on January 24, 2025.
88

0 commit comments

Comments
 (0)