Skip to content

Commit c09bb82

Browse files
committed
Merge branch 'main' into pr-author-org-check
2 parents 9dcf7b1 + c93623b commit c09bb82

51 files changed

Lines changed: 13217 additions & 574 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/pr-metadata-check.yml

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ jobs:
2525
steps:
2626
- name: Check for assignee, labels, and milestone
2727
env:
28-
ASSIGNEES: ${{ toJson(github.event.pull_request.assignees) }}
29-
LABELS: ${{ toJson(github.event.pull_request.labels) }}
30-
MILESTONE: ${{ github.event.pull_request.milestone && github.event.pull_request.milestone.title || '' }}
3128
PR_URL: ${{ github.event.pull_request.html_url }}
29+
PR_NUMBER: ${{ github.event.pull_request.number }}
30+
GH_REPO: ${{ github.repository }}
31+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
3232
IS_BOT: ${{ github.actor == 'dependabot[bot]' || github.actor == 'pre-commit-ci[bot]' || github.actor == 'copy-pr-bot[bot]' }}
3333
IS_DRAFT: ${{ github.event.pull_request.draft }}
3434
run: |
@@ -37,6 +37,15 @@ jobs:
3737
exit 0
3838
fi
3939
40+
# Fetch live PR data to avoid stale event payload (race condition
41+
# when labels/milestone are added shortly after PR creation).
42+
PR_JSON=$(gh pr view "${PR_NUMBER}" --repo "${GH_REPO}" \
43+
--json assignees,labels,milestone \
44+
--jq '{assignees: .assignees, labels: .labels, milestone: (.milestone.title // empty)}')
45+
ASSIGNEES=$(echo "$PR_JSON" | jq '.assignees')
46+
LABELS=$(echo "$PR_JSON" | jq '.labels')
47+
MILESTONE=$(echo "$PR_JSON" | jq -r '.milestone')
48+
4049
ERRORS=""
4150
4251
ASSIGNEE_COUNT=$(echo "$ASSIGNEES" | jq 'length')

.github/workflows/test-wheel-linux.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,12 @@ jobs:
7070
echo "OLD_BRANCH=${OLD_BRANCH}" >> "$GITHUB_OUTPUT"
7171
7272
test:
73-
name: py${{ matrix.PY_VER }}, ${{ matrix.CUDA_VER }}, ${{ (matrix.LOCAL_CTK == '1' && 'local') || 'wheels' }}, ${{ matrix.GPU }}${{ matrix.GPU_COUNT != '1' && format('(x{0})', matrix.GPU_COUNT) || '' }}
73+
name: py${{ matrix.PY_VER }}, ${{ matrix.CUDA_VER }}, ${{ (matrix.LOCAL_CTK == '1' && 'local') || 'wheels' }}, ${{ matrix.GPU }}${{ matrix.GPU_COUNT != '1' && format('(x{0})', matrix.GPU_COUNT) || '' }}${{ matrix.FLAVOR && format(', {0}', matrix.FLAVOR) || '' }}
7474
needs: compute-matrix
7575
strategy:
7676
fail-fast: false
7777
matrix: ${{ fromJSON(needs.compute-matrix.outputs.MATRIX) }}
78-
runs-on: "linux-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-${{ matrix.GPU_COUNT }}"
78+
runs-on: "${{ matrix.FLAVOR || 'linux' }}-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-${{ matrix.GPU_COUNT }}"
7979
# The build stage could fail but we want the CI to keep moving.
8080
if: ${{ github.repository_owner == 'nvidia' && !cancelled() }}
8181
# Our self-hosted runners require a container

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ instance/
120120
# Sphinx documentation
121121
docs_src/_build/
122122
*/docs/source/generated/
123+
*/docs/source/module/generated/
123124

124125
# PyBuilder
125126
.pybuilder/

ci/test-matrix.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ linux:
6060
- { ARCH: 'amd64', PY_VER: '3.13', CUDA_VER: '13.2.0', LOCAL_CTK: '1', GPU: 'h100', GPU_COUNT: '1', DRIVER: 'latest' }
6161
- { ARCH: 'amd64', PY_VER: '3.14', CUDA_VER: '13.2.0', LOCAL_CTK: '1', GPU: 't4', GPU_COUNT: '2', DRIVER: 'latest' }
6262
- { ARCH: 'amd64', PY_VER: '3.14t', CUDA_VER: '13.2.0', LOCAL_CTK: '1', GPU: 'h100', GPU_COUNT: '2', DRIVER: 'latest' }
63+
- { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.9.1', LOCAL_CTK: '0', GPU: 't4', GPU_COUNT: '1', DRIVER: 'latest', FLAVOR: 'wsl' }
64+
- { ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '13.2.0', LOCAL_CTK: '0', GPU: 'rtx4090', GPU_COUNT: '1', DRIVER: 'latest', FLAVOR: 'wsl' }
6365
nightly: []
6466

6567
windows:

cuda_bindings/docs/build_docs.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ if [[ -z "${SPHINX_CUDA_BINDINGS_VER}" ]]; then
2525
| awk -F'+' '{print $1}')
2626
fi
2727

28+
if [[ "${LATEST_ONLY}" == "1" && -z "${BUILD_PREVIEW:-}" && -z "${BUILD_LATEST:-}" ]]; then
29+
export BUILD_LATEST=1
30+
fi
31+
2832
# build the docs (in parallel)
2933
SPHINXOPTS="-j 4 -d build/.doctrees" make html
3034

cuda_bindings/docs/source/conf.py

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
# -- Path setup --------------------------------------------------------------
1111

12+
import inspect
1213
import os
1314
import sys
1415
from pathlib import Path
@@ -26,6 +27,15 @@
2627
release = os.environ["SPHINX_CUDA_BINDINGS_VER"]
2728

2829

30+
def _github_examples_ref():
31+
if int(os.environ.get("BUILD_PREVIEW", 0)) or int(os.environ.get("BUILD_LATEST", 0)):
32+
return "main"
33+
return f"v{release}"
34+
35+
36+
GITHUB_EXAMPLES_REF = _github_examples_ref()
37+
38+
2939
# -- General configuration ---------------------------------------------------
3040

3141
# Add any Sphinx extension module names here, as strings. They can be
@@ -94,11 +104,15 @@
94104
# Add any paths that contain custom static files (such as style sheets) here,
95105
# relative to this directory. They are copied after the builtin static files,
96106
# so a file named "default.css" will overwrite the builtin "default.css".
97-
html_static_path = ["_static"]
107+
html_static_path = [] # ["_static"] does not exist in our environment
98108

99109
# skip cmdline prompts
100110
copybutton_exclude = ".linenos, .gp"
101111

112+
rst_epilog = f"""
113+
.. |cuda_bindings_github_ref| replace:: {GITHUB_EXAMPLES_REF}
114+
"""
115+
102116
intersphinx_mapping = {
103117
"python": ("https://docs.python.org/3/", None),
104118
"numpy": ("https://numpy.org/doc/stable/", None),
@@ -107,7 +121,45 @@
107121
"cufile": ("https://docs.nvidia.com/gpudirect-storage/api-reference-guide/", None),
108122
}
109123

124+
125+
def _sanitize_generated_docstring(lines):
126+
doc_lines = inspect.cleandoc("\n".join(lines)).splitlines()
127+
if not doc_lines:
128+
return
129+
130+
if "(" in doc_lines[0] and ")" in doc_lines[0]:
131+
doc_lines = doc_lines[1:]
132+
while doc_lines and not doc_lines[0].strip():
133+
doc_lines.pop(0)
134+
135+
if not doc_lines:
136+
lines[:] = []
137+
return
138+
139+
lines[:] = [".. code-block:: text", ""]
140+
lines.extend(f" {line}" if line else " " for line in doc_lines)
141+
142+
143+
def autodoc_process_docstring(app, what, name, obj, options, lines):
144+
if name.startswith("cuda.bindings."):
145+
_sanitize_generated_docstring(lines)
146+
147+
148+
def rewrite_source(app, docname, source):
149+
text = source[0]
150+
151+
if docname.startswith("release/"):
152+
text = text.replace(".. module:: cuda.bindings\n\n", "", 1)
153+
154+
source[0] = text
155+
156+
110157
suppress_warnings = [
111158
# for warnings about multiple possible targets, see NVIDIA/cuda-python#152
112159
"ref.python",
113160
]
161+
162+
163+
def setup(app):
164+
app.connect("autodoc-process-docstring", autodoc_process_docstring)
165+
app.connect("source-read", rewrite_source)

cuda_bindings/docs/source/contribute.rst

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,17 @@
44
Contributing
55
============
66

7-
Thank you for your interest in contributing to ``cuda-bindings``! Based on the type of contribution, it will fall into two categories:
8-
9-
1. You want to report a bug, feature request, or documentation issue
10-
- File an `issue <https://github.com/NVIDIA/cuda-python/issues/new/choose>`_ describing what you encountered or what you want to see changed.
11-
- The NVIDIA team will evaluate the issues and triage them, scheduling
12-
them for a release. If you believe the issue needs priority attention
13-
comment on the issue to notify the team.
14-
2. You want to implement a feature, improvement, or bug fix:
15-
- At this time we do not accept code contributions.
7+
Thank you for your interest in contributing to ``cuda-bindings``! Based on the
8+
type of contribution, it will fall into two categories:
9+
10+
1. You want to report a bug, feature request, or documentation issue.
11+
12+
File an `issue <https://github.com/NVIDIA/cuda-python/issues/new/choose>`_
13+
describing what you encountered or what you want to see changed. The NVIDIA
14+
team will evaluate the issue, triage it, and schedule it for a release. If
15+
you believe the issue needs priority attention, comment on the issue to
16+
notify the team.
17+
18+
2. You want to implement a feature, improvement, or bug fix.
19+
20+
At this time we do not accept code contributions.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
3+
4+
Examples
5+
========
6+
7+
This page links to the ``cuda.bindings`` examples shipped in the
8+
`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/|cuda_bindings_github_ref|/cuda_bindings/examples>`_.
9+
Use it as a quick index when you want a runnable sample for a specific API area
10+
or CUDA feature.
11+
12+
Introduction
13+
------------
14+
15+
- `clock_nvrtc.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/clock_nvrtc.py>`_
16+
uses NVRTC-compiled CUDA code and the device clock to time a reduction
17+
kernel.
18+
- `simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_
19+
demonstrates cubemap texture sampling and transformation.
20+
- `simple_p2p.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_p2p.py>`_
21+
shows peer-to-peer memory access and transfers between multiple GPUs.
22+
- `simple_zero_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_zero_copy.py>`_
23+
uses zero-copy mapped host memory for vector addition.
24+
- `system_wide_atomics.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/system_wide_atomics.py>`_
25+
demonstrates system-wide atomic operations on managed memory.
26+
- `vector_add_drv.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_drv.py>`_
27+
uses the CUDA Driver API and unified virtual addressing for vector addition.
28+
- `vector_add_mmap.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_mmap.py>`_
29+
uses virtual memory management APIs such as ``cuMemCreate`` and
30+
``cuMemMap`` for vector addition.
31+
32+
Concepts and techniques
33+
-----------------------
34+
35+
- `stream_ordered_allocation.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py>`_
36+
demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with
37+
memory-pool release thresholds.
38+
39+
CUDA features
40+
-------------
41+
42+
- `global_to_shmem_async_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py>`_
43+
compares asynchronous global-to-shared-memory copy strategies in matrix
44+
multiplication kernels.
45+
- `simple_cuda_graphs.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py>`_
46+
shows both manual CUDA graph construction and stream-capture-based replay.
47+
48+
Libraries and tools
49+
-------------------
50+
51+
- `conjugate_gradient_multi_block_cg.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`_
52+
implements a conjugate-gradient solver with cooperative groups and
53+
multi-block synchronization.
54+
- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
55+
uses NVML to implement a Python subset of ``nvidia-smi``.
56+
57+
Advanced and interoperability
58+
-----------------------------
59+
60+
- `iso_fd_modelling.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/iso_fd_modelling.py>`_
61+
runs isotropic finite-difference wave propagation across multiple GPUs with
62+
peer-to-peer halo exchange.
63+
- `jit_program.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/jit_program.py>`_
64+
JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
65+
API.
66+
- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/numba_emm_plugin.py>`_
67+
shows how to back Numba's EMM interface with the NVIDIA CUDA Python Driver
68+
API.

cuda_bindings/docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
release
1212
install
1313
overview
14+
examples
1415
motivation
1516
environment_variables
1617
api

cuda_bindings/docs/source/install.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Installing from Source
7878
----------------------
7979

8080
Requirements
81-
^^^^^^^^^^^^
81+
~~~~~~~~~~~~
8282

8383
* CUDA Toolkit headers[^1]
8484
* CUDA Runtime static library[^2]
@@ -100,7 +100,7 @@ See `Environment Variables <environment_variables.rst>`_ for a description of ot
100100
Only ``cydriver``, ``cyruntime`` and ``cynvrtc`` are impacted by the header requirement.
101101

102102
Editable Install
103-
^^^^^^^^^^^^^^^^
103+
~~~~~~~~~~~~~~~~
104104

105105
You can use:
106106

0 commit comments

Comments
 (0)