Skip to content

Commit efb407d

Browse files
authored
Fix cuda.core and cuda.bindings example links in docs (#2156)
1 parent e350c5a commit efb407d

7 files changed

Lines changed: 57 additions & 41 deletions

File tree

cuda_bindings/docs/source/conf.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ def _github_examples_ref():
4343
extensions = [
4444
"sphinx.ext.autodoc",
4545
"sphinx.ext.autosummary",
46+
"sphinx.ext.extlinks",
4647
"sphinx.ext.napoleon",
4748
"sphinx.ext.intersphinx",
4849
"myst_nb",
@@ -108,9 +109,16 @@ def _github_examples_ref():
108109
# skip cmdline prompts
109110
copybutton_exclude = ".linenos, .gp"
110111

111-
rst_epilog = f"""
112-
.. |cuda_bindings_github_ref| replace:: {GITHUB_EXAMPLES_REF}
113-
"""
112+
extlinks = {
113+
"cuda-bindings-example": (
114+
f"https://github.com/NVIDIA/cuda-python/blob/{GITHUB_EXAMPLES_REF}/cuda_bindings/examples/%s",
115+
"%s",
116+
),
117+
"cuda-bindings-examples": (
118+
f"https://github.com/NVIDIA/cuda-python/tree/{GITHUB_EXAMPLES_REF}/cuda_bindings/examples%s",
119+
"%s",
120+
),
121+
}
114122

115123
intersphinx_mapping = {
116124
"python": ("https://docs.python.org/3/", None),

cuda_bindings/docs/source/examples.rst

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,61 +5,61 @@ Examples
55
========
66

77
This page links to the ``cuda.bindings`` examples shipped in the
8-
`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/|cuda_bindings_github_ref|/cuda_bindings/examples>`_.
8+
:cuda-bindings-examples:`cuda-python repository </>`.
99
Use it as a quick index when you want a runnable sample for a specific API area
1010
or CUDA feature.
1111

1212
Introduction
1313
------------
1414

15-
- `clock_nvrtc.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/clock_nvrtc.py>`_
15+
- :cuda-bindings-example:`clock_nvrtc.py <0_Introduction/clock_nvrtc.py>`
1616
uses NVRTC-compiled CUDA code and the device clock to time a reduction
1717
kernel.
18-
- `simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_
18+
- :cuda-bindings-example:`simple_cubemap_texture.py <0_Introduction/simple_cubemap_texture.py>`
1919
demonstrates cubemap texture sampling and transformation.
20-
- `simple_p2p.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_p2p.py>`_
20+
- :cuda-bindings-example:`simple_p2p.py <0_Introduction/simple_p2p.py>`
2121
shows peer-to-peer memory access and transfers between multiple GPUs.
22-
- `simple_zero_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_zero_copy.py>`_
22+
- :cuda-bindings-example:`simple_zero_copy.py <0_Introduction/simple_zero_copy.py>`
2323
uses zero-copy mapped host memory for vector addition.
24-
- `system_wide_atomics.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/system_wide_atomics.py>`_
24+
- :cuda-bindings-example:`system_wide_atomics.py <0_Introduction/system_wide_atomics.py>`
2525
demonstrates system-wide atomic operations on managed memory.
26-
- `vector_add_drv.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_drv.py>`_
26+
- :cuda-bindings-example:`vector_add_drv.py <0_Introduction/vector_add_drv.py>`
2727
uses the CUDA Driver API and unified virtual addressing for vector addition.
28-
- `vector_add_mmap.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_mmap.py>`_
28+
- :cuda-bindings-example:`vector_add_mmap.py <0_Introduction/vector_add_mmap.py>`
2929
uses virtual memory management APIs such as ``cuMemCreate`` and
3030
``cuMemMap`` for vector addition.
3131

3232
Concepts and techniques
3333
-----------------------
3434

35-
- `stream_ordered_allocation.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py>`_
35+
- :cuda-bindings-example:`stream_ordered_allocation.py <2_Concepts_and_Techniques/stream_ordered_allocation.py>`
3636
demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with
3737
memory-pool release thresholds.
3838

3939
CUDA features
4040
-------------
4141

42-
- `global_to_shmem_async_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py>`_
42+
- :cuda-bindings-example:`global_to_shmem_async_copy.py <3_CUDA_Features/global_to_shmem_async_copy.py>`
4343
compares asynchronous global-to-shared-memory copy strategies in matrix
4444
multiplication kernels.
45-
- `simple_cuda_graphs.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py>`_
45+
- :cuda-bindings-example:`simple_cuda_graphs.py <3_CUDA_Features/simple_cuda_graphs.py>`
4646
shows both manual CUDA graph construction and stream-capture-based replay.
4747

4848
Libraries and tools
4949
-------------------
5050

51-
- `conjugate_gradient_multi_block_cg.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`_
51+
- :cuda-bindings-example:`conjugate_gradient_multi_block_cg.py <4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`
5252
implements a conjugate-gradient solver with cooperative groups and
5353
multi-block synchronization.
54-
- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
54+
- :cuda-bindings-example:`nvidia_smi.py <4_CUDA_Libraries/nvidia_smi.py>`
5555
uses NVML to implement a Python subset of ``nvidia-smi``.
5656

5757
Advanced and interoperability
5858
-----------------------------
5959

60-
- `iso_fd_modelling.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/iso_fd_modelling.py>`_
60+
- :cuda-bindings-example:`iso_fd_modelling.py <extra/iso_fd_modelling.py>`
6161
runs isotropic finite-difference wave propagation across multiple GPUs with
6262
peer-to-peer halo exchange.
63-
- `jit_program.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/jit_program.py>`_
63+
- :cuda-bindings-example:`jit_program.py <extra/jit_program.py>`
6464
JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
6565
API.

cuda_bindings/docs/source/overview.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -522,7 +522,7 @@ CUDA objects
522522
Certain CUDA kernels use native CUDA types as their parameters such as ``cudaTextureObject_t``. These types require special handling since they're neither a primitive ctype nor a custom user type. Since ``cuda.bindings`` exposes each of them as Python classes, they each implement ``getPtr()`` and ``__int__()``. These two callables used to support the NumPy and ctypes approach. The difference between each call is further described under `Tips and Tricks <https://nvidia.github.io/cuda-python/cuda-bindings/latest/tips_and_tricks.html#>`_.
523523

524524
For this example, lets use the ``transformKernel`` from
525-
`simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_.
525+
:cuda-bindings-example:`simple_cubemap_texture.py <0_Introduction/simple_cubemap_texture.py>`.
526526
The :doc:`examples` page links to more samples covering textures, graphs,
527527
memory mapping, and multi-GPU workflows.
528528

cuda_core/docs/source/conf.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ def _github_examples_ref():
4646
"sphinx.ext.autosummary",
4747
"sphinx.ext.napoleon",
4848
"sphinx.ext.intersphinx",
49+
"sphinx.ext.extlinks",
4950
"myst_nb",
5051
"sphinx_copybutton",
5152
"sphinx_toolbox.more_autodoc.autoprotocol",
@@ -107,9 +108,16 @@ def _github_examples_ref():
107108
# skip cmdline prompts
108109
copybutton_exclude = ".linenos, .gp"
109110

110-
rst_epilog = f"""
111-
.. |cuda_core_github_ref| replace:: {GITHUB_EXAMPLES_REF}
112-
"""
111+
extlinks = {
112+
"cuda-core-example": (
113+
f"https://github.com/NVIDIA/cuda-python/blob/{GITHUB_EXAMPLES_REF}/cuda_core/examples/%s",
114+
"%s",
115+
),
116+
"cuda-core-examples": (
117+
f"https://github.com/NVIDIA/cuda-python/tree/{GITHUB_EXAMPLES_REF}/cuda_core/examples%s",
118+
"%s",
119+
),
120+
}
113121

114122
intersphinx_mapping = {
115123
"python": ("https://docs.python.org/3/", None),

cuda_core/docs/source/examples.rst

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,55 +5,55 @@ Examples
55
========
66

77
This page links to the ``cuda.core`` examples shipped in the
8-
`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/|cuda_core_github_ref|/cuda_core/examples>`_.
8+
:cuda-core-examples:`cuda-python repository </>`.
99
Use it as a quick index when you want a runnable starting point for a specific
1010
workflow.
1111

1212
Compilation and kernel launch
1313
-----------------------------
1414

15-
- `vector_add.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/vector_add.py>`_
15+
- :cuda-core-example:`vector_add.py`
1616
compiles and launches a simple vector-add kernel with CuPy arrays.
17-
- `saxpy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/saxpy.py>`_
17+
- :cuda-core-example:`saxpy.py`
1818
JIT-compiles a templated SAXPY kernel and launches both float and double
1919
instantiations.
20-
- `pytorch_example.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/pytorch_example.py>`_
20+
- :cuda-core-example:`pytorch_example.py`
2121
launches a CUDA kernel with PyTorch tensors and a wrapped PyTorch stream.
2222

2323
Multi-device and advanced launch configuration
2424
----------------------------------------------
2525

26-
- `simple_multi_gpu_example.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/simple_multi_gpu_example.py>`_
26+
- :cuda-core-example:`simple_multi_gpu_example.py`
2727
compiles and launches kernels across multiple GPUs.
28-
- `thread_block_cluster.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/thread_block_cluster.py>`_
28+
- :cuda-core-example:`thread_block_cluster.py`
2929
demonstrates thread block cluster launch configuration on Hopper-class GPUs.
30-
- `tma_tensor_map.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/tma_tensor_map.py>`_
30+
- :cuda-core-example:`tma_tensor_map.py`
3131
demonstrates Tensor Memory Accelerator descriptors and TMA-based bulk copies.
3232

3333
Linking and graphs
3434
------------------
3535

36-
- `jit_lto_fractal.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/jit_lto_fractal.py>`_
36+
- :cuda-core-example:`jit_lto_fractal.py`
3737
uses JIT link-time optimization to link user-provided device code into a
3838
fractal workflow at runtime.
39-
- `cuda_graphs.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/cuda_graphs.py>`_
39+
- :cuda-core-example:`cuda_graphs.py`
4040
captures and replays a multi-kernel CUDA graph to reduce launch overhead.
4141

4242
Interoperability and memory access
4343
----------------------------------
4444

45-
- `memory_ops.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/memory_ops.py>`_
45+
- :cuda-core-example:`memory_ops.py`
4646
covers memory resources, pinned memory, device transfers, and DLPack interop.
47-
- `strided_memory_view_cpu.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/strided_memory_view_cpu.py>`_
47+
- :cuda-core-example:`strided_memory_view_cpu.py`
4848
uses ``StridedMemoryView`` with JIT-compiled CPU code via ``cffi``.
49-
- `strided_memory_view_gpu.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/strided_memory_view_gpu.py>`_
49+
- :cuda-core-example:`strided_memory_view_gpu.py`
5050
uses ``StridedMemoryView`` with JIT-compiled GPU code and foreign GPU buffers.
51-
- `gl_interop_plasma.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/gl_interop_plasma.py>`_
51+
- :cuda-core-example:`gl_interop_plasma.py`
5252
renders a CUDA-generated plasma effect through OpenGL interop without CPU
5353
copies.
5454

5555
System inspection
5656
-----------------
5757

58-
- `show_device_properties.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/show_device_properties.py>`_
58+
- :cuda-core-example:`show_device_properties.py`
5959
prints a detailed report of the CUDA devices available on the system.

cuda_core/docs/source/getting-started.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Example: Compiling and Launching a CUDA kernel
3232
----------------------------------------------
3333

3434
To get a taste for ``cuda.core``, let's walk through a simple example that compiles and launches a vector addition kernel.
35-
You can find the complete example in `vector_add.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/vector_add.py>`_
35+
You can find the complete example in :cuda-core-example:`vector_add.py`
3636
and browse the :doc:`examples page <examples>` for the rest of the shipped
3737
workflows.
3838

@@ -80,7 +80,7 @@ Note the use of the ``name_expressions`` parameter to the :meth:`Program.compile
8080
Next, we retrieve the compiled kernel from the CUBIN and prepare the arguments and kernel configuration.
8181
We're using `CuPy <https://cupy.dev/>`_ arrays as inputs for this example, but
8282
you can use PyTorch tensors too (see
83-
`pytorch_example.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/pytorch_example.py>`_
83+
:cuda-core-example:`pytorch_example.py`
8484
and the :doc:`examples page <examples>`).
8585

8686
.. code-block:: python

cuda_core/docs/source/interoperability.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,11 +70,11 @@ a few iterations to ensure correctness.
7070
for extracting the metadata (such as pointer address, shape, strides, and
7171
dtype) from any Python objects supporting either CAI or DLPack and returning a
7272
:class:`~utils.StridedMemoryView` object. See the
73-
`strided_memory_view_constructors.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/strided_memory_view_constructors.py>`_
73+
:cuda-core-example:`strided_memory_view_constructors.py`
7474
example for the explicit constructors, or
75-
`strided_memory_view_cpu.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/strided_memory_view_cpu.py>`_
75+
:cuda-core-example:`strided_memory_view_cpu.py`
7676
and
77-
`strided_memory_view_gpu.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_core_github_ref|/cuda_core/examples/strided_memory_view_gpu.py>`_
77+
:cuda-core-example:`strided_memory_view_gpu.py`
7878
for decorator-based workflows. This provides a *concrete implementation* to
7979
both protocols that is **array-library-agnostic**, so that all Python projects
8080
can just rely on this without either re-implementing (the consumer-side of)

0 commit comments

Comments
 (0)