This page links to the cuda.core examples shipped in the
cuda-python repository.
Use it as a quick index when you want a runnable starting point for a specific
workflow.
- vector_add.py compiles and launches a simple vector-add kernel with CuPy arrays.
- saxpy.py JIT-compiles a templated SAXPY kernel and launches both float and double instantiations.
- pytorch_example.py launches a CUDA kernel with PyTorch tensors and a wrapped PyTorch stream.
- simple_multi_gpu_example.py compiles and launches kernels across multiple GPUs.
- thread_block_cluster.py demonstrates thread block cluster launch configuration on Hopper-class GPUs.
- tma_tensor_map.py demonstrates Tensor Memory Accelerator descriptors and TMA-based bulk copies.
- jit_lto_fractal.py uses JIT link-time optimization to link user-provided device code into a fractal workflow at runtime.
- cuda_graphs.py captures and replays a multi-kernel CUDA graph to reduce launch overhead.
- memory_ops.py covers memory resources, pinned memory, device transfers, and DLPack interop.
- strided_memory_view_cpu.py
uses
StridedMemoryViewwith JIT-compiled CPU code viacffi. - strided_memory_view_gpu.py
uses
StridedMemoryViewwith JIT-compiled GPU code and foreign GPU buffers. - gl_interop_plasma.py renders a CUDA-generated plasma effect through OpenGL interop without CPU copies.
- show_device_properties.py prints a detailed report of the CUDA devices available on the system.