Skip to content

Commit 4838baf

Browse files
committed
feat: render the Mandelbrot set on CPU and GPU
Replace the add/subtract demo with a more instructional example: the Mandelbrot set rendered two ways so users can read both implementations side by side and compare their performance. - src/mandelbrot_cpu.cpp: plain nested loop over every pixel - src/mandelbrot.cu: the same logic as a CUDA kernel, one thread per pixel - src/main.cpp: shared wrapper returning a (height, width) int32 NumPy array, releasing the GIL during compute - both functions take identical arguments and return identical arrays - add numpy as a runtime dependency; test GPU output matches the CPU - ignore vim swap files Assisted-by: ClaudeCode:claude-opus-4.8
1 parent f9e67a7 commit 4838baf

13 files changed

Lines changed: 310 additions & 111 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,3 +143,7 @@ cython_debug/
143143

144144
_skbuild/
145145
.pyodide-xbuildenv/
146+
147+
# Vim swap files
148+
*.swp
149+
*.swo

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ project(
1414
find_package(pybind11 CONFIG REQUIRED)
1515

1616
# Add a library using FindPython's tooling (pybind11 also provides a helper like
17-
# this), combining the pybind11 bindings with the CUDA kernels.
18-
python_add_library(_core MODULE src/main.cpp src/add.cu WITH_SOABI)
17+
# this), combining the pybind11 bindings with the CPU and CUDA implementations.
18+
python_add_library(_core MODULE src/main.cpp src/mandelbrot_cpu.cpp src/mandelbrot.cu WITH_SOABI)
1919
target_link_libraries(_core PRIVATE pybind11::headers)
2020

2121
# This is passing in the version as a define just as an example

README.md

Lines changed: 57 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,19 @@
1111
An example project built with [pybind11][], [CUDA][], and
1212
[scikit-build-core][]. Python 3.9+.
1313

14-
The extension exposes tiny `add` and `subtract` functions that run on the GPU
15-
with CUDA kernels (`src/add.cu`). Building **requires** the CUDA Toolkit
16-
(`nvcc`): the CMake project declares CUDA as a required language, so
17-
configuration fails without it. The CUDA runtime is linked statically, so the
18-
resulting wheels do not depend on `libcudart` and stay importable on machines
19-
without a GPU — running `add`/`subtract` there raises, but `cuda_available()`
20-
lets you check first.
14+
The extension renders the [Mandelbrot set][mandelbrot] two ways — once on the
15+
CPU and once on the GPU — so you can read both side by side and compare their
16+
performance. The two implementations are written the same way on purpose:
17+
18+
* `src/mandelbrot_cpu.cpp` — a plain nested loop over every pixel
19+
* `src/mandelbrot.cu` — the same logic as a CUDA kernel, one thread per pixel
20+
21+
Both return a `(height, width)` int32 NumPy array of escape counts. Building
22+
**requires** the CUDA Toolkit (`nvcc`): the CMake project declares CUDA as a
23+
required language, so configuration fails without it. The CUDA runtime is linked
24+
statically, so the resulting wheels do not depend on `libcudart` and stay
25+
importable on machines without a GPU — calling `mandelbrot_gpu` there raises, but
26+
`cuda_available()` lets you check first.
2127

2228

2329
[gitter-badge]: https://badges.gitter.im/pybind/Lobby.svg
@@ -39,9 +45,42 @@ The CUDA Toolkit (`nvcc`) must be installed and discoverable by CMake.
3945
```python
4046
import cuda_example
4147

48+
# (height, width) int32 array of escape counts
49+
image = cuda_example.mandelbrot_cpu(width=800, height=600, max_iterations=100)
50+
4251
if cuda_example.cuda_available():
43-
cuda_example.add(1, 2) # 3, computed on the GPU
44-
cuda_example.subtract(1, 2) # -1, computed on the GPU
52+
image = cuda_example.mandelbrot_gpu(width=800, height=600, max_iterations=100)
53+
```
54+
55+
You can view the result with any plotting library, e.g.:
56+
57+
```python
58+
import matplotlib.pyplot as plt
59+
60+
plt.imshow(image, extent=(-2, 1, -1.25, 1.25), cmap="twilight_shifted")
61+
plt.show()
62+
```
63+
64+
## Comparing CPU and GPU
65+
66+
Because both functions take the same arguments and return identical arrays, you
67+
can run them back to back and time them (on a machine with a GPU):
68+
69+
```python
70+
import time
71+
import cuda_example
72+
73+
size = {"width": 2000, "height": 1500, "max_iterations": 200}
74+
75+
start = time.perf_counter()
76+
cpu = cuda_example.mandelbrot_cpu(**size)
77+
print(f"CPU: {time.perf_counter() - start:.3f}s")
78+
79+
start = time.perf_counter()
80+
gpu = cuda_example.mandelbrot_gpu(**size)
81+
print(f"GPU: {time.perf_counter() - start:.3f}s")
82+
83+
assert (cpu == gpu).all() # identical results, very different runtimes
4584
```
4685

4786
## Building CUDA wheels
@@ -86,16 +125,17 @@ docker run --rm \
86125
PY=/opt/python/cp312-cp312/bin/python
87126
cp -r /io /tmp/src && cd /tmp/src
88127
$PY -m pip install --upgrade pip build pytest
89-
$PY -m build --wheel --outdir /wheelhouse . # compiles src/add.cu with nvcc
128+
$PY -m build --wheel --outdir /wheelhouse . # compiles src/mandelbrot.cu with nvcc
90129
$PY -m pip install /wheelhouse/*.whl
91130
$PY -m pytest # GPU tests skip (no device)
92131
'
93132
```
94133

95134
The compiled wheel is written to `./wheelhouse/` on the host, so you can inspect
96135
or install it afterwards. Because the container has no GPU, `cuda_available()`
97-
returns `False` and the `add`/`subtract` tests are skipped. The same flow runs
98-
in CI in the `cuda` job of `.github/workflows/pip.yml`.
136+
returns `False` and the `mandelbrot_gpu` test is skipped (the `mandelbrot_cpu`
137+
tests still run). The same flow runs in CI in the `cuda` job of
138+
`.github/workflows/pip.yml`.
99139

100140
## Files
101141

@@ -104,9 +144,10 @@ necessary. The necessary files are:
104144

105145
* `pyproject.toml`: The Python project file
106146
* `CMakeLists.txt`: The CMake configuration file, which requires the CUDA language
107-
* `src/main.cpp`: The pybind11 bindings
108-
* `src/add.cu`: The CUDA kernels (`add`/`subtract`) and runtime device query
109-
* `src/add.h`: The shared declarations
147+
* `src/main.cpp`: The pybind11 bindings (turns the results into NumPy arrays)
148+
* `src/mandelbrot_cpu.cpp`: The CPU implementation
149+
* `src/mandelbrot.cu`: The CUDA kernel and runtime device query
150+
* `src/mandelbrot.h`: The shared declarations
110151
* `src/cuda_example/__init__.py`: The Python portion of the module. The root of the module needs to be `<package_name>`, `src/<package_name>`, or `python/<package_name>` to be auto-discovered.
111152

112153
These files are also expected and highly recommended:
@@ -148,6 +189,7 @@ terms and conditions of this license.
148189
[cibuildwheel]: https://cibuildwheel.readthedocs.io
149190
[cibw-cuda]: https://github.com/pypa/cibuildwheel/pull/2896
150191
[cuda]: https://developer.nvidia.com/cuda-toolkit
192+
[mandelbrot]: https://en.wikipedia.org/wiki/Mandelbrot_set
151193
[scientific-python development guide]: https://learn.scientific-python.org/development
152194
[dependabot]: https://docs.github.com/en/code-security/dependabot
153195
[github actions]: https://docs.github.com/en/actions

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,11 @@ classifiers = [
2525
"Programming Language :: Python :: 3.14",
2626
"Private :: Do Not Upload",
2727
]
28+
dependencies = ["numpy"]
2829

2930

3031
[dependency-groups]
31-
test = ["pytest"]
32+
test = ["pytest", "numpy"]
3233
dev = [{ include-group = "test" }]
3334

3435

src/add.cu

Lines changed: 0 additions & 51 deletions
This file was deleted.

src/add.h

Lines changed: 0 additions & 12 deletions
This file was deleted.

src/cuda_example/__init__.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
from ._core import (
44
__doc__,
55
__version__,
6-
add,
76
cuda_available,
8-
subtract,
7+
mandelbrot_cpu,
8+
mandelbrot_gpu,
99
)
1010

1111
__all__ = [
1212
"__doc__",
1313
"__version__",
14-
"add",
1514
"cuda_available",
16-
"subtract",
15+
"mandelbrot_cpu",
16+
"mandelbrot_gpu",
1717
]

src/cuda_example/__init__.pyi

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,50 @@
11
"""
2-
Pybind11 + CUDA example plugin
3-
------------------------------
2+
Pybind11 + CUDA Mandelbrot example
3+
----------------------------------
44
55
.. currentmodule:: cuda_example
66
77
.. autosummary::
88
:toctree: _generate
99
10-
add
11-
subtract
10+
mandelbrot_cpu
11+
mandelbrot_gpu
1212
cuda_available
1313
"""
1414

1515
from __future__ import annotations
1616

17-
def add(i: int, j: int) -> int:
17+
from numpy import int32
18+
from numpy.typing import NDArray
19+
20+
def mandelbrot_cpu(
21+
width: int = ...,
22+
height: int = ...,
23+
max_iterations: int = ...,
24+
xmin: float = ...,
25+
xmax: float = ...,
26+
ymin: float = ...,
27+
ymax: float = ...,
28+
) -> NDArray[int32]:
1829
"""
19-
Add two numbers on the GPU with a CUDA kernel.
30+
Render the Mandelbrot set on the CPU.
31+
32+
Returns a ``(height, width)`` int32 array of escape counts.
2033
"""
2134

22-
def subtract(i: int, j: int) -> int:
35+
def mandelbrot_gpu(
36+
width: int = ...,
37+
height: int = ...,
38+
max_iterations: int = ...,
39+
xmin: float = ...,
40+
xmax: float = ...,
41+
ymin: float = ...,
42+
ymax: float = ...,
43+
) -> NDArray[int32]:
2344
"""
24-
Subtract two numbers on the GPU with a CUDA kernel.
45+
Render the Mandelbrot set on the GPU with CUDA.
46+
47+
Returns a ``(height, width)`` int32 array of escape counts.
2548
"""
2649

2750
def cuda_available() -> bool:

src/main.cpp

Lines changed: 62 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,85 @@
1+
#include <pybind11/numpy.h>
12
#include <pybind11/pybind11.h>
23

3-
#include "add.h"
4+
#include <cstdint>
5+
#include <stdexcept>
6+
7+
#include "mandelbrot.h"
48

59
#define STRINGIFY(x) #x
610
#define MACRO_STRINGIFY(x) STRINGIFY(x)
711

812
namespace py = pybind11;
913

14+
namespace {
15+
16+
using Image = py::array_t<std::int32_t, py::array::c_style>;
17+
18+
// Shared wrapper: validate the arguments, allocate the output image, and run one
19+
// of the compute functions with the GIL released. Returns a (height, width)
20+
// NumPy array of escape counts.
21+
Image render(void (*compute)(const MandelbrotParams &, std::int32_t *), int width, int height,
22+
int max_iterations, double xmin, double xmax, double ymin, double ymax) {
23+
if (width <= 0 || height <= 0) {
24+
throw std::invalid_argument("width and height must be positive");
25+
}
26+
if (max_iterations <= 0) {
27+
throw std::invalid_argument("max_iterations must be positive");
28+
}
29+
30+
const MandelbrotParams params{width, height, max_iterations, xmin, xmax, ymin, ymax};
31+
Image image({height, width});
32+
std::int32_t *data = image.mutable_data();
33+
34+
{
35+
py::gil_scoped_release release;
36+
compute(params, data);
37+
}
38+
return image;
39+
}
40+
41+
} // namespace
42+
1043
PYBIND11_MODULE(_core, m, py::mod_gil_not_used(), py::multiple_interpreters::per_interpreter_gil()) {
1144
m.doc() = R"pbdoc(
12-
Pybind11 + CUDA example plugin
13-
------------------------------
45+
Pybind11 + CUDA Mandelbrot example
46+
----------------------------------
1447
1548
.. currentmodule:: cuda_example
1649
1750
.. autosummary::
1851
:toctree: _generate
1952
20-
add
21-
subtract
53+
mandelbrot_cpu
54+
mandelbrot_gpu
2255
cuda_available
2356
)pbdoc";
2457

25-
m.def("add", &add, R"pbdoc(
26-
Add two numbers on the GPU with a CUDA kernel.
27-
)pbdoc");
58+
const char *doc = R"pbdoc(
59+
Render the Mandelbrot set.
2860
29-
m.def("subtract", &subtract, R"pbdoc(
30-
Subtract two numbers on the GPU with a CUDA kernel.
31-
)pbdoc");
61+
Returns a ``(height, width)`` int32 NumPy array; each value is the number
62+
of iterations before the point escaped (``max_iterations`` if it never
63+
did).
64+
)pbdoc";
65+
66+
m.def("mandelbrot_cpu",
67+
[](int width, int height, int max_iterations, double xmin, double xmax, double ymin,
68+
double ymax) {
69+
return render(&mandelbrot_cpu, width, height, max_iterations, xmin, xmax, ymin, ymax);
70+
},
71+
py::arg("width") = 800, py::arg("height") = 600, py::arg("max_iterations") = 100,
72+
py::arg("xmin") = -2.0, py::arg("xmax") = 1.0, py::arg("ymin") = -1.25,
73+
py::arg("ymax") = 1.25, doc);
74+
75+
m.def("mandelbrot_gpu",
76+
[](int width, int height, int max_iterations, double xmin, double xmax, double ymin,
77+
double ymax) {
78+
return render(&mandelbrot_gpu, width, height, max_iterations, xmin, xmax, ymin, ymax);
79+
},
80+
py::arg("width") = 800, py::arg("height") = 600, py::arg("max_iterations") = 100,
81+
py::arg("xmin") = -2.0, py::arg("xmax") = 1.0, py::arg("ymin") = -1.25,
82+
py::arg("ymax") = 1.25, doc);
3283

3384
m.def("cuda_available", &cuda_available, R"pbdoc(
3485
Return True if a CUDA-capable device is available at runtime.

0 commit comments

Comments
 (0)