Skip to content

Commit 6e3fbf5

Browse files
CharryWuLegNeato
authored andcommitted
docs(guide): improve pipeline diagram, add Windows setup, expand cuDNN coverage table
- Add ASCII pipeline diagram to cuda/pipeline.md showing the full Rust CUDA compilation flow (kernel .rs -> NVVM IR -> PTX -> SASS) with crate roles annotated; fixes missing image referenced in #219 - Add Windows setup section to guide/getting_started.md covering prerequisites (MSVC build tools, CUDA Toolkit, cuDNN), PATH configuration for CUDA 12.x and 13.x, and a common-errors table - Expand the cuDNN row in features.md from a single 'In-progress' note to a full sub-table listing all 17 legacy API modules with their implementation status (implemented / not yet wrapped / WIP) Closes #219 Made-with: Cursor
1 parent 85ab9c2 commit 6e3fbf5

File tree

3 files changed

+129
-2
lines changed

3 files changed

+129
-2
lines changed

guide/src/cuda/pipeline.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,43 @@ any language. For an assembly format, PTX is fairly user-friendly.
3131

3232
PTX can be run on NVIDIA GPUs using the driver API or runtime API. Those APIs will convert the PTX
3333
into a final format called SASS which is register allocated and executed on the GPU.
34+
35+
## The Rust CUDA pipeline
36+
37+
The Rust CUDA project replaces NVCC with a custom rustc backend. The pipeline looks like this:
38+
39+
```
40+
+---------------------------------------------------------------------+
41+
| Rust CUDA Pipeline |
42+
| |
43+
| Host code (.rs) GPU kernel code (.rs) |
44+
| | | |
45+
| | rustc_codegen_nvvm |
46+
| | (custom rustc backend) |
47+
| | | |
48+
| | NVVM IR (.bc) |
49+
| | | |
50+
| | libNVVM |
51+
| | | |
52+
| | PTX (.ptx) <-- embedded via |
53+
| | | include_str!() |
54+
| v v |
55+
| Host binary ---- cust ------> Driver API |
56+
| (Rust) (CUDA) |
57+
| | |
58+
| JIT compile |
59+
| | |
60+
| SASS (GPU machine code) |
61+
| | |
62+
| GPU execution |
63+
+---------------------------------------------------------------------+
64+
```
65+
66+
- **`rustc_codegen_nvvm`** is a custom rustc backend that compiles GPU kernel crates to NVVM IR
67+
(LLVM bitcode) instead of the usual host target.
68+
- **`cuda_std`** provides the GPU-side standard library (thread indexing, shared memory,
69+
intrinsics, etc.) used inside kernel crates.
70+
- **`cuda_builder`** is a build-script helper that drives `rustc_codegen_nvvm` from a host
71+
crate's `build.rs`, producing a `.ptx` file that is embedded in the host binary.
72+
- **`cust`** is the host-side safe wrapper around the CUDA Driver API, used to load modules,
73+
allocate GPU memory, launch kernels, and synchronize results.

guide/src/features.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Supported features
1+
# Supported features
22

33
This page is used for tracking Cargo/Rust and CUDA features that are currently supported
44
or planned to be supported in the future. As well as tracking some information about how they could
@@ -50,7 +50,28 @@ around to adding it yet.
5050
| cuFFT ||
5151
| cuSOLVER ||
5252
| cuRAND || cuRAND only works with the runtime API, we have our own general purpose GPU rand library called `gpu_rand` |
53-
| cuDNN || In-progress |
53+
| cuDNN | 🟨 | Partially implemented -- see sub-table below |
54+
55+
### cuDNN API coverage
56+
57+
| Module | Status | Notes |
58+
| --- | --- | --- |
59+
| Activation (ReLU, sigmoid, tanh, etc.) | ✔️ | Forward and backward |
60+
| Attention / Multi-Head Attention | ✔️ | Forward, backward data and weights |
61+
| Convolution | ✔️ | Forward, bias+activation fused, backward data/filter, workspace query, grouped conv |
62+
| Dropout | ✔️ | Forward and backward, state management |
63+
| Normalization (Layer / Instance / Group) || Not yet wrapped |
64+
| Batch Normalization || Not yet wrapped |
65+
| Pooling (max, average) | ✔️ | Forward and backward, N-dimensional |
66+
| Reduction (sum, max, norm, etc.) | ✔️ | With workspace and indices support |
67+
| RNN (LSTM, GRU, vanilla) | ✔️ | v8 API: forward, backward data, backward weights |
68+
| Softmax | ✔️ | Forward and backward, accurate and fast modes |
69+
| Tensor ops (add, scale, set, element-wise) | ✔️ | cudnnOpTensor, cudnnAddTensor, etc. |
70+
| CTC Loss || Not yet wrapped |
71+
| Spatial Transformer || Not yet wrapped |
72+
| Backend / Graph API | 🟨 | Implemented internally but not yet public; marked WIP |
73+
| f16 / bf16 data types || Not supported at the crate level |
74+
| cuDNN 9 error codes | 🟨 | Partial -- falls back to todo!() for unknown status codes |
5475
| cuSPARSE ||
5576
| AmgX ||
5677
| cuTENSOR ||

guide/src/guide/getting_started.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,6 +345,72 @@ A sample `.devcontainer.json` file is also included, configured for Ubuntu 24.04
345345

346346
[`deviceQuery`]: https://github.com/NVIDIA/cuda-samples/tree/ba04faaf7328dbcc87bfc9acaf17f951ee5ddcf3/Samples/deviceQuery
347347

348+
349+
## Windows setup
350+
351+
This section covers Windows-specific steps for getting a native (non-Docker) build working.
352+
353+
### Prerequisites
354+
355+
1. **Visual Studio Build Tools** with the "Desktop development with C++" workload.
356+
Install from [Visual Studio Downloads](https://visualstudio.microsoft.com/downloads/).
357+
358+
2. **CUDA Toolkit 12.x or 13.x** from [NVIDIA's website](https://developer.nvidia.com/cuda-downloads).
359+
The installer will add CUDA to your `PATH` automatically.
360+
361+
3. **Rust nightly toolchain** -- `rustup` will install the pinned version automatically when you
362+
run any `cargo` command inside the repo (from `rust-toolchain.toml`).
363+
364+
### PATH configuration
365+
366+
After installing the CUDA Toolkit, verify the following directories are on your `PATH`:
367+
368+
```powershell
369+
# CUDA 13.x
370+
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin"
371+
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64"
372+
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\nvvm\bin\x64"
373+
374+
# CUDA 12.x -- replace v12.x with your installed version
375+
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin"
376+
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\nvvm\bin"
377+
```
378+
379+
To make these permanent, add them via **System Properties > Environment Variables**.
380+
381+
### cuDNN (optional)
382+
383+
If you plan to use the `cudnn` crate, install cuDNN from
384+
[NVIDIA cuDNN](https://developer.nvidia.com/cudnn).
385+
386+
Place the cuDNN `bin`, `include`, and `lib` directories inside your CUDA Toolkit installation
387+
directory, or set the `CUDNN_PATH` environment variable to the cuDNN root:
388+
389+
```powershell
390+
$env:CUDNN_PATH = "C:\path\to\cudnn"
391+
```
392+
393+
The `cudnn-sys` build script searches these locations automatically (including versioned
394+
subdirectory layouts like `v9.x`).
395+
396+
### Building and running
397+
398+
Once your `PATH` is configured, build and run exactly as on Linux:
399+
400+
```powershell
401+
cargo build
402+
cargo run -p vecadd
403+
```
404+
405+
### Common errors
406+
407+
| Error | Fix |
408+
| --- | --- |
409+
| `error: couldn't load codegen backend` | Add the `nvvm\bin\x64` directory to `PATH` (see above) |
410+
| `cannot open shared object file: libnvvm` | Same fix -- the NVVM DLL must be on `PATH` |
411+
| `LINK : fatal error LNK1181: cannot open input file 'advapi32.lib'` | Ensure MSVC Build Tools are installed |
412+
| `cudnn.lib not found` | Set `CUDNN_PATH` or copy cuDNN files into the CUDA Toolkit directory |
413+
348414
## More examples
349415

350416
The [`examples`] directory has more complex examples. They all follow the same basic structure as

0 commit comments

Comments
 (0)