docs(guide): improve pipeline diagram, add Windows setup, expand cuDNN coverage table

CharryWu · LegNeato · commit 6e3fbf5e9b85 · 2026-04-01T19:36:09.000Z
- Add ASCII pipeline diagram to cuda/pipeline.md showing the full Rust CUDA compilation flow (kernel .rs -> NVVM IR -> PTX -> SASS) with crate roles annotated; fixes missing image referenced in #219 - Add Windows setup section to guide/getting_started.md covering prerequisites (MSVC build tools, CUDA Toolkit, cuDNN), PATH configuration for CUDA 12.x and 13.x, and a common-errors table - Expand the cuDNN row in features.md from a single 'In-progress' note to a full sub-table listing all 17 legacy API modules with their implementation status (implemented / not yet wrapped / WIP) Closes #219 Made-with: Cursor
diff --git a/guide/src/cuda/pipeline.md b/guide/src/cuda/pipeline.md
@@ -31,3 +31,43 @@ any language. For an assembly format, PTX is fairly user-friendly.
 
 PTX can be run on NVIDIA GPUs using the driver API or runtime API. Those APIs will convert the PTX
 into a final format called SASS which is register allocated and executed on the GPU.
+
+## The Rust CUDA pipeline
+
+The Rust CUDA project replaces NVCC with a custom rustc backend. The pipeline looks like this:
+
+```
++---------------------------------------------------------------------+
+|                        Rust CUDA Pipeline                           |
+|                                                                     |
+|  Host code (.rs)           GPU kernel code (.rs)                    |
+|       |                          |                                  |
+|       |                   rustc_codegen_nvvm                        |
+|       |                   (custom rustc backend)                    |
+|       |                          |                                  |
+|       |                     NVVM IR (.bc)                           |
+|       |                          |                                  |
+|       |                      libNVVM                                |
+|       |                          |                                  |
+|       |                      PTX (.ptx)  <-- embedded via           |
+|       |                          |           include_str!()         |
+|       v                          v                                  |
+|  Host binary ---- cust ------> Driver API                           |
+|                  (Rust)         (CUDA)                              |
+|                                  |                                  |
+|                              JIT compile                            |
+|                                  |                                  |
+|                              SASS (GPU machine code)                |
+|                                  |                                  |
+|                              GPU execution                          |
++---------------------------------------------------------------------+
+```
+
+- **`rustc_codegen_nvvm`** is a custom rustc backend that compiles GPU kernel crates to NVVM IR
+  (LLVM bitcode) instead of the usual host target.
+- **`cuda_std`** provides the GPU-side standard library (thread indexing, shared memory,
+  intrinsics, etc.) used inside kernel crates.
+- **`cuda_builder`** is a build-script helper that drives `rustc_codegen_nvvm` from a host
+  crate's `build.rs`, producing a `.ptx` file that is embedded in the host binary.
+- **`cust`** is the host-side safe wrapper around the CUDA Driver API, used to load modules,
+  allocate GPU memory, launch kernels, and synchronize results.
diff --git a/guide/src/features.md b/guide/src/features.md
@@ -1,4 +1,4 @@
-# Supported features 
+﻿# Supported features 
 
 This page is used for tracking Cargo/Rust and CUDA features that are currently supported 
 or planned to be supported in the future. As well as tracking some information about how they could 
@@ -50,7 +50,28 @@ around to adding it yet.
 | cuFFT | ❌ |
 | cuSOLVER | ❌ |
 | cuRAND | ➖ | cuRAND only works with the runtime API, we have our own general purpose GPU rand library called `gpu_rand` |
-| cuDNN | ❌ | In-progress |
+| cuDNN | 🟨 | Partially implemented -- see sub-table below |
+
+### cuDNN API coverage
+
+| Module | Status | Notes |
+| --- | --- | --- |
+| Activation (ReLU, sigmoid, tanh, etc.) | ✔️ | Forward and backward |
+| Attention / Multi-Head Attention | ✔️ | Forward, backward data and weights |
+| Convolution | ✔️ | Forward, bias+activation fused, backward data/filter, workspace query, grouped conv |
+| Dropout | ✔️ | Forward and backward, state management |
+| Normalization (Layer / Instance / Group) | ❌ | Not yet wrapped |
+| Batch Normalization | ❌ | Not yet wrapped |
+| Pooling (max, average) | ✔️ | Forward and backward, N-dimensional |
+| Reduction (sum, max, norm, etc.) | ✔️ | With workspace and indices support |
+| RNN (LSTM, GRU, vanilla) | ✔️ | v8 API: forward, backward data, backward weights |
+| Softmax | ✔️ | Forward and backward, accurate and fast modes |
+| Tensor ops (add, scale, set, element-wise) | ✔️ | cudnnOpTensor, cudnnAddTensor, etc. |
+| CTC Loss | ❌ | Not yet wrapped |
+| Spatial Transformer | ❌ | Not yet wrapped |
+| Backend / Graph API | 🟨 | Implemented internally but not yet public; marked WIP |
+| f16 / bf16 data types | ❌ | Not supported at the crate level |
+| cuDNN 9 error codes | 🟨 | Partial -- falls back to todo!() for unknown status codes |
 | cuSPARSE | ❌ |
 | AmgX | ❌ |
 | cuTENSOR | ❌ |
diff --git a/guide/src/guide/getting_started.md b/guide/src/guide/getting_started.md
@@ -345,6 +345,72 @@ A sample `.devcontainer.json` file is also included, configured for Ubuntu 24.04
 
 [`deviceQuery`]: https://github.com/NVIDIA/cuda-samples/tree/ba04faaf7328dbcc87bfc9acaf17f951ee5ddcf3/Samples/deviceQuery
 
+
+## Windows setup
+
+This section covers Windows-specific steps for getting a native (non-Docker) build working.
+
+### Prerequisites
+
+1. **Visual Studio Build Tools** with the "Desktop development with C++" workload.
+   Install from [Visual Studio Downloads](https://visualstudio.microsoft.com/downloads/).
+
+2. **CUDA Toolkit 12.x or 13.x** from [NVIDIA's website](https://developer.nvidia.com/cuda-downloads).
+   The installer will add CUDA to your `PATH` automatically.
+
+3. **Rust nightly toolchain** -- `rustup` will install the pinned version automatically when you
+   run any `cargo` command inside the repo (from `rust-toolchain.toml`).
+
+### PATH configuration
+
+After installing the CUDA Toolkit, verify the following directories are on your `PATH`:
+
+```powershell
+# CUDA 13.x
+$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin"
+$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64"
+$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\nvvm\bin\x64"
+
+# CUDA 12.x -- replace v12.x with your installed version
+$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin"
+$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\nvvm\bin"
+```
+
+To make these permanent, add them via **System Properties > Environment Variables**.
+
+### cuDNN (optional)
+
+If you plan to use the `cudnn` crate, install cuDNN from
+[NVIDIA cuDNN](https://developer.nvidia.com/cudnn).
+
+Place the cuDNN `bin`, `include`, and `lib` directories inside your CUDA Toolkit installation
+directory, or set the `CUDNN_PATH` environment variable to the cuDNN root:
+
+```powershell
+$env:CUDNN_PATH = "C:\path\to\cudnn"
+```
+
+The `cudnn-sys` build script searches these locations automatically (including versioned
+subdirectory layouts like `v9.x`).
+
+### Building and running
+
+Once your `PATH` is configured, build and run exactly as on Linux:
+
+```powershell
+cargo build
+cargo run -p vecadd
+```
+
+### Common errors
+
+| Error | Fix |
+| --- | --- |
+| `error: couldn't load codegen backend` | Add the `nvvm\bin\x64` directory to `PATH` (see above) |
+| `cannot open shared object file: libnvvm` | Same fix -- the NVVM DLL must be on `PATH` |
+| `LINK : fatal error LNK1181: cannot open input file 'advapi32.lib'` | Ensure MSVC Build Tools are installed |
+| `cudnn.lib not found` | Set `CUDNN_PATH` or copy cuDNN files into the CUDA Toolkit directory |
+
 ## More examples
 
 The [`examples`] directory has more complex examples. They all follow the same basic structure as