Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
name: Bug Report
about: Report a bug or unexpected behavior
title: ''
labels: bug
assignees: ''
---

**Describe the bug**
A clear and concise description of what the bug is.

**To reproduce**
Steps to reproduce the behavior:

1. ...
2. ...

**Expected behavior**
A clear and concise description of what you expected to happen.

**Environment**

- OS: [e.g. Windows 11, Ubuntu 22.04]
- GPU: [e.g. RTX 3060]
- CUDA Toolkit version: [e.g. 13.2]
- cuDNN version (if applicable): [e.g. 9.x]
- Rust toolchain: [output of `rustc --version`]

**Error output**
If applicable, paste the full error message or log output.

**Additional context**
Add any other context about the problem here.
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
name: Feature Request
about: Suggest an enhancement or new feature
title: ''
labels: enhancement
assignees: ''
---

**Is your feature request related to a problem?**
A clear and concise description of the problem. E.g. "I'm always frustrated when..."

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context, links to CUDA documentation, or references here.
20 changes: 20 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## Summary

Brief description of what this PR does and why.

Closes #ISSUE_NUMBER

## Changes

- ...
- ...

## Testing

- [ ] `cargo build` passes
- [ ] `cargo clippy --workspace` passes
- [ ] Tested on: [OS, GPU, CUDA version]

## Notes

Any additional context for reviewers.
190 changes: 190 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Contributing to Rust CUDA

Welcome! We're glad you're interested in contributing to the Rust CUDA project. We welcome
contributions from people of all backgrounds who are interested in making great software with us.

## Getting Help

For questions, clarifications, and general help:

1. Search existing [GitHub issues](https://github.com/Rust-GPU/rust-cuda/issues)
2. If you can't find the answer, open a new issue or start a discussion

## Prerequisites

### Required

- **CUDA Toolkit** (12.x or 13.x recommended). Install from
[NVIDIA's website](https://developer.nvidia.com/cuda-downloads).
- **Rust nightly toolchain** -- the project pins a specific nightly via
[`rust-toolchain.toml`](rust-toolchain.toml). Running any `cargo` command in the repo
will automatically install the correct version if you have `rustup`.
- **LLVM tools** -- installed automatically by `rustup` as part of the pinned toolchain
components.
- A **CUDA-capable GPU** with compute capability >= 3.0.

### Optional

- **cuDNN** -- required only if you're building the `cudnn` / `cudnn-sys` crates. Install
from [NVIDIA cuDNN](https://developer.nvidia.com/cudnn).
- **mdBook** -- required to build the guide locally. Install with
`cargo install mdbook`.

### Windows-Specific Notes

- Ensure the CUDA Toolkit `bin` directory is on your `PATH` (e.g.
`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin`).
- The MSVC build tools are required. Install via
[Visual Studio Build Tools](https://visualstudio.microsoft.com/downloads/) with the
"Desktop development with C++" workload.
- If using cuDNN, place the cuDNN files in your CUDA Toolkit directory or set
`CUDNN_PATH` to point to the cuDNN installation.
- Some crates require `advapi32` for linking (handled automatically by build scripts).

### Linux-Specific Notes

- Ensure `nvcc` is on your `PATH` and `LD_LIBRARY_PATH` includes the CUDA lib directory.
- The project provides container images for CI; see
`.github/workflows/ci_linux.yml` for reference.

## Building

Build the entire workspace:

```sh
cargo build
```

Build a specific crate:

```sh
cargo build -p cust
cargo build -p cudnn
```

Run clippy:

```sh
cargo clippy --workspace
```

Run tests (requires a CUDA-capable GPU):

```sh
cargo test --workspace
```

### Building the Guide

The user-facing documentation is an [mdBook](https://rust-lang.github.io/mdBook/) located
in the `guide/` directory.

```sh
# Install mdBook (one-time)
cargo install mdbook

# Build and serve locally
mdbook serve guide --open
```

## Running Examples

Examples live in the `examples/` and `samples/` directories:

```sh
# Vector addition
cargo run -p vecadd

# Matrix multiplication (GEMM)
cargo run -p gemm
```

See [`examples/README.md`](examples/README.md) for the full list.

## Issues

### Feature Requests

If you have ideas for improvements, suggest features by opening a GitHub issue. Include
details about the feature and describe any use cases it would enable.

### Bug Reports

When reporting a bug, make sure your issue describes:

- Steps to reproduce the behavior
- Your platform (OS, GPU, CUDA version, Rust toolchain version)
- Any error messages or logs

### Wontfix

Issues may be closed as `wontfix` if they are misaligned with the project vision or out of
scope. We will comment on the issue with detailed reasoning.

## Contribution Workflow

### Finding Work

Start by looking at open issues tagged as
[`help wanted`](https://github.com/Rust-GPU/rust-cuda/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22)
or
[`good first issue`](https://github.com/Rust-GPU/rust-cuda/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).

Comment on the issue to let others know you're working on it.

### Pull Request Process

1. **Fork** the repository.
2. **Create a new feature branch** from `main`.
3. **Make your changes.** Ensure there are no build errors by running `cargo build` and
`cargo clippy --workspace` locally.
4. **Open a pull request** with a clear title and description of what you did.
5. A maintainer will review your pull request and may ask you to make changes.

### Commit Messages

This project follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/)
specification. Each commit message should have the format:

```
<type>(<scope>): <description>
```

**Types:** `feat`, `fix`, `docs`, `chore`, `ci`, `test`, `refactor`, `perf`, `style`

**Scopes** (common examples): `cust`, `cudnn`, `cudnn-sys`, `cust_raw`, `cuda_std`,
`nvvm`, `vecadd`, `guide`, `windows`

**Examples:**

```
feat(cudnn): add batch normalization forward/backward
fix(cust_raw): correct Windows CUDA path discovery
docs(guide): add Windows getting-started section
ci(windows): include vecadd in workspace build
```

## Project Structure

| Directory | Description |
| --- | --- |
| `crates/cust` | High-level safe wrapper around the CUDA Driver API |
| `crates/cust_core` | Core `DeviceCopy` trait shared between host and device |
| `crates/cust_raw` | Low-level `bindgen` bindings to CUDA SDK |
| `crates/cudnn` | Type-safe cuDNN wrapper |
| `crates/cudnn-sys` | Low-level `bindgen` bindings to cuDNN |
| `crates/cuda_std` | GPU-side standard library |
| `crates/cuda_std_macros` | Proc macros (`#[kernel]`, `#[gpu_only]`, etc.) |
| `crates/cuda_builder` | Build-time helper for compiling GPU kernels |
| `crates/rustc_codegen_nvvm` | Custom rustc backend targeting NVVM/PTX |
| `crates/nvvm` | Wrapper around NVIDIA's libNVVM |
| `crates/blastoff` | cuBLAS bindings |
| `examples/` | Example programs |
| `samples/` | Ports of NVIDIA CUDA samples |
| `guide/` | mdBook source for the Rust CUDA Guide |

## Licensing

This project is dual-licensed under Apache-2.0 or MIT, at your discretion. Unless you
explicitly state otherwise, any contribution intentionally submitted for inclusion in the
work shall be dual-licensed as above, without any additional terms or conditions.
3 changes: 3 additions & 0 deletions crates/cuda_builder/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -809,6 +809,9 @@ fn invoke_rustc(builder: &CudaBuilder) -> Result<PathBuf, CudaBuilderError> {

let cargo_encoded_rustflags = join_checking_for_separators(rustflags, "\x1f");

// HACK(fee1-dead): didn't seem like there was a better way to disable f16/f128s, the `target_config`` did not work for some reason.
cargo.env("CARGO_FEATURE_NO_F16_F128", "1");

let build = cargo
.stderr(Stdio::inherit())
.current_dir(&builder.path_to_crate)
Expand Down
3 changes: 1 addition & 2 deletions crates/cudnn/src/backend/graph.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use crate::{
use crate::{
CudnnContext, CudnnError,
backend::{Descriptor, Operation},
};
Expand Down Expand Up @@ -39,7 +39,6 @@ impl GraphBuilder {
let descriptors = operations
.iter()
.map(|op| match op {
Operation::ConvBwdData { raw, .. } => raw.inner(),
Operation::ConvBwdData { raw, .. } => raw.inner(),
Operation::ConvBwdFilter { raw, .. } => raw.inner(),
Operation::ConvFwd { raw, .. } => raw.inner(),
Expand Down
40 changes: 33 additions & 7 deletions crates/rustc_codegen_nvvm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -526,8 +526,20 @@ impl<'ll, 'tcx, 'a> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
order: AtomicOrdering,
_size: Size,
) -> &'ll Value {
// Since for any A, A | 0 = A, and performing atomics on constant memory is UB in Rust, we can abuse or to perform atomic reads.
self.atomic_rmw(AtomicRmwBinOp::AtomicOr, ptr, self.const_int(ty, 0), order)
// Since for any A, A | 0 = A, and performing atomics on constant memory is UB in Rust, we
// can abuse bitwise-or to perform atomic reads.
//
// njn: is `ty` the type of the loaded value, or the type of the
// pointer to the loaded-from address? i.e. `T` or `*const T`? I'm
// assuming `T`
let ret_ptr = unsafe { llvm::LLVMRustGetTypeKind(ty) == llvm::TypeKind::Pointer };
self.atomic_rmw(
AtomicRmwBinOp::AtomicOr,
ptr,
self.const_int(ty, 0),
order,
ret_ptr,
)
}

fn load_operand(&mut self, place: PlaceRef<'tcx, &'ll Value>) -> OperandRef<'tcx, &'ll Value> {
Expand Down Expand Up @@ -760,7 +772,9 @@ impl<'ll, 'tcx, 'a> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
_size: Size,
) {
// We can exchange *ptr with val, and then discard the result.
self.atomic_rmw(AtomicRmwBinOp::AtomicXchg, ptr, val, order);
let ret_ptr =
unsafe { llvm::LLVMRustGetTypeKind(llvm::LLVMTypeOf(val)) == llvm::TypeKind::Pointer };
self.atomic_rmw(AtomicRmwBinOp::AtomicXchg, ptr, val, order, ret_ptr);
}

fn gep(&mut self, ty: &'ll Type, ptr: &'ll Value, indices: &[&'ll Value]) -> &'ll Value {
Expand Down Expand Up @@ -1217,17 +1231,19 @@ impl<'ll, 'tcx, 'a> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
let success = self.extract_value(res, 1);
(val, success)
}

fn atomic_rmw(
&mut self,
op: AtomicRmwBinOp,
dst: &'ll Value,
src: &'ll Value,
order: AtomicOrdering,
ret_ptr: bool,
) -> &'ll Value {
if matches!(op, AtomicRmwBinOp::AtomicNand) {
self.fatal("Atomic NAND not supported yet!")
}
self.atomic_op(
let mut res = self.atomic_op(
dst,
|builder, dst| {
// We are in a supported address space - just use ordinary atomics
Expand All @@ -1243,8 +1259,8 @@ impl<'ll, 'tcx, 'a> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
}
},
|builder, dst| {
// Local space is only accessible to the current thread.
// So, there are no synchronization issues, and we can emulate it using a simple load / compare / store.
// Local space is only accessible to the current thread. So, there are no
// synchronization issues, and we can emulate it using a simple load/compare/store.
let load: &'ll Value =
unsafe { llvm::LLVMBuildLoad(builder.llbuilder, dst, UNNAMED) };
let next_val = match op {
Expand Down Expand Up @@ -1278,7 +1294,17 @@ impl<'ll, 'tcx, 'a> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
unsafe { llvm::LLVMBuildStore(builder.llbuilder, next_val, dst) };
load
},
)
);

// njn:
// - copied from rustc_codegen_llvm
// - but Fractal said: Here, if ret_ptr is true, we should cast dst to *usize, src to
// usize, and then cast the return value back to a *T(by checking the original type of
// src).
if ret_ptr && self.val_ty(res) != self.type_ptr() {
res = self.inttoptr(res, self.type_ptr());
}
res
}

fn atomic_fence(
Expand Down
4 changes: 0 additions & 4 deletions crates/rustc_codegen_nvvm/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -816,10 +816,6 @@ impl<'tcx> FnAbiOfHelpers<'tcx> for CodegenCx<'_, 'tcx> {
}

impl<'tcx> CoverageInfoBuilderMethods<'tcx> for CodegenCx<'_, 'tcx> {
fn init_coverage(&mut self, _instance: Instance<'tcx>) {
todo!()
}

fn add_coverage(
&mut self,
_instance: Instance<'tcx>,
Expand Down
3 changes: 2 additions & 1 deletion crates/rustc_codegen_nvvm/src/intrinsic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,8 @@ impl<'ll, 'tcx> IntrinsicCallBuilderMethods<'tcx> for Builder<'_, 'll, 'tcx> {
// This piece of code was adapted from `rustc_codegen_cranelift`.
let intrinsic = self.tcx.intrinsic(instance.def_id()).unwrap();
if intrinsic.must_be_overridden {
bug!(
span_bug!(
span,
"intrinsic {} must be overridden by codegen_nvvm, but isn't",
intrinsic.name,
);
Expand Down
Loading
Loading