moe-gpu-dsp

Zero-copy GPU signal processing framework for Rust. Batch cuFFT, CUDA kernel dispatch, and full STFT/IFFT pipelines that stay entirely on GPU memory.

Built on cudarc 0.19.

What it does

Upload a signal once. Run all processing on GPU. Download the result once. No CPU round-trips between stages.

Signal -> GPU upload -> Hann window -> Batch FFT R2C
  -> Magnitude -> Processing kernels -> Soft mask
  -> Batch FFT C2R -> Overlap-add -> GPU download -> Output

Kernels included

Kernel	Purpose
`window_frames`	Hann windowing with hop-based frame extraction
`magnitude`	Complex to float magnitude with transpose
`median_filter`	2D median filter (horizontal or vertical)
`soft_mask`	a^2 / (a^2 + b^2 + eps) applied to complex data
`overlap_add`	ISTFT reconstruction via atomicAdd

All kernels are compiled once at init via NVRTC (typically < 0.1s).

Usage

[dependencies]
moe-gpu-dsp = { version = "0.1", features = ["cuda"] }

use moe_gpu_dsp::{GpuDsp, DspConfig};

// Init GPU + compile kernels
let dsp = GpuDsp::new(DspConfig::default()).unwrap();

// Upload + window
let (_sig, windowed, n_frames) = dsp.window_frames(&audio_signal);

// Batch FFT (all frames at once)
let complex = dsp.batch_fft_r2c(&windowed, n_frames);

// Magnitude (transposed for frequency-axis processing)
let mag = dsp.magnitude(&complex, n_frames);
let complex_floats = GpuDsp::complex_as_floats(&complex, n_frames * dsp.n_bins());

// Median filter (horizontal = time axis, vertical = frequency axis)
let h_filtered = dsp.median_filter(&mag, dsp.n_bins(), n_frames, 17, true);
let v_filtered = dsp.median_filter(&mag, dsp.n_bins(), n_frames, 17, false);

// Soft mask + IFFT + overlap-add
let mut masked = dsp.soft_mask(&h_filtered, &v_filtered, &complex_floats,
                                dsp.n_bins(), n_frames);
let output = dsp.batch_ifft_c2r_ola(&mut masked, n_frames, audio_signal.len());

Setup

1. Install CUDA toolkit

Linux / WSL:

# Ubuntu/Debian
sudo apt-get install -y cuda-toolkit-13-1

# Verify
nvcc --version

Windows: Download from NVIDIA CUDA Toolkit and install.

2. Set environment variables

The build needs to find your CUDA installation:

export PATH=/usr/local/cuda/bin:$PATH
export CUDA_PATH=/usr/local/cuda

If your CUDA toolkit version differs from your driver version (common on WSL), pin it:

# Check driver version: nvidia-smi
# Set toolkit version to match driver (e.g. driver 13.1 = 13010)
export CUDARC_CUDA_VERSION=13010

3. Add to your project

[dependencies]
moe-gpu-dsp = { version = "0.1", features = ["cuda"] }

4. Build

cargo build --features cuda

5. Run

At runtime, the CUDA libraries must be on LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
cargo run --features cuda

Configure GPU architecture

Default target is sm_86 (RTX 3070/3080/3090). Change it in DspConfig:

let config = DspConfig {
    gpu_arch: "sm_89",  // RTX 4090
    ..Default::default()
};

Common architectures:

GPU	Architecture
RTX 2070/2080	`sm_75`
RTX 3070/3080/3090	`sm_86`
RTX 4070/4080/4090	`sm_89`
A100	`sm_80`
H100	`sm_90`

WSL-specific notes

CUDA toolkit goes in WSL. Do NOT install Linux GPU drivers in WSL (the Windows driver bridges automatically).
If you get CUDA_ERROR_UNSUPPORTED_PTX_VERSION, your toolkit version is newer than your driver. Install an older toolkit or update the Windows NVIDIA driver.
cuFFT libraries are in /usr/local/cuda/lib64/. Make sure LD_LIBRARY_PATH includes this at runtime.

Graceful fallback

Without the cuda feature, GpuDsp::new() returns None. You can fall back to CPU:

let dsp = GpuDsp::new(DspConfig::default());
if dsp.is_none() {
    // CPU fallback
}

Performance

On RTX 3070, processing a 3-minute audio signal (7.4M samples, 14,562 STFT frames):

NVRTC kernel compilation: 0.03s (cached after first run)
Full pipeline (window + FFT + magnitude + 2x median filter + soft mask + IFFT + overlap-add): 99ms

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

moe-gpu-dsp

What it does

Kernels included

Usage

Setup

1. Install CUDA toolkit

2. Set environment variables

3. Add to your project

4. Build

5. Run

Configure GPU architecture

WSL-specific notes

Graceful fallback

Performance

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

moe-gpu-dsp

What it does

Kernels included

Usage

Setup

1. Install CUDA toolkit

2. Set environment variables

3. Add to your project

4. Build

5. Run

Configure GPU architecture

WSL-specific notes

Graceful fallback

Performance

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages