You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add CUDA extension with cuDSS multi-GPU solver; document known bug
- Add LinearAlgebraMPICUDAExt.jl with cu()/cpu() conversions and
CuDSSFactorizationMPI for distributed sparse direct solves via NCCL
- Add codecov.yml to exclude GPU extensions from coverage (no CI GPUs)
- Document cuDSS MGMN bug (status=5 on narrow-bandwidth matrices)
with workaround in docs/src/guide.md
- Update CLAUDE.md with CUDA extension documentation
- Various source improvements for GPU support
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CLAUDE.md
+49-8Lines changed: 49 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,15 +29,26 @@ mpiexec -n 4 julia --project=. test/test_factorization.jl
29
29
julia --project=. -e 'using Pkg; Pkg.precompile()'
30
30
```
31
31
32
+
## MPI Configuration
33
+
34
+
By default, MPI.jl uses MPItrampoline_jll. On some Linux clusters, this causes MUMPS to hang during the solve phase. If you experience hangs with multi-rank MUMPS tests, switch to MPICH_jll:
35
+
36
+
```julia
37
+
using MPIPreferences
38
+
MPIPreferences.use_jll_binary("MPICH_jll")
39
+
```
40
+
41
+
This creates/updates `LocalPreferences.toml` (which is gitignored). Restart Julia after changing MPI preferences.
42
+
32
43
## GPU Support
33
44
34
-
GPU acceleration is supported via Metal.jl (macOS) as a package extension.
45
+
GPU acceleration is supported via Metal.jl (macOS) or CUDA.jl (Linux/Windows) as package extensions.
35
46
36
47
### Type Parameters
37
48
38
-
-`VectorMPI{T,AV}` where `AV` is `Vector{T}` (CPU)or `MtlVector{T}` (GPU)
39
-
-`MatrixMPI{T,AM}` where `AM` is `Matrix{T}` (CPU)or `MtlMatrix{T}` (GPU)
40
-
-`SparseMatrixMPI{T,Ti,AV}` where `AV` is `Vector{T}` (CPU) or `MtlVector{T}` (GPU) for the `nzval` array
49
+
-`VectorMPI{T,AV}` where `AV` is `Vector{T}` (CPU), `MtlVector{T}` (Metal), or `CuVector{T}` (CUDA)
50
+
-`MatrixMPI{T,AM}` where `AM` is `Matrix{T}` (CPU), `MtlMatrix{T}` (Metal), or `CuMatrix{T}` (CUDA)
51
+
-`SparseMatrixMPI{T,Ti,AV}` where `AV` is `Vector{T}` (CPU), `MtlVector{T}`, or `CuVector{T}` for the `nzval` array
41
52
- Type aliases: `VectorMPI_CPU{T}`, `MatrixMPI_CPU{T}`, `SparseMatrixMPI_CPU{T,Ti}` for CPU-backed types
42
53
43
54
### Creating Zero Arrays
@@ -55,15 +66,20 @@ A = zeros(MatrixMPI_CPU{Float64}, 50, 30)
55
66
S =zeros(SparseMatrixMPI{Float64,Int,Vector{Float64}}, 100, 100)
56
67
S =zeros(SparseMatrixMPI_CPU{Float64,Int}, 100, 100)
57
68
58
-
# GPU zero arrays (requires Metal.jl loaded)
69
+
# GPU zero arrays (requires Metal.jl or CUDA.jl loaded)
MPI communication always uses CPU buffers (no Metal-aware MPI exists). GPU data is staged through CPU:
82
+
MPI communication always uses CPU buffers (no GPU-aware MPI). GPU data is staged through CPU:
67
83
68
84
1. GPU vector data copied to CPU staging buffer
69
85
2. MPI communication on CPU buffers
@@ -84,7 +100,32 @@ Sparse matrices remain on CPU (Julia's `SparseMatrixCSC` doesn't support GPU arr
84
100
### Extension Files
85
101
86
102
-`ext/LinearAlgebraMPIMetalExt.jl` - Metal extension with `mtl()` and `cpu()` functions
87
-
- Loaded automatically when `using Metal` before `using LinearAlgebraMPI`
103
+
-`ext/LinearAlgebraMPICUDAExt.jl` - CUDA extension with `cu()` and `cpu()` functions, plus cuDSS multi-GPU solver
104
+
- Loaded automatically when `using Metal` or `using CUDA` before `using LinearAlgebraMPI`
105
+
106
+
### CUDA-Specific: cuDSS Multi-GPU Solver
107
+
108
+
The CUDA extension includes `CuDSSFactorizationMPI` for distributed sparse direct solves using NVIDIA's cuDSS library with NCCL inter-GPU communication:
-**Vector operations**: norms, reductions, arithmetic with automatic partition alignment
22
22
- Support for both `Float64` and `ComplexF64` element types
23
-
-**GPU acceleration** via Metal.jl (macOS) with automatic CPU staging for MPI
23
+
-**GPU acceleration** via Metal.jl (macOS) or CUDA.jl (Linux/Windows) with automatic CPU staging for MPI
24
+
-**Multi-GPU sparse direct solver** via cuDSS with NCCL communication (CUDA only)
24
25
25
26
## Installation
26
27
@@ -66,11 +67,11 @@ F = ldlt(A_sym_dist) # LDLT factorization
66
67
x_sol =solve(F, y) # Solve A_sym * x_sol = y
67
68
```
68
69
69
-
## GPU Support (Metal)
70
+
## GPU Support
70
71
71
-
LinearAlgebraMPI supports GPU acceleration on macOS via Metal.jl. GPU support is optional - Metal.jl is loaded as a weak dependency.
72
+
LinearAlgebraMPI supports GPU acceleration via Metal.jl (macOS) or CUDA.jl (Linux/Windows). GPU support is optional - extensions are loaded as weak dependencies.
72
73
73
-
### Converting between CPU and GPU
74
+
### Metal (macOS)
74
75
75
76
```julia
76
77
using Metal # Load Metal BEFORE MPI for GPU detection
0 commit comments