Update support for sparse linear algebra by luraess · Pull Request #937 · JuliaGPU/AMDGPU.jl

luraess · 2026-06-25T14:30:55Z

Sparse LinearAlgebra interfaces

Follows-up on #859 and supersedes #861

Implements a first round of LinearAlgebra interfaces (+/-, UniformScaling, Diagonal, triu/tril, kron) for GPU sparse matrices, hooking into the GPUArrays generic machinery where possible and falling back to format-aware implementations where it is not.

The implementation was robot-helped by Claude under my steering.

What is added

src/sparse/interfaces.jl

_sptranspose / _spadjoint for CSR, CSC, COO — required by GPUArrays to materialise transpose / adjoint wrappers.
+/- for all combinations of plain/transposed/adjoint CSR and CSC via geam; COO routes through CSR and converts back. Cross-format pairs (CSR+CSC, CSR+BSR and their reverses) normalise both operands to CSR before calling geam.
+/-/* with UniformScaling for all three formats and their transposed/adjointed wrappers. The identity is materialised as a same-format sparse matrix (_sparse_identity); a TODO notes a potentially more efficient broadcast-singleton approach.
+/-/* with Diagonal for all three formats and wrappers. Addition converts the Diagonal to the same sparse format and delegates to the existing geam path. Multiplication scales the nonzero values directly via the COO index arrays (d[colInd] / d[rowInd]); for CSR and CSC this involves a round-trip through COO.

src/sparse/linalg.jl

triu / tril for COO by masking; CSR/CSC fall through the GPUArrays generic path which dispatches to coo_type.
kron for COO×COO, COO×Diagonal, Diagonal×COO via GPU-side repeat/broadcast on the index and value arrays.

src/sparse/conversions.jl

ROCSparseMatrixCOO(::Diagonal) constructor chain.
Typed forwarding constructors ROCSparseMatrixCSR{Tv,Ti}(coo) / ROCSparseMatrixCSC{Tv,Ti}(coo) needed by GPUArrays generics.

src/sparse/array.jl

ROCSparseMatrix(transpose/adjoint(other_sparse)) constructors for GPU-to-GPU round-trips.

Potential follow-ups

* Diagonal on CSR/CSC goes through two format conversions (CSR→COO→CSR). A dedicated rocsparse kernel or a direct rowPtr-walk would be more efficient but would require more infrastructure.
COO +/- also routes through CSR; same trade-off applies.
Cross-format +/- (e.g. CSR + CSC) always returns CSR. The output type could arguably follow the left operand, but the approach keeps things simple and correct.
kron with Diagonal uses collect on the CPU for the diagonal index arrays before uploading; a fully GPU-side construction would avoid the round-trip.
_sparse_identity allocates the full sparse identity, which is wasteful for large matrices (see TODO). A broadcast-singleton approach (as in SparseArrays.PromoteToSparse) would be cleaner.

github-actions

AMDGPU.jl Benchmarks

Details

Benchmark suite	Current: `0517d58`	Previous: `7c9aab0`	Ratio
`amdgpu/synchronization/context/device`	`610` ns	`600` ns	`1.02`
`amdgpu/synchronization/stream/blocking`	`250` ns	`250` ns	`1`
`amdgpu/synchronization/stream/nonblocking`	`330` ns	`330` ns	`1`
`array/accumulate/Float32/1d`	`87831` ns	`85972` ns	`1.02`
`array/accumulate/Float32/dims=1`	`391265` ns	`412075` ns	`0.95`
`array/accumulate/Float32/dims=1L`	`136152` ns	`137091` ns	`0.99`
`array/accumulate/Float32/dims=2`	`129292` ns	`130332` ns	`0.99`
`array/accumulate/Float32/dims=2L`	`2810460` ns	`2810115` ns	`1.00`
`array/accumulate/Int64/1d`	`98861` ns	`102751` ns	`0.96`
`array/accumulate/Int64/dims=1`	`281824` ns	`442706` ns	`0.64`
`array/accumulate/Int64/dims=1L`	`168312` ns	`167432` ns	`1.01`
`array/accumulate/Int64/dims=2`	`127811` ns	`127031` ns	`1.01`
`array/accumulate/Int64/dims=2L`	`2988642` ns	`2984467` ns	`1.00`
`array/broadcast`	`93701` ns	`70231` ns	`1.33`
`array/construct`	`1780` ns	`1700` ns	`1.05`
`array/copy`	`37160` ns	`40561` ns	`0.92`
`array/copyto!/cpu_to_gpu`	`183463` ns	`121541` ns	`1.51`
`array/copyto!/gpu_to_cpu`	`182693` ns	`114461` ns	`1.60`
`array/copyto!/gpu_to_gpu`	`82171` ns	`66551` ns	`1.23`
`array/iteration/findall/bool`	`181293` ns	`181832` ns	`1.00`
`array/iteration/findall/int`	`190223` ns	`192932` ns	`0.99`
`array/iteration/findfirst/bool`	`117851` ns	`122251` ns	`0.96`
`array/iteration/findfirst/int`	`116232` ns	`116342` ns	`1.00`
`array/iteration/findmin/1d`	`170642` ns	`170152` ns	`1.00`
`array/iteration/findmin/2d`	`156303` ns	`153822` ns	`1.02`
`array/iteration/logical`	`357785` ns	`350744` ns	`1.02`
`array/iteration/scalar`	`295964` ns	`296083` ns	`1.00`
`array/permutedims/2d`	`75072` ns	`74481` ns	`1.01`
`array/permutedims/3d`	`75231` ns	`74251` ns	`1.01`
`array/permutedims/4d`	`76821` ns	`76951` ns	`1.00`
`array/random/rand/Float32`	`52021` ns	`52171` ns	`1.00`
`array/random/rand/Int64`	`58311` ns	`58731` ns	`0.99`
`array/random/rand!/Float32`	`88511` ns	`85101` ns	`1.04`
`array/random/rand!/Int64`	`115051` ns	`69261` ns	`1.66`
`array/random/randn/Float32`	`86562` ns	`98642` ns	`0.88`
`array/random/randn!/Float32`	`167133` ns	`101231` ns	`1.65`
`array/reductions/mapreduce/Float32/1d`	`134282` ns	`134242` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=1`	`95312` ns	`95431` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=1L`	`774121` ns	`774349` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2`	`97772` ns	`97531` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2L`	`298834` ns	`297464` ns	`1.00`
`array/reductions/mapreduce/Int64/1d`	`134642` ns	`134951` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1`	`95321` ns	`95301` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1L`	`781201` ns	`781800` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2`	`96681` ns	`96801` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2L`	`301074` ns	`299524` ns	`1.01`
`array/reductions/reduce/Float32/1d`	`134222` ns	`133912` ns	`1.00`
`array/reductions/reduce/Float32/dims=1`	`95582` ns	`95711` ns	`1.00`
`array/reductions/reduce/Float32/dims=1L`	`774211` ns	`775219` ns	`1.00`
`array/reductions/reduce/Float32/dims=2`	`97542` ns	`97621` ns	`1.00`
`array/reductions/reduce/Float32/dims=2L`	`298544` ns	`297424` ns	`1.00`
`array/reductions/reduce/Int64/1d`	`130452` ns	`134602` ns	`0.97`
`array/reductions/reduce/Int64/dims=1`	`95422` ns	`95311` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`778521` ns	`780269` ns	`1.00`
`array/reductions/reduce/Int64/dims=2`	`96882` ns	`97121` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`299684` ns	`299264` ns	`1.00`
`array/reverse/1d`	`44921` ns	`44550` ns	`1.01`
`array/reverse/1dL`	`76271` ns	`76661` ns	`0.99`
`array/reverse/1dL_inplace`	`169863` ns	`173202` ns	`0.98`
`array/reverse/1d_inplace`	`74581` ns	`84571` ns	`0.88`
`array/reverse/2d`	`52901` ns	`52831` ns	`1.00`
`array/reverse/2dL`	`102261` ns	`102811` ns	`0.99`
`array/reverse/2dL_inplace`	`125692` ns	`178873` ns	`0.70`
`array/reverse/2d_inplace`	`107632` ns	`96051` ns	`1.12`
`array/sorting/1d`	`341325` ns	`379995` ns	`0.90`
`integration/byval/reference`	`39271` ns	`39540` ns	`0.99`
`integration/byval/slices=1`	`40471` ns	`40350` ns	`1.00`
`integration/byval/slices=2`	`160112` ns	`159152` ns	`1.01`
`integration/byval/slices=3`	`238773` ns	`238933` ns	`1.00`
`integration/volumerhs`	`5042401` ns	`5031334` ns	`1.00`
`kernel/indexing`	`75801` ns	`65521` ns	`1.16`
`kernel/indexing_checked`	`66391` ns	`72491` ns	`0.92`
`kernel/launch`	`1290` ns	`1280` ns	`1.01`
`kernel/rand`	`126832` ns	`124252` ns	`1.02`
`latency/import`	`1503115254` ns	`1491816057` ns	`1.01`
`latency/precompile`	`11926533311` ns	`11773992921` ns	`1.01`
`latency/ttfp`	`11012011599` ns	`10954774141` ns	`1.01`

This comment was automatically generated by workflow using github-action-benchmark.

Add sparse linalg interface

0517d58

luraess mentioned this pull request Jun 25, 2026

Update support for sparse adjoint / transpose #861

Closed

github-actions Bot reviewed Jun 25, 2026

View reviewed changes

luraess marked this pull request as ready for review June 25, 2026 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update support for sparse linear algebra#937

Update support for sparse linear algebra#937
luraess wants to merge 1 commit into
mainfrom
lr/sp-int

luraess commented Jun 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

luraess commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sparse LinearAlgebra interfaces

What is added

Potential follow-ups

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

AMDGPU.jl Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

luraess commented Jun 25, 2026 •

edited

Loading