Conversation
Contributor
There was a problem hiding this comment.
AMDGPU.jl Benchmarks
Details
| Benchmark suite | Current: 0517d58 | Previous: 7c9aab0 | Ratio |
|---|---|---|---|
amdgpu/synchronization/context/device |
610 ns |
600 ns |
1.02 |
amdgpu/synchronization/stream/blocking |
250 ns |
250 ns |
1 |
amdgpu/synchronization/stream/nonblocking |
330 ns |
330 ns |
1 |
array/accumulate/Float32/1d |
87831 ns |
85972 ns |
1.02 |
array/accumulate/Float32/dims=1 |
391265 ns |
412075 ns |
0.95 |
array/accumulate/Float32/dims=1L |
136152 ns |
137091 ns |
0.99 |
array/accumulate/Float32/dims=2 |
129292 ns |
130332 ns |
0.99 |
array/accumulate/Float32/dims=2L |
2810460 ns |
2810115 ns |
1.00 |
array/accumulate/Int64/1d |
98861 ns |
102751 ns |
0.96 |
array/accumulate/Int64/dims=1 |
281824 ns |
442706 ns |
0.64 |
array/accumulate/Int64/dims=1L |
168312 ns |
167432 ns |
1.01 |
array/accumulate/Int64/dims=2 |
127811 ns |
127031 ns |
1.01 |
array/accumulate/Int64/dims=2L |
2988642 ns |
2984467 ns |
1.00 |
array/broadcast |
93701 ns |
70231 ns |
1.33 |
array/construct |
1780 ns |
1700 ns |
1.05 |
array/copy |
37160 ns |
40561 ns |
0.92 |
array/copyto!/cpu_to_gpu |
183463 ns |
121541 ns |
1.51 |
array/copyto!/gpu_to_cpu |
182693 ns |
114461 ns |
1.60 |
array/copyto!/gpu_to_gpu |
82171 ns |
66551 ns |
1.23 |
array/iteration/findall/bool |
181293 ns |
181832 ns |
1.00 |
array/iteration/findall/int |
190223 ns |
192932 ns |
0.99 |
array/iteration/findfirst/bool |
117851 ns |
122251 ns |
0.96 |
array/iteration/findfirst/int |
116232 ns |
116342 ns |
1.00 |
array/iteration/findmin/1d |
170642 ns |
170152 ns |
1.00 |
array/iteration/findmin/2d |
156303 ns |
153822 ns |
1.02 |
array/iteration/logical |
357785 ns |
350744 ns |
1.02 |
array/iteration/scalar |
295964 ns |
296083 ns |
1.00 |
array/permutedims/2d |
75072 ns |
74481 ns |
1.01 |
array/permutedims/3d |
75231 ns |
74251 ns |
1.01 |
array/permutedims/4d |
76821 ns |
76951 ns |
1.00 |
array/random/rand/Float32 |
52021 ns |
52171 ns |
1.00 |
array/random/rand/Int64 |
58311 ns |
58731 ns |
0.99 |
array/random/rand!/Float32 |
88511 ns |
85101 ns |
1.04 |
array/random/rand!/Int64 |
115051 ns |
69261 ns |
1.66 |
array/random/randn/Float32 |
86562 ns |
98642 ns |
0.88 |
array/random/randn!/Float32 |
167133 ns |
101231 ns |
1.65 |
array/reductions/mapreduce/Float32/1d |
134282 ns |
134242 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
95312 ns |
95431 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
774121 ns |
774349 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
97772 ns |
97531 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
298834 ns |
297464 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
134642 ns |
134951 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
95321 ns |
95301 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
781201 ns |
781800 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
96681 ns |
96801 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
301074 ns |
299524 ns |
1.01 |
array/reductions/reduce/Float32/1d |
134222 ns |
133912 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
95582 ns |
95711 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
774211 ns |
775219 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
97542 ns |
97621 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
298544 ns |
297424 ns |
1.00 |
array/reductions/reduce/Int64/1d |
130452 ns |
134602 ns |
0.97 |
array/reductions/reduce/Int64/dims=1 |
95422 ns |
95311 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
778521 ns |
780269 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
96882 ns |
97121 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
299684 ns |
299264 ns |
1.00 |
array/reverse/1d |
44921 ns |
44550 ns |
1.01 |
array/reverse/1dL |
76271 ns |
76661 ns |
0.99 |
array/reverse/1dL_inplace |
169863 ns |
173202 ns |
0.98 |
array/reverse/1d_inplace |
74581 ns |
84571 ns |
0.88 |
array/reverse/2d |
52901 ns |
52831 ns |
1.00 |
array/reverse/2dL |
102261 ns |
102811 ns |
0.99 |
array/reverse/2dL_inplace |
125692 ns |
178873 ns |
0.70 |
array/reverse/2d_inplace |
107632 ns |
96051 ns |
1.12 |
array/sorting/1d |
341325 ns |
379995 ns |
0.90 |
integration/byval/reference |
39271 ns |
39540 ns |
0.99 |
integration/byval/slices=1 |
40471 ns |
40350 ns |
1.00 |
integration/byval/slices=2 |
160112 ns |
159152 ns |
1.01 |
integration/byval/slices=3 |
238773 ns |
238933 ns |
1.00 |
integration/volumerhs |
5042401 ns |
5031334 ns |
1.00 |
kernel/indexing |
75801 ns |
65521 ns |
1.16 |
kernel/indexing_checked |
66391 ns |
72491 ns |
0.92 |
kernel/launch |
1290 ns |
1280 ns |
1.01 |
kernel/rand |
126832 ns |
124252 ns |
1.02 |
latency/import |
1503115254 ns |
1491816057 ns |
1.01 |
latency/precompile |
11926533311 ns |
11773992921 ns |
1.01 |
latency/ttfp |
11012011599 ns |
10954774141 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sparse LinearAlgebra interfaces
Follows-up on #859 and supersedes #861
Implements a first round of LinearAlgebra interfaces (
+/-,UniformScaling,Diagonal,triu/tril,kron) for GPU sparse matrices, hooking into the GPUArrays generic machinery where possible and falling back to format-aware implementations where it is not.The implementation was robot-helped by Claude under my steering.
What is added
src/sparse/interfaces.jl_sptranspose/_spadjointfor CSR, CSC, COO — required by GPUArrays to materialisetranspose/adjointwrappers.+/-for all combinations of plain/transposed/adjoint CSR and CSC viageam; COO routes through CSR and converts back. Cross-format pairs (CSR+CSC, CSR+BSR and their reverses) normalise both operands to CSR before callinggeam.+/-/*withUniformScalingfor all three formats and their transposed/adjointed wrappers. The identity is materialised as a same-format sparse matrix (_sparse_identity); a TODO notes a potentially more efficient broadcast-singleton approach.+/-/*withDiagonalfor all three formats and wrappers. Addition converts theDiagonalto the same sparse format and delegates to the existinggeampath. Multiplication scales the nonzero values directly via the COO index arrays (d[colInd]/d[rowInd]); for CSR and CSC this involves a round-trip through COO.src/sparse/linalg.jltriu/trilfor COO by masking; CSR/CSC fall through the GPUArrays generic path which dispatches tocoo_type.kronfor COO×COO, COO×Diagonal,Diagonal×COO via GPU-siderepeat/broadcast on the index and value arrays.src/sparse/conversions.jlROCSparseMatrixCOO(::Diagonal)constructor chain.ROCSparseMatrixCSR{Tv,Ti}(coo)/ROCSparseMatrixCSC{Tv,Ti}(coo)needed by GPUArrays generics.src/sparse/array.jlROCSparseMatrix(transpose/adjoint(other_sparse))constructors for GPU-to-GPU round-trips.Potential follow-ups
* Diagonalon CSR/CSC goes through two format conversions (CSR→COO→CSR). A dedicated rocsparse kernel or a direct rowPtr-walk would be more efficient but would require more infrastructure.+/-also routes through CSR; same trade-off applies.+/-(e.g. CSR + CSC) always returns CSR. The output type could arguably follow the left operand, but the approach keeps things simple and correct.kronwithDiagonalusescollecton the CPU for the diagonal index arrays before uploading; a fully GPU-side construction would avoid the round-trip._sparse_identityallocates the full sparse identity, which is wasteful for large matrices (see TODO). A broadcast-singleton approach (as inSparseArrays.PromoteToSparse) would be cleaner.