Added Base.similar methods for CuSparseMatrixCOO and BSR#3114
Merged
Conversation
kshyatt
reviewed
Apr 21, 2026
Member
|
Also, can some tests be added? |
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 07cb674 | Previous: fdb0f83 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
99956 ns |
98753 ns |
1.01 |
array/accumulate/Float32/dims=1 |
76223 ns |
74838 ns |
1.02 |
array/accumulate/Float32/dims=1L |
1594955 ns |
1595600 ns |
1.00 |
array/accumulate/Float32/dims=2 |
141352 ns |
139337 ns |
1.01 |
array/accumulate/Float32/dims=2L |
653944 ns |
653321 ns |
1.00 |
array/accumulate/Int64/1d |
119204 ns |
118217 ns |
1.01 |
array/accumulate/Int64/dims=1 |
80206 ns |
79158 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1710205 ns |
1709186 ns |
1.00 |
array/accumulate/Int64/dims=2 |
154209 ns |
153413 ns |
1.01 |
array/accumulate/Int64/dims=2L |
960254 ns |
959320 ns |
1.00 |
array/broadcast |
18913 ns |
18499 ns |
1.02 |
array/construct |
1199 ns |
1232 ns |
0.97 |
array/copy |
16928 ns |
16662 ns |
1.02 |
array/copyto!/cpu_to_gpu |
216158 ns |
210966 ns |
1.02 |
array/copyto!/gpu_to_cpu |
280211 ns |
278113 ns |
1.01 |
array/copyto!/gpu_to_gpu |
10403 ns |
10251.333333333334 ns |
1.01 |
array/iteration/findall/bool |
134493 ns |
132507 ns |
1.01 |
array/iteration/findall/int |
148906 ns |
147140 ns |
1.01 |
array/iteration/findfirst/bool |
70933 ns |
70124 ns |
1.01 |
array/iteration/findfirst/int |
72031 ns |
71302 ns |
1.01 |
array/iteration/findmin/1d |
69673 ns |
67538 ns |
1.03 |
array/iteration/findmin/2d |
101491 ns |
101327 ns |
1.00 |
array/iteration/logical |
195763 ns |
192237 ns |
1.02 |
array/iteration/scalar |
65862 ns |
64116 ns |
1.03 |
array/permutedims/2d |
50412 ns |
49526 ns |
1.02 |
array/permutedims/3d |
51182 ns |
50756 ns |
1.01 |
array/permutedims/4d |
51456 ns |
50974 ns |
1.01 |
array/random/rand/Float32 |
12310 ns |
11658 ns |
1.06 |
array/random/rand/Int64 |
24741 ns |
22560 ns |
1.10 |
array/random/rand!/Float32 |
8276.333333333334 ns |
7849.333333333333 ns |
1.05 |
array/random/rand!/Int64 |
19746 ns |
17870 ns |
1.10 |
array/random/randn/Float32 |
35552 ns |
35106 ns |
1.01 |
array/random/randn!/Float32 |
23934 ns |
23912 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
34001 ns |
33612 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1 |
39016 ns |
38308 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=1L |
50570 ns |
50124 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
56113 ns |
55421 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
67503 ns |
67539 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
39984 ns |
39433 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1 |
41545 ns |
41004 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
86823 ns |
86427 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
58286 ns |
57708 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2L |
83536 ns |
82911 ns |
1.01 |
array/reductions/reduce/Float32/1d |
34274 ns |
33312 ns |
1.03 |
array/reductions/reduce/Float32/dims=1 |
38845 ns |
38395 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
50683 ns |
50133 ns |
1.01 |
array/reductions/reduce/Float32/dims=2 |
56245 ns |
55712 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
69362 ns |
68679 ns |
1.01 |
array/reductions/reduce/Int64/1d |
40309 ns |
39077 ns |
1.03 |
array/reductions/reduce/Int64/dims=1 |
41975 ns |
40942 ns |
1.03 |
array/reductions/reduce/Int64/dims=1L |
86880 ns |
86287 ns |
1.01 |
array/reductions/reduce/Int64/dims=2 |
58107 ns |
57687 ns |
1.01 |
array/reductions/reduce/Int64/dims=2L |
83473 ns |
82800 ns |
1.01 |
array/reverse/1d |
17322 ns |
16733 ns |
1.04 |
array/reverse/1dL |
68140 ns |
67771 ns |
1.01 |
array/reverse/1dL_inplace |
65505 ns |
65314 ns |
1.00 |
array/reverse/1d_inplace |
8840.666666666666 ns |
8293.666666666666 ns |
1.07 |
array/reverse/2d |
20879 ns |
20003 ns |
1.04 |
array/reverse/2dL |
72366 ns |
71847 ns |
1.01 |
array/reverse/2dL_inplace |
65251 ns |
65036 ns |
1.00 |
array/reverse/2d_inplace |
9928 ns |
9632 ns |
1.03 |
array/sorting/1d |
2660346 ns |
2656101 ns |
1.00 |
array/sorting/2d |
1041945 ns |
1038978 ns |
1.00 |
array/sorting/by |
3195808 ns |
3193388 ns |
1.00 |
cuda/synchronization/context/auto |
1149.2 ns |
1144.2 ns |
1.00 |
cuda/synchronization/context/blocking |
915.4848484848485 ns |
945.2307692307693 ns |
0.97 |
cuda/synchronization/context/nonblocking |
6304.2 ns |
5938.8 ns |
1.06 |
cuda/synchronization/stream/auto |
1010.3 ns |
1002.3 ns |
1.01 |
cuda/synchronization/stream/blocking |
806.2022471910112 ns |
835.5142857142857 ns |
0.96 |
cuda/synchronization/stream/nonblocking |
6130.5 ns |
5969.4 ns |
1.03 |
integration/byval/reference |
143376 ns |
143209 ns |
1.00 |
integration/byval/slices=1 |
145681 ns |
145483 ns |
1.00 |
integration/byval/slices=2 |
284497 ns |
283679 ns |
1.00 |
integration/byval/slices=3 |
422437 ns |
422104 ns |
1.00 |
integration/cudadevrt |
101862 ns |
101601 ns |
1.00 |
integration/volumerhs |
9089170 ns |
9087846 ns |
1.00 |
kernel/indexing |
13015 ns |
12529 ns |
1.04 |
kernel/indexing_checked |
13886 ns |
13442 ns |
1.03 |
kernel/launch |
2198.222222222222 ns |
2078.777777777778 ns |
1.06 |
kernel/occupancy |
730.6590909090909 ns |
762.8947368421053 ns |
0.96 |
kernel/rand |
14273 ns |
14692 ns |
0.97 |
latency/import |
3880740734 ns |
3855139894 ns |
1.01 |
latency/precompile |
4640823930 ns |
4639622735 ns |
1.00 |
latency/ttfp |
4546678518 ns |
4510104327 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
kshyatt
reviewed
Apr 21, 2026
rainerrodrigues
commented
Apr 22, 2026
rainerrodrigues
left a comment
Contributor
Author
There was a problem hiding this comment.
@kshyatt Hi, can you check if this is suitable and extensive enough for testing?
f08a059 to
b48050e
Compare
Member
|
Same as #3119, you seem to have many unrelated changes in here that cause CI failures. |
maleadt
reviewed
May 18, 2026
The dims-taking similar family (similar(M, [Tv, [Ti,]] m, n) and tuple forms) covered CSC, CSR and COO, but not BSR, which fell through to the dense CuArray fallback. Add BSR variants that preserve blockDim and dir. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
54f12f6 to
07cb674
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3114 +/- ##
==========================================
+ Coverage 16.32% 16.86% +0.53%
==========================================
Files 124 124
Lines 9875 9881 +6
==========================================
+ Hits 1612 1666 +54
+ Misses 8263 8215 -48 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the missing Base.similar methods for CuSparseMatrixCOO and CuSparseMatrixBSR, allowing them to fallback gracefully without converting to dense CPU arrays.
Fixes #3061
Fixes #3055