RFC: automatic therock artifact selection#919
Draft
simeonschaub wants to merge 24 commits into
Draft
Conversation
a78f1f9 to
963eb4c
Compare
Member
|
Have you seen that multiarch tarballs are available. Though I think they are gigantic |
Member
Author
|
Yes, I saw them, but like you said I don't think it's great to ship such huge tarballs to users |
c0ceec1 to
094e810
Compare
Member
Author
|
This is now using the clang shipped by rocm as also proposed by @vchuravy in #931 (comment). Locally I am getting miscompilations in the triangular matmul tests, which I will try to reduce and open an upstream issue. |
Contributor
There was a problem hiding this comment.
AMDGPU.jl Benchmarks
Details
| Benchmark suite | Current: 9ea5a7c | Previous: 7c9aab0 | Ratio |
|---|---|---|---|
amdgpu/synchronization/context/device |
540 ns |
600 ns |
0.90 |
amdgpu/synchronization/stream/blocking |
230 ns |
250 ns |
0.92 |
amdgpu/synchronization/stream/nonblocking |
300 ns |
330 ns |
0.91 |
array/accumulate/Float32/1d |
364975 ns |
85972 ns |
4.25 |
array/accumulate/Float32/dims=1 |
489717 ns |
412075 ns |
1.19 |
array/accumulate/Float32/dims=1L |
563168 ns |
137091 ns |
4.11 |
array/accumulate/Float32/dims=2 |
459447 ns |
130332 ns |
3.53 |
array/accumulate/Float32/dims=2L |
15277269 ns |
2810115 ns |
5.44 |
array/accumulate/Int64/1d |
367166 ns |
102751 ns |
3.57 |
array/accumulate/Int64/dims=1 |
486067 ns |
442706 ns |
1.10 |
array/accumulate/Int64/dims=1L |
462337 ns |
167432 ns |
2.76 |
array/accumulate/Int64/dims=2 |
480176 ns |
127031 ns |
3.78 |
array/accumulate/Int64/dims=2L |
15885978 ns |
2984467 ns |
5.32 |
array/broadcast |
207103 ns |
70231 ns |
2.95 |
array/construct |
1630 ns |
1700 ns |
0.96 |
array/copy |
36680 ns |
40561 ns |
0.90 |
array/copyto!/cpu_to_gpu |
153202 ns |
121541 ns |
1.26 |
array/copyto!/gpu_to_cpu |
109752 ns |
114461 ns |
0.96 |
array/copyto!/gpu_to_gpu |
59051 ns |
66551 ns |
0.89 |
array/iteration/findall/bool |
486877 ns |
181832 ns |
2.68 |
array/iteration/findall/int |
513967 ns |
192932 ns |
2.66 |
array/iteration/findfirst/bool |
472767 ns |
122251 ns |
3.87 |
array/iteration/findfirst/int |
479817 ns |
116342 ns |
4.12 |
array/iteration/findmin/1d |
793041 ns |
170152 ns |
4.66 |
array/iteration/findmin/2d |
819362 ns |
153822 ns |
5.33 |
array/iteration/logical |
515258 ns |
350744 ns |
1.47 |
array/iteration/scalar |
275884 ns |
296083 ns |
0.93 |
array/permutedims/2d |
115622 ns |
74481 ns |
1.55 |
array/permutedims/3d |
198453 ns |
74251 ns |
2.67 |
array/permutedims/4d |
296474 ns |
76951 ns |
3.85 |
array/random/rand/Float32 |
47471 ns |
52171 ns |
0.91 |
array/random/rand/Int64 |
90451 ns |
58731 ns |
1.54 |
array/random/rand!/Float32 |
66481 ns |
85101 ns |
0.78 |
array/random/rand!/Int64 |
189223 ns |
69261 ns |
2.73 |
array/random/randn/Float32 |
136992 ns |
98642 ns |
1.39 |
array/random/randn!/Float32 |
100062 ns |
101231 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
472847 ns |
134242 ns |
3.52 |
array/reductions/mapreduce/Float32/dims=1 |
434207 ns |
95431 ns |
4.55 |
array/reductions/mapreduce/Float32/dims=1L |
32851041 ns |
774349 ns |
42.42 |
array/reductions/mapreduce/Float32/dims=2 |
438946 ns |
97531 ns |
4.50 |
array/reductions/mapreduce/Float32/dims=2L |
1760865 ns |
297464 ns |
5.92 |
array/reductions/mapreduce/Int64/1d |
490807 ns |
134951 ns |
3.64 |
array/reductions/mapreduce/Int64/dims=1 |
550688 ns |
95301 ns |
5.78 |
array/reductions/mapreduce/Int64/dims=1L |
36316261 ns |
781800 ns |
46.45 |
array/reductions/mapreduce/Int64/dims=2 |
562338 ns |
96801 ns |
5.81 |
array/reductions/mapreduce/Int64/dims=2L |
1759095 ns |
299524 ns |
5.87 |
array/reductions/reduce/Float32/1d |
506997 ns |
133912 ns |
3.79 |
array/reductions/reduce/Float32/dims=1 |
426166 ns |
95711 ns |
4.45 |
array/reductions/reduce/Float32/dims=1L |
35624312 ns |
775219 ns |
45.95 |
array/reductions/reduce/Float32/dims=2 |
548737 ns |
97621 ns |
5.62 |
array/reductions/reduce/Float32/dims=2L |
1796905 ns |
297424 ns |
6.04 |
array/reductions/reduce/Int64/1d |
495207 ns |
134602 ns |
3.68 |
array/reductions/reduce/Int64/dims=1 |
540128 ns |
95311 ns |
5.67 |
array/reductions/reduce/Int64/dims=1L |
36295651 ns |
780269 ns |
46.52 |
array/reductions/reduce/Int64/dims=2 |
560428 ns |
97121 ns |
5.77 |
array/reductions/reduce/Int64/dims=2L |
1772895 ns |
299264 ns |
5.92 |
array/reverse/1d |
207643 ns |
44550 ns |
4.66 |
array/reverse/1dL |
528128 ns |
76661 ns |
6.89 |
array/reverse/1dL_inplace |
647099 ns |
173202 ns |
3.74 |
array/reverse/1d_inplace |
269914 ns |
84571 ns |
3.19 |
array/reverse/2d |
314174 ns |
52831 ns |
5.95 |
array/reverse/2dL |
589058 ns |
102811 ns |
5.73 |
array/reverse/2dL_inplace |
654099 ns |
178873 ns |
3.66 |
array/reverse/2d_inplace |
339214 ns |
96051 ns |
3.53 |
array/sorting/1d |
13185389 ns |
379995 ns |
34.70 |
integration/byval/reference |
120982 ns |
39540 ns |
3.06 |
integration/byval/slices=1 |
120211 ns |
40350 ns |
2.98 |
integration/byval/slices=2 |
258893 ns |
159152 ns |
1.63 |
integration/byval/slices=3 |
1266579 ns |
238933 ns |
5.30 |
integration/volumerhs |
5292186 ns |
5031334 ns |
1.05 |
kernel/indexing |
78391 ns |
65521 ns |
1.20 |
kernel/indexing_checked |
214043 ns |
72491 ns |
2.95 |
kernel/launch |
1360 ns |
1280 ns |
1.06 |
kernel/rand |
256404 ns |
124252 ns |
2.06 |
latency/import |
1626534717 ns |
1491816057 ns |
1.09 |
latency/precompile |
11900817336 ns |
11773992921 ns |
1.01 |
latency/ttfp |
10743222244 ns |
10954774141 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a proof of concept for automatically downloading ROCm libraries via the artifact system. Not sure this is the best approach, maybe we should wrap these tarballs as JLLs through Yggdrasil instead?
It's currently partially vibe coded and Linux only, since I couldn't find a reliable way of querying the gfx_target_version on Windows.