Skip to content

RFC: automatic therock artifact selection#919

Draft
simeonschaub wants to merge 24 commits into
JuliaGPU:mainfrom
simeonschaub:sds/artifacts
Draft

RFC: automatic therock artifact selection#919
simeonschaub wants to merge 24 commits into
JuliaGPU:mainfrom
simeonschaub:sds/artifacts

Conversation

@simeonschaub

Copy link
Copy Markdown
Member

This is a proof of concept for automatically downloading ROCm libraries via the artifact system. Not sure this is the best approach, maybe we should wrap these tarballs as JLLs through Yggdrasil instead?

It's currently partially vibe coded and Linux only, since I couldn't find a reliable way of querying the gfx_target_version on Windows.

@gbaraldi

Copy link
Copy Markdown
Member

Have you seen that multiarch tarballs are available. Though I think they are gigantic

@simeonschaub

Copy link
Copy Markdown
Member Author

Yes, I saw them, but like you said I don't think it's great to ship such huge tarballs to users

@simeonschaub simeonschaub force-pushed the sds/artifacts branch 2 times, most recently from c0ceec1 to 094e810 Compare June 25, 2026 13:02
@simeonschaub

Copy link
Copy Markdown
Member Author

This is now using the clang shipped by rocm as also proposed by @vchuravy in #931 (comment). Locally I am getting miscompilations in the triangular matmul tests, which I will try to reduce and open an upstream issue.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU.jl Benchmarks

Details
Benchmark suite Current: 9ea5a7c Previous: 7c9aab0 Ratio
amdgpu/synchronization/context/device 540 ns 600 ns 0.90
amdgpu/synchronization/stream/blocking 230 ns 250 ns 0.92
amdgpu/synchronization/stream/nonblocking 300 ns 330 ns 0.91
array/accumulate/Float32/1d 364975 ns 85972 ns 4.25
array/accumulate/Float32/dims=1 489717 ns 412075 ns 1.19
array/accumulate/Float32/dims=1L 563168 ns 137091 ns 4.11
array/accumulate/Float32/dims=2 459447 ns 130332 ns 3.53
array/accumulate/Float32/dims=2L 15277269 ns 2810115 ns 5.44
array/accumulate/Int64/1d 367166 ns 102751 ns 3.57
array/accumulate/Int64/dims=1 486067 ns 442706 ns 1.10
array/accumulate/Int64/dims=1L 462337 ns 167432 ns 2.76
array/accumulate/Int64/dims=2 480176 ns 127031 ns 3.78
array/accumulate/Int64/dims=2L 15885978 ns 2984467 ns 5.32
array/broadcast 207103 ns 70231 ns 2.95
array/construct 1630 ns 1700 ns 0.96
array/copy 36680 ns 40561 ns 0.90
array/copyto!/cpu_to_gpu 153202 ns 121541 ns 1.26
array/copyto!/gpu_to_cpu 109752 ns 114461 ns 0.96
array/copyto!/gpu_to_gpu 59051 ns 66551 ns 0.89
array/iteration/findall/bool 486877 ns 181832 ns 2.68
array/iteration/findall/int 513967 ns 192932 ns 2.66
array/iteration/findfirst/bool 472767 ns 122251 ns 3.87
array/iteration/findfirst/int 479817 ns 116342 ns 4.12
array/iteration/findmin/1d 793041 ns 170152 ns 4.66
array/iteration/findmin/2d 819362 ns 153822 ns 5.33
array/iteration/logical 515258 ns 350744 ns 1.47
array/iteration/scalar 275884 ns 296083 ns 0.93
array/permutedims/2d 115622 ns 74481 ns 1.55
array/permutedims/3d 198453 ns 74251 ns 2.67
array/permutedims/4d 296474 ns 76951 ns 3.85
array/random/rand/Float32 47471 ns 52171 ns 0.91
array/random/rand/Int64 90451 ns 58731 ns 1.54
array/random/rand!/Float32 66481 ns 85101 ns 0.78
array/random/rand!/Int64 189223 ns 69261 ns 2.73
array/random/randn/Float32 136992 ns 98642 ns 1.39
array/random/randn!/Float32 100062 ns 101231 ns 0.99
array/reductions/mapreduce/Float32/1d 472847 ns 134242 ns 3.52
array/reductions/mapreduce/Float32/dims=1 434207 ns 95431 ns 4.55
array/reductions/mapreduce/Float32/dims=1L 32851041 ns 774349 ns 42.42
array/reductions/mapreduce/Float32/dims=2 438946 ns 97531 ns 4.50
array/reductions/mapreduce/Float32/dims=2L 1760865 ns 297464 ns 5.92
array/reductions/mapreduce/Int64/1d 490807 ns 134951 ns 3.64
array/reductions/mapreduce/Int64/dims=1 550688 ns 95301 ns 5.78
array/reductions/mapreduce/Int64/dims=1L 36316261 ns 781800 ns 46.45
array/reductions/mapreduce/Int64/dims=2 562338 ns 96801 ns 5.81
array/reductions/mapreduce/Int64/dims=2L 1759095 ns 299524 ns 5.87
array/reductions/reduce/Float32/1d 506997 ns 133912 ns 3.79
array/reductions/reduce/Float32/dims=1 426166 ns 95711 ns 4.45
array/reductions/reduce/Float32/dims=1L 35624312 ns 775219 ns 45.95
array/reductions/reduce/Float32/dims=2 548737 ns 97621 ns 5.62
array/reductions/reduce/Float32/dims=2L 1796905 ns 297424 ns 6.04
array/reductions/reduce/Int64/1d 495207 ns 134602 ns 3.68
array/reductions/reduce/Int64/dims=1 540128 ns 95311 ns 5.67
array/reductions/reduce/Int64/dims=1L 36295651 ns 780269 ns 46.52
array/reductions/reduce/Int64/dims=2 560428 ns 97121 ns 5.77
array/reductions/reduce/Int64/dims=2L 1772895 ns 299264 ns 5.92
array/reverse/1d 207643 ns 44550 ns 4.66
array/reverse/1dL 528128 ns 76661 ns 6.89
array/reverse/1dL_inplace 647099 ns 173202 ns 3.74
array/reverse/1d_inplace 269914 ns 84571 ns 3.19
array/reverse/2d 314174 ns 52831 ns 5.95
array/reverse/2dL 589058 ns 102811 ns 5.73
array/reverse/2dL_inplace 654099 ns 178873 ns 3.66
array/reverse/2d_inplace 339214 ns 96051 ns 3.53
array/sorting/1d 13185389 ns 379995 ns 34.70
integration/byval/reference 120982 ns 39540 ns 3.06
integration/byval/slices=1 120211 ns 40350 ns 2.98
integration/byval/slices=2 258893 ns 159152 ns 1.63
integration/byval/slices=3 1266579 ns 238933 ns 5.30
integration/volumerhs 5292186 ns 5031334 ns 1.05
kernel/indexing 78391 ns 65521 ns 1.20
kernel/indexing_checked 214043 ns 72491 ns 2.95
kernel/launch 1360 ns 1280 ns 1.06
kernel/rand 256404 ns 124252 ns 2.06
latency/import 1626534717 ns 1491816057 ns 1.09
latency/precompile 11900817336 ns 11773992921 ns 1.01
latency/ttfp 10743222244 ns 10954774141 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants