Skip to content

Fix rocFFT error with multiple tasks#935

Draft
luraess wants to merge 1 commit into
mainfrom
lr/rocfft
Draft

Fix rocFFT error with multiple tasks#935
luraess wants to merge 1 commit into
mainfrom
lr/rocfft

Conversation

@luraess

@luraess luraess commented Jun 19, 2026

Copy link
Copy Markdown
Member

Fix rocfft_status_failure when executing FFT on a spawned Julia task (Julia 1.12+)

In Julia 1.12, Threads.@spawn seems to inherits task-local storage (via Base.copy?), so the spawned task sees the same HIPStream as the parent. update_stream! was guarded by plan.stream != new_stream, causing it to skip rocfft_execution_info_set_stream, however rocFFT requires this call to be made on the same OS thread as rocfft_execute. With multiple Julia threads the spawned task can land on a different OS thread, causing rocfft_status_failure.

The fix is to call rocfft_execution_info_set_stream unconditionally before every execution, dropping the plan.stream != new_stream guard. This is unobservable in a (single-threaded) REPL (all tasks share one OS thread?) but it manifests sometimes in CI.

@luraess luraess changed the title Set stream before every execution Fix roFFT error with multiple tasks Jun 19, 2026
@luraess luraess changed the title Fix roFFT error with multiple tasks Fix rocFFT error with multiple tasks Jun 19, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU.jl Benchmarks

Details
Benchmark suite Current: bfc256a Previous: 756602c Ratio
amdgpu/synchronization/context/device 610 ns 600 ns 1.02
amdgpu/synchronization/stream/blocking 250 ns 240 ns 1.04
amdgpu/synchronization/stream/nonblocking 340 ns 340 ns 1
array/accumulate/Float32/1d 86361 ns 86251 ns 1.00
array/accumulate/Float32/dims=1 397256 ns 393845 ns 1.01
array/accumulate/Float32/dims=1L 135712 ns 131681 ns 1.03
array/accumulate/Float32/dims=2 133462 ns 103022 ns 1.30
array/accumulate/Float32/dims=2L 2805579 ns 2827930 ns 0.99
array/accumulate/Int64/1d 96171 ns 96412 ns 1.00
array/accumulate/Int64/dims=1 407356 ns 285244 ns 1.43
array/accumulate/Int64/dims=1L 167053 ns 160812 ns 1.04
array/accumulate/Int64/dims=2 126791 ns 120772 ns 1.05
array/accumulate/Int64/dims=2L 2987371 ns 3014433 ns 0.99
array/broadcast 93662 ns 128932 ns 0.73
array/construct 1590 ns 1680 ns 0.95
array/copy 37621 ns 39371 ns 0.96
array/copyto!/cpu_to_gpu 184263 ns 114832 ns 1.60
array/copyto!/gpu_to_cpu 183652 ns 152432 ns 1.20
array/copyto!/gpu_to_gpu 126892 ns 88321 ns 1.44
array/iteration/findall/bool 179892 ns 181912 ns 0.99
array/iteration/findall/int 187303 ns 190933 ns 0.98
array/iteration/findfirst/bool 123721 ns 114451 ns 1.08
array/iteration/findfirst/int 118372 ns 116331 ns 1.02
array/iteration/findmin/1d 166743 ns 166203 ns 1.00
array/iteration/findmin/2d 155752 ns 156173 ns 1.00
array/iteration/logical 348885 ns 346025 ns 1.01
array/iteration/scalar 288354 ns 289864 ns 0.99
array/permutedims/2d 74901 ns 64761 ns 1.16
array/permutedims/3d 74231 ns 73791 ns 1.01
array/permutedims/4d 76831 ns 76481 ns 1.00
array/random/rand/Float32 50981 ns 51540 ns 0.99
array/random/rand/Int64 57291 ns 56210 ns 1.02
array/random/rand!/Float32 92261 ns 142162 ns 0.65
array/random/rand!/Int64 116052 ns 141832 ns 0.82
array/random/randn/Float32 100182 ns 86921 ns 1.15
array/random/randn!/Float32 116242 ns 152202 ns 0.76
array/reductions/mapreduce/Float32/1d 132841 ns 132902 ns 1.00
array/reductions/mapreduce/Float32/dims=1 95592 ns 95052 ns 1.01
array/reductions/mapreduce/Float32/dims=1L 772881 ns 777081 ns 0.99
array/reductions/mapreduce/Float32/dims=2 96871 ns 96731 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 298774 ns 299584 ns 1.00
array/reductions/mapreduce/Int64/1d 134062 ns 133322 ns 1.01
array/reductions/mapreduce/Int64/dims=1 95462 ns 78081 ns 1.22
array/reductions/mapreduce/Int64/dims=1L 783781 ns 783471 ns 1.00
array/reductions/mapreduce/Int64/dims=2 94881 ns 96252 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 300434 ns 308254 ns 0.97
array/reductions/reduce/Float32/1d 132662 ns 132802 ns 1.00
array/reductions/reduce/Float32/dims=1 95001 ns 94832 ns 1.00
array/reductions/reduce/Float32/dims=1L 773371 ns 774621 ns 1.00
array/reductions/reduce/Float32/dims=2 97222 ns 96802 ns 1.00
array/reductions/reduce/Float32/dims=2L 297484 ns 307245 ns 0.97
array/reductions/reduce/Int64/1d 133152 ns 129672 ns 1.03
array/reductions/reduce/Int64/dims=1 94852 ns 78151 ns 1.21
array/reductions/reduce/Int64/dims=1L 780821 ns 781931 ns 1.00
array/reductions/reduce/Int64/dims=2 96111 ns 96192 ns 1.00
array/reductions/reduce/Int64/dims=2L 303334 ns 298414 ns 1.02
array/reverse/1d 43921 ns 44380 ns 0.99
array/reverse/1dL 75401 ns 74131 ns 1.02
array/reverse/1dL_inplace 112692 ns 108282 ns 1.04
array/reverse/1d_inplace 77891 ns 86471 ns 0.90
array/reverse/2d 52211 ns 50661 ns 1.03
array/reverse/2dL 101491 ns 100341 ns 1.01
array/reverse/2dL_inplace 130102 ns 117622 ns 1.11
array/reverse/2d_inplace 79911 ns 95391 ns 0.84
array/sorting/1d 341775 ns 341945 ns 1.00
integration/byval/reference 39130 ns 38830 ns 1.01
integration/byval/slices=1 40161 ns 40880 ns 0.98
integration/byval/slices=2 140522 ns 158462 ns 0.89
integration/byval/slices=3 237983 ns 238013 ns 1.00
integration/volumerhs 5050459 ns 4942659 ns 1.02
kernel/indexing 129511 ns 43630 ns 2.97
kernel/indexing_checked 124942 ns 128022 ns 0.98
kernel/launch 1310 ns 1290 ns 1.02
kernel/rand 123941 ns 106671 ns 1.16
latency/import 1493016620 ns 1501349912 ns 0.99
latency/precompile 11942427457 ns 12041117438 ns 0.99
latency/ttfp 10901786827 ns 10491950084 ns 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@luraess

luraess commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

cscs-ci run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant