Replace CUDA.synchronize() with CuEvent-based synchronization by victorcamaraa · Pull Request #700 · JuliaParallel/Dagger.jl

victorcamaraa · 2026-04-17T21:10:21Z

Replaces all CUDA.synchronize() device-wide barriers in CUDAExt.jl with CuEvent-based synchronization.

The old approach issued a full cuDeviceSynchronize on every data movement and gpu_synchronize() call, stalling the CPU until the entire device was idle. The new approach records a CuEvent on the relevant stream and issues a GPU-side wait, so only the streams that actually share data are synchronized and the CPU is never blocked unnecessarily.

The one exception is DtoH transfers, where the CPU genuinely needs to wait for data to arrive in host memory, those use CUDA.synchronize(ev), which is a CPU wait scoped to a single event rather than the full device.

Changes

_sync_with_context: captures caller's stream before context switch, replaces CUDA.synchronize() with CuEvent record + GPU-side CUDA.wait
gpu_synchronize: captures user's ambient stream before context switch, same event pattern
HtoD move: drops CUDA.synchronize() entirely — stream ordering is sufficient
DtoH move: replaces full barrier with CUDA.synchronize(ev) scoped to the active stream
DtoD same-device move: drops barrier — task graph ordering guarantees upstream completion
DtoD cross-device move: replaces source-side barrier with a CuEvent that the destination stream GPU-waits on
gpu_synchronize(::Val{:CUDA}): refactored to call gpu_synchronize(proc) per device for consistency

Notes:

These implementations were tested on a single GTX 1060 6GB; therefore, I was not able to test DtoD
Debugging and partial implementation were assisted by Claude Sonnet 4.7 and Gemini

jpsamaroo

Great work! I have some tweaks for you to make, but this otherwise is looking good!

jpsamaroo · 2026-04-21T18:16:56Z

+        CUDA.record(ev, stream())
+        CUDA.synchronize(ev) # CPU waits ONLY for this stream to finish
+
+        return adapt(Array, x)


I don't think this is necessary - CUDA.synchronize() is a stream-local sync (it's the same as CUDA.synchronize(stream()).

jpsamaroo · 2026-04-21T18:18:01Z

 function Dagger.move(from_proc::CPUProc, to_proc::CuArrayDeviceProc, x)
    with_context(to_proc) do
        arr = adapt(CuArray, x)
-        CUDA.synchronize()


These are necessary to ensure that x isn't modified before it's read into arr, I believe.

jpsamaroo · 2026-04-21T18:18:32Z

+
        _x = Array{T,N}(undef, size(x))
        copyto!(_x, x)
-        CUDA.synchronize()


Same with this function, this shouldn't be necessary.

jpsamaroo · 2026-04-21T21:02:55Z

                Array(unwrap(x))
            end
-        end
+        end       


Suggested change

end

end

jpsamaroo · 2026-04-21T21:03:45Z

+                ev = CUDA.CuEvent()
+                CUDA.record(ev, stream())
+                CUDA.synchronize(ev) 


This could actually just be done as CUDA.synchronize(), we don't need an event here.

jpsamaroo · 2026-04-21T21:04:01Z

    if from_proc == to_proc
-        with_context(CUDA.synchronize, from_proc)
        return x
+


Suggested change

jpsamaroo · 2026-04-21T21:15:14Z

        host_copy = with_context(from_proc) do
+            ev = CUDA.CuEvent()
+            CUDA.record(ev, stream())
+            CUDA.synchronize(ev) 


This one also is probably just CUDA.synchronize()

jpsamaroo approved these changes Apr 21, 2026

View reviewed changes

Replace CUDA.synchronize() with CuEvent-based synchronization

7b6da27

victorcamaraa force-pushed the CuEvent-sync branch from f45a3d2 to 7b6da27 Compare April 22, 2026 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace CUDA.synchronize() with CuEvent-based synchronization#700

Replace CUDA.synchronize() with CuEvent-based synchronization#700
victorcamaraa wants to merge 1 commit intoJuliaParallel:masterfrom
victorcamaraa:CuEvent-sync

victorcamaraa commented Apr 17, 2026

Uh oh!

jpsamaroo left a comment

Uh oh!

Uh oh!

jpsamaroo Apr 21, 2026

Uh oh!

jpsamaroo Apr 21, 2026

Uh oh!

jpsamaroo Apr 21, 2026

Uh oh!

Uh oh!

jpsamaroo Apr 21, 2026

Uh oh!

jpsamaroo Apr 21, 2026

Uh oh!

jpsamaroo Apr 21, 2026

Uh oh!

Uh oh!

jpsamaroo Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

victorcamaraa commented Apr 17, 2026

Changes

Notes:

Uh oh!

jpsamaroo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpsamaroo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

jpsamaroo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

jpsamaroo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpsamaroo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

jpsamaroo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

jpsamaroo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpsamaroo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants