Skip to content

Commit 25f5912

Browse files
committed
Skip hipStreamSynchronize on iGPU when stream is idle
On integrated GPUs with fine-grained coherent unified memory, hipStreamSynchronize is unnecessary when the stream has no pending work. Use hipStreamQuery (non-blocking) to check first, only sync when the stream is actually busy. Results: hipStreamSynchronize calls reduced from 5683 to 54 (-99%).
1 parent ef8190c commit 25f5912

1 file changed

Lines changed: 6 additions & 1 deletion

File tree

mlx/backend/rocm/allocator.cpp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -550,7 +550,12 @@ void* Buffer::raw_ptr() {
550550
auto& cbuf = *static_cast<rocm::RocmBuffer*>(ptr_);
551551

552552
if (cbuf.device == -1) {
553-
(void)hipStreamSynchronize(nullptr);
553+
// Unified memory on iGPU: fine-grained coherent memory means CPU sees
554+
// GPU writes without explicit sync. Only sync if the stream has pending
555+
// work (hipStreamQuery returns hipErrorNotReady when busy).
556+
if (hipStreamQuery(nullptr) != hipSuccess) {
557+
(void)hipStreamSynchronize(nullptr);
558+
}
554559
} else {
555560
(void)hipDeviceSynchronize();
556561
rocm::allocator().move_to_unified_memory(cbuf);

0 commit comments

Comments
 (0)