Skip to content

Commit 6d1a540

Browse files
committed
data: Yield the CPU in the NBX exchange loop under oversubscription
nbx_exchange polled with a tight 'while True: Iprobe' busy-wait, spinning at 100% CPU while waiting for peer messages and the termination barrier. When ranks are oversubscribed (more ranks than cores), a spinning rank starves the very peer it is waiting on, so MPI makes no progress and the exchange deadlocks. This is exactly the MPI-notebook CI configuration: a 4-engine ipyparallel cluster on a 2-core runner (OpenMPI --oversubscribe), where it manifested as a multi-minute cell timeout. Yield (time.sleep(0)) whenever a poll pass finds nothing ready, so co-scheduled ranks can run; drain ready messages first via . Verified: correctness unchanged (44 routed/gather mode-4 tests pass) and 16 ranks on 8 cores complete in 0.35s instead of hanging.
1 parent 82081ba commit 6d1a540

1 file changed

Lines changed: 10 additions & 1 deletion

File tree

devito/data/distributed/transport.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
graph communicator without affecting the layers above.
99
"""
1010

11+
import time
12+
1113
import numpy as np
1214

1315
from devito.mpi import MPI
@@ -83,12 +85,19 @@ def nbx_exchange(comm, sendbufs, dtype, tag=0):
8385
buf = np.empty(count, dtype=dtype)
8486
comm.Recv([buf, mpitype], source=src, tag=tag)
8587
recvd[src] = buf
86-
elif barrier is None:
88+
# Drain any further ready messages before yielding
89+
continue
90+
if barrier is None:
8791
if MPI.Request.Testall(sends):
8892
# All my sends were matched -> announce I am done sending
8993
barrier = comm.Ibarrier()
9094
elif barrier.Test():
9195
# Everyone is done sending and nothing is in flight
9296
break
97+
# Nothing was ready this pass: yield the CPU so co-scheduled ranks can
98+
# make progress. Without this the probe loop busy-waits at 100%, which
99+
# deadlocks the exchange when ranks are oversubscribed (more ranks than
100+
# cores, e.g. the 4-engine ipyparallel cluster on a 2-core CI runner).
101+
time.sleep(0)
93102

94103
return recvd

0 commit comments

Comments
 (0)