This issue is similar to #1323 (regarding the issue that can be observed from the outside), but since it is triggered by a slightly different setup, the root cause is different and it requires a different fix, I created a new issue.
When running the following test file
def test_1():
import time
time.sleep(5)
assert True
def test_2():
assert True
with
pytest -n1 --dist=loadgroup -o faulthandler_timeout=1 -o faulthandler_exit_on_timeout=true testing/test_timeout.py
test execution hangs with
replacing crashed worker gw0
2 workers [2 items]
The issue is that only one item is assigned to the new worker in
|
if self.collection is not None: |
|
for node in self.nodes: |
|
self._reschedule(node) |
|
return |
This item is removed from the queue in
|
self.nextitem_index = self.torun.get() |
Then, in
|
self.nextitem_index = self.torun.get() |
the worker waits for a second item.
In contrast, at first start (when self.collection is still None), two items are added to the queue via
|
# Assign initial workload |
|
for node in self.nodes: |
|
self._assign_work_unit(node) |
|
|
|
# Ensure nodes start with at least two work units if possible (#277) |
|
for node in self.nodes: |
|
self._reschedule(node) |
This issue is similar to #1323 (regarding the issue that can be observed from the outside), but since it is triggered by a slightly different setup, the root cause is different and it requires a different fix, I created a new issue.
When running the following test file
with
test execution hangs with
The issue is that only one item is assigned to the new worker in
pytest-xdist/src/xdist/scheduler/loadscope.py
Lines 352 to 355 in 8fed345
This item is removed from the queue in
pytest-xdist/src/xdist/remote.py
Line 204 in 8fed345
Then, in
pytest-xdist/src/xdist/remote.py
Line 214 in 8fed345
the worker waits for a second item.
In contrast, at first start (when
self.collectionis stillNone), two items are added to the queue viapytest-xdist/src/xdist/scheduler/loadscope.py
Lines 396 to 402 in 8fed345