Skip to content

pytest --dist=loadgroup hangs if a crashed worker is restarted #1323

Description

@radoering

When running the following test file

def test_1():
    assert True

def test_2():
    import time
    time.sleep(5)
    assert True

with (uses features from pytest 9)

pytest -n1 --dist=loadgroup -o faulthandler_timeout=1 -o faulthandler_exit_on_timeout=true testing/test_timeout.py

test execution hangs with

replacing crashed worker gw0
collecting: 1/2 workers
collecting: 1/2 workers
2 workers [2 items]

When using another test distribution algorithm, test execution does not hang.

The issue is that

# Made uncompleted work unit available again
self.workqueue.update(workload)

adds the complete workload to the queue, including the completed work.

Later in

nodeids_indexes = [
worker_collection.index(nodeid)
for nodeid, completed in work_unit.items()
if not completed
]

completed work items are dropped so that nodeids_indexes is empty.

Finally, the process hangs in

self.nextitem_index = self.torun.get()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions