[2026-04-30T22:47:19+0000] [MainThread] [C] [toil.worker] Worker crashed with traceback:
Traceback (most recent call last):
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/docker/api/client.py", line 275, in _raise_for_status
response.raise_for_status()
~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/requests/models.py", line 1028, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.50/services/create
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/builds/databiosphere/toil/src/toil/worker.py", line 591, in workerScript
job._runner(
~~~~~~~~~~~^
jobGraph=None,
^^^^^^^^^^^^^^
...<2 lines>...
defer=defer,
^^^^^^^^^^^^
)
^
File "/builds/databiosphere/toil/src/toil/job.py", line 3376, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/builds/databiosphere/toil/src/toil/job.py", line 3254, in _run
return self.run(fileStore)
~~~~~~~~^^^^^^^^^^^
File "/builds/databiosphere/toil/src/toil/wdl/wdltoil.py", line 333, in decorated
return decoratee(*args, **kwargs)
File "/builds/databiosphere/toil/src/toil/wdl/wdltoil.py", line 4540, in run
task_container.run(miniwdl_logger, command_string)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/WDL/runtime/task_container.py", line 323, in run
exit_code = self._run(logger, terminating, command)
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/WDL/runtime/backend/docker_swarm.py", line 233, in _run
svc = client.services.create(image_tag, **kwargs)
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/docker/models/services.py", line 235, in create
service_id = self.client.api.create_service(**create_kwargs)
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/docker/utils/decorators.py", line 32, in wrapper
return f(self, *args, **kwargs)
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/docker/api/service.py", line 187, in create_service
return self._result(
~~~~~~~~~~~~^
self._post_json(url, data=data, headers=headers), True
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/docker/api/client.py", line 281, in _result
self._raise_for_status(response)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/docker/api/client.py", line 277, in _raise_for_status
raise create_api_error_from_http_exception(e) from e
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
File "/builds/databiosphere/toil/venv/lib/python3.14/site-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation) from e
docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.50/services/create: Internal Server Error ("rpc error: code = DeadlineExceeded desc = context deadline exceeded")
We should add retry logic, either in MiniWDL (preferred) or else where we call into MiniWDL, to retry with exponential backoff when Docker just fails to create containers for reasons like this.
CI tests can in general fail for no good reason with errors like:
See: https://ucsc-ci.com/databiosphere/toil/-/jobs/108201/raw
We should add retry logic, either in MiniWDL (preferred) or else where we call into MiniWDL, to retry with exponential backoff when Docker just fails to create containers for reasons like this.
┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1832