Clearer error when a job hits a full disk (ENOSPC) (#1899) by arimu1 · Pull Request #2293 · common-workflow-language/cwltool

arimu1 · 2026-06-19T03:20:00Z

Problem

When the user's temporary drive (or output directory) fills up while a job is running, the underlying OSError carries errno 28 (ENOSPC, "No space left on device"). In JobBase._execute this fell through to the catch-all branch:

else:
    _logger.exception("Exception while running job: %s", str(e), ...)
processStatus = "permanentFail"

…which dumps a stack trace and gives the user no hint that the actual problem is a full disk (the symptom reported on the CWL forum thread linked from the issue: a tool silently produces empty outputs).

Fix

Add a dedicated ENOSPC branch to the existing OSError handler in cwltool/job.py, right next to the ENOENT ("command not found") case that's already special-cased there. It logs a concise, actionable message instead of a traceback:

[job foo] No space left on device. The temporary directory (/tmp/…) and/or
output directory (/…/out) may be full; free up space, or point --tmpdir-prefix
and --outdir at a location with more capacity.

The full traceback is still available with --debug (the branch passes exc_info=runtimeContext.debug, matching the sibling cases). I also changed the existing magic-number e.errno == 2 to the named errno.ENOSPC's counterpart errno.ENOENT for readability, now that errno is imported.

Tests

tests/test_examples.py::test_disk_full_error_message injects an OSError(errno.ENOSPC, …) at the job's output-handling stage (monkeypatching cwltool.job.bytes2str_in_dicts) during a normal echo run, and asserts:

the message contains "No space left on device" and the --tmpdir-prefix guidance,
no raw Traceback (most recent call last) is printed,
exit code is 1 (permanentFail).

I verified this test fails against the unpatched code (it hits the old generic-exception/traceback path), proving it guards the behavior. black, flake8, and isort are clean on the changed files.

This addresses the "provide a better error message" ask in #1899. Actually probing free space ahead of time is a larger, platform-specific change and isn't attempted here.

This change was produced with the assistance of Claude Code (model: Claude Opus). The diff, root-cause analysis, and test were reviewed by a human before submission.

…anguage#1899) When the temporary or output directory fills up mid-job, the OSError (errno 28, ENOSPC) fell through to the generic 'Exception while running job' handler and dumped a traceback, giving the user no idea the real problem was a full disk. Add a dedicated ENOSPC branch that logs an actionable message naming the tmpdir/outdir and pointing at --tmpdir-prefix / --outdir. Also switch the existing errno == 2 check to the named errno.ENOENT for clarity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

mr-c · 2026-06-19T09:52:04Z

@arimu1 thanks for using cwltool and contributing your fixes. Let's get the existing PRs resolved before opening more.

Perhaps you can fix the codecov upload issue. Maybe the version needa bumping? If that doesn't work, I would accept a temporary disabling of CodeCov, but then reviews will take longer as I will have to personally check the code coverage locally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clearer error when a job hits a full disk (ENOSPC) (#1899)#2293

Clearer error when a job hits a full disk (ENOSPC) (#1899)#2293
arimu1 wants to merge 1 commit into
common-workflow-language:mainfrom
arimu1:fix/1899-disk-full-error

arimu1 commented Jun 19, 2026

Uh oh!

mr-c commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

arimu1 commented Jun 19, 2026

Problem

Fix

Tests

Uh oh!

mr-c commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants