Clearer error when a job hits a full disk (ENOSPC) (#1899)#2293
Open
arimu1 wants to merge 1 commit into
Open
Conversation
…anguage#1899) When the temporary or output directory fills up mid-job, the OSError (errno 28, ENOSPC) fell through to the generic 'Exception while running job' handler and dumped a traceback, giving the user no idea the real problem was a full disk. Add a dedicated ENOSPC branch that logs an actionable message naming the tmpdir/outdir and pointing at --tmpdir-prefix / --outdir. Also switch the existing errno == 2 check to the named errno.ENOENT for clarity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Member
|
@arimu1 thanks for using cwltool and contributing your fixes. Let's get the existing PRs resolved before opening more. Perhaps you can fix the codecov upload issue. Maybe the version needa bumping? If that doesn't work, I would accept a temporary disabling of CodeCov, but then reviews will take longer as I will have to personally check the code coverage locally. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #1899.
When the user's temporary drive (or output directory) fills up while a job is running, the underlying
OSErrorcarrieserrno 28(ENOSPC, "No space left on device"). InJobBase._executethis fell through to the catch-all branch:…which dumps a stack trace and gives the user no hint that the actual problem is a full disk (the symptom reported on the CWL forum thread linked from the issue: a tool silently produces empty outputs).
Fix
Add a dedicated
ENOSPCbranch to the existingOSErrorhandler incwltool/job.py, right next to theENOENT("command not found") case that's already special-cased there. It logs a concise, actionable message instead of a traceback:The full traceback is still available with
--debug(the branch passesexc_info=runtimeContext.debug, matching the sibling cases). I also changed the existing magic-numbere.errno == 2to the namederrno.ENOSPC's counterparterrno.ENOENTfor readability, now thaterrnois imported.Tests
tests/test_examples.py::test_disk_full_error_messageinjects anOSError(errno.ENOSPC, …)at the job's output-handling stage (monkeypatchingcwltool.job.bytes2str_in_dicts) during a normalechorun, and asserts:--tmpdir-prefixguidance,Traceback (most recent call last)is printed,permanentFail).I verified this test fails against the unpatched code (it hits the old generic-exception/traceback path), proving it guards the behavior.
black,flake8, andisortare clean on the changed files.This addresses the "provide a better error message" ask in #1899. Actually probing free space ahead of time is a larger, platform-specific change and isn't attempted here.
This change was produced with the assistance of Claude Code (model: Claude Opus). The diff, root-cause analysis, and test were reviewed by a human before submission.