Add error handling for gzipped file decompression and external command failures in fread()#7097
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7097 +/- ##
=======================================
Coverage 98.69% 98.69%
=======================================
Files 79 79
Lines 14680 14687 +7
=======================================
+ Hits 14489 14496 +7
Misses 191 191 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I still see the tests failing, PTAL & ping if you're stuck |
|
Thanks! I tried making the file in 2nd test more corrupted cuz might be the corrupted gzip file not corrupted enough. But its still failing, can you please review the second test. |
|
Oh, I managed to do it: tmp <- tempfile()
file.create(tmp)
conn <- file(tmp, 'wb')
# this is data.table:::known_signatures$gzip
writeBin(as.raw(c(31, 139)), conn)
close(conn)
fread(tmp)
# Error in readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) :
# error reading from the connection
# In addition: Warning message:
# In readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) :
# invalid or incomplete compressed data |
|
Thank you for the working example. I've updated the test to use your approach |
|
Slightly worried the warnings/errors we observe could be more platform-dependent than we've detected here. Let's see if anything comes out of the more thorough GLCI suite. |
fixes #5415
This PR enhances
fread()'s error handling to provide actionable error messages when system commands fail or file decompression encounters issues (like disk full scenarios). Previously, these failures could result in warnings or silent truncation.The current implementation of
fread()has two silent failure modes that can lead to data corruption and hard-to-debug issues:Gzipped file decompression failures: When
R.utils::decompressFile()fails due to insufficient disk space or other issues, the function silently continues with a truncated file, leading to partial data reads and warnings that are easily missed in non-interactive sessions.External command failures: When using the
cmdparameter, failed external commands (non-zero exit codes) are not detected, potentially leading to processing of empty or incomplete files.Solution-
As
R.utils::decompressFile()is called without error handling, allowing silent failures so I wrappedR.utils::decompressFile()intryCatch()to catch decompression failures and added comprehensive error message that explains the likely cause (disk full) and mention disc space andtmpdirargument.And System command exit codes are not checked thus missing command failures, so to fix - Captured the return status from
system()command and added exit code validation with error message.Also in both modifications - moved
on.exit(unlink(tmpFile), add=TRUE)before command execution to prevent storage leaks.@tdhock @jangorecki @Anirban166 @joshhwuu can you please review.