Skip to content

Add error handling for gzipped file decompression and external command failures in fread()#7097

Merged
MichaelChirico merged 12 commits into
Rdatatable:masterfrom
Mukulyadav2004:issue_5415
Jun 28, 2025
Merged

Add error handling for gzipped file decompression and external command failures in fread()#7097
MichaelChirico merged 12 commits into
Rdatatable:masterfrom
Mukulyadav2004:issue_5415

Conversation

@Mukulyadav2004
Copy link
Copy Markdown
Contributor

fixes #5415

This PR enhances fread()'s error handling to provide actionable error messages when system commands fail or file decompression encounters issues (like disk full scenarios). Previously, these failures could result in warnings or silent truncation.

The current implementation of fread() has two silent failure modes that can lead to data corruption and hard-to-debug issues:

  • Gzipped file decompression failures: When R.utils::decompressFile() fails due to insufficient disk space or other issues, the function silently continues with a truncated file, leading to partial data reads and warnings that are easily missed in non-interactive sessions.

  • External command failures: When using the cmd parameter, failed external commands (non-zero exit codes) are not detected, potentially leading to processing of empty or incomplete files.

Solution-

  • As R.utils::decompressFile() is called without error handling, allowing silent failures so I wrapped R.utils::decompressFile() in tryCatch() to catch decompression failures and added comprehensive error message that explains the likely cause (disk full) and mention disc space and tmpdir argument.

  • And System command exit codes are not checked thus missing command failures, so to fix - Captured the return status from system() command and added exit code validation with error message.

  • Also in both modifications - moved on.exit(unlink(tmpFile), add=TRUE) before command execution to prevent storage leaks.

@tdhock @jangorecki @Anirban166 @joshhwuu can you please review.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.69%. Comparing base (5bb6450) to head (4e0e2ba).
Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7097   +/-   ##
=======================================
  Coverage   98.69%   98.69%           
=======================================
  Files          79       79           
  Lines       14680    14687    +7     
=======================================
+ Hits        14489    14496    +7     
  Misses        191      191           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread R/fread.R Outdated
Comment thread R/fread.R
Comment thread R/fread.R Outdated
Comment thread R/fread.R Outdated
Comment thread R/fread.R
Comment thread R/fread.R Outdated
@MichaelChirico
Copy link
Copy Markdown
Member

I still see the tests failing, PTAL & ping if you're stuck

@Mukulyadav2004
Copy link
Copy Markdown
Contributor Author

Mukulyadav2004 commented Jun 27, 2025

Thanks! I tried making the file in 2nd test more corrupted cuz might be the corrupted gzip file not corrupted enough. But its still failing, can you please review the second test.

@MichaelChirico
Copy link
Copy Markdown
Member

Oh, I managed to do it:

tmp <- tempfile()
file.create(tmp)
conn <- file(tmp, 'wb')
# this is data.table:::known_signatures$gzip
writeBin(as.raw(c(31, 139)), conn)
close(conn)

fread(tmp)
# Error in readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) : 
#   error reading from the connection
# In addition: Warning message:
# In readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) :
#   invalid or incomplete compressed data

@Mukulyadav2004
Copy link
Copy Markdown
Contributor Author

Thank you for the working example. I've updated the test to use your approach

Comment thread inst/tests/tests.Rraw Outdated
Copy link
Copy Markdown
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Comment thread inst/tests/tests.Rraw Outdated
@MichaelChirico
Copy link
Copy Markdown
Member

Slightly worried the warnings/errors we observe could be more platform-dependent than we've detected here. Let's see if anything comes out of the more thorough GLCI suite.

@MichaelChirico MichaelChirico merged commit 94f4081 into Rdatatable:master Jun 28, 2025
9 of 10 checks passed
@Mukulyadav2004 Mukulyadav2004 deleted the issue_5415 branch June 28, 2025 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fread should raise an error instead of a warning when reading a gzipped file that does not fit in temporary storage

3 participants