create: implement retries for individual fs files (backport of #7351)#9812
Draft
ThomasWaldmann wants to merge 4 commits into
Draft
create: implement retries for individual fs files (backport of #7351)#9812ThomasWaldmann wants to merge 4 commits into
ThomasWaldmann wants to merge 4 commits into
Conversation
…ckup#7351) Backport of borgbackup#7351 to 1.4-maint. Retry reading a backup source file when it raises a BackupError / BackupOSError (e.g. transient I/O error, or the file changed while we read it): - retry the same file after some sleep time (1ms, growing exponentially up to 10s, 10 tries) - BackupOSError: if retries do not help, skip the file, log it with "E" status - BackupError (changed while reading): last try backs it up, logged with "C" status Works for borg create's builtin fs recursion, --paths-from-command and --paths-from-stdin. 1.4-specific: do NOT retry a file once checkpoint part files (.borg_part_N) have been written for it. Re-reading it from the start would create duplicate / inconsistent part files (concatenating all part files would then no longer yield the complete file). ChunksProcessor.last_part_number tracks this; such a file is kept as-is ("C") or skipped ("E") instead of being retried. Permission errors (EPERM/EACCES) are not retried (they won't get better), so we don't waste ~15s of backoff on every unreadable file. Adds a ChunkerFailing test chunker (chunker-params=fail,block_size,map) to drive the error handling, and tests: - test_create_erroneous_file: file is read on a later retry - test_create_no_permission_file: permission error is not retried, file skipped - test_create_erroneous_file_with_part_files: file with part files is not retried Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 1.4-maint #9812 +/- ##
=============================================
+ Coverage 82.04% 82.30% +0.26%
=============================================
Files 38 38
Lines 11384 11411 +27
Branches 1794 1799 +5
=============================================
+ Hits 9340 9392 +52
+ Misses 1460 1433 -27
- Partials 584 586 +2 ☔ View full report in Codecov by Harness. |
Co-authored-by: Junie <junie@jetbrains.com>
Co-authored-by: Junie <junie@jetbrains.com>
The changed-file retry path already rolls back the chunks it added before re-reading. Do the same when a read error (BackupOSError) interrupts reading a file: the chunks added since the last checkpoint (item.chunks[from_chunk:]) are not referenced by any committed part item, so without rolling them back they leak as an inflated chunk refcount (or, if content also shifted, an orphan chunk). The rollback lives in process_file_chunks, where from_chunk is known: chunks before from_chunk are referenced by part files we already wrote and must be kept; only the uncommitted tail is decref'd. Adds test_create_erroneous_file_read_retry_rolls_back_chunks: the re-read chunks dedup to the same ids, so the leak shows up as a too-high refcount (not an orphan id) - the test checks refcounts directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
f5aba18 to
7133066
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of #7351 to
1.4-maint.Retry reading a backup source file when it raises a
BackupError/BackupOSError:BackupOSError: if retries don't help, skip the file, log it withEstatusBackupError(changed while reading): last try backs it up, logged withCstatusWorks for
borg create's builtin fs recursion,--paths-from-commandand--paths-from-stdin.1.4-specific care: checkpoint part files
1.4 has checkpointing that may write
.borg_part_Nfiles. A naive retry would re-read the file from the start and write a second set of part files, so concatenating all part files would no longer reproduce the complete file. Therefore a file is not retried once part files have been written for it (ChunksProcessor.last_part_numbertracks this) — it is kept as-is (C) or skipped (E) instead.Permission errors
EPERM/EACCESare not retried (they won't get better) — avoids ~15s of pointless backoff on every unreadable file. This is the post-#7351 follow-up behavior; included here to avoid a regression vs. current 1.4.Tests
Adds a
ChunkerFailingtest chunker (--chunker-params=fail,block_size,map) to drive the error handling, and:test_create_erroneous_file(from create: implement retries for individual fs files #7351) — file is read on a later retrytest_create_no_permission_file— permission error is not retried, file skippedtest_create_erroneous_file_with_part_files— new, 1.4-specific — a file with part files is not retried (no master equivalent, since master 2.0 has no part files)🤖 Generated with Claude Code