Add --retry-failed: re-download only the failed entries from a prior report#23
Open
pchvykov wants to merge 2 commits into
Open
Add --retry-failed: re-download only the failed entries from a prior report#23pchvykov wants to merge 2 commits into
pchvykov wants to merge 2 commits into
Conversation
… + O(n^2) metadata write Two issues compounded to make large downloads get progressively slower the deeper/larger the tree: 1. Unbounded concurrency. process_item_parallel spawned a fresh ThreadPoolExecutor per directory and parent threads blocked on their children, so live concurrency multiplied with tree depth (observed ~24 concurrent transfers with max_workers=4). That saturated the shared HTTP pool (connection churn -> CLOSE_WAIT pileup) and tripped iCloud server-side throttling. Now there is a single global ThreadPoolExecutor for the whole traversal. Directory workers submit their children onto the same pool via _submit and return immediately instead of blocking (completion is counted via future done-callbacks, so there is no pool-starvation deadlock). The HTTPAdapter is sized to max_workers with pool_block=True to stop connection churn. 2. O(n^2) version-metadata write. download_drive_item rewrote the entire .ifetch_versions.json under a global lock after every file, serializing all download threads. The in-memory map is now updated per-file and flushed once at the end of download(). Verified: full test suite (114 tests) passes; smoke test over a 484-file / 5-level tree drains with peak concurrency == max_workers. Co-Authored-By: Claude <noreply@anthropic.com>
…rior report
After a large download, a handful of files often fail (transient network /
throttling). Re-running the whole download re-walks the entire tree and
re-verifies every already-good file, which is slow on large folders.
--retry-failed reads a prior download_report.json, maps each "failed" local
path back to its iCloud remote path (icloud_path is the remote root that
local_path mirrors), and downloads only those, reusing the shared bounded
pool. Paths are resolved on both sides so a symlinked local root
(e.g. /tmp -> /private/tmp) still matches. A retry_<report>.json is written for
a follow-up pass.
Usage:
ifetch <icloud_path> <local_path> --retry-failed
ifetch <icloud_path> <local_path> --retry-failed /path/to/report.json
_submit is refactored into a reusable _submit_task so the retry path reuses the
same traversal-completion counting as the main download.
Verified: full test suite (114 tests) passes; --retry-failed retries only
failed entries, skips completed ones, and records unresolvable paths as
still-failed.
Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
After a large download, a handful of files often fail for transient reasons (network blips, throttling). Re-running the whole download re-walks the entire tree and re-verifies every already-good file, which is slow on large folders.
--retry-failedtargets only the files that failed.What it does
Reads a prior
download_report.json, picks out the entries withstatus == "failed", maps each failed local path back to its iCloud remote path (icloud_pathis the remote root thatlocal_pathmirrors), and downloads just those - skipping the full-tree walk entirely. A freshretry_<report>.jsonis written so a second pass can target anything still failing.Notes
.resolve()d before matching, so a symlinked/normalised local root (e.g./tmp->/private/tmp) still lines up with the absolute paths recorded in the report._submitis refactored into a small reusable_submit_task(fn, *args)so the retry path reuses the exact same traversal-completion counting as the main download.Testing
failedentries are retried, already-completed files are skipped, and unresolvable paths are recorded as still-failed.Dependency
Co-Authored-By: Claude noreply@anthropic.com