Skip to content

Add --retry-failed: re-download only the failed entries from a prior report#23

Open
pchvykov wants to merge 2 commits into
roshanlam:mainfrom
pchvykov:feat-retry-failed
Open

Add --retry-failed: re-download only the failed entries from a prior report#23
pchvykov wants to merge 2 commits into
roshanlam:mainfrom
pchvykov:feat-retry-failed

Conversation

@pchvykov
Copy link
Copy Markdown

@pchvykov pchvykov commented Jun 7, 2026

Motivation

After a large download, a handful of files often fail for transient reasons (network blips, throttling). Re-running the whole download re-walks the entire tree and re-verifies every already-good file, which is slow on large folders. --retry-failed targets only the files that failed.

What it does

Reads a prior download_report.json, picks out the entries with status == "failed", maps each failed local path back to its iCloud remote path (icloud_path is the remote root that local_path mirrors), and downloads just those - skipping the full-tree walk entirely. A fresh retry_<report>.json is written so a second pass can target anything still failing.

# default report path (<local_path>/download_report.json)
ifetch <icloud_path> <local_path> --retry-failed

# explicit report path
ifetch <icloud_path> <local_path> --retry-failed /path/to/download_report.json

Notes

  • Local and stored paths are both .resolve()d before matching, so a symlinked/normalised local root (e.g. /tmp -> /private/tmp) still lines up with the absolute paths recorded in the report.
  • Paths that fall outside the local root, or remote paths that no longer resolve, are logged and recorded as still-failed rather than crashing the run.
  • _submit is refactored into a small reusable _submit_task(fn, *args) so the retry path reuses the exact same traversal-completion counting as the main download.

Testing

  • Full test suite passes (114 passed).
  • Verified end to end: only failed entries are retried, already-completed files are skipped, and unresolvable paths are recorded as still-failed.

Dependency

⚠️ This is stacked on #22 - it reuses the shared bounded pool introduced there (_submit_task / _executor / completion counting). Until #22 is merged, the diff here includes that commit too. Please merge #22 first; I'll rebase this onto main afterward for a clean diff.

Co-Authored-By: Claude noreply@anthropic.com

pchvykov and others added 2 commits June 7, 2026 14:20
… + O(n^2) metadata write

Two issues compounded to make large downloads get progressively slower the
deeper/larger the tree:

1. Unbounded concurrency. process_item_parallel spawned a fresh
   ThreadPoolExecutor per directory and parent threads blocked on their
   children, so live concurrency multiplied with tree depth (observed ~24
   concurrent transfers with max_workers=4). That saturated the shared HTTP
   pool (connection churn -> CLOSE_WAIT pileup) and tripped iCloud server-side
   throttling.

   Now there is a single global ThreadPoolExecutor for the whole traversal.
   Directory workers submit their children onto the same pool via _submit and
   return immediately instead of blocking (completion is counted via future
   done-callbacks, so there is no pool-starvation deadlock). The HTTPAdapter is
   sized to max_workers with pool_block=True to stop connection churn.

2. O(n^2) version-metadata write. download_drive_item rewrote the entire
   .ifetch_versions.json under a global lock after every file, serializing all
   download threads. The in-memory map is now updated per-file and flushed once
   at the end of download().

Verified: full test suite (114 tests) passes; smoke test over a 484-file /
5-level tree drains with peak concurrency == max_workers.

Co-Authored-By: Claude <noreply@anthropic.com>
…rior report

After a large download, a handful of files often fail (transient network /
throttling). Re-running the whole download re-walks the entire tree and
re-verifies every already-good file, which is slow on large folders.

--retry-failed reads a prior download_report.json, maps each "failed" local
path back to its iCloud remote path (icloud_path is the remote root that
local_path mirrors), and downloads only those, reusing the shared bounded
pool. Paths are resolved on both sides so a symlinked local root
(e.g. /tmp -> /private/tmp) still matches. A retry_<report>.json is written for
a follow-up pass.

Usage:
    ifetch <icloud_path> <local_path> --retry-failed
    ifetch <icloud_path> <local_path> --retry-failed /path/to/report.json

_submit is refactored into a reusable _submit_task so the retry path reuses the
same traversal-completion counting as the main download.

Verified: full test suite (114 tests) passes; --retry-failed retries only
failed entries, skips completed ones, and records unresolvable paths as
still-failed.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant