Skip to content

refactor: harden FileCleanupStrategy with retry and parallel deletes#649

Open
shangxinli wants to merge 12 commits into
apache:mainfrom
shangxinli:pr-b-cleanup-hardening
Open

refactor: harden FileCleanupStrategy with retry and parallel deletes#649
shangxinli wants to merge 12 commits into
apache:mainfrom
shangxinli:pr-b-cleanup-hardening

Conversation

@shangxinli
Copy link
Copy Markdown
Contributor

refactor: harden FileCleanupStrategy with retry and parallel deletes. Brings the C++ cleanup path closer to Java's behavior:

  • DeleteFile retries up to 3 times on FileIO-backed errors with linear backoff, stopping immediately on kNotFound (mirrors Java's stopRetryOn(NotFoundException) + retry(3)). Custom delete callbacks remain single-shot since their retry policy is opaque to us.
  • DeleteFiles now parallelizes per-file deletes through a small std::async-based RunInParallel wrapper, capped at 8 workers to avoid swamping FileIO. Replaces the existing serial loop and resolves the TODO(shangxinli) marker on bulk deletion (a true bulk FileIO API remains TODO).
  • DeleteWith() doc note clarifies that the supplied callback may be invoked concurrently and must be thread-safe.

No behavioral test changes are needed -- the existing cleanup tests exercise the parallel path automatically when they delete more than one file, and continue to pass.

shangxinli and others added 11 commits April 22, 2026 16:43
Implement the file cleanup logic that was missing from the expire
snapshots feature (the original PR noted "TODO: File recycling will
be added in a followup PR").

Port the "reachable file cleanup" strategy from Java's
ReachableFileCleanup, following the same phased approach:

Phase 1: Collect manifest paths from expired and retained snapshots
Phase 2: Prune manifests still referenced by retained snapshots
Phase 3: Find data files only in manifests being deleted, subtract
         files still reachable from retained manifests (kAll only)
Phase 4: Delete orphaned manifest files
Phase 5: Delete manifest lists from expired snapshots
Phase 6: Delete expired statistics and partition statistics files

Key design decisions matching Java parity:
- Best-effort deletion: suppress errors on individual file deletions
  to avoid blocking metadata updates (Java suppressFailureWhenFinished)
- Branch/tag awareness: retained snapshot set includes all snapshots
  reachable from any ref (branch or tag), preventing false-positive
  deletions of files still referenced by non-main branches
- Data file safety: only delete data files from manifests that are
  themselves being deleted, then subtract any files still reachable
  from retained manifests (two-pass approach from ReachableFileCleanup)
- Respect CleanupLevel: kNone skips all, kMetadataOnly skips data
  files, kAll cleans everything
- FileIO abstraction: uses FileIO::DeleteFile for filesystem
  compatibility (S3, HDFS, local), with custom DeleteWith() override
- Statistics cleanup via snapshot ID membership in retained set

TODOs for follow-up:
- Multi-threaded file deletion (Java uses Tasks.foreach with executor)
- IncrementalFileCleanup strategy for linear ancestry optimization
  (Java uses this when no branches/cherry-picks involved)
- Fix O(M*S) I/O: Pre-cache ManifestFile objects in manifest_cache_ during
  Phase 1 (ReadManifestsForSnapshot), eliminating repeated manifest list
  reads in FindDataFilesToDelete.

- Fix storage leak: Use LiveEntries() instead of Entries() to match Java's
  ManifestFiles.readPaths behavior (only ADDED/EXISTING entries).

- Fix data loss risk: When reading a retained manifest fails, abort data
  file deletion entirely instead of silently continuing. Java retries and
  throws on failure here.

- Fix statistics file deletion: Use path-based set difference instead of
  snapshot_id-only check, preventing erroneous deletion of statistics files
  shared across snapshots.

- Remove goto anti-pattern: Extract ManifestFile lookup into
  MakeManifestReader() helper and use manifest_cache_ for direct lookup.

- Improve API: FindDataFilesToDelete now returns
  Result<unordered_set<string>> instead of using a mutable out-parameter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirror Java's file cleanup class hierarchy for expire snapshots:
- Add abstract FileCleanupStrategy with shared DeleteFile() and
  ExpiredStatisticsFilePaths() utilities (path-based set difference)
- Add ReachableFileCleanup concrete class owning manifest_cache_,
  ReadManifestsForSnapshot(), and FindDataFilesToDelete()
- Move MakeManifestReader() to a free function in anonymous namespace
  using ICEBERG_ASSIGN_OR_RAISE
- Remove cleanup-specific private methods and manifest_cache_ from
  ExpireSnapshots class; Finalize() now delegates to the strategy
- Clear apply_result_ after consumption in Finalize()
- Rename DeleteFilePath to DeleteFile; use std::ignore for FileIO return
- Remove manifest_list.h and manifest_reader.h from the header
… stats file deletion

P0: ReadManifestsForSnapshot now returns bool. If any retained snapshot's
manifest list cannot be read, phases 2-4 (manifest and data file deletion)
are skipped entirely. An incomplete retained set makes it unsafe to compute
manifests_to_delete, as manifests still referenced by unreadable snapshots
would be wrongly included. This matches Java's throwFailureWhenFinished
behavior in ReachableFileCleanup. Manifest list deletion (phase 5) is
unaffected since it is keyed on expired snapshots only.

P1: Remove physical statistics and partition-statistics file deletion (the
former phase 6). RemoveStatistics/RemovePartitionStatistics are still not
called in RemoveSnapshots (the TODO in table_metadata.cc), so the committed
metadata still references those files after they would be deleted on disk.
Deletion is deferred until the metadata-level removal is wired in, at which
point the two operations can be kept in sync.
Brings the C++ cleanup path closer to Java's behavior:

- DeleteFile retries up to 3 times on FileIO-backed errors with linear
  backoff, stopping immediately on kNotFound (mirrors Java's
  stopRetryOn(NotFoundException) + retry(3)). Custom delete callbacks
  remain single-shot since their retry policy is opaque to us.
- DeleteFiles now parallelizes per-file deletes through a small
  std::async-based RunInParallel wrapper, capped at 8 workers to avoid
  swamping FileIO. Replaces the existing serial loop and resolves the
  TODO(shangxinli) marker on bulk deletion (a true bulk FileIO API
  remains TODO).
- DeleteWith() doc note clarifies that the supplied callback may be
  invoked concurrently and must be thread-safe.

No behavioral test changes are needed -- the existing cleanup tests
exercise the parallel path automatically when they delete more than one
file, and continue to pass.
The merge of main into pr-b-cleanup-hardening auto-resolved by keeping
both the new and old versions side-by-side in three places:

  * transaction.cc: duplicate Result<const TableMetadata*>
    finalize_result definition (compile error: redefinition).
  * expire_snapshots.cc: duplicate #include <string>.
  * expire_snapshots.cc: an old DeleteFile/DeleteFiles pair (the
    pre-hardening serial loop with the original "TODO add retry"
    comment) was kept inside the new DeleteFiles body, leaving an
    unmatched brace and two redefined methods.

Drop the duplicates so the file compiles and matches the intended
post-merge state. clang-format remains clean and all ExpireSnapshots
tests pass.
The cleanup tests collected deleted paths via deleted_files.push_back
inside the DeleteWith callback. With the new parallel DeleteFiles,
that callback can now run concurrently from multiple worker threads,
racing on a non-thread-safe std::vector.

The race surfaced as a flaky failure in CI's ASAN/UBSAN job for
ExpireSnapshotsCleanupTest.MetadataOnlySkipsDataDeletion: the test
expected three deleted entries, but one was lost to the race.
Local serial-run timing happened to mask it.

Wrap each test's collector with a per-test std::mutex so the
push_back is serialized. Functionally equivalent for sequential
deletes; correct under parallel deletes. 20-iteration loop is now
green locally.
Comment thread src/iceberg/update/expire_snapshots.cc Outdated
if (begin >= items.size()) break;
std::size_t end = std::min(begin + per, items.size());
auto slice = items.subspan(begin, end - begin);
futures.emplace_back(std::async(std::launch::async, [slice, &work]() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good idea to use std::async here. Instead, we may want to introduce a thread pool to fully control the resource.

shangxinli added a commit to shangxinli/iceberg-cpp that referenced this pull request May 13, 2026
…el deletes

Addresses review feedback on apache#649: std::async spins up a fresh std::thread per
DeleteFiles call (and per worker), which thrashes thread creation when
CleanFiles invokes DeleteFiles 3-4 times in a row. Replace with a per-strategy
ThreadPool that owns its workers for the lifetime of the strategy.

- New util/thread_pool_internal.h / .cc: minimal worker pool with eager thread
  start, mutex+cv task queue, Submit returning a future<void>, and a RunAndWait
  fan-out helper for span-of-items workloads. Drained on destruction.
- FileCleanupStrategy now holds a ThreadPool sized once at construction
  (min(8, hardware_concurrency)). DeleteFiles short-circuits empty/single-item
  batches and otherwise delegates to pool_.RunAndWait. The pool member is
  declared last so workers are joined before file_io_ and delete_func_ are
  destroyed.
- Drops the RunInParallel free template, the per-call WorkerCount, and the
  <future>/<span> includes in expire_snapshots.cc.
- Adds util_test::ThreadPoolTest covering ctor validation, single submit,
  fan-out, empty no-op, observed concurrency, exception isolation, and dtor
  drain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… deletes

Addresses review feedback on this PR (wgtmac): std::async spins up a fresh
std::thread per DeleteFiles call (and per worker), which thrashes thread
creation when CleanFiles invokes DeleteFiles 3-4 times in a row. Replace
with a per-strategy ThreadPool that owns its workers for the lifetime of
the strategy.

- New util/thread_pool_internal.h / .cc: minimal worker pool with eager
  thread start, a mutex+cv task queue, Submit returning a future<void>,
  and a RunAndWait fan-out helper for span-of-items workloads. Drained
  on destruction. The class carries ICEBERG_EXPORT so the test binary
  can link against libiceberg.so under the project's hidden default
  visibility.
- FileCleanupStrategy now holds a ThreadPool sized once at construction
  (min(8, hardware_concurrency)). DeleteFiles short-circuits empty and
  single-item batches and otherwise delegates to pool_.RunAndWait. The
  pool member is declared last so workers are joined before file_io_
  and delete_func_ are destroyed.
- Drops the RunInParallel free template, the per-call WorkerCount, and
  the <future>/<span> includes in expire_snapshots.cc.
- Adds util_test::ThreadPoolTest covering ctor validation, single
  submit, fan-out, empty no-op, observed concurrency, exception
  isolation, and dtor drain.
- Registers the new sources in both CMake and Meson.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shangxinli shangxinli force-pushed the pr-b-cleanup-hardening branch from 6cf99b7 to ff3a254 Compare May 13, 2026 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants