Skip to content

feat(io): add bulk delete API to FileIO.#659

Open
slfan1989 wants to merge 3 commits into
apache:mainfrom
slfan1989:add-fileio-bulk-delete-api
Open

feat(io): add bulk delete API to FileIO.#659
slfan1989 wants to merge 3 commits into
apache:mainfrom
slfan1989:add-fileio-bulk-delete-api

Conversation

@slfan1989
Copy link
Copy Markdown
Contributor

@slfan1989 slfan1989 commented May 18, 2026

Summary

Add a new FileIO::DeleteFiles(...) API as a bulk deletion entry point.

The default implementation deletes files sequentially by calling the existing
DeleteFile(...) method and returns the first deletion error encountered.

This PR only adds the API and backward-compatible fallback behavior. It does not
yet update ExpireSnapshots to use DeleteFiles(...), and it does not introduce
parallel deletion.

Fixed: #658

Motivation

ExpireSnapshots and other cleanup flows may need to delete many files. A
bulk deletion API gives FileIO implementations a common extension point for
future optimized deletion strategies, such as storage-native batch deletion or
parallel fallback deletion.

@slfan1989
Copy link
Copy Markdown
Contributor Author

@wgtmac Could you please help review this PR? This is my first code contribution to the iceberg-cpp module, and I would really appreciate your guidance and help. Thank you very much!

Comment thread src/iceberg/file_io.h
///
/// \param file_location The location of the file to delete.
/// \return void if the delete succeeded, an error code if the delete failed.
virtual Status DeleteFile(const std::string& file_location) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this line?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this only to avoid the unused parameter warning in the default implementation, since file_location is not used when returning NotImplemented.

I thought it was a small cleanup while touching this area, but I agree it is not required for this PR. I can revert it if you prefer.

Comment thread src/iceberg/file_io.h
///
/// \param file_locations The locations of the files to delete.
/// \return void if all deletes succeeded, an error code if any delete failed.
virtual Status DeleteFiles(std::span<const std::string> file_locations) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if you could make some changes that utilize this new API. I saw you mentioned ExpireSnapshots.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I originally planned to split this into two PRs: first add the FileIO::DeleteFiles API, then update ExpireSnapshots to use it.

I agree that this PR would be more useful if it also adopted the new API. I’ll take a look at updating ExpireSnapshots to use DeleteFiles for grouped file cleanup.

#include "iceberg/test/matchers.h"

namespace iceberg {
namespace {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this or its counterpart. If you want to wrap RecordingFileIO in an anonymous namespace, you should shrink the scope instead.

Suggested change
namespace {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I’ll shrink the anonymous namespace scope to cover only the helper RecordingFileIO and keep the tests outside of the iceberg namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support bulk and parallel file deletion for ExpireSnapshots

2 participants