Open
Conversation
e88d444 to
11cce80
Compare
xanderbailey
commented
Apr 16, 2026
Comment on lines
+36
to
+43
| /// Filter applied to each [`ManifestFile`] before fetching it. | ||
| /// Returns `true` to include the manifest, `false` to skip it. | ||
| pub(crate) type ManifestFileFilter = Arc<dyn Fn(&ManifestFile) -> bool + Send + Sync>; | ||
|
|
||
| /// Filter applied to each manifest entry after loading a manifest. | ||
| /// Returns `true` to include the entry, `false` to skip it. | ||
| pub(crate) type ManifestEntryFilter = Arc<dyn Fn(&ManifestEntryRef) -> bool + Send + Sync>; | ||
|
|
Contributor
Author
There was a problem hiding this comment.
I think this is a nice way to inject these filters and should extend to other future scans
xanderbailey
commented
Apr 16, 2026
| pub manifest_entry_filter: Option<ManifestEntryFilter>, | ||
| } | ||
|
|
||
| impl std::fmt::Debug for PlanContext { |
Contributor
Author
There was a problem hiding this comment.
ManifestFileFilter and ManifestEntryFilter can't be debug
CTTY
reviewed
Apr 16, 2026
| pub(crate) struct AppendSnapshotSet { | ||
| /// Snapshot IDs in the range | ||
| snapshot_ids: HashSet<i64>, | ||
| } |
Collaborator
There was a problem hiding this comment.
Contributor
Author
There was a problem hiding this comment.
ancestors_between is used here. We need to add validation and operation-type checking on top of it, we're not re-implementing the traversal. Does that make sense?
xanderbailey
commented
Apr 26, 2026
| /// | ||
| /// Use [`Table::incremental_append_scan`] or | ||
| /// [`Table::incremental_append_scan_inclusive`] to create an instance. | ||
| pub struct IncrementalAppendScanBuilder<'a> { |
Contributor
Author
There was a problem hiding this comment.
There is duplication here between the scan builders which I've left for now until maybe we have a 3rd variant and we can pick out the common patterns a bit better.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
What changes are included in this PR?
Adds incremental append scan support to iceberg-rust, allowing users to read only newly added data files between two snapshots. This is the Rust equivalent of Java's
BaseIncrementalAppendScan.Core scan module (
crates/iceberg/src/scan/)incremental.rsmodule containing:AppendSnapshotSet: walks the snapshot ancestry chain betweenfrom_snapshot_idandto_snapshot_id, validates connectivity, and collects only APPEND operation snapshot IDs (skipping overwrite/delete/compaction — matching Java'sBaseIncrementalAppendScanbehavior)IncrementalAppendScanBuilder: builder with the same configuration options asTableScanBuilder(column selection, predicates, concurrency limits, row group filtering, etc.)from_snapshotsemanticsScanConfig+build_table_scan()to eliminate duplication betweenTableScanBuilderandIncrementalAppendScanBuilderManifestFileFilterandManifestEntryFiltercallback types toPlanContext, used by incremental scans to:added_snapshot_idis outside the scan rangestatus == Addedandsnapshot_idwithin the append setTable API (
crates/iceberg/src/table.rs)Table::incremental_append_scan(from, to)— exclusive, matches Java'snewIncrementalAppendScan()Table::incremental_append_scan_inclusive(from, to)— inclusive variantStaticTableDataFusion integration (
crates/integrations/datafusion/)ScanRangeenum replacing the previousOption<i64>snapshot ID, supportingLatest,PointInTime, andIncrementalvariantsIcebergStaticTableProvidergains three new constructors:try_new_incremental(table, from, to)— exclusivetry_new_incremental_inclusive(table, from, to)— inclusivetry_new_appends_after(table, from)— exclusive, scans to current snapshotget_batch_streamupdated to dispatch to the appropriate scan builder based onScanRangeAre these changes tested?
Yes
Java comparison notes:
AppendSnapshotSet::buildmirrors Java'sBaseIncrementalAppendScan.snapshotsBetween()— only APPEND snapshots are collected, non-APPEND are silently skippedfrom_snapshotsemantics match Java's defaultIncrementalAppendScanbehavioradded_snapshot_id) and entry filtering (status == ADDED,snapshot_idin range) matchBaseIncrementalAppendScan.doPlanFiles()in Java