You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(scan): use current_schema for default-snapshot column validation
A default table scan (no explicit snapshot_id) currently validates
caller-supplied column names against the snapshot's schema_id, not the
table's current schema. After an UpdateSchemaAction commit changes the
current schema (rename/add/delete column), pre-existing snapshots still
point at the old schema_id, so the validation loop in
TableScanBuilder::build rejects names that are valid against the
post-evolution schema with:
DataInvalid => Column <new_name> not found in table. Schema: <old>
The downstream Parquet projection
(arrow/reader/projection.rs::get_arrow_projection_mask_with_field_ids)
already maps field IDs to on-disk column names via PARQUET:field_id
metadata, so resolving names against the current schema is safe
end-to-end — field IDs are stable across schema versions, and the
file's original column names live in the parquet metadata until the
file is rewritten.
Fix: branch on whether the caller asked for a specific snapshot.
Explicit snapshot_id (time-travel) keeps the snapshot-time vocabulary;
default scan uses the table's current schema.
Tests: three regression tests on a fixture with current-schema-id=1
(id, value) and a sole snapshot at schema-id=0 (id, tmp):
* test_default_scan_uses_current_schema_after_evolution — select(['id','value'])
succeeds in the default scan
* test_default_scan_rejects_old_name_after_rename — select(['id','tmp'])
fails with DataInvalid in the default scan
* test_snapshot_id_scan_uses_snapshot_schema — snapshot_id(1).select(['id','tmp'])
succeeds (time-travel), and snapshot_id(1).select(['id','value']) fails
All 1299 iceberg lib tests pass (37 in scan::tests = 34 existing + 3
new). Clippy + rustfmt clean.
0 commit comments