Skip to content

Core: Fix SnapshotUtil.schemaFor to look up schema by id#16892

Draft
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/snapshotutil-schemafor-metadata-by-id
Draft

Core: Fix SnapshotUtil.schemaFor to look up schema by id#16892
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/snapshotutil-schemafor-metadata-by-id

Conversation

@thswlsqls

Copy link
Copy Markdown
Contributor

Summary

  • SnapshotUtil.schemaFor(TableMetadata, ref) read the snapshot schema with metadata.schemas().get(snapshot.schemaId()).
  • metadata.schemas() returns a List<Schema>, so the schema id was used as a list index.
  • When schema ids no longer match list positions, this returns the wrong schema, or throws IndexOutOfBoundsException when the id is past the list size.
  • Schema ids and list positions diverge after expireSnapshots().cleanExpiredMetadata(true) removes an older unreferenced schema.
  • The fix looks the schema up by id via metadata.schemasById(), mirroring the sibling schemaFor(Table, long), including its checkState guard for a missing id.
  • This is a latent fix. The only in-tree production caller of this overload, MergingSnapshotProducer.apply, passes a branch ref, and branch refs return the table schema earlier in the method. The buggy line is reachable only for tag refs, so this removes an internal inconsistency rather than a live failure.

Testing done

  • Added TestSnapshotUtil#schemaForTagWithMetadata (wrong-schema case) and schemaForTagWithMetadataSchemaIdBeyondListSize (IndexOutOfBounds case).
  • ./gradlew :iceberg-core:test --tests "org.apache.iceberg.util.TestSnapshotUtil" — 14 tests passed.
  • ./gradlew :iceberg-core:spotlessCheck :iceberg-core:checkstyleMain :iceberg-core:checkstyleTest :iceberg-core:revapi — passed (revapi backward compatible).
  • Regression demo: with the fix reverted, both new tests fail (wrong schema, and ArrayIndexOutOfBoundsException); restored, both pass.

schemaFor(TableMetadata, ref) read the snapshot schema with
metadata.schemas().get(snapshot.schemaId()). schemas() returns a
List<Schema>, so the schema id was used as a list index. When schema
ids no longer match list positions (for example after expireSnapshots
with cleanExpiredMetadata removes an older schema), this returns the
wrong schema, or throws IndexOutOfBoundsException when the id is past
the list size.

Look the schema up by id via metadata.schemasById(), mirroring the
sibling schemaFor(Table, long) in the same class, and guard a missing
id with the same checkState message.

Generated-by: Claude Code
@github-actions github-actions Bot added the core label Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant