Parent: #589.
PlanFiles builds dvIndex by iterating DV manifest entries and grouping by ReferencedDataFile(). Two gaps relative to Java's DeleteFileIndex:
The first is a missing sequence-number guard. The spec says a DV applies to a data file only when the data file's data_sequence_number is less than or equal to the DV's data_sequence_number. Java's DeleteFileIndex.findDV enforces this with a ValidationException. The Go dvIndex build skips the check, so a stale DV from a prior epoch paired with a newer data file is silently applied — a silent over-deletion path that only triggers on malformed manifests, but the guard is cheap.
The second is that multiple DVs per data file are silently unioned. Java's Builder.add errors with ValidationException("Can't index multiple DVs for %s"). The Go dvIndex is map[string][]iceberg.DataFile, so any number of entries are accepted per path. The scanner's readAllDeletionVectors defensively rejects this at read time as of #996, but the canonical home for the check is planning — that way callers inspecting FileScanTask.DeletionVectorFiles directly see only validated state.
One PR naturally: both checks live in the same dvIndex construction loop. The loop was last edited by #996, so this work text-conflicts with that PR's pos-delete suppression block — rebase if both are in flight together.
Parent: #589.
PlanFilesbuildsdvIndexby iterating DV manifest entries and grouping byReferencedDataFile(). Two gaps relative to Java'sDeleteFileIndex:The first is a missing sequence-number guard. The spec says a DV applies to a data file only when the data file's
data_sequence_numberis less than or equal to the DV'sdata_sequence_number. Java'sDeleteFileIndex.findDVenforces this with aValidationException. The GodvIndexbuild skips the check, so a stale DV from a prior epoch paired with a newer data file is silently applied — a silent over-deletion path that only triggers on malformed manifests, but the guard is cheap.The second is that multiple DVs per data file are silently unioned. Java's
Builder.adderrors withValidationException("Can't index multiple DVs for %s"). The GodvIndexismap[string][]iceberg.DataFile, so any number of entries are accepted per path. The scanner'sreadAllDeletionVectorsdefensively rejects this at read time as of #996, but the canonical home for the check is planning — that way callers inspectingFileScanTask.DeletionVectorFilesdirectly see only validated state.One PR naturally: both checks live in the same
dvIndexconstruction loop. The loop was last edited by #996, so this work text-conflicts with that PR's pos-delete suppression block — rebase if both are in flight together.