Skip to content

feat(table): match Java's DV planning validations in PlanFiles #1050

@laskoviymishka

Description

@laskoviymishka

Parent: #589.

PlanFiles builds dvIndex by iterating DV manifest entries and grouping by ReferencedDataFile(). Two gaps relative to Java's DeleteFileIndex:

The first is a missing sequence-number guard. The spec says a DV applies to a data file only when the data file's data_sequence_number is less than or equal to the DV's data_sequence_number. Java's DeleteFileIndex.findDV enforces this with a ValidationException. The Go dvIndex build skips the check, so a stale DV from a prior epoch paired with a newer data file is silently applied — a silent over-deletion path that only triggers on malformed manifests, but the guard is cheap.

The second is that multiple DVs per data file are silently unioned. Java's Builder.add errors with ValidationException("Can't index multiple DVs for %s"). The Go dvIndex is map[string][]iceberg.DataFile, so any number of entries are accepted per path. The scanner's readAllDeletionVectors defensively rejects this at read time as of #996, but the canonical home for the check is planning — that way callers inspecting FileScanTask.DeletionVectorFiles directly see only validated state.

One PR naturally: both checks live in the same dvIndex construction loop. The loop was last edited by #996, so this work text-conflicts with that PR's pos-delete suppression block — rebase if both are in flight together.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions