Commit 72bbff2
committed
fix(reader): auto-include equality delete key columns in projection
When a user scans with `.select(["col_a", "col_b"])` and the table has
merge-on-read equality delete files keyed on a column NOT in the select
list (e.g. `id`), the HashSet-based `apply_eq_delete_filter` fails with:
Equality delete key column 'id' (field_id=1) not found in batch
The fix augments the Parquet projection mask and RecordBatchTransformer
with any equality delete key field IDs that are missing from the user's
projection. After applying equality deletes, the extra columns are
stripped from the output batches so the user sees only their requested
columns.
This matches the behavior of Spark, Flink, and Trino, which
transparently widen the internal projection for delete evaluation.1 parent 29cf9f6 commit 72bbff2
1 file changed
Lines changed: 452 additions & 12 deletions
0 commit comments