Commit d8d48c0
committed
Fix row_selection bug in exact reverse scan + add comprehensive tests
Addresses Copilot review comment on PR #47: when `row_selection` is present
(e.g. from page pruning via pushdown_filters), the parquet stream emits only
the selected rows, so seeding `rg_row_counts` from `RowGroupMetaData::num_rows()`
caused ReversedRowGroupStream to mis-detect row-group boundaries and silently
mix batches from multiple row groups, producing wrong ordering.
Fix: new `compute_selected_rows_per_rg` helper walks the RowSelection in
lock-step with the row groups and computes the actual output row count per RG.
Tests added:
- 4 unit tests for compute_selected_rows_per_rg (no skip, spanning skips,
all-skipped, short selection error)
- test_exact_reverse_scan_multi_rg_produces_global_desc: verifies Inexact
yields [7,8,9,4,5,6,1,2,3] while Exact yields [9..1] (globally DESC)
- test_exact_reverse_scan_applies_limit_after_reversal: verifies limit=4
over [1..9] yields [9,8,7,6] (top of forward order, not first N pre-reverse)
- test_exact_reverse_scan_with_row_selection_across_rgs: regression test
for the row_selection bug — 3 RGs with per-RG selections yield the
expected [10,9,8,7,6,5,4,3]
- test_exact_reverse_scan_with_row_selection_and_limit: combined case1 parent fd2650a commit d8d48c0
1 file changed
Lines changed: 421 additions & 7 deletions
0 commit comments