Skip to content

Commit d8d48c0

Browse files
committed
Fix row_selection bug in exact reverse scan + add comprehensive tests
Addresses Copilot review comment on PR #47: when `row_selection` is present (e.g. from page pruning via pushdown_filters), the parquet stream emits only the selected rows, so seeding `rg_row_counts` from `RowGroupMetaData::num_rows()` caused ReversedRowGroupStream to mis-detect row-group boundaries and silently mix batches from multiple row groups, producing wrong ordering. Fix: new `compute_selected_rows_per_rg` helper walks the RowSelection in lock-step with the row groups and computes the actual output row count per RG. Tests added: - 4 unit tests for compute_selected_rows_per_rg (no skip, spanning skips, all-skipped, short selection error) - test_exact_reverse_scan_multi_rg_produces_global_desc: verifies Inexact yields [7,8,9,4,5,6,1,2,3] while Exact yields [9..1] (globally DESC) - test_exact_reverse_scan_applies_limit_after_reversal: verifies limit=4 over [1..9] yields [9,8,7,6] (top of forward order, not first N pre-reverse) - test_exact_reverse_scan_with_row_selection_across_rgs: regression test for the row_selection bug — 3 RGs with per-RG selections yield the expected [10,9,8,7,6,5,4,3] - test_exact_reverse_scan_with_row_selection_and_limit: combined case
1 parent fd2650a commit d8d48c0

1 file changed

Lines changed: 421 additions & 7 deletions

File tree

0 commit comments

Comments
 (0)