Skip to content

Commit 381faa1

Browse files
committed
Refactor: per-RG independent reverse scan (modeled after Atlas ReverseParquetSource)
Replace ReversedRowGroupStream's rg_row_counts boundary detection with per-row-group independent reading. Each RG gets its own ParquetRecordBatchStreamBuilder with RowFilter applied independently, then batches are reversed per-RG. This fixes the correctness issue where RowFilter reduces actual rows below rg_row_counts predictions. Memory: O(largest RG), same as Atlas's ReverseParquetSource. Added SLT test for exact reverse + pushdown_filters + predicate.
1 parent 67e72af commit 381faa1

1 file changed

Lines changed: 285 additions & 167 deletions

File tree

0 commit comments

Comments
 (0)