Skip to content

Commit 600500d

Browse files
committed
parquet: seek ahead skipped push decoder row groups
Teach the row-group frontier to seek ahead over queued row groups that can be proven unreachable before instantiating the row-group builder. Skip queued row groups when their selection slice is empty, when offset/limit leaves no rows to read, or when the remaining limit is already exhausted. Keep predicate-bearing row groups conservative and stop at the first row group that may still need data. Add a push decoder regression covering `try_next_reader` with offset/limit so the frontier path is exercised directly. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
1 parent 7b564c3 commit 600500d

1 file changed

Lines changed: 22 additions & 0 deletions

File tree

  • parquet/src/arrow/push_decoder

parquet/src/arrow/push_decoder/mod.rs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1121,6 +1121,28 @@ mod test {
11211121
expect_finished(decoder.try_decode());
11221122
}
11231123

1124+
#[test]
1125+
fn test_decoder_try_next_reader_offset_limit() {
1126+
let mut decoder = ParquetPushDecoderBuilder::try_new_decoder(test_file_parquet_metadata())
1127+
.unwrap()
1128+
.with_offset(225)
1129+
.with_limit(20)
1130+
.build()
1131+
.unwrap();
1132+
1133+
let ranges = expect_needs_data(decoder.try_next_reader());
1134+
push_ranges_to_decoder(&mut decoder, ranges);
1135+
1136+
let reader = expect_data(decoder.try_next_reader());
1137+
let batches = reader
1138+
.map(|batch| batch.expect("expected decoded batch"))
1139+
.collect::<Vec<_>>();
1140+
let output = concat_batches(&TEST_BATCH.schema(), &batches).unwrap();
1141+
assert_eq!(output, TEST_BATCH.slice(225, 20));
1142+
1143+
expect_finished(decoder.try_next_reader());
1144+
}
1145+
11241146
#[test]
11251147
fn test_decoder_row_group_selection() {
11261148
// take only the second row group

0 commit comments

Comments
 (0)