Skip to content

Commit 35a581c

Browse files
committed
perf: increase SortPreservingMergeExec prefetch buffer from 1 to 16
When SPM reads directly from I/O-bound sources (e.g., DataSourceExec without SortExec buffering), the merge loop stalls waiting for Parquet I/O on each poll. Increasing the prefetch buffer lets background tasks read ahead while the merge processes previous batches. Local benchmark (release, 16 partitions, sort_pushdown_sorted): Q1 full scan: Main 110ms → PR 82ms (1.3x faster) Q3 SELECT *: Main 239ms → PR 228ms (1.05x faster) Q2/Q4 LIMIT: 3-7ms (unchanged, already fast)
1 parent 911f0dd commit 35a581c

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

datafusion/physical-plan/src/sorts/sort_preserving_merge.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
361361
.map(|partition| {
362362
let stream =
363363
self.input.execute(partition, Arc::clone(&context))?;
364-
Ok(spawn_buffered(stream, 1))
364+
Ok(spawn_buffered(stream, 16))
365365
})
366366
.collect::<Result<_>>()?;
367367

0 commit comments

Comments
 (0)