Skip to content

Commit a79cbdf

Browse files
zhuqi-lucasclaude
andcommitted
docs: add non-overlapping exception to partition ordering diagram
The existing doc comment explains that multi-file partitions break output ordering. Add a note about the exception: when sort pushdown verifies files are non-overlapping via statistics, output_ordering is preserved and SortExec can be eliminated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent be5e40e commit a79cbdf

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

datafusion/datasource/src/file_scan_config.rs

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1819,6 +1819,22 @@ fn validate_orderings(
18191819
///
18201820
/// DataSourceExec
18211821
/// ```
1822+
///
1823+
/// **Exception**: When files within a partition are **non-overlapping** (verified
1824+
/// via min/max statistics) and each file is internally sorted, the combined
1825+
/// output is still correctly sorted. Sort pushdown
1826+
/// ([`FileScanConfig::try_pushdown_sort`]) detects this case and preserves
1827+
/// `output_ordering`, allowing `SortExec` to be eliminated entirely.
1828+
///
1829+
/// ```text
1830+
/// Partition 1 (files sorted by stats, non-overlapping):
1831+
/// ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
1832+
/// │ 1.parquet │ │ 2.parquet │ │ 3.parquet │
1833+
/// │ A: [1..100] │ │ A: [101..200] │ │ A: [201..300] │
1834+
/// │ Sort: A, B, C │ │ Sort: A, B, C │ │ Sort: A, B, C │
1835+
/// └──────────────────┘ └──────────────────┘ └──────────────────┘
1836+
/// max(1) <= min(2) ✓ max(2) <= min(3) ✓ → output_ordering preserved
1837+
/// ```
18221838
fn get_projected_output_ordering(
18231839
base_config: &FileScanConfig,
18241840
projected_schema: &SchemaRef,

0 commit comments

Comments
 (0)