Commit 94c58d0
fix(sort-pushdown): restore SortExec elimination after stats-based file reorder (apache#22493)
## Which issue does this PR close?
- Closes apache#22494
Regression introduced by apache#21956 — single-partition multi-file scans with
`WITH ORDER` (or inferred parquet `sorting_columns` metadata) plus files
listed out of order on disk no longer have their `SortExec` eliminated,
even when the stats-based file reorder produces a non-overlapping
layout.
This was the central case PR apache#21182 was designed to fix.
## Rationale for this change
In the Phase 2 sort pushdown design (apache#21182), when a file source
returned
`Unsupported` because `validated_output_ordering()` stripped its
declared
ordering, the outer `FileScanConfig::try_pushdown_sort` invoked a
fallback
(`try_sort_file_groups_by_statistics`) that reordered files by min/max
stats, re-validated the ordering against the new file groups, and could
upgrade the result back to `Exact` — dropping the outer `SortExec`.
PR apache#21956 added the `column_in_file_schema` signal so that plain-column
sort requests would return `Inexact` (with `sort_order_for_reorder` set)
instead of `Unsupported`. That enabled runtime per-RG reorder, but it
also pulled the typical wrong-file-order case out of the
`Unsupported`-fallback re-validation path. The Inexact branch of
`rebuild_with_source` always strips `output_ordering`, so even when the
post-sort file groups became non-overlapping and the declared ordering
would re-validate, the outer wrapper returned `Inexact` and PushdownSort
kept the `SortExec`.
The SLT comment on test 6.1 (`# … → SortExec eliminated`) still
describes
the pre-apache#21956 behaviour, but the recorded expectation was updated to
match the post-apache#21956 plan where SortExec stayed — a silent regression.
## What changes are included in this PR?
### `rebuild_with_source`
When `is_exact=false` but the stats-based file sort produced
`all_non_overlapping=true`, re-validate the declared `output_ordering`
against the new file groups. If `ordering_satisfy` passes, preserve
`output_ordering` instead of stripping it.
### Outer `FileScanConfig::try_pushdown_sort`
On the `Inexact` branch, inspect `config.output_ordering` after rebuild
and return `Exact` if it survived. This restores the
`Unsupported → upgrade Exact` semantics, but on the `Inexact` branch the
plain-column path now follows.
### Safety
- When the source has no declared ordering (no `WITH ORDER` and no
parquet `sorting_columns` metadata), `self.output_ordering` is empty,
the re-validate produces no orderings, `ordering_satisfy` returns
`false`, and `SortExec` stays. min/max stats alone never trigger an
upgrade.
- When `all_non_overlapping=false` (overlapping files post-sort),
`output_ordering` is stripped exactly as before.
### Tests
`sort_pushdown.slt`:
- Tests **4.1** (parquet metadata), **6.1** (`WITH ORDER ASC`),
**8.1** (`WITH ORDER DESC` with reverse), **G.1**, **G.2** (multi-
partition SPM + BufferExec) — expectations updated to match the
restored `SortExec`-eliminated plans (matches the PR apache#21182 era plans
and the SLT comments that were already in the file).
- New **Test 5b** — files written without `ORDER BY` (no
`sorting_columns` metadata) + external table without `WITH ORDER`:
asserts that even when min/max stats happen to be non-overlapping,
`SortExec` is kept.
## Are these changes tested?
Yes — `sort_pushdown.slt` covers the affected paths comprehensively.
Full `cargo test -p datafusion-sqllogictest --test sqllogictests` and
`cargo test -p datafusion-datasource --lib` / `-p
datafusion-datasource-parquet --lib` / `-p
datafusion-physical-optimizer`
pass. `cargo clippy -D warnings` clean.
## Are there any user-facing changes?
Yes — for the in-scope scenarios above, `EXPLAIN` plans now show the
`SortExec` removed and `output_ordering` set on `DataSourceExec`. Query
results are unaffected.
---------
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 9f4b78a commit 94c58d0
3 files changed
Lines changed: 343 additions & 49 deletions
File tree
- datafusion
- datasource/src/file_scan_config
- sqllogictest/test_files
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
937 | 937 | | |
938 | 938 | | |
939 | 939 | | |
940 | | - | |
941 | | - | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
942 | 947 | | |
943 | 948 | | |
944 | | - | |
| 949 | + | |
945 | 950 | | |
946 | 951 | | |
947 | | - | |
| 952 | + | |
948 | 953 | | |
949 | 954 | | |
950 | 955 | | |
| |||
973 | 978 | | |
974 | 979 | | |
975 | 980 | | |
976 | | - | |
977 | | - | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
978 | 1017 | | |
979 | 1018 | | |
980 | 1019 | | |
| |||
Lines changed: 69 additions & 24 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
161 | 181 | | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
166 | 211 | | |
167 | 212 | | |
168 | 213 | | |
| |||
0 commit comments