Commit f174c94
fix(parquet-datasource): never drop conjuncts the RowFilter cannot place
`build_row_filter` (and its `RowFilterGenerator` wrapper) silently dropped
conjuncts that `FilterCandidateBuilder::build` rejected (`Ok(None)` was
`.flatten()`-ed away) and swallowed whole-build errors. By the time
`build_row_filter` runs, `ParquetSource::try_pushdown_filters` has already
accepted the filter and the parent `FilterExec` has been removed, so those
dropped conjuncts were never applied anywhere — wrong results.
Most reproducible trigger: the per-file expr adapter rewrites a predicate
that was pushable at *table schema* time into something the
`PushdownChecker` rejects at *physical file schema* time (schema
evolution / coercion / whole-struct references introduced by the rewrite).
Surface the rejected conjuncts instead of dropping them:
- `build_row_filter` now returns
`Result<(Option<RowFilter>, Vec<Arc<dyn PhysicalExpr>>)>`. The second
element is the conjuncts it could not place. Bench / in-file test call
sites updated.
- `RowFilterGenerator` exposes `rejected_conjuncts()`. On a whole-file
build error it routes every conjunct through that list, so an error no
longer relaxes the predicate.
- `DecoderProjection::build` grows a `post_scan_conjuncts` parameter and
a `post_scan_filter: Option<PostScanFilter>` field. When non-empty it
widens the decoder mask (over the user projection ∪ post-scan filter
columns), rebases the conjuncts onto the stream schema, and returns a
`PostScanFilter` that the stream applies to every decoded batch with
SQL `WHERE` semantics (mirroring `FilterExec`'s `batch_filter`).
- `PushDecoderStreamState` carries the optional `PostScanFilter` and
applies it in the `DecodeResult::Data` arm, skipping empty batches.
- The decoder-local LIMIT is unsafe with a post-scan filter (the decoder
would short-circuit before the filter rejects enough rows), so the
opener routes the limit to `remaining_limit` whenever a post-scan
filter is present.
- New `post_scan_rows_pruned` / `post_scan_rows_matched` counters and
`post_scan_filter_eval_time` `Time` on `ParquetFileMetrics`, mirroring
the existing `pushdown_rows_*` / `row_pushdown_eval_time` so
`EXPLAIN ANALYZE` keeps surfacing filter cost.
Two regression tests:
- `build_row_filter_surfaces_rejected_struct_conjunct` (`row_filter.rs`)
asserts the new API contract directly — the rejected struct conjunct
is returned, not dropped.
- `rejected_struct_conjunct_runs_post_scan_not_dropped` (`opener/mod.rs`)
is end-to-end: with `pushdown_filters=true` and a `s IS NOT NULL`
predicate over a struct column where one row is NULL, `main` returns 3
rows (conjunct silently dropped, predicate relaxed); after this fix it
correctly returns 2.
The `pushdown_filters = false` path is intentionally unchanged in this
commit — `try_pushdown_filters` still leaves the `FilterExec` above the
scan in that case. Always-accepting filters and removing the `FilterExec`
unconditionally is a separate behaviour change in a follow-up commit.
`push_down_filter_parquet.slt` updated for the new `post_scan_rows_*`
metric lines on `EXPLAIN ANALYZE` output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent e1c1a45 commit f174c94
8 files changed
Lines changed: 504 additions & 117 deletions
File tree
- datafusion
- datasource-parquet
- benches
- src
- opener
- sqllogictest/test_files
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
118 | | - | |
119 | | - | |
120 | | - | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
210 | 210 | | |
211 | 211 | | |
212 | 212 | | |
213 | | - | |
214 | | - | |
215 | | - | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
62 | 70 | | |
63 | 71 | | |
64 | 72 | | |
| |||
167 | 175 | | |
168 | 176 | | |
169 | 177 | | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
170 | 190 | | |
171 | 191 | | |
172 | 192 | | |
| |||
202 | 222 | | |
203 | 223 | | |
204 | 224 | | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
205 | 228 | | |
206 | 229 | | |
207 | 230 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1149 | 1149 | | |
1150 | 1150 | | |
1151 | 1151 | | |
1152 | | - | |
1153 | | - | |
1154 | | - | |
1155 | | - | |
1156 | | - | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
1157 | 1174 | | |
1158 | 1175 | | |
| 1176 | + | |
1159 | 1177 | | |
1160 | 1178 | | |
1161 | 1179 | | |
| 1180 | + | |
1162 | 1181 | | |
1163 | 1182 | | |
1164 | 1183 | | |
1165 | | - | |
1166 | | - | |
1167 | | - | |
1168 | | - | |
1169 | | - | |
1170 | | - | |
1171 | | - | |
1172 | | - | |
1173 | | - | |
1174 | | - | |
1175 | | - | |
1176 | | - | |
1177 | 1184 | | |
1178 | 1185 | | |
1179 | 1186 | | |
| |||
1183 | 1190 | | |
1184 | 1191 | | |
1185 | 1192 | | |
1186 | | - | |
1187 | | - | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
1188 | 1201 | | |
1189 | 1202 | | |
1190 | 1203 | | |
| |||
1229 | 1242 | | |
1230 | 1243 | | |
1231 | 1244 | | |
| 1245 | + | |
1232 | 1246 | | |
1233 | 1247 | | |
1234 | 1248 | | |
| |||
1245 | 1259 | | |
1246 | 1260 | | |
1247 | 1261 | | |
| 1262 | + | |
1248 | 1263 | | |
1249 | 1264 | | |
1250 | 1265 | | |
| |||
2673 | 2688 | | |
2674 | 2689 | | |
2675 | 2690 | | |
| 2691 | + | |
| 2692 | + | |
| 2693 | + | |
| 2694 | + | |
| 2695 | + | |
| 2696 | + | |
| 2697 | + | |
| 2698 | + | |
| 2699 | + | |
| 2700 | + | |
| 2701 | + | |
| 2702 | + | |
| 2703 | + | |
| 2704 | + | |
| 2705 | + | |
| 2706 | + | |
| 2707 | + | |
| 2708 | + | |
| 2709 | + | |
| 2710 | + | |
| 2711 | + | |
| 2712 | + | |
| 2713 | + | |
| 2714 | + | |
| 2715 | + | |
| 2716 | + | |
| 2717 | + | |
| 2718 | + | |
| 2719 | + | |
| 2720 | + | |
| 2721 | + | |
| 2722 | + | |
| 2723 | + | |
| 2724 | + | |
| 2725 | + | |
| 2726 | + | |
| 2727 | + | |
| 2728 | + | |
| 2729 | + | |
| 2730 | + | |
| 2731 | + | |
| 2732 | + | |
| 2733 | + | |
| 2734 | + | |
| 2735 | + | |
| 2736 | + | |
| 2737 | + | |
| 2738 | + | |
| 2739 | + | |
| 2740 | + | |
| 2741 | + | |
| 2742 | + | |
| 2743 | + | |
| 2744 | + | |
| 2745 | + | |
| 2746 | + | |
| 2747 | + | |
| 2748 | + | |
| 2749 | + | |
| 2750 | + | |
| 2751 | + | |
| 2752 | + | |
| 2753 | + | |
| 2754 | + | |
| 2755 | + | |
| 2756 | + | |
| 2757 | + | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
| 2762 | + | |
| 2763 | + | |
| 2764 | + | |
| 2765 | + | |
| 2766 | + | |
| 2767 | + | |
| 2768 | + | |
| 2769 | + | |
| 2770 | + | |
| 2771 | + | |
| 2772 | + | |
| 2773 | + | |
| 2774 | + | |
| 2775 | + | |
| 2776 | + | |
| 2777 | + | |
| 2778 | + | |
| 2779 | + | |
| 2780 | + | |
| 2781 | + | |
| 2782 | + | |
2676 | 2783 | | |
0 commit comments