You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(parquet-datasource): always accept pushable filters, run them post-scan
`ParquetSource::try_pushdown_filters` always returns the per-filter
`Yes` / `No` discriminant from `can_expr_be_pushed_down_with_schemas`,
regardless of the `pushdown_filters` config. The parent `FilterExec` is
always removed for pushable filters, and the scan owns the predicate.
The opener routes the predicate to the post-scan filter when
`pushdown_filters = false`, in addition to the rejected-conjunct path
that already exists for `pushdown_filters = true`:
- `pushdown_filters = true` → row-filterable conjuncts via the parquet
`RowFilter`; any rejected conjuncts via the post-scan filter (the
correctness fix from the previous commit).
- `pushdown_filters = false` → the whole predicate runs as a post-scan
filter on decoded batches (behaviorally identical to a `FilterExec`).
The `pushdown_filters` config keeps its meaning ("build a parquet
`RowFilter`"); doc comments updated.
Plan / test consequences (all results unchanged, plan shape and metrics
change):
- The `FilterExec` no longer appears above a `DataSourceExec` for
pushable parquet filters. The predicate appears as `predicate=…` on
the `DataSourceExec`. Parquet `.slt` files are regenerated to reflect
this (clickbench, push_down_filter_parquet, projection_pushdown,
parquet*, etc.). Spurious whitespace churn from `--complete` was
reverted.
- Opener / integration tests that asserted "row group not pruned ⇒ all
rows returned" (e.g. `a = 1` over `[1, 2, 3]` returning 3 rows) are
updated to reflect the matching-row count, since the scan now applies
the predicate row-level via the post-scan filter.
- `FilterExec: id@0 = 1` assertions in DataFrame / view tests become
`predicate=id@0 = 1` on the `DataSourceExec`.
- Insta inline snapshots in `parquet.rs` and `explain_analyze.rs` are
re-accepted (`output_rows=8` → `output_rows=5` plus
`post_scan_rows_pruned=3`, multi-line plans collapse where the
`FilterExec`/`RepartitionExec` chain is gone).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments