You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds a runtime filter-placement layer on top of the row-group-morsel
split introduced by the parent commit. Each Parquet predicate is
assigned a `FilterId` and flows through a state machine
(`SelectivityTracker`) that moves it between three placements:
- `RowFilter` — evaluated inside the arrow-rs decoder as an
`ArrowPredicate`, enabling late-materialization savings when the
filter columns are a small fraction of the projection.
- `PostScan` — evaluated against the decoded wide batch before the
projector strips it; used when late materialization has little to
save or when the filter is CPU-heavy.
- `Dropped` — optional filters (hash-join dynamic filters wrapped in
`OptionalFilterPhysicalExpr`) are skipped mid-stream when their CI
upper bound on bytes-saved-per-second falls below a minimum.
Initial placement uses a cheap byte-ratio heuristic
(`filter_compressed_bytes / projection_compressed_bytes`); subsequent
placements refine using Welford online stats reported from both the
row-filter path (`DatafusionArrowPredicate::evaluate`) and the
post-scan path (`apply_post_scan_filters_with_stats`). Placement is
re-evaluated per morsel, so stats from the prior morsel's scan feed
into the next morsel's decision.
Config knobs on `TableParquetOptions.execution.parquet`:
- `filter_pushdown_min_bytes_per_sec` (default 100 MB/s)
- `filter_collecting_byte_ratio_threshold` (default 0.20)
- `filter_confidence_z` (default 2.0 ≈ 97.5% one-sided CI)
The `reorder_filters` option is removed; the adaptive tracker
subsumes its role.
Notable trade-offs documented in PR discussion:
- The adaptive layer adds ~10 % aggregate ClickBench overhead vs the
pure morsel-split base (PR #10). Most of it lives in
`ParquetLazyMorsel::build_stream_now` under parallel load; single-
thread shows no regression. Candidate fix is splitting adaptive
state out of `LazyMorselShared` so non-adaptive queries get the
same `Arc` allocation shape as PR #10.
- The `OptionalFilterPhysicalExpr` wrapper changes plan display
output (`DynamicFilter [...]` → `Optional(DynamicFilter [...])`);
several sqllogictest expected outputs and snapshot tests were
updated accordingly.
- A selectivity-tracker microbench was added under
`benches/selectivity_tracker.rs` so future iterations on the
tracker can be measured independently of full ClickBench.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments