You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding Use of arrow's has_true() / has_false() (apache#21806)
## Which issue does this PR close?
Closesapache#21784
## Rationale for this change
Apache Arrow added `BooleanArray::has_true()` and `has_false()` so
callers can answer “any true/false?” without a full bit count. That can
short-circuit and avoid unnecessary work compared to patterns like
`true_count() == 0` or `true_count() > 0`.
This PR applies those APIs across DataFusion where the logic is purely
existential (or equivalent via null-safe “all true” / “no true” checks),
matching the audit suggested in the issue.
## What changes are included in this PR?
- Replace hot-path checks that only needed existence or emptiness with
`has_true()` / `has_false()` (and `null_count()` where needed),
including:
- Nested/array helpers (`array_has`, list replace), Spark
`array_contains` null-semantics fast path
- Physical expressions: `evaluate_selection`, binary AND/OR
short-circuit, CASE/IN list loops
- `scatter` fast paths
- Top-K filter handling, sort-merge join filter, nested-loop join bitmap
checks
- Parquet column stats (`metadata.rs`, `has_any_exact_match`)
- Keep `true_count()` / `false_count()` where an actual count is
required (row counts, metrics, selectivity, `to_array(n)`, etc.)
- Import `arrow::array::Array` where `null_count()` is used on
`BooleanArray` in trait-heavy paths
## Are these changes tested?
Existing tests cover this behavior; the edits are semantics-preserving
refactors (same conditions, cheaper primitives). No new tests were
added.
## Are there any user-facing changes?
No. Behavior should be unchanged; this is an internal
performance/clarity improvement.
---------
Co-authored-by: Raushan Prabhakar <ros@Raushans-MacBook-Air.local>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
0 commit comments