Commit e71bd56
authored
fix: Improve consistency of per-column stats on
## Which issue does this PR close?
- Closes #22716
## Rationale for this change
#21081 capped the NDV at the row count when computing statistics for
several operators. This PR extends that work and ensures that per-column
statistics for filter operators are consistent with the estimated output
row count. In particular:
* Null count is also capped at the row count
* Byte size is scaled down by the estimated selectivity
We also extend the analysis to consider null-rejecting predicates; for
example, the clause `a = 10` as a top-level conjunct implies that the
null-count of the surviving rows is exactly 0.
## What changes are included in this PR?
* Ensure per-column statistics (null count, byte size) are consistent
with filtered row count
* Check for null-rejecting predicates to estimate a more accurate null
count of 0
* Update SLT expected plans
* Add unit tests for new behavior
* Various refactoring and comment improvements
## Are these changes tested?
Yes; new tests added.
## Are there any user-facing changes?
No.FilterExec output (#22718)1 parent e2db766 commit e71bd56
2 files changed
Lines changed: 423 additions & 81 deletions
0 commit comments