Skip to content

Commit e71bd56

Browse files
authored
fix: Improve consistency of per-column stats on FilterExec output (#22718)
## Which issue does this PR close? - Closes #22716 ## Rationale for this change #21081 capped the NDV at the row count when computing statistics for several operators. This PR extends that work and ensures that per-column statistics for filter operators are consistent with the estimated output row count. In particular: * Null count is also capped at the row count * Byte size is scaled down by the estimated selectivity We also extend the analysis to consider null-rejecting predicates; for example, the clause `a = 10` as a top-level conjunct implies that the null-count of the surviving rows is exactly 0. ## What changes are included in this PR? * Ensure per-column statistics (null count, byte size) are consistent with filtered row count * Check for null-rejecting predicates to estimate a more accurate null count of 0 * Update SLT expected plans * Add unit tests for new behavior * Various refactoring and comment improvements ## Are these changes tested? Yes; new tests added. ## Are there any user-facing changes? No.
1 parent e2db766 commit e71bd56

2 files changed

Lines changed: 423 additions & 81 deletions

File tree

0 commit comments

Comments
 (0)