Commit 53f12f6
committed
feat: Extract NDV (distinct_count) statistics from Parquet metadata
This change adds support for reading Number of Distinct Values (NDV)
statistics from Parquet file metadata when available.
Previously, `distinct_count` in `ColumnStatistics` was always set to
`Precision::Absent`. Now it is populated from parquet row group
column statistics when present:
- Single row group with NDV: `Precision::Exact(ndv)`
- Multiple row groups with NDV: `Precision::Inexact(max)` as lower bound
(we can't accurately merge NDV since duplicates may exist across
row groups; max is more conservative than sum for join cardinality
estimation)
- No NDV available: `Precision::Absent`
This provides foundation for improved join cardinality estimation
and other statistics-based optimizations.
Relates to #152651 parent 1f37a33 commit 53f12f6
2 files changed
Lines changed: 390 additions & 3 deletions
0 commit comments