Skip to content

Commit 53f12f6

Browse files
committed
feat: Extract NDV (distinct_count) statistics from Parquet metadata
This change adds support for reading Number of Distinct Values (NDV) statistics from Parquet file metadata when available. Previously, `distinct_count` in `ColumnStatistics` was always set to `Precision::Absent`. Now it is populated from parquet row group column statistics when present: - Single row group with NDV: `Precision::Exact(ndv)` - Multiple row groups with NDV: `Precision::Inexact(max)` as lower bound (we can't accurately merge NDV since duplicates may exist across row groups; max is more conservative than sum for join cardinality estimation) - No NDV available: `Precision::Absent` This provides foundation for improved join cardinality estimation and other statistics-based optimizations. Relates to #15265
1 parent 1f37a33 commit 53f12f6

2 files changed

Lines changed: 390 additions & 3 deletions

File tree

0 commit comments

Comments
 (0)