Support row group pruning for struct field predicates

Related to the work on struct array handling:
- #20854 
- #20822 
- #20829 

When filtering on struct fields (e.g. `WHERE s['value'] > 5`), Datafusion currently can not prune row groups using Parquet column statistics, even though the underlying leaf columns have valid min/max statistics stored in the parquet metadata

The issue is in the pruning predicate system. When it encounters a `GetField` expr like `GetField(Column("s"), "value")`, the column extraction logic only sees the parent struct `Column(s)` and doesn't resolve through to the nested field

Fixing this would mean teaching the pruning system to resolve `GetField` expressions down to their leaf columns, then look up the corresponding Parquet column stats. Note, the stats themselves are already there in the Parquet metadata, they're just never consulted for nested field access

On tables with many row groups, this could significantly reduce the amount of data read for struct field predicates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support row group pruning for struct field predicates #20871

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support row group pruning for struct field predicates #20871

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions