Commit cac01cd
fix(datafusion_iceberg): route full arrow_schema to PruneDataFiles
The second-stage data-file pruner (PruneDataFiles) was constructed with
`partition_schema` — a subset schema holding only the Hive-style partition
columns. Its `min_values`/`max_values` implementation looks up each column
referenced by the pruning predicate via `arrow_schema.field_with_name(..)`
to fetch the datatype, so any filter on a column absent from
`partition_schema` silently returned `None` and pruned nothing.
Identity-self-named partition columns (where `pf.name() == pf.source_name()`)
are intentionally dropped from `file_partition_fields` so the parquet
reader doesn't duplicate them between the path encoding and the file body,
which also drops them from `table_partition_cols` and therefore from
`partition_schema`. The result: a filter like `event_name = 'ad_start'`
against a table partitioned by `identity(event_name)` reached the second-
stage pruner but found no schema hit, so every partition file of the
target was scanned in full (`files_ranges_pruned_statistics=0`). This
only surfaced now because Embucket/embucket#126 unblocked the filter
reaching TableScan in the first place.
Fix: pass the full `arrow_schema` to `PruneDataFiles::new`. It has every
column the predicate might reference — identity-self-named partition
columns, non-partition columns with per-file statistics, etc. Correctness
is preserved because the first-stage `PruneManifests` path still prunes
transformed partition columns (`collector_tstamp_day`, `id_bucket`, ...)
via manifest-list partition bounds, and synthetic partition-transform
columns simply return `None` from `PruneDataFiles` (no per-file stats
exist for them), which is the same behavior they had before.
Adds a regression test: `test_identity_self_named_partition_filter_prunes_files`
creates a `identity(kind)` partitioned table, inserts one row per
partition value to materialize 3 distinct parquet files, then scans with
`kind = 'a'` and asserts the resulting plan lists exactly 1 parquet file
instead of 3.
Refs: Embucket/embucket#127
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 8d242d5 commit cac01cd
1 file changed
Lines changed: 122 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
597 | 597 | | |
598 | 598 | | |
599 | 599 | | |
600 | | - | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
601 | 610 | | |
602 | 611 | | |
603 | | - | |
| 612 | + | |
604 | 613 | | |
605 | 614 | | |
606 | 615 | | |
| |||
3003 | 3012 | | |
3004 | 3013 | | |
3005 | 3014 | | |
| 3015 | + | |
| 3016 | + | |
| 3017 | + | |
| 3018 | + | |
| 3019 | + | |
| 3020 | + | |
| 3021 | + | |
| 3022 | + | |
| 3023 | + | |
| 3024 | + | |
| 3025 | + | |
| 3026 | + | |
| 3027 | + | |
| 3028 | + | |
| 3029 | + | |
| 3030 | + | |
| 3031 | + | |
| 3032 | + | |
| 3033 | + | |
| 3034 | + | |
| 3035 | + | |
| 3036 | + | |
| 3037 | + | |
| 3038 | + | |
| 3039 | + | |
| 3040 | + | |
| 3041 | + | |
| 3042 | + | |
| 3043 | + | |
| 3044 | + | |
| 3045 | + | |
| 3046 | + | |
| 3047 | + | |
| 3048 | + | |
| 3049 | + | |
| 3050 | + | |
| 3051 | + | |
| 3052 | + | |
| 3053 | + | |
| 3054 | + | |
| 3055 | + | |
| 3056 | + | |
| 3057 | + | |
| 3058 | + | |
| 3059 | + | |
| 3060 | + | |
| 3061 | + | |
| 3062 | + | |
| 3063 | + | |
| 3064 | + | |
| 3065 | + | |
| 3066 | + | |
| 3067 | + | |
| 3068 | + | |
| 3069 | + | |
| 3070 | + | |
| 3071 | + | |
| 3072 | + | |
| 3073 | + | |
| 3074 | + | |
| 3075 | + | |
| 3076 | + | |
| 3077 | + | |
| 3078 | + | |
| 3079 | + | |
| 3080 | + | |
| 3081 | + | |
| 3082 | + | |
| 3083 | + | |
| 3084 | + | |
| 3085 | + | |
| 3086 | + | |
| 3087 | + | |
| 3088 | + | |
| 3089 | + | |
| 3090 | + | |
| 3091 | + | |
| 3092 | + | |
| 3093 | + | |
| 3094 | + | |
| 3095 | + | |
| 3096 | + | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
| 3107 | + | |
| 3108 | + | |
| 3109 | + | |
| 3110 | + | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
| 3116 | + | |
| 3117 | + | |
| 3118 | + | |
| 3119 | + | |
| 3120 | + | |
| 3121 | + | |
| 3122 | + | |
| 3123 | + | |
| 3124 | + | |
| 3125 | + | |
3006 | 3126 | | |
3007 | 3127 | | |
3008 | 3128 | | |
| |||
0 commit comments