You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
physical_optimizer: preserve_file_partitions when num file groups < target_partitions
`datafusion.optimizer.preserve_file_partitions` would not actually preserve the file partitions when the number of file groups is less than the target_partitions. This is unexpected behavior. If a user wants to preserve file partitions, it is because they want to avoid repartitions.
This change fixes that by updating 1 line in the `enforce_distribution` optimizer rule. It also adds a regression test.
Before
```
ProjectionExec
AggregateExec: mode=FinalPartitioned, gby=[f_dkey, timestamp]
RepartitionExec: partitioning=Hash([f_dkey, timestamp], 4), input_partitions=3
AggregateExec: mode=Partial, gby=[f_dkey, timestamp]
DataSourceExec: file_groups=3, projection=[timestamp, value, f_dkey]
```
After
```
ProjectionExec
AggregateExec: mode=SinglePartitioned, gby=[f_dkey, timestamp]
DataSourceExec: file_groups=3, projection=[timestamp, value, f_dkey]
```
01)ProjectionExec: expr=[f_dkey@0 as f_dkey, timestamp@1 as timestamp, count(Int64(1))@2 as count(*), avg(fact_table.value)@3 as avg(fact_table.value)]
722
+
02)--AggregateExec: mode=SinglePartitioned, gby=[f_dkey@2 as f_dkey, timestamp@0 as timestamp], aggr=[count(Int64(1)), avg(fact_table.value)]
0 commit comments