Skip to content

physical_optimizer: preserve_file_partitions when num file groups < target_partitions#117

Merged
jayshrivastava merged 1 commit intoDataDog:branch-53from
jayshrivastava:js/preserve-file-partitions
Apr 30, 2026
Merged

physical_optimizer: preserve_file_partitions when num file groups < target_partitions#117
jayshrivastava merged 1 commit intoDataDog:branch-53from
jayshrivastava:js/preserve-file-partitions

Conversation

@jayshrivastava
Copy link
Copy Markdown

Cherry-pick of apache#21533

…arget_partitions (apache#21533)

## Rationale for this change

`datafusion.optimizer.preserve_file_partitions` would not actually
preserve the file partitions when the number of file groups is less than
the target_partitions. This is unexpected behavior. If a user wants to
preserve file partitions, it is because they want to avoid repartitions.

Before
```
  ProjectionExec
    AggregateExec: mode=FinalPartitioned, gby=[f_dkey, timestamp]
      RepartitionExec: partitioning=Hash([f_dkey, timestamp], 4), input_partitions=3
        AggregateExec: mode=Partial, gby=[f_dkey, timestamp]
          DataSourceExec: file_groups=3, projection=[timestamp, value, f_dkey]
```
After
```
  ProjectionExec
    AggregateExec: mode=SinglePartitioned, gby=[f_dkey, timestamp]
      DataSourceExec: file_groups=3, projection=[timestamp, value, f_dkey]
```

## What changes are included in this PR?

This change fixes that by updating 1 line in the `enforce_distribution`
optimizer rule.

## Are these changes tested?

Yes. In the `preserve_file_partitions` SLT.

## Are there any user-facing changes?

No.
@jayshrivastava jayshrivastava marked this pull request as ready for review April 30, 2026 16:37
@jayshrivastava
Copy link
Copy Markdown
Author

TYFR!

@jayshrivastava jayshrivastava merged commit a35e59e into DataDog:branch-53 Apr 30, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants