Skip to content

refactor: extract sort pushdown logic from FileScanConfig into separate module#21457

Merged
zhuqi-lucas merged 4 commits intoapache:mainfrom
zhuqi-lucas:feat/refactor-sort-pushdown-module
Apr 10, 2026
Merged

refactor: extract sort pushdown logic from FileScanConfig into separate module#21457
zhuqi-lucas merged 4 commits intoapache:mainfrom
zhuqi-lucas:feat/refactor-sort-pushdown-module

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21433

Rationale for this change

As noted by @alamb in #21182 (comment), file_scan_config.rs has grown large after the sort pushdown optimization. This PR extracts the sort pushdown helpers into their own module to improve readability and maintainability.

What changes are included in this PR?

Move sort pushdown logic from file_scan_config.rs (3591 → 3066 lines) into a new sort_pushdown.rs module (576 lines):

  • rebuild_with_source, try_sort_file_groups_by_statistics
  • sort_files_within_groups_by_statistics, any_file_has_nulls_in_sort_columns
  • validate_orderings, is_ordering_valid_for_file_groups
  • get_projected_output_ordering, ordered_column_indices_from_projection
  • SortedFileGroups struct

try_pushdown_sort stays in the DataSource impl — it calls into the new module.

Are these changes tested?

Pure refactor, all existing tests pass (120 passed).

Are there any user-facing changes?

No.

…te module

Move statistics-based file sorting, non-overlapping validation, and NULL
handling logic into `datasource/src/sort_pushdown.rs` to reduce the size
of `file_scan_config.rs` (3591 → 3066 lines).

Moved to sort_pushdown module:
- rebuild_with_source, try_sort_file_groups_by_statistics
- sort_files_within_groups_by_statistics, any_file_has_nulls_in_sort_columns
- validate_orderings, is_ordering_valid_for_file_groups
- get_projected_output_ordering, ordered_column_indices_from_projection

Pure refactor — no behavior changes.

Closes apache#21433
Copilot AI review requested due to automatic review settings April 8, 2026 08:52
@github-actions github-actions bot added the datasource Changes to the datasource crate label Apr 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors DataFusion’s file-based sort pushdown implementation by extracting statistics-based file sorting and ordering validation helpers out of FileScanConfig into a dedicated sort_pushdown module, reducing file_scan_config.rs size and improving maintainability.

Changes:

  • Introduces datafusion/datasource/src/sort_pushdown.rs containing sort pushdown helpers (file-group sorting, ordering validation, NULL/statistics checks).
  • Wires the new module into the crate (mod.rs) and updates FileScanConfig to call into crate::sort_pushdown::*.
  • Removes the extracted helper implementations from file_scan_config.rs while keeping try_pushdown_sort in the DataSource impl.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
datafusion/datasource/src/sort_pushdown.rs New module containing extracted sort pushdown helper logic and documentation.
datafusion/datasource/src/mod.rs Registers the new internal sort_pushdown module.
datafusion/datasource/src/file_scan_config.rs Updates call sites to use crate::sort_pushdown and removes inlined helper code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhuqi-lucas zhuqi-lucas requested review from adriangb and alamb April 8, 2026 09:20
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhuqi-lucas

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

Thanks @adriangb @alamb for review!

@zhuqi-lucas zhuqi-lucas added this pull request to the merge queue Apr 10, 2026
Merged via the queue into apache:main with commit 4389f14 Apr 10, 2026
31 checks passed
@zhuqi-lucas zhuqi-lucas deleted the feat/refactor-sort-pushdown-module branch April 10, 2026 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: extract sort pushdown logic from FileScanConfig into separate module

4 participants