Skip to content

feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys#21362

Open
xiedeyantu wants to merge 9 commits intoapache:mainfrom
xiedeyantu:sortkey
Open

feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys#21362
xiedeyantu wants to merge 9 commits intoapache:mainfrom
xiedeyantu:sortkey

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

Which issue does this PR close?

Rationale for this change

This PR adds functional-dependency-based simplification for ORDER BY clauses. When an earlier sort key already functionally determines a later key, the later key is redundant and can be removed without changing query semantics. This reduces unnecessary sorting work and avoids carrying extra sort keys through planning and execution.

What changes are included in this PR?

This PR extends the existing functional dependency utilities with a helper for pruning redundant sort keys, and wires that helper into eliminate_duplicated_expr so Sort nodes can be simplified during optimization. It also adds regression coverage for both the positive case, where a trailing sort key is removed, and the negative case, where sort order prevents pruning.

Are these changes tested?

Yes. I added unit tests covering:

  • removal of a functionally redundant trailing ORDER BY key
  • preservation of ordering when the dependent column appears before its determinant

I also ran cargo test -p datafusion-optimizer eliminate_duplicated_expr -- --nocapture successfully, and cargo fmt --all passes.

Are there any user-facing changes?

Yes, but only in query planning behavior. Some queries with redundant ORDER BY keys may produce simpler plans and run more efficiently. There are no public API changes.

@github-actions github-actions bot added optimizer Optimizer rules common Related to common crate labels Apr 4, 2026
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 4, 2026
Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! A few minor suggestions.

@xiedeyantu
Copy link
Copy Markdown
Member Author

Looks good overall! A few minor suggestions.

@neilconway Thank you for your comments and suggestions; I think the scenario you mentioned—changing order by deptno, total_sal, abs(deptno) to order by deptno, abs(deptno)—is an excellent point. We should definitely support this; our previous restrictions were indeed a bit too strict. We could even go a step further in the future by leveraging injective functions for additional optimizations (though it might be better to implement this in a separate PR). Regarding the rest of your comments, I have attempted to address each one by submitting a corresponding commit. Could you please take another look and review them? Thank you once again for your thorough reviews of every single one of my PRs!

Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this!

Can you "resolve" comment threads for review comments you believe have been addressed, please?

pub fn get_required_sort_exprs_indices(
schema: &DFSchema,
sort_expr_names: &[String],
) -> Option<Vec<usize>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we don't need to return Option anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Transformed::no
};

assert!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that panicking if this fails is worth it ... what about debug_assert! instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change here to internal_err, since order by clause can not be empty. Perhaps internal_err is a better choice.

let mut required_sort_expr_indices = Vec::new();

for (sort_expr_idx, sort_expr_name) in sort_expr_names.iter().enumerate() {
let Some(field_idx) = field_names
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a quick comment to justify/explain this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have Added some comments.

Comment on lines +614 to +620
let removable = dependencies.deps.iter().any(|dependency| {
dependency.target_indices.contains(&field_idx)
&& dependency
.source_indices
.iter()
.all(|source_idx| known_field_indices.contains(source_idx))
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment explaining this would be useful as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done!

@xiedeyantu
Copy link
Copy Markdown
Member Author

Thanks for iterating on this!

Can you "resolve" comment threads for review comments you believe have been addressed, please?

@neilconway My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I didn't click the button. Going forward, I will follow your suggestion. Thank you!

@neilconway
Copy link
Copy Markdown
Contributor

My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I didn't click the button.

I think it's simpler if the PR submitter just proactively "resolves" comments they believe have been resolved; if the reviewer disagrees, they can always reopen the comment thread.

Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Nice work.

@alamb PR looks reasonable to me.

@xiedeyantu
Copy link
Copy Markdown
Member Author

My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I didn't click the button.

I think it's simpler if the PR submitter just proactively "resolves" comments they believe have been resolved; if the reviewer disagrees, they can always reopen the comment thread.

I completely agree; this will make it much easier for the reviewer to conduct the next round of reviews. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ORDER BY by Pruning Functionally Redundant Sort Keys

2 participants