feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys#21362
feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys#21362xiedeyantu wants to merge 9 commits intoapache:mainfrom
Conversation
neilconway
left a comment
There was a problem hiding this comment.
Looks good overall! A few minor suggestions.
@neilconway Thank you for your comments and suggestions; I think the scenario you mentioned—changing |
neilconway
left a comment
There was a problem hiding this comment.
Thanks for iterating on this!
Can you "resolve" comment threads for review comments you believe have been addressed, please?
| pub fn get_required_sort_exprs_indices( | ||
| schema: &DFSchema, | ||
| sort_expr_names: &[String], | ||
| ) -> Option<Vec<usize>> { |
There was a problem hiding this comment.
Seems like we don't need to return Option anymore.
| Transformed::no | ||
| }; | ||
|
|
||
| assert!( |
There was a problem hiding this comment.
Not sure that panicking if this fails is worth it ... what about debug_assert! instead?
There was a problem hiding this comment.
I change here to internal_err, since order by clause can not be empty. Perhaps internal_err is a better choice.
| let mut required_sort_expr_indices = Vec::new(); | ||
|
|
||
| for (sort_expr_idx, sort_expr_name) in sort_expr_names.iter().enumerate() { | ||
| let Some(field_idx) = field_names |
There was a problem hiding this comment.
Can we add a quick comment to justify/explain this?
There was a problem hiding this comment.
I have Added some comments.
| let removable = dependencies.deps.iter().any(|dependency| { | ||
| dependency.target_indices.contains(&field_idx) | ||
| && dependency | ||
| .source_indices | ||
| .iter() | ||
| .all(|source_idx| known_field_indices.contains(source_idx)) | ||
| }); |
There was a problem hiding this comment.
I think a comment explaining this would be useful as well.
@neilconway My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I didn't click the button. Going forward, I will follow your suggestion. Thank you! |
I think it's simpler if the PR submitter just proactively "resolves" comments they believe have been resolved; if the reviewer disagrees, they can always reopen the comment thread. |
neilconway
left a comment
There was a problem hiding this comment.
Looks good to me! Nice work.
@alamb PR looks reasonable to me.
I completely agree; this will make it much easier for the reviewer to conduct the next round of reviews. Thank you! |
Which issue does this PR close?
Rationale for this change
This PR adds functional-dependency-based simplification for
ORDER BYclauses. When an earlier sort key already functionally determines a later key, the later key is redundant and can be removed without changing query semantics. This reduces unnecessary sorting work and avoids carrying extra sort keys through planning and execution.What changes are included in this PR?
This PR extends the existing functional dependency utilities with a helper for pruning redundant sort keys, and wires that helper into
eliminate_duplicated_exprsoSortnodes can be simplified during optimization. It also adds regression coverage for both the positive case, where a trailing sort key is removed, and the negative case, where sort order prevents pruning.Are these changes tested?
Yes. I added unit tests covering:
ORDER BYkeyI also ran
cargo test -p datafusion-optimizer eliminate_duplicated_expr -- --nocapturesuccessfully, andcargo fmt --allpasses.Are there any user-facing changes?
Yes, but only in query planning behavior. Some queries with redundant
ORDER BYkeys may produce simpler plans and run more efficiently. There are no public API changes.