Skip to content

Cherry-pick: Propagate orderings through struct-producing projections (#39)#42

Merged
zhuqi-lucas merged 1 commit intomassive-com:branch-52from
zhuqi-lucas:cherry-pick-51-to-52
Mar 30, 2026
Merged

Cherry-pick: Propagate orderings through struct-producing projections (#39)#42
zhuqi-lucas merged 1 commit intomassive-com:branch-52from
zhuqi-lucas:cherry-pick-51-to-52

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Collaborator

Summary

Cherry-pick of #39 from branch-51 to branch-52.

  • Propagates orderings through struct-producing projections (e.g. named_struct)
  • Adds StructFieldMapping trait to ScalarUDFImpl so UDFs can declare struct field→input mappings
  • Uses these mappings in ProjectionExpr::project_orderings to preserve sort order through struct columns

Original PR by @rkrishn7.

Conflicts resolved

  • named_struct.rs: import merge (ScalarValue added)
  • projection.rs: import merge (DataType, internal_err, ScalarFunctionExpr, RecordBatch/RecordBatchOptions additions)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Cherry-pick of #39 to preserve sort order through struct-producing projections (e.g. named_struct) by teaching projection/equivalence logic how struct fields relate back to input expressions, enabling sort elimination for ORDER BY s['field'].

Changes:

  • Add StructFieldMapping to the UDF API (ScalarUDFImpl) so struct-producing UDFs can describe field→input-arg mappings.
  • Extend ProjectionMapping::try_new to decompose struct-producing projections into field-level mapping entries (e.g. col(a) → get_field(col(s), "a")).
  • Add coverage via sqllogictest and equivalence-properties tests for ordering propagation through named_struct.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
datafusion/sqllogictest/test_files/order.slt Adds EXPLAIN assertions for sort elimination / retention when ordering by struct fields
datafusion/physical-expr/src/projection.rs Adds struct-field decomposition into ProjectionMapping and adjusts test helper to skip non-Column targets
datafusion/physical-expr/src/equivalence/properties/dependency.rs Adds a unit test validating ordering propagation through named_struct projections
datafusion/functions/src/core/named_struct.rs Implements struct_field_mapping for named_struct using get_field as the accessor
datafusion/expr/src/udf.rs Introduces StructFieldMapping and plumbs it through ScalarUDF / ScalarUDFImpl (incl. alias wrapper)
datafusion/expr/src/lib.rs Re-exports StructFieldMapping from the expr crate

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

use datafusion_common::tree_node::{Transformed, TransformedResult, TreeNode};
use datafusion_common::{
Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, plan_err,
Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, internal_err, plan_err,
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal_err is imported but not used anywhere in this module, which will fail builds that treat warnings as errors (or clippy -D warnings). Please remove the unused import or use it where intended.

Suggested change
Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, internal_err, plan_err,
Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, plan_err,

Copilot uses AI. Check for mistakes.
func_expr.fun().struct_field_mapping(&literal_args)
{
if let DataType::Struct(struct_fields) = func_expr.return_type() {
for (accessor_args, source_arg_idx) in &field_mapping.fields {
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func_expr.args()[*source_arg_idx] can panic if a UDF returns a StructFieldMapping with an out-of-bounds argument index. Since struct_field_mapping is part of the public UDF API, this should be handled defensively (e.g., validate indices and skip/return an internal error) rather than panicking during planning/optimization.

Suggested change
for (accessor_args, source_arg_idx) in &field_mapping.fields {
for (accessor_args, source_arg_idx) in &field_mapping.fields {
// `source_arg_idx` comes from UDF `struct_field_mapping` and may
// be out of bounds for `func_expr.args()`. Guard against this
// to avoid panicking during planning/optimization.
if *source_arg_idx >= func_expr.args().len() {
continue;
}

Copilot uses AI. Check for mistakes.
@zhuqi-lucas zhuqi-lucas force-pushed the cherry-pick-51-to-52 branch 2 times, most recently from cec17f4 to ebce147 Compare March 30, 2026 03:15
@zhuqi-lucas zhuqi-lucas force-pushed the cherry-pick-51-to-52 branch from ebce147 to b1031d2 Compare March 30, 2026 03:34
@zhuqi-lucas zhuqi-lucas merged commit befeec7 into massive-com:branch-52 Mar 30, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants