Cherry-pick: Propagate orderings through struct-producing projections (#39)#42
Conversation
There was a problem hiding this comment.
Pull request overview
Cherry-pick of #39 to preserve sort order through struct-producing projections (e.g. named_struct) by teaching projection/equivalence logic how struct fields relate back to input expressions, enabling sort elimination for ORDER BY s['field'].
Changes:
- Add
StructFieldMappingto the UDF API (ScalarUDFImpl) so struct-producing UDFs can describe field→input-arg mappings. - Extend
ProjectionMapping::try_newto decompose struct-producing projections into field-level mapping entries (e.g.col(a) → get_field(col(s), "a")). - Add coverage via sqllogictest and equivalence-properties tests for ordering propagation through
named_struct.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| datafusion/sqllogictest/test_files/order.slt | Adds EXPLAIN assertions for sort elimination / retention when ordering by struct fields |
| datafusion/physical-expr/src/projection.rs | Adds struct-field decomposition into ProjectionMapping and adjusts test helper to skip non-Column targets |
| datafusion/physical-expr/src/equivalence/properties/dependency.rs | Adds a unit test validating ordering propagation through named_struct projections |
| datafusion/functions/src/core/named_struct.rs | Implements struct_field_mapping for named_struct using get_field as the accessor |
| datafusion/expr/src/udf.rs | Introduces StructFieldMapping and plumbs it through ScalarUDF / ScalarUDFImpl (incl. alias wrapper) |
| datafusion/expr/src/lib.rs | Re-exports StructFieldMapping from the expr crate |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| use datafusion_common::tree_node::{Transformed, TransformedResult, TreeNode}; | ||
| use datafusion_common::{ | ||
| Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, plan_err, | ||
| Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, internal_err, plan_err, |
There was a problem hiding this comment.
internal_err is imported but not used anywhere in this module, which will fail builds that treat warnings as errors (or clippy -D warnings). Please remove the unused import or use it where intended.
| Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, internal_err, plan_err, | |
| Result, ScalarValue, assert_or_internal_err, internal_datafusion_err, plan_err, |
| func_expr.fun().struct_field_mapping(&literal_args) | ||
| { | ||
| if let DataType::Struct(struct_fields) = func_expr.return_type() { | ||
| for (accessor_args, source_arg_idx) in &field_mapping.fields { |
There was a problem hiding this comment.
func_expr.args()[*source_arg_idx] can panic if a UDF returns a StructFieldMapping with an out-of-bounds argument index. Since struct_field_mapping is part of the public UDF API, this should be handled defensively (e.g., validate indices and skip/return an internal error) rather than panicking during planning/optimization.
| for (accessor_args, source_arg_idx) in &field_mapping.fields { | |
| for (accessor_args, source_arg_idx) in &field_mapping.fields { | |
| // `source_arg_idx` comes from UDF `struct_field_mapping` and may | |
| // be out of bounds for `func_expr.args()`. Guard against this | |
| // to avoid panicking during planning/optimization. | |
| if *source_arg_idx >= func_expr.args().len() { | |
| continue; | |
| } |
cec17f4 to
ebce147
Compare
ebce147 to
b1031d2
Compare
Summary
Cherry-pick of #39 from
branch-51tobranch-52.named_struct)StructFieldMappingtrait toScalarUDFImplso UDFs can declare struct field→input mappingsProjectionExpr::project_orderingsto preserve sort order through struct columnsOriginal PR by @rkrishn7.
Conflicts resolved
named_struct.rs: import merge (ScalarValueadded)projection.rs: import merge (DataType,internal_err,ScalarFunctionExpr,RecordBatch/RecordBatchOptionsadditions)