fix: Set Substrait output types for expressions#20597
Conversation
|
@gabotechs Could you possibly take a look at this or suggest another reviewer? Thanks! |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
0fdbc92 to
6d0fbdc
Compare
|
@alamb Any chance you could help me find a reviewer for this one? (I'm not sure how I should go about this as a first-time DataFusion contributor). Many thanks! |
|
I @wlhjason -- thanks for the ping. Maybe @westonpace or @gabotechs have time to review this one (or know someone who does) |
|
Requesting backup now |
westonpace
left a comment
There was a problem hiding this comment.
This looks correct to me. The output type is not optional according to Substrait so the previous behavior was not compliant with the spec. This change brings things into compliance.
|
Here's some feedback from Claude: I don't think I'm two worried about P1. The plans are still roughly equivalent. Might be worth just calling out in the PR description. I agree with P2, in particular, we should have a test for the join case. I think P3 is inevitable and somewhat accepted at the moment. Agree that the things in P4 are minor, take them or leave them. |
876cf5f to
d560e09
Compare
|
Many thanks for the review, @westonpace! I think I've addressed all the key points now. |
Agreed. I don't have merge privilege but maybe @alamb can pull the trigger? |
d560e09 to
53f2bb3
Compare
|
Rebased on |
gabotechs
left a comment
There was a problem hiding this comment.
Thanks @wlhjason for the PR and @westonpace for the review! this LGTM, I'll merge it soon.
Which issue does this PR close?
BinaryExprincludesoutput_type#15831.Rationale for this change
The Substrait producer did not set the ScalarFunction
output_typewhen converting binary expressions, which broke consumers relying on theoutput_type.What changes are included in this PR?
from_joinandfrom_betweento eliminate direct calls tomake_binary_op_scalar_funcoutput_typewhen converting several types of DataFusion expressions:Expr::BinaryExpr)Expr::Not)Expr::ScalarFunction)There are a few more places where the
output_typehas not been set, such asfrom_likeandfrom_in_list, as mentioned in #15831. I've left these out of scope here as fixing them would require more substantial code changes.Are these changes tested?
Yes, via a new unit test.
Are there any user-facing changes?
No, beyond the Substrait output fix.