feat : support spark compatible int to timestamp cast#20555
feat : support spark compatible int to timestamp cast#20555martin-g merged 7 commits intoapache:mainfrom
Conversation
339a475 to
05c433b
Compare
05c433b to
9e71267
Compare
datafusion/sqllogictest/test_files/spark/conversion/cast_int_to_timestamp.slt
Show resolved
Hide resolved
|
Thank you . I will address the review comments shortly |
|
@martin-g , Please take a look whenever you get a chance |
|
I added further tests , timezone support and null handling along with signature changes per review. |
|
Investigating tests failures |
datafusion/sqllogictest/test_files/spark/conversion/cast_int_to_timestamp.slt
Show resolved
Hide resolved
0226e27 to
665d593
Compare
|
@paleolimbot , @alamb FYI I am planning to use this PR as an initial ramp up to support spark compatible cast functionality . For now, we could stick to calling |
paleolimbot
left a comment
There was a problem hiding this comment.
I have no background on the spark-specific functionality but took a look for other cast-related things I've run into 🙂 . A few optional suggestions to consider!
|
Thank you . I replied with my comments in the PR and would love to know your thoughts |
|
Tagging other committers for review @Jefffrey |
It seems like this will work. You could in theory handle an arbitrary Arrow destination (the Spark int handling applies to the input, not the output) but maybe you don't need to. The cast wrapper I'm thinking of for SedonaDB would get inserted by an optimizer rule from a logical Cast, where it would have to handle an arbitrary Arrow output type but you are probably inserting this differently if it is for Comet. |
|
@paleolimbot , agreed. This is just an initial framework to start supporting spark compatible cast in DF . In an utopian sense, we would have a single cast(expr, datatype) which would wire the right cast op based on the semantic profile selected by user (thank you for brainstorming this with me the other day and coming up with the idea of semantic profiles @mbutrovich) . Something like this : |
|
Are we ready to merge this PR? |
paleolimbot
left a comment
There was a problem hiding this comment.
I can't speak to the Spark details but I think this is a productive place to collect Spark-specific cast details until such time that cast details are more easily configurable.
|
Thank you for the approval @paleolimbot . I think this PR is LGTM but tagging @andygrove @comphead in case they have a comment here . |
|
Is this one ready to merge? |
|
Thank you, @coderfender & @paleolimbot ! |
|
Looks like this might have caused a test failure: https://github.com/apache/datafusion/actions/runs/23564037842/job/68611261733 |
|
Thank you . Let me look into the failure and create a PR to fix it |
|
It seems like we might have missed rebasing the main with branch before merging the change |
|
Raised PR to fix CI : #20555 |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. Fixes CI failures caused by PR : #20555 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
Which issue does this PR close?
Ref : #20555
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
spark_castas a first step. Next step would be to make changes to the planner to leverage spark compatible cast instead of regular cast operations