Skip to content

feat : support spark compatible int to timestamp cast#20555

Merged
martin-g merged 7 commits intoapache:mainfrom
coderfender:df_int_timestamp_cast
Mar 25, 2026
Merged

feat : support spark compatible int to timestamp cast#20555
martin-g merged 7 commits intoapache:mainfrom
coderfender:df_int_timestamp_cast

Conversation

@coderfender
Copy link
Copy Markdown
Contributor

@coderfender coderfender commented Feb 25, 2026

Which issue does this PR close?

Ref : #20555

Rationale for this change

What changes are included in this PR?

  1. Created a new cast scalar UDF to support int to timestamp casts (spark compatible). The goal is to leverage this is as an entry point for all spark compatible / df incompatible casts

Are these changes tested?

  1. Yes (through series of unit tests testing various edge cases like overflow , null input etc)

Are there any user-facing changes?

  1. Yes. We are bringing in ability to leverage spark compatible cast operations in DF through spark_cast as a first step. Next step would be to make changes to the planner to leverage spark compatible cast instead of regular cast operations

@github-actions github-actions bot added the spark label Feb 25, 2026
@coderfender coderfender force-pushed the df_int_timestamp_cast branch 3 times, most recently from 339a475 to 05c433b Compare February 25, 2026 17:26
@coderfender coderfender marked this pull request as draft February 25, 2026 17:28
@coderfender coderfender force-pushed the df_int_timestamp_cast branch from 05c433b to 9e71267 Compare February 25, 2026 19:50
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 25, 2026
@coderfender coderfender marked this pull request as ready for review February 25, 2026 19:54
@coderfender
Copy link
Copy Markdown
Contributor Author

Thank you . I will address the review comments shortly

@xanderbailey xanderbailey mentioned this pull request Feb 28, 2026
@coderfender
Copy link
Copy Markdown
Contributor Author

@martin-g , Please take a look whenever you get a chance

@coderfender
Copy link
Copy Markdown
Contributor Author

I added further tests , timezone support and null handling along with signature changes per review.

@coderfender
Copy link
Copy Markdown
Contributor Author

Investigating tests failures

@coderfender coderfender requested a review from martin-g March 9, 2026 22:23
@coderfender coderfender force-pushed the df_int_timestamp_cast branch from 0226e27 to 665d593 Compare March 10, 2026 16:22
@coderfender
Copy link
Copy Markdown
Contributor Author

@paleolimbot , @alamb FYI I am planning to use this PR as an initial ramp up to support spark compatible cast functionality . For now, we could stick to calling spark_cast and keep adding more functionality while we design / implement wiring to pick user specified semantics and inject spark vs df cast in the planning phase

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no background on the spark-specific functionality but took a look for other cast-related things I've run into 🙂 . A few optional suggestions to consider!

@coderfender
Copy link
Copy Markdown
Contributor Author

Thank you . I replied with my comments in the PR and would love to know your thoughts

@coderfender
Copy link
Copy Markdown
Contributor Author

Tagging other committers for review @Jefffrey

@paleolimbot
Copy link
Copy Markdown
Member

Thank you . I replied with my comments in the PR and would love to know your thoughts

It seems like this will work. You could in theory handle an arbitrary Arrow destination (the Spark int handling applies to the input, not the output) but maybe you don't need to. The cast wrapper I'm thinking of for SedonaDB would get inserted by an optimizer rule from a logical Cast, where it would have to handle an arbitrary Arrow output type but you are probably inserting this differently if it is for Comet.

@coderfender
Copy link
Copy Markdown
Contributor Author

coderfender commented Mar 12, 2026

@paleolimbot , agreed. This is just an initial framework to start supporting spark compatible cast in DF . In an utopian sense, we would have a single cast(expr, datatype) which would wire the right cast op based on the semantic profile selected by user (thank you for brainstorming this with me the other day and coming up with the idea of semantic profiles @mbutrovich) . Something like this :

SET df.semantics.profile=spark

select cast(1 as timestamp) ;

Result : 1

SET df.semantics.profile=default 

select cast(1 as timestamp) ;

Result : 0.0000001

```

We pretty much have all the cast support (barring some incompatible support bound by JVM vs Rust implementation) that I plan to port upstream to DF . Once there are enough cast ops (and other spark compatible expressions) supported, I can start work on semantics support functionality to wire in the right functions and I think that would be a good feature for the users to leverage DF against spark loads directly. For now though this I believe would be a good first step in that direction

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 20, 2026

Are we ready to merge this PR?

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the Spark details but I think this is a productive place to collect Spark-specific cast details until such time that cast details are more easily configurable.

@coderfender
Copy link
Copy Markdown
Contributor Author

Thank you for the approval @paleolimbot . I think this PR is LGTM but tagging @andygrove @comphead in case they have a comment here .

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 25, 2026

Is this one ready to merge?

@martin-g martin-g added this pull request to the merge queue Mar 25, 2026
Merged via the queue into apache:main with commit 139b0b4 Mar 25, 2026
31 checks passed
@martin-g
Copy link
Copy Markdown
Member

Thank you, @coderfender & @paleolimbot !

@neilconway
Copy link
Copy Markdown
Contributor

Looks like this might have caused a test failure: https://github.com/apache/datafusion/actions/runs/23564037842/job/68611261733

@coderfender
Copy link
Copy Markdown
Contributor Author

Thank you . Let me look into the failure and create a PR to fix it

@coderfender
Copy link
Copy Markdown
Contributor Author

coderfender commented Mar 26, 2026

It seems like we might have missed rebasing the main with branch before merging the change

@coderfender
Copy link
Copy Markdown
Contributor Author

Raised PR to fix CI : #20555

github-merge-queue bot pushed a commit that referenced this pull request Mar 26, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.
Fixes CI failures caused by PR :
#20555
## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support spark compatible cast from int -> timestamp

5 participants