feat : support spark compatible int to timestamp cast by coderfender · Pull Request #20555 · apache/datafusion

coderfender · 2026-02-25T16:59:22Z

Which issue does this PR close?

Closes Support spark compatible cast from int -> timestamp #20554

Ref : #20555

Rationale for this change

What changes are included in this PR?

Created a new cast scalar UDF to support int to timestamp casts (spark compatible). The goal is to leverage this is as an entry point for all spark compatible / df incompatible casts

Are these changes tested?

Yes (through series of unit tests testing various edge cases like overflow , null input etc)

Are there any user-facing changes?

Yes. We are bringing in ability to leverage spark compatible cast operations in DF through spark_cast as a first step. Next step would be to make changes to the planner to leverage spark compatible cast instead of regular cast operations

datafusion/spark/src/function/conversion/cast.rs

datafusion/spark/src/function/conversion/mod.rs

datafusion/sqllogictest/test_files/spark/conversion/cast_int_to_timestamp.slt

coderfender · 2026-02-27T15:13:52Z

Thank you . I will address the review comments shortly

coderfender · 2026-03-05T07:56:37Z

@martin-g , Please take a look whenever you get a chance

coderfender · 2026-03-05T07:57:33Z

I added further tests , timezone support and null handling along with signature changes per review.

datafusion/spark/src/function/conversion/cast.rs

coderfender · 2026-03-05T15:22:18Z

Investigating tests failures

datafusion/spark/src/function/conversion/cast.rs

datafusion/sqllogictest/test_files/spark/conversion/cast_int_to_timestamp.slt

datafusion/spark/src/function/conversion/mod.rs

datafusion/spark/src/function/conversion/cast.rs

coderfender · 2026-03-11T18:22:15Z

@paleolimbot , @alamb FYI I am planning to use this PR as an initial ramp up to support spark compatible cast functionality . For now, we could stick to calling spark_cast and keep adding more functionality while we design / implement wiring to pick user specified semantics and inject spark vs df cast in the planning phase

paleolimbot

I have no background on the spark-specific functionality but took a look for other cast-related things I've run into 🙂 . A few optional suggestions to consider!

datafusion/spark/src/function/conversion/cast.rs

coderfender · 2026-03-12T04:23:48Z

Thank you . I replied with my comments in the PR and would love to know your thoughts

coderfender · 2026-03-12T18:19:21Z

Tagging other committers for review @Jefffrey

paleolimbot · 2026-03-12T19:21:10Z

Thank you . I replied with my comments in the PR and would love to know your thoughts

It seems like this will work. You could in theory handle an arbitrary Arrow destination (the Spark int handling applies to the input, not the output) but maybe you don't need to. The cast wrapper I'm thinking of for SedonaDB would get inserted by an optimizer rule from a logical Cast, where it would have to handle an arbitrary Arrow output type but you are probably inserting this differently if it is for Comet.

coderfender · 2026-03-12T19:41:21Z

@paleolimbot , agreed. This is just an initial framework to start supporting spark compatible cast in DF . In an utopian sense, we would have a single cast(expr, datatype) which would wire the right cast op based on the semantic profile selected by user (thank you for brainstorming this with me the other day and coming up with the idea of semantic profiles @mbutrovich) . Something like this :

SET df.semantics.profile=spark

select cast(1 as timestamp) ;

Result : 1

SET df.semantics.profile=default 

select cast(1 as timestamp) ;

Result : 0.0000001

```

We pretty much have all the cast support (barring some incompatible support bound by JVM vs Rust implementation) that I plan to port upstream to DF . Once there are enough cast ops (and other spark compatible expressions) supported, I can start work on semantics support functionality to wire in the right functions and I think that would be a good feature for the users to leverage DF against spark loads directly. For now though this I believe would be a good first step in that direction

alamb · 2026-03-20T13:29:35Z

Are we ready to merge this PR?

paleolimbot

I can't speak to the Spark details but I think this is a productive place to collect Spark-specific cast details until such time that cast details are more easily configurable.

coderfender · 2026-03-20T18:40:10Z

Thank you for the approval @paleolimbot . I think this PR is LGTM but tagging @andygrove @comphead in case they have a comment here .

alamb · 2026-03-25T20:32:54Z

Is this one ready to merge?

martin-g · 2026-03-25T21:04:02Z

Thank you, @coderfender & @paleolimbot !

neilconway · 2026-03-25T22:50:07Z

Looks like this might have caused a test failure: https://github.com/apache/datafusion/actions/runs/23564037842/job/68611261733

coderfender · 2026-03-26T00:14:48Z

Thank you . Let me look into the failure and create a PR to fix it

coderfender · 2026-03-26T00:16:04Z

It seems like we might have missed rebasing the main with branch before merging the change

coderfender · 2026-03-26T01:06:17Z

Raised PR to fix CI : #20555

## Which issue does this PR close?  - Closes #. Fixes CI failures caused by PR : #20555 ## Rationale for this change  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

github-actions bot added the spark label Feb 25, 2026

coderfender force-pushed the df_int_timestamp_cast branch 3 times, most recently from 339a475 to 05c433b Compare February 25, 2026 17:26

coderfender marked this pull request as draft February 25, 2026 17:28

df_int_timestamp_cast

9e71267

coderfender force-pushed the df_int_timestamp_cast branch from 05c433b to 9e71267 Compare February 25, 2026 19:50

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 25, 2026

coderfender marked this pull request as ready for review February 25, 2026 19:54

martin-g reviewed Feb 27, 2026

View reviewed changes

xanderbailey mentioned this pull request Feb 28, 2026

Spark Cast #20599

Draft

coderfender added 2 commits March 4, 2026 22:40

setup_spark_based_cast_tests

199e3f7

setup_spark_based_cast_tests

476e787

setup_spark_based_cast_tests

91f202a

martin-g reviewed Mar 5, 2026

View reviewed changes

coderfender added 2 commits March 9, 2026 08:46

schema_changes_utils

228a24e

fix_review_comments

532377e

coderfender requested a review from martin-g March 9, 2026 22:23

martin-g reviewed Mar 10, 2026

View reviewed changes

fix_review_comments

665d593

coderfender force-pushed the df_int_timestamp_cast branch from 0226e27 to 665d593 Compare March 10, 2026 16:22

martin-g approved these changes Mar 11, 2026

View reviewed changes

datafusion/spark/src/function/conversion/cast.rs Show resolved Hide resolved

datafusion/spark/src/function/conversion/cast.rs Show resolved Hide resolved

datafusion/spark/src/function/conversion/cast.rs Show resolved Hide resolved

coderfender mentioned this pull request Mar 11, 2026

Consolidate non-expression cast behaviour #20748

Open

paleolimbot reviewed Mar 11, 2026

View reviewed changes

datafusion/spark/src/function/conversion/cast.rs Show resolved Hide resolved

datafusion/spark/src/function/conversion/cast.rs Show resolved Hide resolved

paleolimbot approved these changes Mar 20, 2026

View reviewed changes

martin-g added this pull request to the merge queue Mar 25, 2026

Merged via the queue into apache:main with commit 139b0b4 Mar 25, 2026
31 checks passed

coderfender mentioned this pull request Mar 26, 2026

fix: Df int timestamp cast fix failing CI #21163

Merged

Conversation

coderfender commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderfender commented Feb 27, 2026

Uh oh!

coderfender commented Mar 5, 2026

Uh oh!

coderfender commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderfender commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderfender commented Mar 11, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderfender commented Mar 12, 2026

Uh oh!

coderfender commented Mar 12, 2026

Uh oh!

paleolimbot commented Mar 12, 2026

Uh oh!

coderfender commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Mar 20, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

coderfender commented Mar 20, 2026

Uh oh!

alamb commented Mar 25, 2026

Uh oh!

Uh oh!

martin-g commented Mar 25, 2026

Uh oh!

neilconway commented Mar 25, 2026

Uh oh!

coderfender commented Mar 26, 2026

Uh oh!

coderfender commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderfender commented Mar 26, 2026

coderfender commented Feb 25, 2026 •

edited

Loading

coderfender commented Mar 12, 2026 •

edited

Loading

coderfender commented Mar 26, 2026 •

edited

Loading