[datafusion-spark] Add Spark-compatible ceil function#20593
[datafusion-spark] Add Spark-compatible ceil function#20593shivbhatia10 wants to merge 28 commits intoapache:mainfrom
Conversation
comphead
left a comment
There was a problem hiding this comment.
Thanks @shivbhatia10 it would be nice having Spark .slt tests covering edge cases and datatypes as well as commenting in file body what is the difference between Spark and current DF implementation
|
Hey @comphead, I've added some |
|
Thanks @shivbhatia10 would you mind providing an example where DF ceil and Spark ceil return diff result? I'm still struggling to understand the difference between them |
|
Taking a look @shivbhatia10 |
| } | ||
| DataType::Decimal128(p, s) => Ok(DataType::Decimal128(*p, *s)), | ||
| _ => Ok(DataType::Int64), | ||
| } |
There was a problem hiding this comment.
Also recommend using return_field_from_args instead of return_type for better type control during planning phase
There was a problem hiding this comment.
In this UDF I don't think it's strictly necessary, I believe we can infer the output type from arg_types, unless it's good practice in general to prefer return_field_from_args?
There was a problem hiding this comment.
@shivbhatia10 , I strongly advice for us to leverage return_field_from_args given that almost all new expressions are leveraging that to be more efficient and reject unsupported inputs / outputs during planning phase itself
|
@shivbhatia10 , thank you for the PR . I have left some comments on the PR to make implementation a little more safe and concretize edge case handling. Please take a look whenever you get a chance and I can review it again |
|
Hi @coderfender, thank you very much for the comprehensive review here, I've tried to address your comments, please let me know if there's anything else I should add. |
| Ok(DataType::Decimal128(new_p, 0)) | ||
| } else { | ||
| Ok(DataType::Decimal128(*p, *s)) | ||
| } |
There was a problem hiding this comment.
this makes it clear :)
| @@ -81,7 +81,8 @@ impl ScalarUDFImpl for SparkCeil { | |||
| } | |||
| } | |||
| dt if dt.is_integer() => Ok(dt.clone()), | |||
There was a problem hiding this comment.
Shouldnt all integer inputs produce a long / int64 output ?
There was a problem hiding this comment.
Yes that's my bad, I checked the Spark source code and I think they always cast any non-decimal type to int64, I've set it up that way now: 5ad7ef6
| "Returns the smallest integer not less than expr.", | ||
| arg1 | ||
| )); | ||
| export_functions!((ceil, "Returns the ceiling of expr.", arg1)); |
There was a problem hiding this comment.
nice way to keep brief and concise
| ScalarValue::Decimal128(v, p, s) if *s > 0 => { | ||
| let div = 10_i128.pow(*s as u32); | ||
| let div = 10_i128.pow_wrapping(*s as u32); | ||
| let new_p = ((*p as i64) - (*s as i64) + 1).clamp(1, 38) as u8; |
There was a problem hiding this comment.
There is a lot of duplication in the code between scalar and array inputs. Can we create simple functions and inline them to not repeat ourselves ?
There was a problem hiding this comment.
Have tried to extract out a few methods between both paths: 9b23318
coderfender
left a comment
There was a problem hiding this comment.
Almost there . Left some minor comments to improve quality and a potential datatype mismatch
| query I | ||
| SELECT ceil(5::int); | ||
| ---- | ||
| 5 |
There was a problem hiding this comment.
@shivbhatia10 , we might need to test the output data type along with the value in the SLT files ?
|
Hey @coderfender sorry for the delay here, I've tried to respond to your comments, let me know if there's anything else that would be good to change |
Which issue does this PR close?
Part of #15914 and apache/datafusion-comet#1704
I also just noticed #15916 butI believe work has been stale on this one.
Rationale for this change
Helping to continue adding Spark compatible expressions to datafusion-spark.
What changes are included in this PR?
Add new
ceilfunction.Are these changes tested?
Yes, unit tests.
Are there any user-facing changes?
No.