[datafusion-spark] Add Spark-compatible floor function#20594
[datafusion-spark] Add Spark-compatible floor function#20594shivbhatia10 wants to merge 9 commits intoapache:mainfrom
Conversation
kosiew
left a comment
There was a problem hiding this comment.
Thanks for working on this.
| impl SparkFloor { | ||
| pub fn new() -> Self { | ||
| Self { | ||
| signature: Signature::numeric(1, Volatility::Immutable), |
There was a problem hiding this comment.
How does this implement Spark's floor(expr, scale)?
TODO: 2-argument floor(value, scale) is not yet implemented
Signature::numeric(1) is a wider surface than spark_floor's supported data_types.
There was a problem hiding this comment.
@shivbhatia10 you could probably use logical_integer / decimal / float and coerce inputs to reject unwanted inputs during planning phase
|
|
||
| fn spark_floor(args: &[ColumnarValue], return_type: &DataType) -> Result<ColumnarValue> { | ||
| let input = match take_function_args("floor", args)? { | ||
| [ColumnarValue::Scalar(value)] => value.to_array()?, |
There was a problem hiding this comment.
Looks like every scalar invocation will incur this round trip:
scalar→array→result‑array→scalar pattern
Consider following the built-in floor/ceil pattern for scalar inputs so scalar calls can stay scalar and avoid the extra array round-trip.
There was a problem hiding this comment.
@shivbhatia10 , please take a look at my floor function for reference (I implemented macros to not convert scalars -> array and back) : #20860
| ## PySpark 3.5.5 Result: {'floor(3.1411, -3)': Decimal('0'), 'typeof(floor(3.1411, -3))': 'decimal(4,0)', 'typeof(3.1411)': 'decimal(5,4)', 'typeof(-3)': 'int'} | ||
| ## TODO: 2-argument floor(value, scale) is not yet implemented | ||
| #query | ||
| #SELECT floor(3.1411::decimal(5,4), -3::int); |
There was a problem hiding this comment.
Also add explicit type assertions to the SLT coverage (for example with typeof(...)) since the main semantic difference from DataFusion is the result type, not just the numeric value.
|
@shivbhatia10 . Please feel free to take my branch for tests , type assertions which I believe should cover all cases and provide comprehensive coverage . @kosiew |
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
| match &arg_types[0] { | ||
| DataType::Decimal128(p, s) if *s > 0 => { | ||
| let new_p = ((*p as i64) - (*s as i64) + 1).clamp(1, 38) as u8; |
There was a problem hiding this comment.
I dont think we need i64 casting here. i8 or even u8 could do here
|
|
||
| fn spark_floor(args: &[ColumnarValue], return_type: &DataType) -> Result<ColumnarValue> { | ||
| let input = match take_function_args("floor", args)? { | ||
| [ColumnarValue::Scalar(value)] => value.to_array()?, |
There was a problem hiding this comment.
@shivbhatia10 , please take a look at my floor function for reference (I implemented macros to not convert scalars -> array and back) : #20860
| .as_primitive::<Float64Type>() | ||
| .unary::<_, Int64Type>(|x| x.floor() as i64), | ||
| ) as _, | ||
| dt if dt.is_integer() => cast(&input, &DataType::Int64)?, |
There was a problem hiding this comment.
Could simply cast to i64 to avoid unnecessary stack frames from cast function
|
|
||
| #[test] | ||
| fn test_floor_int64() { | ||
| let input = Int64Array::from(vec![Some(1), Some(-1), None]); |
There was a problem hiding this comment.
Should we test with all possible integer types here ?
|
|
||
| #[test] | ||
| fn test_floor_float64() { | ||
| let input = Float64Array::from(vec![ |
There was a problem hiding this comment.
Could we also test Nan / Infinity inputs for float types ?
| impl SparkFloor { | ||
| pub fn new() -> Self { | ||
| Self { | ||
| signature: Signature::numeric(1, Volatility::Immutable), |
There was a problem hiding this comment.
@shivbhatia10 you could probably use logical_integer / decimal / float and coerce inputs to reject unwanted inputs during planning phase
Which issue does this PR close?
Part of #15914 and apache/datafusion-comet#1704
Rationale for this change
Helping to continue adding Spark compatible expressions to datafusion-spark.
What changes are included in this PR?
Add new
floorfunction.Are these changes tested?
Yes, unit tests.
Are there any user-facing changes?
No.