Skip to content

[datafusion-spark] Add Spark-compatible floor function#20594

Open
shivbhatia10 wants to merge 9 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-floor
Open

[datafusion-spark] Add Spark-compatible floor function#20594
shivbhatia10 wants to merge 9 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-floor

Conversation

@shivbhatia10
Copy link
Copy Markdown
Contributor

@shivbhatia10 shivbhatia10 commented Feb 27, 2026

Which issue does this PR close?

Part of #15914 and apache/datafusion-comet#1704

Rationale for this change

Helping to continue adding Spark compatible expressions to datafusion-spark.

What changes are included in this PR?

Add new floor function.

Are these changes tested?

Yes, unit tests.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the spark label Feb 27, 2026
@shivbhatia10 shivbhatia10 changed the title Add floor [datafusion-spark] Add Spark-compatible floor function Feb 27, 2026
@shivbhatia10 shivbhatia10 marked this pull request as ready for review February 27, 2026 13:41
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 28, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10

Thanks for working on this.

impl SparkFloor {
pub fn new() -> Self {
Self {
signature: Signature::numeric(1, Volatility::Immutable),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this implement Spark's floor(expr, scale)?

TODO: 2-argument floor(value, scale) is not yet implemented

Signature::numeric(1) is a wider surface than spark_floor's supported data_types.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10 you could probably use logical_integer / decimal / float and coerce inputs to reject unwanted inputs during planning phase


fn spark_floor(args: &[ColumnarValue], return_type: &DataType) -> Result<ColumnarValue> {
let input = match take_function_args("floor", args)? {
[ColumnarValue::Scalar(value)] => value.to_array()?,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like every scalar invocation will incur this round trip:
scalar→array→result‑array→scalar pattern

Consider following the built-in floor/ceil pattern for scalar inputs so scalar calls can stay scalar and avoid the extra array round-trip.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10 , please take a look at my floor function for reference (I implemented macros to not convert scalars -> array and back) : #20860

## PySpark 3.5.5 Result: {'floor(3.1411, -3)': Decimal('0'), 'typeof(floor(3.1411, -3))': 'decimal(4,0)', 'typeof(3.1411)': 'decimal(5,4)', 'typeof(-3)': 'int'}
## TODO: 2-argument floor(value, scale) is not yet implemented
#query
#SELECT floor(3.1411::decimal(5,4), -3::int);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add explicit type assertions to the SLT coverage (for example with typeof(...)) since the main semantic difference from DataFusion is the result type, not just the numeric value.

@coderfender
Copy link
Copy Markdown
Contributor

coderfender commented Mar 12, 2026

@shivbhatia10 . Please feel free to take my branch for tests , type assertions which I believe should cover all cases and provide comprehensive coverage . @kosiew
My floor and ceil PRs if you would like to work on them :

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
match &arg_types[0] {
DataType::Decimal128(p, s) if *s > 0 => {
let new_p = ((*p as i64) - (*s as i64) + 1).clamp(1, 38) as u8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need i64 casting here. i8 or even u8 could do here


fn spark_floor(args: &[ColumnarValue], return_type: &DataType) -> Result<ColumnarValue> {
let input = match take_function_args("floor", args)? {
[ColumnarValue::Scalar(value)] => value.to_array()?,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10 , please take a look at my floor function for reference (I implemented macros to not convert scalars -> array and back) : #20860

.as_primitive::<Float64Type>()
.unary::<_, Int64Type>(|x| x.floor() as i64),
) as _,
dt if dt.is_integer() => cast(&input, &DataType::Int64)?,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could simply cast to i64 to avoid unnecessary stack frames from cast function


#[test]
fn test_floor_int64() {
let input = Int64Array::from(vec![Some(1), Some(-1), None]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test with all possible integer types here ?


#[test]
fn test_floor_float64() {
let input = Float64Array::from(vec![
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also test Nan / Infinity inputs for float types ?

impl SparkFloor {
pub fn new() -> Self {
Self {
signature: Signature::numeric(1, Volatility::Immutable),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10 you could probably use logical_integer / decimal / float and coerce inputs to reject unwanted inputs during planning phase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants