Skip to content

[datafusion-spark] Add Spark-compatible ceil function#20593

Open
shivbhatia10 wants to merge 28 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-ceil
Open

[datafusion-spark] Add Spark-compatible ceil function#20593
shivbhatia10 wants to merge 28 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-ceil

Conversation

@shivbhatia10
Copy link
Copy Markdown
Contributor

@shivbhatia10 shivbhatia10 commented Feb 27, 2026

Which issue does this PR close?

Part of #15914 and apache/datafusion-comet#1704

I also just noticed #15916 butI believe work has been stale on this one.

Rationale for this change

Helping to continue adding Spark compatible expressions to datafusion-spark.

What changes are included in this PR?

Add new ceil function.

Are these changes tested?

Yes, unit tests.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the spark label Feb 27, 2026
@shivbhatia10 shivbhatia10 changed the title Add ceil [datafusion-spark] Add Spark-compatible ceil function Feb 27, 2026
@shivbhatia10 shivbhatia10 marked this pull request as ready for review February 27, 2026 13:35
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shivbhatia10 it would be nice having Spark .slt tests covering edge cases and datatypes as well as commenting in file body what is the difference between Spark and current DF implementation

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 28, 2026
@shivbhatia10
Copy link
Copy Markdown
Contributor Author

Hey @comphead, I've added some .slt tests and uncommented a few existing ones that seem have been there as placeholders for when we got this implementation, and I expanded on the comment explaining the differences between this and the DataFusion ceil function.

@comphead
Copy link
Copy Markdown
Contributor

comphead commented Mar 2, 2026

Thanks @shivbhatia10 would you mind providing an example where DF ceil and Spark ceil return diff result? I'm still struggling to understand the difference between them

@coderfender
Copy link
Copy Markdown
Contributor

Taking a look @shivbhatia10

}
DataType::Decimal128(p, s) => Ok(DataType::Decimal128(*p, *s)),
_ => Ok(DataType::Int64),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also recommend using return_field_from_args instead of return_type for better type control during planning phase

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this UDF I don't think it's strictly necessary, I believe we can infer the output type from arg_types, unless it's good practice in general to prefer return_field_from_args?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10 , I strongly advice for us to leverage return_field_from_args given that almost all new expressions are leveraging that to be more efficient and reject unsupported inputs / outputs during planning phase itself

@coderfender
Copy link
Copy Markdown
Contributor

@shivbhatia10 , thank you for the PR . I have left some comments on the PR to make implementation a little more safe and concretize edge case handling. Please take a look whenever you get a chance and I can review it again

@shivbhatia10
Copy link
Copy Markdown
Contributor Author

Hi @coderfender, thank you very much for the comprehensive review here, I've tried to address your comments, please let me know if there's anything else I should add.

Ok(DataType::Decimal128(new_p, 0))
} else {
Ok(DataType::Decimal128(*p, *s))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it clear :)

@@ -81,7 +81,8 @@ impl ScalarUDFImpl for SparkCeil {
}
}
dt if dt.is_integer() => Ok(dt.clone()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt all integer inputs produce a long / int64 output ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's my bad, I checked the Spark source code and I think they always cast any non-decimal type to int64, I've set it up that way now: 5ad7ef6

"Returns the smallest integer not less than expr.",
arg1
));
export_functions!((ceil, "Returns the ceiling of expr.", arg1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice way to keep brief and concise

ScalarValue::Decimal128(v, p, s) if *s > 0 => {
let div = 10_i128.pow(*s as u32);
let div = 10_i128.pow_wrapping(*s as u32);
let new_p = ((*p as i64) - (*s as i64) + 1).clamp(1, 38) as u8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplication in the code between scalar and array inputs. Can we create simple functions and inline them to not repeat ourselves ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried to extract out a few methods between both paths: 9b23318

Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there . Left some minor comments to improve quality and a potential datatype mismatch

query I
SELECT ceil(5::int);
----
5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10 , we might need to test the output data type along with the value in the SLT files ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10
Copy link
Copy Markdown
Contributor Author

Hey @coderfender sorry for the delay here, I've tried to respond to your comments, let me know if there's anything else that would be good to change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants