[datafusion-spark] Add Spark-compatible isnan function#20595
[datafusion-spark] Add Spark-compatible isnan function#20595shivbhatia10 wants to merge 9 commits intoapache:mainfrom
Conversation
kosiew
left a comment
There was a problem hiding this comment.
@shivbhatia10
Thanks for working on this.
| fn spark_isnan(args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
| let [value] = take_function_args("isnan", args)?; | ||
|
|
||
| match value { |
There was a problem hiding this comment.
The scalar and array paths each repeat the same "float32 vs float64, then map is_nan, then patch nulls" structure.
A small helper here would make the implementation easier to scan and keep future changes aligned across both types.
| signature: Signature::one_of( | ||
| vec![ | ||
| TypeSignature::Exact(vec![DataType::Float32]), | ||
| TypeSignature::Exact(vec![DataType::Float64]), |
There was a problem hiding this comment.
Does this mean SELECT isnan(NULL) will be rejected during planning because the literal has Null type and there is no coercion path?
That is a semantic gap from Spark, where isnan(NULL) returns false.
|
|
||
| # Scalar input: float64 | ||
| query BBBBB | ||
| SELECT isnan(1.0::DOUBLE), isnan('NaN'::DOUBLE), isnan('inf'::DOUBLE), isnan(0.0::DOUBLE), isnan(-1.0::DOUBLE); |
There was a problem hiding this comment.
The happy-path coverage is solid.
Can you also add one Spark-specific planner/error test such as SELECT isnan(1) to document that this Spark variant intentionally rejects non-floating numerics.
That would make the behavioral difference from the built-in DataFusion isnan obvious to future readers and protect against accidental widening of the signature later.
Which issue does this PR close?
Part of #15914 and apache/datafusion-comet#1704
Rationale for this change
Helping to continue adding Spark compatible expressions to datafusion-spark.
What changes are included in this PR?
Add new
isnanfunction.Are these changes tested?
Yes, unit tests.
Are there any user-facing changes?
No.