Skip to content

Commit f7d22da

Browse files
committed
fix: mark expressions with known correctness issues as incompatible
Review all open correctness issues and mark affected expressions as Incompatible so they fall back to Spark by default. Update the compatibility guide with detailed documentation of each incompatibility and links to tracking issues. Expressions marked Incompatible: - ArrayContains (#3346), GetArrayItem (#3330, #3332), ArrayRemove (#3173) - Hour, Minute, Second for TimestampNTZ inputs (#3180) - TruncTimestamp for non-UTC timezones (#2649) - Ceil, Floor for Decimal inputs (#1729) - Tan (#1897), Corr (#2646), StructsToJson (#3016)
1 parent d9ed85f commit f7d22da

8 files changed

Lines changed: 181 additions & 15 deletions

File tree

docs/source/user-guide/latest/compatibility.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,51 @@ Expressions that are not 100% Spark-compatible will fall back to Spark by defaul
5858
`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is the Spark expression class name. See
5959
the [Comet Supported Expressions Guide](expressions.md) for more information on this configuration setting.
6060

61+
### Array Expressions
62+
63+
- **ArrayContains**: Returns null instead of false for empty arrays with literal values.
64+
[#3346](https://github.com/apache/datafusion-comet/issues/3346)
65+
- **ArrayRemove**: Returns null when the element to remove is null, instead of removing null elements from the array.
66+
[#3173](https://github.com/apache/datafusion-comet/issues/3173)
67+
- **GetArrayItem**: Known correctness issues with index handling, including off-by-one errors and incorrect results
68+
with dynamic (non-literal) index values.
69+
[#3330](https://github.com/apache/datafusion-comet/issues/3330),
70+
[#3332](https://github.com/apache/datafusion-comet/issues/3332)
71+
- **ArraysOverlap**: Inconsistent behavior when arrays contain NULL values.
72+
[#3645](https://github.com/apache/datafusion-comet/issues/3645),
73+
[#2036](https://github.com/apache/datafusion-comet/issues/2036)
74+
- **ArrayUnion**: Sorts input arrays before performing the union, while Spark preserves the order of the first array
75+
and appends unique elements from the second.
76+
[#3644](https://github.com/apache/datafusion-comet/issues/3644)
77+
78+
### Date/Time Expressions
79+
80+
- **Hour, Minute, Second**: Incorrectly apply timezone conversion to TimestampNTZ inputs. TimestampNTZ stores local
81+
time without timezone, so no conversion should be applied. These expressions work correctly with Timestamp inputs.
82+
[#3180](https://github.com/apache/datafusion-comet/issues/3180)
83+
- **TruncTimestamp (date_trunc)**: Produces incorrect results when used with non-UTC timezones. Compatible when
84+
timezone is UTC.
85+
[#2649](https://github.com/apache/datafusion-comet/issues/2649)
86+
87+
### Math Expressions
88+
89+
- **Ceil, Floor**: Incorrect results for Decimal type inputs.
90+
[#1729](https://github.com/apache/datafusion-comet/issues/1729)
91+
- **Tan**: `tan(-0.0)` produces `0.0` instead of `-0.0`.
92+
[#1897](https://github.com/apache/datafusion-comet/issues/1897)
93+
94+
### Aggregate Expressions
95+
96+
- **Corr**: Returns null instead of NaN in some edge cases.
97+
[#2646](https://github.com/apache/datafusion-comet/issues/2646)
98+
- **First, Last**: These functions are not deterministic. When `ignoreNulls` is set, results may not match Spark.
99+
[#1630](https://github.com/apache/datafusion-comet/issues/1630)
100+
101+
### Struct Expressions
102+
103+
- **StructsToJson (to_json)**: Does not support `+Infinity` and `-Infinity` for numeric types (float, double).
104+
[#3016](https://github.com/apache/datafusion-comet/issues/3016)
105+
61106
## Regular Expressions
62107

63108
Comet uses the Rust regexp crate for evaluating regular expressions, and this has different behavior from Java's

docs/source/user-guide/latest/expressions.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -101,12 +101,12 @@ Expressions that are not Spark-compatible will fall back to Spark by default and
101101
| DatePart | `date_part(field, source)` | Yes | Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` |
102102
| Extract | `extract(field FROM source)` | Yes | Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` |
103103
| FromUnixTime | `from_unixtime` | No | Does not support format, supports only -8334601211038 <= sec <= 8210266876799 |
104-
| Hour | `hour` | Yes | |
104+
| Hour | `hour` | No | Incorrectly applies timezone conversion to TimestampNTZ inputs ([#3180](https://github.com/apache/datafusion-comet/issues/3180)) |
105105
| LastDay | `last_day` | Yes | |
106-
| Minute | `minute` | Yes | |
107-
| Second | `second` | Yes | |
106+
| Minute | `minute` | No | Incorrectly applies timezone conversion to TimestampNTZ inputs ([#3180](https://github.com/apache/datafusion-comet/issues/3180)) |
107+
| Second | `second` | No | Incorrectly applies timezone conversion to TimestampNTZ inputs ([#3180](https://github.com/apache/datafusion-comet/issues/3180)) |
108108
| TruncDate | `trunc` | Yes | |
109-
| TruncTimestamp | `date_trunc` | Yes | |
109+
| TruncTimestamp | `date_trunc` | No | Incorrect results in non-UTC timezones ([#2649](https://github.com/apache/datafusion-comet/issues/2649)) |
110110
| UnixDate | `unix_date` | Yes | |
111111
| UnixTimestamp | `unix_timestamp` | Yes | |
112112
| Year | `year` | Yes | |
@@ -129,14 +129,14 @@ Expressions that are not Spark-compatible will fall back to Spark by default and
129129
| Atan | `atan` | Yes | |
130130
| Atan2 | `atan2` | Yes | |
131131
| BRound | `bround` | Yes | |
132-
| Ceil | `ceil` | Yes | |
132+
| Ceil | `ceil` | No | Incorrect results for Decimal type inputs ([#1729](https://github.com/apache/datafusion-comet/issues/1729)) |
133133
| Cos | `cos` | Yes | |
134134
| Cosh | `cosh` | Yes | |
135135
| Cot | `cot` | Yes | |
136136
| Divide | `/` | Yes | |
137137
| Exp | `exp` | Yes | |
138138
| Expm1 | `expm1` | Yes | |
139-
| Floor | `floor` | Yes | |
139+
| Floor | `floor` | No | Incorrect results for Decimal type inputs ([#1729](https://github.com/apache/datafusion-comet/issues/1729)) |
140140
| Hex | `hex` | Yes | |
141141
| IntegralDivide | `div` | Yes | |
142142
| IsNaN | `isnan` | Yes | |
@@ -154,7 +154,7 @@ Expressions that are not Spark-compatible will fall back to Spark by default and
154154
| Sinh | `sinh` | Yes | |
155155
| Sqrt | `sqrt` | Yes | |
156156
| Subtract | `-` | Yes | |
157-
| Tan | `tan` | Yes | |
157+
| Tan | `tan` | No | tan(-0.0) produces incorrect result ([#1897](https://github.com/apache/datafusion-comet/issues/1897)) |
158158
| Tanh | `tanh` | Yes | |
159159
| TryAdd | `try_add` | Yes | Only integer inputs are supported |
160160
| TryDivide | `try_div` | Yes | Only integer inputs are supported |
@@ -196,7 +196,7 @@ Expressions that are not Spark-compatible will fall back to Spark by default and
196196
| BitXorAgg | | Yes | |
197197
| BoolAnd | `bool_and` | Yes | |
198198
| BoolOr | `bool_or` | Yes | |
199-
| Corr | | Yes | |
199+
| Corr | | No | Returns null instead of NaN in some edge cases ([#2646](https://github.com/apache/datafusion-comet/issues/2646)) |
200200
| Count | | Yes | |
201201
| CovPopulation | | Yes | |
202202
| CovSample | | Yes | |
@@ -233,7 +233,7 @@ Comet supports using the following aggregate functions within window contexts wi
233233
| -------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
234234
| ArrayAppend | No | |
235235
| ArrayCompact | No | |
236-
| ArrayContains | Yes | |
236+
| ArrayContains | No | Returns null instead of false for empty arrays with literal values ([#3346](https://github.com/apache/datafusion-comet/issues/3346)) |
237237
| ArrayDistinct | No | Behaves differently than spark. Comet first sorts then removes duplicates while Spark preserves the original order. |
238238
| ArrayExcept | No | |
239239
| ArrayFilter | Yes | Only supports case where function is `IsNotNull` |
@@ -242,14 +242,14 @@ Comet supports using the following aggregate functions within window contexts wi
242242
| ArrayJoin | No | |
243243
| ArrayMax | Yes | |
244244
| ArrayMin | Yes | |
245-
| ArrayRemove | Yes | |
245+
| ArrayRemove | No | Returns null when element is null instead of removing null elements ([#3173](https://github.com/apache/datafusion-comet/issues/3173)) |
246246
| ArrayRepeat | No | |
247247
| ArrayUnion | No | Behaves differently than spark. Comet sorts the input arrays before performing the union, while Spark preserves the order of the first array and appends unique elements from the second. |
248248
| ArraysOverlap | No | |
249249
| CreateArray | Yes | |
250250
| ElementAt | Yes | Input must be an array. Map inputs are not supported. |
251251
| Flatten | Yes | |
252-
| GetArrayItem | Yes | |
252+
| GetArrayItem | No | Known correctness issues with index handling ([#3330](https://github.com/apache/datafusion-comet/issues/3330), [#3332](https://github.com/apache/datafusion-comet/issues/3332)) |
253253

254254
## Map Expressions
255255

@@ -269,7 +269,7 @@ Comet supports using the following aggregate functions within window contexts wi
269269
| GetArrayStructFields | Yes | |
270270
| GetStructField | Yes | |
271271
| JsonToStructs | No | Partial support. Requires explicit schema. |
272-
| StructsToJson | Yes | |
272+
| StructsToJson | No | Does not support Infinity/-Infinity for numeric types ([#3016](https://github.com/apache/datafusion-comet/issues/3016)) |
273273

274274
## Conversion Expressions
275275

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ object QueryPlanSerde extends Logging with CometExprShim {
116116
classOf[Sinh] -> CometScalarFunction("sinh"),
117117
classOf[Sqrt] -> CometScalarFunction("sqrt"),
118118
classOf[Subtract] -> CometSubtract,
119-
classOf[Tan] -> CometScalarFunction("tan"),
119+
classOf[Tan] -> CometTan,
120120
classOf[Tanh] -> CometScalarFunction("tanh"),
121121
classOf[Cot] -> CometScalarFunction("cot"),
122122
classOf[UnaryMinus] -> CometUnaryMinus,

spark/src/main/scala/org/apache/comet/serde/aggregates.scala

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -584,6 +584,13 @@ object CometStddevPop extends CometAggregateExpressionSerde[StddevPop] with Come
584584
}
585585

586586
object CometCorr extends CometAggregateExpressionSerde[Corr] {
587+
588+
override def getSupportLevel(expr: Corr): SupportLevel =
589+
Incompatible(
590+
Some(
591+
"Returns null instead of NaN in some edge cases" +
592+
" (https://github.com/apache/datafusion-comet/issues/2646)"))
593+
587594
override def convert(
588595
aggExpr: AggregateExpression,
589596
corr: Corr,

spark/src/main/scala/org/apache/comet/serde/arrays.scala

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,12 @@ object CometArrayRemove
3535
with CometExprShim
3636
with ArraysBase {
3737

38+
override def getSupportLevel(expr: ArrayRemove): SupportLevel =
39+
Incompatible(
40+
Some(
41+
"Returns null when element is null instead of removing null elements" +
42+
" (https://github.com/apache/datafusion-comet/issues/3173)"))
43+
3844
override def convert(
3945
expr: ArrayRemove,
4046
inputs: Seq[Attribute],
@@ -131,6 +137,13 @@ object CometArrayAppend extends CometExpressionSerde[ArrayAppend] {
131137
}
132138

133139
object CometArrayContains extends CometExpressionSerde[ArrayContains] {
140+
141+
override def getSupportLevel(expr: ArrayContains): SupportLevel =
142+
Incompatible(
143+
Some(
144+
"Returns null instead of false for empty arrays with literal values" +
145+
" (https://github.com/apache/datafusion-comet/issues/3346)"))
146+
134147
override def convert(
135148
expr: ArrayContains,
136149
inputs: Seq[Attribute],
@@ -472,6 +485,14 @@ object CometCreateArray extends CometExpressionSerde[CreateArray] {
472485
}
473486

474487
object CometGetArrayItem extends CometExpressionSerde[GetArrayItem] {
488+
489+
override def getSupportLevel(expr: GetArrayItem): SupportLevel =
490+
Incompatible(
491+
Some(
492+
"Known correctness issues with index handling" +
493+
" (https://github.com/apache/datafusion-comet/issues/3330," +
494+
" https://github.com/apache/datafusion-comet/issues/3332)"))
495+
475496
override def convert(
476497
expr: GetArrayItem,
477498
inputs: Seq[Attribute],

0 commit comments

Comments
 (0)