Skip to content

Commit c1b9f70

Browse files
committed
fix: implement Spark-compatible null handling for arrays_overlap
Replace DataFusion's array_has_any (which treats NULL == NULL) with a custom implementation that follows Spark's three-valued logic: - true when arrays share a common non-null element - null when no common non-null elements but either array has nulls - false when no common elements and no nulls Closes #3645
1 parent d9ed85f commit c1b9f70

6 files changed

Lines changed: 432 additions & 3 deletions

File tree

docs/source/user-guide/latest/compatibility.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,18 @@ Expressions that are not 100% Spark-compatible will fall back to Spark by defaul
5858
`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is the Spark expression class name. See
5959
the [Comet Supported Expressions Guide](expressions.md) for more information on this configuration setting.
6060

61+
## Array Functions
62+
63+
### ArraysOverlap
64+
65+
Comet's `arrays_overlap` implementation follows Spark's null handling semantics: when no common non-null elements
66+
exist but either array contains null elements, the result is `null` rather than `false`. This matches Spark's
67+
three-valued logic where `arrays_overlap(array(1, null), array(null, 2))` returns `null`.
68+
69+
Comet currently uses `ScalarValue`-based comparison for complex element types (structs, nested arrays), which may
70+
have subtle differences from Spark's equality semantics for these types. Primitive and string element types use
71+
native comparisons that match Spark.
72+
6173
## Regular Expressions
6274

6375
Comet uses the Rust regexp crate for evaluating regular expressions, and this has different behavior from Java's

docs/source/user-guide/latest/expressions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ Comet supports using the following aggregate functions within window contexts wi
245245
| ArrayRemove | Yes | |
246246
| ArrayRepeat | No | |
247247
| ArrayUnion | No | Behaves differently than spark. Comet sorts the input arrays before performing the union, while Spark preserves the order of the first array and appends unique elements from the second. |
248-
| ArraysOverlap | No | |
248+
| ArraysOverlap | No | See [ArraysOverlap](compatibility.md#arraysoverlap) in the compatibility guide. |
249249
| CreateArray | Yes | |
250250
| ElementAt | Yes | Input must be an array. Map inputs are not supported. |
251251
| Flatten | Yes | |

0 commit comments

Comments
 (0)