The arrays_overlap function (returns whether two arrays share any common elements) is a common primitive across engines and would be a good addition to functions_list.yaml or functions_set.yaml (I am not particular about which).
Cross-Engine Function Names
Cross-Engine Semantic Comparison
The following table was generated by running equivalent queries across all five engines via Docker.
| Test Case |
PostgreSQL |
DuckDB |
Spark |
Trino |
DataFusion |
[1,2,3] && [3,4,5] |
true |
true |
true |
true |
true |
[1,2,3] && [4,5,6] |
false |
false |
false |
false |
false |
[1,2,3] && [] |
false |
false |
false |
false |
false |
[] && [] |
false |
false |
false |
false |
false |
[1,NULL,3] && [3,4] |
true |
true |
true |
true |
true |
⚠️ [1,NULL,3] && [4,5] |
false |
false |
NULL |
NULL |
false |
⚠️ [1,NULL] && [NULL,4] |
false |
false |
NULL |
NULL |
true |
NULL && [1,2] |
NULL |
NULL |
NULL |
NULL |
NULL |
[1,1,2] && [1,3] |
true |
true |
true |
true |
true |
['a','b','c'] && ['c','d'] |
true |
true |
true |
true |
true |
All five engines agree on the straightforward cases. The divergence is entirely about how NULL elements are handled when there is no definitive non-null overlap (cases 6 & 7).
Proposal
Lets add an arrays_overlap function to functions_{list/set}.yaml. We can handle the nullability stuff with a function option.
The
arrays_overlapfunction (returns whether two arrays share any common elements) is a common primitive across engines and would be a good addition tofunctions_list.yamlorfunctions_set.yaml(I am not particular about which).Cross-Engine Function Names
&&operatorlist_has_anyarrays_overlaparrays_overlaparray_has_anyCross-Engine Semantic Comparison
The following table was generated by running equivalent queries across all five engines via Docker.
[1,2,3] && [3,4,5][1,2,3] && [4,5,6][1,2,3] && [][] && [][1,NULL,3] && [3,4][1,NULL,3] && [4,5][1,NULL] && [NULL,4]NULL && [1,2][1,1,2] && [1,3]['a','b','c'] && ['c','d']All five engines agree on the straightforward cases. The divergence is entirely about how NULL elements are handled when there is no definitive non-null overlap (cases 6 & 7).
Proposal
Lets add an
arrays_overlapfunction tofunctions_{list/set}.yaml. We can handle the nullability stuff with a function option.