Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions extensions/functions_list.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,40 @@ scalar_functions:
value: func<any1 -> boolean?>
nullability: DECLARED_OUTPUT
return: boolean?

- name: "has_overlap"
description: >-
Determines whether two lists share any common elements.

Returns `true` if the two lists have at least one non-null element in common,
`false` if they definitely do not, and the behavior when null elements are
present but no non-null overlap exists is controlled by the `null_handling`
option.

If either input list is `NULL`, returns `NULL`.

If either input list is empty, returns `false`.
impls:
- args:
- name: left
value: list<any1>
- name: right
value: list<any1>
options:
null_handling:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 2 cents are enum for clarity and interoperability. One might argue that this is an option that should follow system default but

a. the alternatives are rather well-understood
b. it may not be consistent (i.e., system may not implement null handling consistently)
c. it works better with the dialect I believe...
d. null handling is... kind of the essential to the semantic of the overlap (null is not an exception in real data unfortunately).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this. Null handling is kind of essential here, and if we allow producers to not set this, or consumers to ignore it, were going to wind up with some very interesting execution differences across systems.

description: >-
Controls how null elements affect the result when no non-null
elements overlap.

- THREE_VALUED: Returns NULL when null elements are present in
both lists but no non-null overlap is found.

- IGNORE_NULLS: Skips null elements entirely so the result is
always true or false.

- NULL_EQUALS_NULL: Treats null elements as equal to each
other, so if both lists contain null that counts as an
overlap.
values: [THREE_VALUED, IGNORE_NULLS, NULL_EQUALS_NULL]
nullability: DECLARED_OUTPUT
return: boolean?
1 change: 1 addition & 0 deletions extensions/functions_set.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ scalar_functions:
values: [ NAN_IS_NAN, NAN_IS_NOT_NAN ]
nullability: DECLARED_OUTPUT
return: i64?

58 changes: 58 additions & 0 deletions tests/cases/list/has_overlap.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
### SUBSTRAIT_SCALAR_TEST: v1.0
### SUBSTRAIT_INCLUDE: '/extensions/functions_list.yaml'

# basic_overlap: Non-null overlap exists
has_overlap([1, 2, 3]::list<i32>, [3, 4, 5]::list<i32>) [null_handling:THREE_VALUED] = true::bool
has_overlap([1, 2, 3]::list<i32>, [3, 4, 5]::list<i32>) [null_handling:IGNORE_NULLS] = true::bool
has_overlap([1, 2, 3]::list<i32>, [3, 4, 5]::list<i32>) [null_handling:NULL_EQUALS_NULL] = true::bool

# basic_no_overlap: No overlap exists
has_overlap([1, 2, 3]::list<i32>, [4, 5, 6]::list<i32>) [null_handling:THREE_VALUED] = false::bool
has_overlap([1, 2, 3]::list<i32>, [4, 5, 6]::list<i32>) [null_handling:IGNORE_NULLS] = false::bool
has_overlap([1, 2, 3]::list<i32>, [4, 5, 6]::list<i32>) [null_handling:NULL_EQUALS_NULL] = false::bool

# duplicates: Duplicate elements
has_overlap([1, 1, 2]::list<i32>, [1, 3]::list<i32>) [null_handling:THREE_VALUED] = true::bool
has_overlap([1, 1, 2]::list<i32>, [1, 3]::list<i32>) [null_handling:IGNORE_NULLS] = true::bool
has_overlap([1, 1, 2]::list<i32>, [1, 3]::list<i32>) [null_handling:NULL_EQUALS_NULL] = true::bool

# empty_right: One list is empty
has_overlap([1, 2, 3]::list<i32>, []::list<i32>) [null_handling:THREE_VALUED] = false::bool
has_overlap([1, 2, 3]::list<i32>, []::list<i32>) [null_handling:IGNORE_NULLS] = false::bool
has_overlap([1, 2, 3]::list<i32>, []::list<i32>) [null_handling:NULL_EQUALS_NULL] = false::bool

# both_empty: Both lists are empty
has_overlap([]::list<i32>, []::list<i32>) [null_handling:THREE_VALUED] = false::bool
has_overlap([]::list<i32>, []::list<i32>) [null_handling:IGNORE_NULLS] = false::bool
has_overlap([]::list<i32>, []::list<i32>) [null_handling:NULL_EQUALS_NULL] = false::bool

# null_left_list: Null left list returns null
has_overlap(null::list<i32>, [1, 2, 3]::list<i32>) [null_handling:THREE_VALUED] = null::bool
has_overlap(null::list<i32>, [1, 2, 3]::list<i32>) [null_handling:IGNORE_NULLS] = null::bool
has_overlap(null::list<i32>, [1, 2, 3]::list<i32>) [null_handling:NULL_EQUALS_NULL] = null::bool

# null_right_list: Null right list returns null
has_overlap([1, 2, 3]::list<i32>, null::list<i32>) [null_handling:THREE_VALUED] = null::bool
has_overlap([1, 2, 3]::list<i32>, null::list<i32>) [null_handling:IGNORE_NULLS] = null::bool
has_overlap([1, 2, 3]::list<i32>, null::list<i32>) [null_handling:NULL_EQUALS_NULL] = null::bool

# both_null_lists: Both lists null returns null
has_overlap(null::list<i32>, null::list<i32>) [null_handling:THREE_VALUED] = null::bool
has_overlap(null::list<i32>, null::list<i32>) [null_handling:IGNORE_NULLS] = null::bool
has_overlap(null::list<i32>, null::list<i32>) [null_handling:NULL_EQUALS_NULL] = null::bool

# null_elements_with_non_null_overlap: Null elements present with a non-null overlap
has_overlap([1, null, 3]::list<i32?>, [3, 4]::list<i32?>) [null_handling:THREE_VALUED] = true::bool
has_overlap([1, null, 3]::list<i32?>, [3, 4]::list<i32?>) [null_handling:IGNORE_NULLS] = true::bool
has_overlap([1, null, 3]::list<i32?>, [3, 4]::list<i32?>) [null_handling:NULL_EQUALS_NULL] = true::bool

# null_elements_no_non_null_overlap: Null elements in one list, no non-null overlap
has_overlap([1, null, 3]::list<i32?>, [4, 5]::list<i32?>) [null_handling:THREE_VALUED] = null::bool
has_overlap([1, null, 3]::list<i32?>, [4, 5]::list<i32?>) [null_handling:IGNORE_NULLS] = false::bool
has_overlap([1, null, 3]::list<i32?>, [4, 5]::list<i32?>) [null_handling:NULL_EQUALS_NULL] = false::bool

# null_elements_in_both: Null elements in both lists, no non-null overlap
has_overlap([1, null]::list<i32?>, [null, 4]::list<i32?>) [null_handling:THREE_VALUED] = null::bool
has_overlap([1, null]::list<i32?>, [null, 4]::list<i32?>) [null_handling:IGNORE_NULLS] = false::bool
has_overlap([1, null]::list<i32?>, [null, 4]::list<i32?>) [null_handling:NULL_EQUALS_NULL] = true::bool

Loading