Skip to content

fix: Fix Spark slice function Null type to GenericListArray casting issue#20469

Open
erenavsarogullari wants to merge 4 commits intoapache:mainfrom
erenavsarogullari:df_spark_slice_func_fix
Open

fix: Fix Spark slice function Null type to GenericListArray casting issue#20469
erenavsarogullari wants to merge 4 commits intoapache:mainfrom
erenavsarogullari:df_spark_slice_func_fix

Conversation

@erenavsarogullari
Copy link
Copy Markdown
Member

Which issue does this PR close?

Rationale for this change

Currently, Spark slice function accepts Null Arrays and return Null for this particular queries. DataFusion-Spark slice function also needs to return NULL when Null Array is set.
Spark Behavior (tested with latest Spark master):

> SELECT slice(NULL, 1, 2);
+-----------------+
|slice(NULL, 1, 2)|
+-----------------+
|             null|
+-----------------+

DF Behaviour:
Current:

query error
SELECT slice(NULL, 1, 2);
----
DataFusion error: Internal error: could not cast array of type Null to arrow_array::array::list_array::GenericListArray<i32>.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues

New:

query ?
SELECT slice(NULL, 1, 2);
----
NULL

What changes are included in this PR?

Explained under first section.

Are these changes tested?

Added new UT cases for both slice.rs and slice.slt.

Are there any user-facing changes?

Yes, currently, slice function returns error message for Null Array inputs, however, expected behavior is to be returned NULL so end-user will get expected result instead of error message.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Feb 21, 2026
@Jefffrey
Copy link
Copy Markdown
Contributor

Does Spark return null (void) or an array of null (void)?

I tested on PySpark 4.1.1

>>> spark.sql("select slice(NULL, 1, 2) as a").printSchema()
root
 |-- a: array (nullable = true)
 |    |-- element: void (containsNull = true)
  • Seems to suggest the latter?

@erenavsarogullari
Copy link
Copy Markdown
Member Author

Spark returns Array as (nullable = true / false) and there are 2 representations as follows in terms of input array so it returns input array for both cases as either NULL or [NULL]:
Case1: input array is null (array (nullable = true)):

+-----------------+
|slice(NULL, 1, 2)|
+-----------------+
|             NULL|
+-----------------+

root
 |-- slice(NULL, 1, 2): array (nullable = true)
 |    |-- element: void (containsNull = true)

Case2: input array has null element (array (nullable = false)):

+------------------------+
|slice(array(NULL), 1, 2)|
+------------------------+
|                  [NULL]|
+------------------------+

root
 |-- slice(array(NULL), 1, 2): array (nullable = false)
 |    |-- element: void (containsNull = true)

@Jefffrey
Copy link
Copy Markdown
Contributor

This seems to suggest return type should be list of nulls instead of just null

@erenavsarogullari erenavsarogullari force-pushed the df_spark_slice_func_fix branch 2 times, most recently from dcaa80c to 2d066d7 Compare February 28, 2026 18:58
@erenavsarogullari
Copy link
Copy Markdown
Member Author

This seems to suggest return type should be list of nulls instead of just null

Yes, latest fix aims to address this by returning NullArray instead of Scalar Null value.

@erenavsarogullari erenavsarogullari force-pushed the df_spark_slice_func_fix branch 2 times, most recently from f8cb205 to f5832c4 Compare February 28, 2026 19:41
@Jefffrey
Copy link
Copy Markdown
Contributor

Jefffrey commented Mar 2, 2026

This seems to suggest return type should be list of nulls instead of just null

Yes, latest fix aims to address this by returning NullArray instead of Scalar Null value.

Sorry I don't see how this relates? I mean it seems to suggest that a DataType::Null input has a DataType::List(Null) output, according to my PySpark test. But maybe the test was incorrect, since perhaps Spark introduces some casts? Could we confirm the behaviour if we want to align with Spark here?

@erenavsarogullari erenavsarogullari force-pushed the df_spark_slice_func_fix branch from f5832c4 to 2d6254e Compare March 24, 2026 02:45
@erenavsarogullari erenavsarogullari force-pushed the df_spark_slice_func_fix branch from 0fc58c6 to 901642c Compare April 5, 2026 22:20
@erenavsarogullari
Copy link
Copy Markdown
Member Author

erenavsarogullari commented Apr 5, 2026

@Jefffrey Sorry for the delay and thanks again for the review. I have just submitted the latest commit which returns DataType::List(Null) for DataType::Null input and added test cases verifying the output type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Spark slice function Null type to GenericListArray casting issue

3 participants