Skip to content

fix: decline CreateArray with struct-nullability-divergent children#4533

Open
schenksj wants to merge 1 commit into
apache:mainfrom
schenksj:fix/create-array-mismatched-child
Open

fix: decline CreateArray with struct-nullability-divergent children#4533
schenksj wants to merge 1 commit into
apache:mainfrom
schenksj:fix/create-array-mismatched-child

Conversation

@schenksj
Copy link
Copy Markdown

Which issue does this PR close?

Closes #4528.

Rationale for this change

DataFusion's make_array asserts strict element-type equality in MutableArrayData::with_capacities and panics on a mismatch. Spark's CreateArray coerces element types with sameType, which ignores nullability, so children that share a surface type but differ only in a nested struct field's nullability get no unifying cast. For example array(struct(a not null), struct(a nullable)) reaches native execution with two different struct types and panics:

native panic: assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData

This is a standalone fix; it was surfaced while working on the Delta Lake contrib integration (Delta's CDC write path builds array(struct(...), struct(...)) plans with one struct per change type, leaving a _change_type field's nullability divergent across arms), so prioritizing it helps that effort, but it applies to any such plan.

What changes are included in this PR?

CometCreateArray now declines (falls back to Spark) when its children's types differ in a way make_array cannot handle. DataFusion tolerates container nullability differences (ArrayType.containsNull / MapType.valueContainsNull are coerced) but not a struct field's nullability, so the check normalizes container nullability before comparing and keeps struct field nullability significant — declining only the cases that actually panic. This avoids over-declining legitimate arrays of arrays/maps that differ only in containsNull.

This tracks upstream apache/datafusion#22366; the caller-side decline can be removed once that fix lands.

How are these changes tested?

New test in CometArrayExpressionSuite builds array(struct(id, ct not null), struct(id, ct nullable)) and asserts correct results. The test fails on main with the native MutableArrayData panic and passes with this change. The full CometArrayExpressionSuite (40/40) passes, including arrays_overlap - nested array null handling which exercises arrays differing only in containsNull and must still run natively.

DataFusion's make_array asserts strict element-type equality in
MutableArrayData and panics on a mismatch. Spark's CreateArray coerces element
types with `sameType`, which ignores nullability, so children that share a
surface type but differ only in a nested struct field's nullability get no
unifying cast (e.g. array(struct(a not null), struct(a nullable))). Native
execution then panics: "Arrays with inconsistent types passed to
MutableArrayData".

DataFusion tolerates container nullability differences (ArrayType.containsNull /
MapType.valueContainsNull are coerced), so decline only the cases that actually
panic: children that still differ after normalizing container nullability while
keeping struct field nullability significant. Those fall back to Spark's
evaluator.

Closes apache#4528

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CreateArray with nullability-divergent children panics in native make_array

1 participant