fix: decline CreateArray with struct-nullability-divergent children#4533
Open
schenksj wants to merge 1 commit into
Open
fix: decline CreateArray with struct-nullability-divergent children#4533schenksj wants to merge 1 commit into
schenksj wants to merge 1 commit into
Conversation
DataFusion's make_array asserts strict element-type equality in MutableArrayData and panics on a mismatch. Spark's CreateArray coerces element types with `sameType`, which ignores nullability, so children that share a surface type but differ only in a nested struct field's nullability get no unifying cast (e.g. array(struct(a not null), struct(a nullable))). Native execution then panics: "Arrays with inconsistent types passed to MutableArrayData". DataFusion tolerates container nullability differences (ArrayType.containsNull / MapType.valueContainsNull are coerced), so decline only the cases that actually panic: children that still differ after normalizing container nullability while keeping struct field nullability significant. Those fall back to Spark's evaluator. Closes apache#4528 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4528.
Rationale for this change
DataFusion's
make_arrayasserts strict element-type equality inMutableArrayData::with_capacitiesand panics on a mismatch. Spark'sCreateArraycoerces element types withsameType, which ignores nullability, so children that share a surface type but differ only in a nested struct field's nullability get no unifying cast. For examplearray(struct(a not null), struct(a nullable))reaches native execution with two different struct types and panics:This is a standalone fix; it was surfaced while working on the Delta Lake contrib integration (Delta's CDC write path builds
array(struct(...), struct(...))plans with one struct per change type, leaving a_change_typefield's nullability divergent across arms), so prioritizing it helps that effort, but it applies to any such plan.What changes are included in this PR?
CometCreateArraynow declines (falls back to Spark) when its children's types differ in a waymake_arraycannot handle. DataFusion tolerates container nullability differences (ArrayType.containsNull/MapType.valueContainsNullare coerced) but not a struct field's nullability, so the check normalizes container nullability before comparing and keeps struct field nullability significant — declining only the cases that actually panic. This avoids over-declining legitimate arrays of arrays/maps that differ only incontainsNull.This tracks upstream apache/datafusion#22366; the caller-side decline can be removed once that fix lands.
How are these changes tested?
New test in
CometArrayExpressionSuitebuildsarray(struct(id, ct not null), struct(id, ct nullable))and asserts correct results. The test fails onmainwith the nativeMutableArrayDatapanic and passes with this change. The fullCometArrayExpressionSuite(40/40) passes, includingarrays_overlap - nested array null handlingwhich exercises arrays differing only incontainsNulland must still run natively.