Skip to content

Commit cf05617

Browse files
committed
[SPARK-57736][SQL] Fix NPE in CreateNamedStruct.dataType when a struct field name is null
### What changes were proposed in this pull request? `CreateNamedStruct.dataType` builds each field with `StructField(name.toString, ...)`: ```scala override lazy val dataType: StructType = { val fields = names.zip(valExprs).map { case (name, expr) => ... StructField(name.toString, expr.dataType, expr.nullable, metadata) // NPE if name == null } StructType(fields) } ``` When a field name is `null`, `name.toString` throws a `NullPointerException`. This is reached eagerly while building a `RowEncoder` serializer (`SerializerBuildHelper.createSerializerForObject` -> `CreateNamedStruct(...).dataType`), so it crashes before any analysis runs. This PR makes the field name null-safe and preserves the null name: ```scala StructField(if (name == null) null else name.toString, expr.dataType, expr.nullable, metadata) ``` ### Why are the changes needed? A null field name is invalid input -- `CreateNamedStruct.checkInputDataTypes` already rejects it (`names.contains(null)` -> `UNEXPECTED_NULL`) -- but `dataType` dereferences `name.toString` before type checking, and the encoder calls `dataType` directly. Keeping it null-safe converts the hard `NullPointerException` into correct behavior, consistent with SPARK-57725 which made `AttributeSeq` tolerate null-named attributes. Minimal reproduction: ```scala import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Literal} import org.apache.spark.sql.types.{IntegerType, StringType} CreateNamedStruct(Seq(Literal.create(null, StringType), Literal(1))).dataType // NPE before this fix ``` Note: this fixes the specific `CreateNamedStruct.dataType` NPE. The full `createDataFrame(schemaWithNullFieldName)` scenario hits additional, independent null-name sites further along (e.g. a `StructField.name.equalsIgnoreCase` schema comparison during resolution), which are separate pre-existing issues and out of scope here. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a regression test in `ComplexTypeSuite` asserting `dataType` no longer throws and preserves the null field name. ``` build/sbt 'catalyst/testOnly *ComplexTypeSuite' ``` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56845 from MaxGekk/SPARK-57736-createnamedstruct-npe. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 0525313) Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent ab166a3 commit cf05617

2 files changed

Lines changed: 28 additions & 1 deletion

File tree

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -470,7 +470,11 @@ case class CreateNamedStruct(children: Seq[Expression]) extends Expression with
470470
case gsf: GetStructField => gsf.metadata
471471
case _ => Metadata.empty
472472
}
473-
StructField(name.toString, expr.dataType, expr.nullable, metadata)
473+
// A null field name is invalid input (checkInputDataTypes flags it as UNEXPECTED_NULL),
474+
// but dataType is evaluated eagerly by the encoder before type checking; keep it null-safe
475+
// and preserve the null name rather than throwing a NullPointerException (SPARK-57736).
476+
StructField(if (name == null) null else name.toString, expr.dataType, expr.nullable,
477+
metadata)
474478
}
475479
StructType(fields)
476480
}

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -492,6 +492,29 @@ class ComplexTypeSuite extends SparkFunSuite with ExpressionEvalHelper {
492492
)
493493
}
494494

495+
test("SPARK-57736: CreateNamedStruct.dataType is null-safe when a field name is null") {
496+
// Accessing `dataType` must not throw an NPE even though a null field name is invalid input.
497+
val struct = CreateNamedStruct(Seq(Literal.create(null, StringType), Literal(1)))
498+
val dt = struct.dataType
499+
assert(dt.length == 1)
500+
assert(dt.head.name == null)
501+
// The null field name is still reported as invalid by input type checking.
502+
assert(struct.checkInputDataTypes().isFailure)
503+
val result = struct.checkInputDataTypes().asInstanceOf[DataTypeMismatch]
504+
assert(result.errorSubClass == "UNEXPECTED_NULL")
505+
506+
// A null field name mixed with valid named fields is null-safe and still flagged.
507+
val mixed = CreateNamedStruct(Seq(
508+
Literal("a"), Literal(1),
509+
Literal.create(null, StringType), Literal(2)))
510+
val mixedDt = mixed.dataType
511+
assert(mixedDt.length == 2)
512+
assert(mixedDt.head.name == "a")
513+
assert(mixedDt(1).name == null)
514+
assert(mixed.checkInputDataTypes().asInstanceOf[DataTypeMismatch].errorSubClass ==
515+
"UNEXPECTED_NULL")
516+
}
517+
495518
test("test dsl for complex type") {
496519
def quickResolve(u: UnresolvedExtractValue): Expression = {
497520
ExtractValue(u.child, u.extraction, _ == _)

0 commit comments

Comments
 (0)