Commit cf05617
committed
[SPARK-57736][SQL] Fix NPE in CreateNamedStruct.dataType when a struct field name is null
### What changes were proposed in this pull request?
`CreateNamedStruct.dataType` builds each field with `StructField(name.toString, ...)`:
```scala
override lazy val dataType: StructType = {
val fields = names.zip(valExprs).map {
case (name, expr) =>
...
StructField(name.toString, expr.dataType, expr.nullable, metadata) // NPE if name == null
}
StructType(fields)
}
```
When a field name is `null`, `name.toString` throws a `NullPointerException`. This is reached eagerly while building a `RowEncoder` serializer (`SerializerBuildHelper.createSerializerForObject` -> `CreateNamedStruct(...).dataType`), so it crashes before any analysis runs. This PR makes the field name null-safe and preserves the null name:
```scala
StructField(if (name == null) null else name.toString, expr.dataType, expr.nullable, metadata)
```
### Why are the changes needed?
A null field name is invalid input -- `CreateNamedStruct.checkInputDataTypes` already rejects it (`names.contains(null)` -> `UNEXPECTED_NULL`) -- but `dataType` dereferences `name.toString` before type checking, and the encoder calls `dataType` directly. Keeping it null-safe converts the hard `NullPointerException` into correct behavior, consistent with SPARK-57725 which made `AttributeSeq` tolerate null-named attributes.
Minimal reproduction:
```scala
import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Literal}
import org.apache.spark.sql.types.{IntegerType, StringType}
CreateNamedStruct(Seq(Literal.create(null, StringType), Literal(1))).dataType // NPE before this fix
```
Note: this fixes the specific `CreateNamedStruct.dataType` NPE. The full `createDataFrame(schemaWithNullFieldName)` scenario hits additional, independent null-name sites further along (e.g. a `StructField.name.equalsIgnoreCase` schema comparison during resolution), which are separate pre-existing issues and out of scope here.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added a regression test in `ComplexTypeSuite` asserting `dataType` no longer throws and preserves the null field name.
```
build/sbt 'catalyst/testOnly *ComplexTypeSuite'
```
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
Closes #56845 from MaxGekk/SPARK-57736-createnamedstruct-npe.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 0525313)
Signed-off-by: Max Gekk <max.gekk@gmail.com>1 parent ab166a3 commit cf05617
2 files changed
Lines changed: 28 additions & 1 deletion
File tree
- sql/catalyst/src
- main/scala/org/apache/spark/sql/catalyst/expressions
- test/scala/org/apache/spark/sql/catalyst/expressions
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
470 | 470 | | |
471 | 471 | | |
472 | 472 | | |
473 | | - | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
474 | 478 | | |
475 | 479 | | |
476 | 480 | | |
| |||
Lines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
492 | 492 | | |
493 | 493 | | |
494 | 494 | | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
495 | 518 | | |
496 | 519 | | |
497 | 520 | | |
| |||
0 commit comments