Skip to content

fix: propagate parent struct null mask in GetStructField#4523

Open
schenksj wants to merge 2 commits into
apache:mainfrom
schenksj:fix/get-struct-field-null-mask
Open

fix: propagate parent struct null mask in GetStructField#4523
schenksj wants to merge 2 commits into
apache:mainfrom
schenksj:fix/get-struct-field-null-mask

Conversation

@schenksj
Copy link
Copy Markdown

Which issue does this PR close?

Closes #4432.

Rationale for this change

A field of a NULL struct must be NULL (Spark semantics). Arrow stores a StructArray's child arrays with their own validity, independent of the parent struct's null buffer — so the raw child value at a row where the struct itself is null can be non-null (e.g. parquet files where a logically-null struct column still carries a populated child buffer). GetStructField::evaluate returned the child column verbatim, so isnotnull(struct.field) wrongly evaluated TRUE for a null struct.

What changes are included in this PR?

GetStructField now unions the parent struct's null mask into the extracted child (null where the struct is null OR the child is null), via a project_field helper used by both the array and scalar-struct evaluation paths.

How are these changes tested?

Added a standalone unit test field_of_null_struct_is_null that builds a StructArray whose child buffer is non-null at every row while the struct is null at some rows. The test fails without the fix (the field comes back non-null for the null-struct rows) and passes with it.

schenksj and others added 2 commits May 29, 2026 20:07
A field of a NULL struct must be NULL (Spark semantics). Arrow stores a
StructArray's child arrays with their own validity, INDEPENDENT of the parent
struct's null buffer, so the raw child value at a row where the struct itself is
null can be non-null (e.g. parquet files where a logically-null struct column
still carries a populated child buffer). GetStructField.evaluate returned the
child column verbatim, so isnotnull(struct.field) wrongly evaluated TRUE for a
null struct.

Fix: union the parent struct's null mask into the extracted child (null where the
struct is null OR the child is null). Adds a standalone unit test that fails
without the fix and passes with it.

Closes apache#4432

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GetStructField returns non-null for fields of a NULL struct (missing null-mask propagation)

1 participant