Skip to content

fix(arrow/array): reject nulls in non-nullable fields when building records#858

Open
zeroshade wants to merge 1 commit into
apache:mainfrom
zeroshade:gh-372-nonnullable-validation
Open

fix(arrow/array): reject nulls in non-nullable fields when building records#858
zeroshade wants to merge 1 commit into
apache:mainfrom
zeroshade:gh-372-nonnullable-validation

Conversation

@zeroshade

Copy link
Copy Markdown
Member

Rationale for this change

array.RecordBuilder.NewRecordBatch (and the deprecated NewRecord) silently
produced records whose field was declared Nullable: false but whose column
actually contained nulls. The Rust (RecordBatch::try_new) and C++
implementations reject this; arrow-go accepted it, yielding spec-invalid
records. Fixes #372.

What changes are included in this PR?

  • RecordBuilder: NewRecordBatch now rejects a record whose top-level
    field is non-nullable but whose column contains nulls (it panics, consistent
    with its existing whole-builder length-mismatch guard). A new
    RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error) returns the
    violation as an error instead of panicking.
  • The check is scoped to top-level record columns only. A non-nullable child
    of a nullable struct is unaffected (its nulls are legitimate where the parent
    slot is null), so existing nested data keeps working. array.NewRecordBatch
    and the lower-level constructors are unchanged.
  • Readers: the CSV, JSON, and Avro readers build through
    NewRecordBatchChecked and surface a typed read error instead of building an
    invalid record or panicking.

Are these changes tested?

Yes. A new TestRecordBuilderNonNullableNulls covers rejection (error + panic),
nullable fields with nulls allowed, non-nullable without nulls allowed, a
nullable struct with a non-nullable child allowed, and the empty-builder case.
Pre-existing fixtures that declared non-nullable fields while appending nulls
were corrected to mark those fields nullable. The full arrow/... and
parquet/... suites pass.

Are there any user-facing changes?

Yes:

  • New RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error).
  • RecordBuilder.NewRecordBatch() / NewRecord() now reject (panic) a top-level
    non-nullable column that contains nulls; previously such records were built
    silently. This is a behavior change for code that relied on the prior lax
    behavior — the fix is to declare the field Nullable: true.
  • The CSV, JSON, and Avro readers now return an error when input would place a
    null into a non-nullable top-level column.

Potential follow-ups (intentionally out of scope here): recursive validation
of nested non-nullable fields, and logical-null detection for non-nullable
run-end-encoded columns.

Closes #372

…ecords

RecordBuilder.NewRecordBatch (and the deprecated NewRecord) now reject a
record whose top-level field is declared non-nullable but whose column
contains nulls, matching the Rust and C++ implementations. The new
RecordBuilder.NewRecordBatchChecked returns the violation as an error
instead of panicking.

The check is scoped to top-level record columns only: a non-nullable child
of a nullable struct is unaffected, so existing nested data keeps working.
The CSV, JSON, and Avro readers route through NewRecordBatchChecked and
surface a read error rather than building an invalid record.

Closes apache#372
@zeroshade zeroshade requested a review from lidavidm June 19, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RecordBuilder should confirm non-nullable fields do not contain null rows when building Record

1 participant