fix(arrow/array): reject nulls in non-nullable fields when building records#858
Open
zeroshade wants to merge 1 commit into
Open
fix(arrow/array): reject nulls in non-nullable fields when building records#858zeroshade wants to merge 1 commit into
zeroshade wants to merge 1 commit into
Conversation
…ecords RecordBuilder.NewRecordBatch (and the deprecated NewRecord) now reject a record whose top-level field is declared non-nullable but whose column contains nulls, matching the Rust and C++ implementations. The new RecordBuilder.NewRecordBatchChecked returns the violation as an error instead of panicking. The check is scoped to top-level record columns only: a non-nullable child of a nullable struct is unaffected, so existing nested data keeps working. The CSV, JSON, and Avro readers route through NewRecordBatchChecked and surface a read error rather than building an invalid record. Closes apache#372
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
array.RecordBuilder.NewRecordBatch(and the deprecatedNewRecord) silentlyproduced records whose field was declared
Nullable: falsebut whose columnactually contained nulls. The Rust (
RecordBatch::try_new) and C++implementations reject this; arrow-go accepted it, yielding spec-invalid
records. Fixes #372.
What changes are included in this PR?
RecordBuilder:NewRecordBatchnow rejects a record whose top-levelfield is non-nullable but whose column contains nulls (it panics, consistent
with its existing whole-builder length-mismatch guard). A new
RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error)returns theviolation as an error instead of panicking.
of a nullable struct is unaffected (its nulls are legitimate where the parent
slot is null), so existing nested data keeps working.
array.NewRecordBatchand the lower-level constructors are unchanged.
NewRecordBatchCheckedand surface a typed read error instead of building aninvalid record or panicking.
Are these changes tested?
Yes. A new
TestRecordBuilderNonNullableNullscovers rejection (error + panic),nullable fields with nulls allowed, non-nullable without nulls allowed, a
nullable struct with a non-nullable child allowed, and the empty-builder case.
Pre-existing fixtures that declared non-nullable fields while appending nulls
were corrected to mark those fields nullable. The full
arrow/...andparquet/...suites pass.Are there any user-facing changes?
Yes:
RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error).RecordBuilder.NewRecordBatch()/NewRecord()now reject (panic) a top-levelnon-nullable column that contains nulls; previously such records were built
silently. This is a behavior change for code that relied on the prior lax
behavior — the fix is to declare the field
Nullable: true.null into a non-nullable top-level column.
Potential follow-ups (intentionally out of scope here): recursive validation
of nested non-nullable fields, and logical-null detection for non-nullable
run-end-encoded columns.
Closes #372