Skip to content

Arrow: Fix NPE reading a constant/null column in vectorized reader#16871

Open
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/arrow-vectorized-null-vector-npe
Open

Arrow: Fix NPE reading a constant/null column in vectorized reader#16871
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/arrow-vectorized-null-vector-npe

Conversation

@thswlsqls

@thswlsqls thswlsqls commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Closes #10275

Summary

  • When a column is added with ALTER TABLE ... ADD COLUMN after data files were written, the arrow vectorized reader produces a constant VectorHolder whose vector is null.
  • GenericArrowVectorAccessorFactory.getPlainVectorAccessor called vector.getClass() on that null vector, throwing a confusing NullPointerException.
  • Guard the null vector so the message becomes Unsupported vector: null, matching the null-safe pattern already used by the default branch in the same class (line 178).
  • Revives the stale fix from #10275 - fix NullPointerException #10284, closed by the stale bot (not rejected by design); adds the reproduction test the reviewer requested, with no Spark dependency.
  • Fully reading added constant/null columns in the vectorized path is out of scope here and tracked as follow-up on NullPointerException when using VectorizedArrowReader to read a null column #10275.

Testing done

  • Added TestArrowReader#testReadAddedColumnFailsWithClearMessage: writes one row, adds a column via updateSchema().addColumn, scans with VectorizedTableScanIterable, and asserts UnsupportedOperationException with message Unsupported vector: null.
  • Verified the test fails on main (NPE on vector.getClass()) and passes with the fix.
  • ./gradlew :iceberg-arrow:check — passed (30 tests, JDK 21).

AI Disclosure

  • Model: Claude Opus 4.8
  • Platform/Tool: Claude Code

When a column is added after data files were written, the arrow vectorized
reader produces a constant holder with a null vector. getPlainVectorAccessor
then called vector.getClass() and threw a confusing NullPointerException.
Guard the null vector so the message is "Unsupported vector: null", matching
the null-safe pattern already used by the default branch in the same class.

Revives the stale fix from apache#10284 (closed by the stale bot, not by design) and
adds the reproduction test the reviewer requested.

Generated-by: Claude Code
@github-actions github-actions Bot added the arrow label Jun 19, 2026

@singhpk234 singhpk234 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it then just throwing UnsupportException instead of NPE ? we should update the pr description accordingly
because the fix is not making the reads successful just wraps into another exception, is the workaround then to disable vectorized reads ?

@thswlsqls

Copy link
Copy Markdown
Contributor Author

You're right — this doesn't make the read succeed; it replaces the confusing NPE (getPlainVectorAccessor calling getClass() on a null vector) with a clear UnsupportedOperationException: Unsupported vector: null. Fully reading an added constant/null column in the vectorized path is still out of scope here and tracked on #10275; the workaround today is to disable vectorized reads. I'll update the description so it reads as "clearer failure, not a successful read."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NullPointerException when using VectorizedArrowReader to read a null column

2 participants