Skip to content

chore: skip zero-row batches in ArrowStreamReaderCursor#185

Merged
j10t merged 1 commit into
mainfrom
moritz/cursor-skip-zero-row-batches
May 11, 2026
Merged

chore: skip zero-row batches in ArrowStreamReaderCursor#185
j10t merged 1 commit into
mainfrom
moritz/cursor-skip-zero-row-batches

Conversation

@mkaufmann
Copy link
Copy Markdown
Member

@mkaufmann mkaufmann commented May 11, 2026

The ArrowStreamReaderCursor wraps a ArrowStreamReader instance and exposes it as a Cursor for the JDBC ResultSet. The cursor's next() was treating "loadNextBatch returned true" as "I have a row.". That's not generally true for Arrow IPC streams: a zero-row batch is valid Arrow IPC. The ResultSet would then have reported that the result is finished even though further batches might have resulted in rows - a clear correctness bug.

While the current lower protocol layers don't produce such zero row ArrowBatches (they only potentially produce a schema message without arrow batch message), we want to gracefully handle that to avoid overly strict dependance on behavior of a different layer.

The fix is a new loadNextNonEmptyBatch that calls loadNextBatch in a loop and only returns once it lands on a batch with at least one row.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.37%. Comparing base (092ac4d) to head (65ac88b).

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #185   +/-   ##
=========================================
  Coverage     82.36%   82.37%           
- Complexity     1866     1867    +1     
=========================================
  Files           125      125           
  Lines          5008     5009    +1     
  Branches        536      537    +1     
=========================================
+ Hits           4125     4126    +1     
  Misses          641      641           
  Partials        242      242           
Components Coverage Δ
JDBC Core 83.15% <100.00%> (+<0.01%) ⬆️
JDBC Main 40.69% <ø> (ø)
JDBC HTTP 90.30% <ø> (ø)
JDBC Utilities 65.25% <ø> (ø)
Spark Datasource ∅ <ø> (∅)
Files with missing lines Coverage Δ
...e/datacloud/jdbc/core/ArrowStreamReaderCursor.java 90.62% <100.00%> (+0.30%) ⬆️

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mkaufmann mkaufmann force-pushed the moritz/cursor-skip-zero-row-batches branch 2 times, most recently from 21439c0 to 4787487 Compare May 11, 2026 18:18
@mkaufmann mkaufmann changed the title fix: skip zero-row batches in ArrowStreamReaderCursor chore: skip zero-row batches in ArrowStreamReaderCursor May 11, 2026
The cursor's next() was treating "loadNextBatch returned true" as
"I have a row." That's wrong: a zero-row batch is valid Arrow IPC.
Hyper sends one as the first chunk on some queries, and the async
chunked-query path uses empty batches as keep-alives. So the cursor
reports a phantom row that isn't there.

The fix is a new loadNextNonEmptyBatch that calls loadNextBatch in a
loop and only returns once it lands on a batch with at least one row.
JDBC's ResultSet.next() has no concept of a batch, so swallowing the
empty ones has to happen in the cursor. Pushing it out to callers
would mean every JDBC user has to know about Arrow's batch
boundaries.

Tests:
- skipsZeroRowBatchAndYieldsSubsequentNonEmptyRows: real Arrow IPC
  stream of {0-row, 1-row}, next() reports the one real row, then
  false.
- zeroRowOnlyBatchYieldsNoRows: real Arrow IPC stream of {0-row},
  next() returns false.
- firstNextReturnsTrueWhenInitialBatchHasRows /
  firstNextReturnsFalseWhenStreamHasNoBatches replace the old
  parameterised forwardsLoadNextBatch test, which would loop forever
  under the new control flow.
@mkaufmann mkaufmann force-pushed the moritz/cursor-skip-zero-row-batches branch from 4787487 to 65ac88b Compare May 11, 2026 19:03
@j10t j10t merged commit 1bda7d4 into main May 11, 2026
13 checks passed
@j10t j10t deleted the moritz/cursor-skip-zero-row-batches branch May 11, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants