chore: skip zero-row batches in ArrowStreamReaderCursor#185
Merged
Conversation
3 tasks
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #185 +/- ##
=========================================
Coverage 82.36% 82.37%
- Complexity 1866 1867 +1
=========================================
Files 125 125
Lines 5008 5009 +1
Branches 536 537 +1
=========================================
+ Hits 4125 4126 +1
Misses 641 641
Partials 242 242
🚀 New features to boost your workflow:
|
21439c0 to
4787487
Compare
The cursor's next() was treating "loadNextBatch returned true" as
"I have a row." That's wrong: a zero-row batch is valid Arrow IPC.
Hyper sends one as the first chunk on some queries, and the async
chunked-query path uses empty batches as keep-alives. So the cursor
reports a phantom row that isn't there.
The fix is a new loadNextNonEmptyBatch that calls loadNextBatch in a
loop and only returns once it lands on a batch with at least one row.
JDBC's ResultSet.next() has no concept of a batch, so swallowing the
empty ones has to happen in the cursor. Pushing it out to callers
would mean every JDBC user has to know about Arrow's batch
boundaries.
Tests:
- skipsZeroRowBatchAndYieldsSubsequentNonEmptyRows: real Arrow IPC
stream of {0-row, 1-row}, next() reports the one real row, then
false.
- zeroRowOnlyBatchYieldsNoRows: real Arrow IPC stream of {0-row},
next() returns false.
- firstNextReturnsTrueWhenInitialBatchHasRows /
firstNextReturnsFalseWhenStreamHasNoBatches replace the old
parameterised forwardsLoadNextBatch test, which would loop forever
under the new control flow.
4787487 to
65ac88b
Compare
KaviarasuSakthivadivel
approved these changes
May 11, 2026
j10t
approved these changes
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
ArrowStreamReaderCursorwraps aArrowStreamReaderinstance and exposes it as a Cursor for the JDBC ResultSet. The cursor'snext()was treating "loadNextBatchreturned true" as "I have a row.". That's not generally true for Arrow IPC streams: a zero-row batch is valid Arrow IPC. The ResultSet would then have reported that the result is finished even though further batches might have resulted in rows - a clear correctness bug.While the current lower protocol layers don't produce such zero row ArrowBatches (they only potentially produce a schema message without arrow batch message), we want to gracefully handle that to avoid overly strict dependance on behavior of a different layer.
The fix is a new
loadNextNonEmptyBatchthat callsloadNextBatchin a loop and only returns once it lands on a batch with at least one row.