Add bounded SEA API support for CloudFetch (UseBoundedSeaApi)#1468
Open
gopalldb wants to merge 7 commits into
Open
Add bounded SEA API support for CloudFetch (UseBoundedSeaApi)#1468gopalldb wants to merge 7 commits into
gopalldb wants to merge 7 commits into
Conversation
e8e6bde to
e9dcd7c
Compare
d6b9994 to
1fa1b2b
Compare
6 tasks
Part 1 of bounded SEA API compliance for CloudFetch:
1. New connection property UseBoundedSeaApi (default 0/off). When enabled:
- Sends row_offset query parameter on GetResultData requests
- Forces StreamingChunkProvider (which uses next_chunk_index, not
total_chunk_count) even when streaming is explicitly disabled
2. StreamingChunkProvider already uses next_chunk_index for continuation
and end-of-stream detection — no changes needed to its core logic.
3. Legacy RemoteChunkProvider (uses total_chunk_count) is bypassed when
bounded SEA is enabled.
row_offset is derived from the previous link's row_offset + row_count
and sent as a query parameter on /sql/statements/{id}/result/chunks/{idx}.
This is required for future >100GB results and cluster-side fetch.
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
During coalesced link refresh, the server may return links for chunks not yet in the provider's map (newly-discovered chunks beyond highestKnownChunkIndex). Previously these were silently skipped. Now: create new chunks from refresh response links, update highestKnownChunkIndex, and set endOfStreamReached from the response's hasMore flag. Follows the per-chunk state-machine reconciliation from the bounded SEA API spec. Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Fixes 3 gaps found by comparing with the legacy ChunkDownloadTask: 1. Outer catch(Throwable) + exception chaining in finally: uncaught exceptions were lost — the finally block created a generic exception without the original cause. Now captures uncaughtException and chains it, matching ChunkDownloadTask's pattern. 2. Thread context propagation: download threads had no connection context or statementId for telemetry/logging. Now captures caller's context via DatabricksThreadContextHolder and clears in finally. 3. Download timing: added task-level timing log (totalMs, retries) matching ChunkDownloadTask's diagnostics. Also includes the RuntimeException catch (parity with PR databricks#1302). Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
P0-1: Remove redundant chunk.setStatus(DOWNLOAD_FAILED) in inner catch — defer entirely to finally block. Fixes StreamingChunkDownloadTaskTest. P0-2: Add NEXT_CHANGELOG.md entry under ### Added for UseBoundedSeaApi. P1-1: Call triggerDownloads() after reconciliation creates new chunks from refresh response — prevents newly-discovered chunks sitting PENDING. P1-2/P1-3: Un-gated changes (new chunk creation, EOS from refresh, triggerDownloads) are intentional parity fixes for all EnableStreamingChunkProvider=1 users. EnableStreamingChunkProvider defaults to off, so default users are unaffected. P1-4: Revert RuntimeException from inner catch — DatabricksError is caught by outer catch(Throwable) and fails immediately (no retry), matching ChunkDownloadTask behavior exactly. NPE/ISE won't be retried. P2-1: Always send row_offset (even 0 for chunk 0) when bounded SEA enabled — explicit is safer than relying on server default. P2-3: Update nextLinkFetchIndex after reconciliation to avoid prefetch thread re-fetching chunks already discovered via refresh. P2-5: Add "Requires server support" to connection property help text. Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
…on updates P1-1: Use putIfAbsent in createChunkFromLink to prevent double row-count and chunk replacement when called concurrently from prefetch and download threads. Without this, a race could leave a CompletableFuture that is never completed, causing consumer hangs. P1-2: Bundle nextLinkFetchIndex + nextRowOffsetToFetch into an immutable FetchPosition holder updated via volatile reference. This ensures the prefetch thread always reads a consistent (index, rowOffset) pair, which matters for bounded SEA API where row_offset is used by the server. P2-1: Bump reconciliation failure log from DEBUG to WARN for production visibility. Signed-off-by: Gopal Lal <gopal.lal@databricks.com> Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
5801928 to
7e0b2c1
Compare
…h path
Per the bounded SEA API contract, drivers must not depend on
manifest.{chunks, total_chunk_count, total_row_count}. Pass null for
totalChunkCount when converting initial ExternalLinks for
StreamingChunkProvider — the provider derives end-of-stream from
next_chunk_index on ExternalLink instead.
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
…ast/isAfterLast When boundedSeaApiEnabled=true, isLast() and isAfterLast() now use hasNext() instead of resultSetMetaData.getTotalRows(). The bounded SEA API contract does not guarantee manifest.total_row_count is populated; the chunk providers derive end-of-stream from next_chunk_index instead. Gated behind isBoundedSeaApiEnabled() + ArrowStreamResult instanceof check so existing Thrift and non-bounded SEA behavior is unchanged. Signed-off-by: Gopal Lal <gopal.lal@databricks.com> Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
UseBoundedSeaApiconnection property (default0) to gate bounded SEA API complianceStreamingChunkProvider(usesnext_chunk_indexchaining instead of deprecatedtotal_chunk_count), sendrow_offseton GetResultData, add batched link refresh reconciliationStreamingChunkDownloadTaskto parity withChunkDownloadTask(outercatch(Throwable), thread context propagation, exception chaining)Key design decisions
UseBoundedSeaApiis off by default — no behavioral change without opt-inStreamingChunkProvideris already bounded-SEA-compliant (usesnext_chunk_index), only needed to addrow_offsetand force its selectionTest plan
UseBoundedSeaApi=0(default) — no behavioral change, existing CloudFetch and inline paths unchangedUseBoundedSeaApi=1with CloudFetch enabled — usesStreamingChunkProviderwithrow_offsetNO_CHANGELOG=true
This pull request was AI-assisted by Isaac.