Introduce abstract base classes for Arrow result handling by jayantsing-db · Pull Request #881 · databricks/databricks-jdbc

jayantsing-db · 2025-07-06T18:46:29Z

Re-created from #850

Description

This PR is the first of two and splits the changes originally proposed in #634.

Key changes:

Create AbstractRemoteChunkProvider and AbstractArrowResultChunk as unified base implementations for chunk management
Add state machine (ArrowResultChunkStateMachine) to handle chunk lifecycle transitions
Improve thread synchronization between main thread and IO/download threads

The new async implementation (2nd PR) will provide an alternative path for handling large Arrow datasets with improved scalability, while maintaining the existing synchronous approach for backwards compatibility.

Below is a class diagram that provides a clearer understanding and concise summary of the class structure:

Testing

Fake service tests
Unit tests
Multi-DBR tests
Local testing in a highly concurrent environment

Additional Notes to the Reviewer

Benchmarking document: https://docs.google.com/document/d/1MvKeSnrQVKFGdkuaSPXFCrbb4TDaUqFwHrIW2rxy3zI/edit?usp=sharing

…d in databricks#634. Key changes: - Create AbstractRemoteChunkProvider and AbstractArrowResultChunk as unified base implementations for chunk management - Add state machine (ArrowResultChunkStateMachine) to handle chunk lifecycle transitions - Improve thread synchronization between main thread and IO/download threads The new async implementation (2nd PR) will provide an alternative path for handling large Arrow datasets with improved scalability, while maintaining the existing synchronous approach for backwards compatibility. Below is a class diagram that provides a clearer understanding and concise summary of the class structure: ![class-diagram](https://github.com/user-attachments/assets/e9503d47-5895-439a-9d39-f4963da3e5df) - Fake service tests - Unit tests - Multi-DBR tests - Local testing in a highly concurrent environment Benchmarking document: https://docs.google.com/document/d/1MvKeSnrQVKFGdkuaSPXFCrbb4TDaUqFwHrIW2rxy3zI/edit?usp=sharing

gopalldb · 2025-07-09T10:52:51Z

+      stateMachine.transition(targetStatus);
+    } catch (DatabricksParsingException e) {
+      LOGGER.warn(
+          "Failed to transition to state [%s] from state [%s] for chunk [%d] and statement [%s]. Stack trace: %s",


does it mean that in case of invalid transition, it will be no-op, and user will not see any error? Won't this cause any subsequent failure or data inconsistency?

yes user won't see any error. i can fail it straight away but i am afraid the current code (which this PR doesn't change) might already have invalid transitions (tech debt). So to not affect any existing workloads, I am just logging this. I have already fixed as many invalid transitions as possible using existing tests but there are chances few remain (because of possible coverage gap). let me know your thoughts.

maybe i will fail invalid transitions behind a safe-flag or a connection flag in the next PR.

@vikrantpuppala has the same concern of us not throwing error and consuming it. is it okay if throw the error in a separate PR and separate release behind a private flag? That way easier to manager/revert?

gopalldb · 2025-07-09T10:53:31Z

+      JdbcLoggerFactory.getLogger(AbstractArrowResultChunk.class);
+
+  protected static final Integer SECONDS_BUFFER_FOR_EXPIRY = 60;
+  protected static final long CHUNK_READY_TIMEOUT_SECONDS = 30;


can we make this configurable?

gopalldb · 2025-07-09T11:01:57Z

+
+  // Initialize valid state transitions
+  static {
+    VALID_TRANSITIONS.put(PENDING, Set.of(URL_FETCHED, CHUNK_RELEASED));


can't go from pending to failure or cancelled?

Not in thrift client. When using thrift client, chunks start with status URL_FETCHED. But in SEA, PENDING to DOWNLOAD_FAILED is possible. I will add it. This also indicates a lack of test coverage for SEA client.

gopalldb

Make sure that this is tested through all tests

vikrantpuppala reviewed Jul 7, 2025

View reviewed changes

Comment thread src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowResultChunkStateMachine.java Outdated

jayantsing-db added 2 commits July 7, 2025 16:21

Address review comments

a70fc3c

move transitions to enum

1fbe684

jayantsing-db requested review from gopalldb and samikshya-db July 7, 2025 11:17

gopalldb reviewed Jul 9, 2025

View reviewed changes

gopalldb approved these changes Jul 9, 2025

View reviewed changes

jayantsing-db added 3 commits July 11, 2025 13:24

Make ready timeout config

dbf09fa

Merge remote-tracking branch 'databricks/main' into re-http-client

e6bfed8

Add a transition

7c0c95d

jayantsing-db merged commit 5798a0d into databricks:main Jul 11, 2025
10 of 12 checks passed

jayantsing-db deleted the jayantsing-db/re-http-client branch July 11, 2025 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce abstract base classes for Arrow result handling#881

Introduce abstract base classes for Arrow result handling#881
jayantsing-db merged 6 commits into
databricks:mainfrom
jayantsing-db:jayantsing-db/re-http-client

jayantsing-db commented Jul 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

gopalldb Jul 9, 2025

Uh oh!

jayantsing-db Jul 9, 2025

Uh oh!

jayantsing-db Jul 9, 2025

Uh oh!

jayantsing-db Jul 9, 2025 •

edited

Loading

Uh oh!

gopalldb Jul 9, 2025

Uh oh!

jayantsing-db Jul 9, 2025

Uh oh!

gopalldb Jul 9, 2025 •

edited

Loading

Uh oh!

jayantsing-db Jul 11, 2025

Uh oh!

gopalldb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jayantsing-db commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Additional Notes to the Reviewer

Uh oh!

Uh oh!

gopalldb Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gopalldb Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

gopalldb Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gopalldb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jayantsing-db commented Jul 6, 2025 •

edited

Loading

jayantsing-db Jul 9, 2025 •

edited

Loading

gopalldb Jul 9, 2025 •

edited

Loading