Introduce abstract base classes for Arrow result handling#881
Conversation
…d in databricks#634. Key changes: - Create AbstractRemoteChunkProvider and AbstractArrowResultChunk as unified base implementations for chunk management - Add state machine (ArrowResultChunkStateMachine) to handle chunk lifecycle transitions - Improve thread synchronization between main thread and IO/download threads The new async implementation (2nd PR) will provide an alternative path for handling large Arrow datasets with improved scalability, while maintaining the existing synchronous approach for backwards compatibility. Below is a class diagram that provides a clearer understanding and concise summary of the class structure:  - Fake service tests - Unit tests - Multi-DBR tests - Local testing in a highly concurrent environment Benchmarking document: https://docs.google.com/document/d/1MvKeSnrQVKFGdkuaSPXFCrbb4TDaUqFwHrIW2rxy3zI/edit?usp=sharing
| stateMachine.transition(targetStatus); | ||
| } catch (DatabricksParsingException e) { | ||
| LOGGER.warn( | ||
| "Failed to transition to state [%s] from state [%s] for chunk [%d] and statement [%s]. Stack trace: %s", |
There was a problem hiding this comment.
does it mean that in case of invalid transition, it will be no-op, and user will not see any error? Won't this cause any subsequent failure or data inconsistency?
There was a problem hiding this comment.
yes user won't see any error. i can fail it straight away but i am afraid the current code (which this PR doesn't change) might already have invalid transitions (tech debt). So to not affect any existing workloads, I am just logging this. I have already fixed as many invalid transitions as possible using existing tests but there are chances few remain (because of possible coverage gap). let me know your thoughts.
There was a problem hiding this comment.
maybe i will fail invalid transitions behind a safe-flag or a connection flag in the next PR.
There was a problem hiding this comment.
@vikrantpuppala has the same concern of us not throwing error and consuming it. is it okay if throw the error in a separate PR and separate release behind a private flag? That way easier to manager/revert?
| JdbcLoggerFactory.getLogger(AbstractArrowResultChunk.class); | ||
|
|
||
| protected static final Integer SECONDS_BUFFER_FOR_EXPIRY = 60; | ||
| protected static final long CHUNK_READY_TIMEOUT_SECONDS = 30; |
There was a problem hiding this comment.
can we make this configurable?
|
|
||
| // Initialize valid state transitions | ||
| static { | ||
| VALID_TRANSITIONS.put(PENDING, Set.of(URL_FETCHED, CHUNK_RELEASED)); |
There was a problem hiding this comment.
can't go from pending to failure or cancelled?
There was a problem hiding this comment.
Not in thrift client. When using thrift client, chunks start with status URL_FETCHED. But in SEA, PENDING to DOWNLOAD_FAILED is possible. I will add it. This also indicates a lack of test coverage for SEA client.
gopalldb
left a comment
There was a problem hiding this comment.
Make sure that this is tested through all tests
Re-created from #850
Description
This PR is the first of two and splits the changes originally proposed in #634.
Key changes:
The new async implementation (2nd PR) will provide an alternative path for handling large Arrow datasets with improved scalability, while maintaining the existing synchronous approach for backwards compatibility.
Below is a class diagram that provides a clearer understanding and concise summary of the class structure:
Testing
Additional Notes to the Reviewer
Benchmarking document: https://docs.google.com/document/d/1MvKeSnrQVKFGdkuaSPXFCrbb4TDaUqFwHrIW2rxy3zI/edit?usp=sharing