fix: improve driver-side timeout logging with pool/channel diagnostics (4.x)#883
Merged
dkropachev merged 1 commit intoMay 13, 2026
Conversation
38daf16 to
c2d9d18
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves the debuggability of driver-side request timeouts by enriching DriverTimeoutException with per-node pool/channel diagnostics captured at the moment the timeout fires, and by having request handlers provide those diagnostics when constructing the exception.
Changes:
- Add
DriverTimeoutException.NodeDiagnostics,UNAVAILABLEsentinel, and a new constructor that generates an enriched timeout message and exposes diagnostics programmatically. - Update CQL, Graph, and Continuous Paging request handlers to collect per-node diagnostics on timeout and pass them into the new exception constructor.
- Relax a couple of Simulacron integration tests to accept the enriched timeout message format.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| integration-tests/src/test/java/com/datastax/oss/driver/core/cql/SimpleStatementSimulacronIT.java | Adjust assertion to tolerate enriched timeout messages. |
| integration-tests/src/test/java/com/datastax/oss/driver/core/cql/BoundStatementSimulacronIT.java | Adjust assertions to tolerate enriched timeout messages. |
| core/src/main/java/com/datastax/oss/driver/internal/core/cql/CqlRequestHandler.java | Capture node/channel/pool diagnostics at timeout time and pass to DriverTimeoutException. |
| core/src/main/java/com/datastax/oss/driver/internal/core/cql/CqlPrepareHandler.java | Capture node/channel/pool diagnostics for prepare timeouts and pass to DriverTimeoutException. |
| core/src/main/java/com/datastax/oss/driver/api/core/DriverTimeoutException.java | Introduce NodeDiagnostics API, new constructor, diagnostics accessor, and message building. |
| core/src/main/java/com/datastax/dse/driver/internal/core/graph/GraphRequestHandler.java | Capture diagnostics for Graph request timeouts and pass to DriverTimeoutException. |
| core/src/main/java/com/datastax/dse/driver/internal/core/cql/continuous/ContinuousRequestHandlerBase.java | Capture diagnostics for global/page continuous paging timeouts and pass to DriverTimeoutException. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c2d9d18 to
526867d
Compare
dkropachev
reviewed
May 11, 2026
…tics (DRIVER-540) Add NodeDiagnostics public inner class to DriverTimeoutException with fields for in-flight counts and pool capacity, and generate a diagnostic suffix in the exception message at timeout time. Refactor all four request-handler timeout paths (CqlRequestHandler, CqlPrepareHandler, GraphRequestHandler, ContinuousRequestHandlerBase) to build List<NodeDiagnostics> instead of a raw message string. Update IT assertions that matched exact message strings to use hasMessageStartingWith() to accommodate the new suffix.
526867d to
b749f92
Compare
dkropachev
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: DRIVER-540
4.x port of #867 (which targeted
scylla-3.x).Supersedes #874 (closed — AI had committed directly onto
scylla-4.xinstead of a feature branch).Problem
When a
DriverTimeoutExceptionfires, the existing log output gives no information about why it happened. There are two distinct failure modes that look identical:reader_concurrency_semaphorestall). The driver gave up, but the server may still be processing them, orphaning stream IDs.Without knowing which nodes had in-flight requests and what the pool state looked like at timeout time, diagnosing which scenario occurred requires guesswork.
Changes
DriverTimeoutException(public API)A new
NodeDiagnosticspublic inner class is added with fields captured at timeout time:channelInFlightpoolInFlightpoolAvailableIdspoolOrphanedIdsPool fields are set to
UNAVAILABLE(-1) when the pool was removed between dispatch and timeout.A new constructor
DriverTimeoutException(String baseMessage, List<NodeDiagnostics>)generates the full message internally (per reviewer request). The existing single-arg constructor is unchanged.getNodeDiagnostics()allows callers to inspect fields programmatically, consistent with the 3.xOperationTimedOutExceptionapproach.Message format
Diagnosing failure modes:
poolAvailableIdsnear zero → pool contention; requests queuing inside the driver before reaching the serverpoolAvailableIdsnormal + highchannelInFlight→ server is slow; requests sent but not answered within the timeoutpoolOrphanedIds→ previous timeouts consumed stream IDs that the driver is still waiting to reclaimHandler files
Each handler builds a
List<NodeDiagnostics>at the moment the timer fires and passes it to the new constructor:CqlRequestHandler.java— iteratesinFlightCallbacks(multi-node list)GraphRequestHandler.java— same patternCqlPrepareHandler.java— single-element list frominitialCallbackContinuousRequestHandlerBase.java—scheduleGlobalTimeout(multi-node list) +onPageTimeout(single-element list)The per-handler
buildTimeoutMessage()/buildGlobalTimeoutMessage()helpers are removed; message generation now lives inDriverTimeoutException.buildMessage().Architecture differences from 3.x
In 3.x, the timeout is per-node (
SpeculativeExecution.onTimeout()). In 4.x, a single timer covers the entire request, but multiple nodes can be in-flight simultaneously (speculative executions + retries).RequestHandler.SpeculativeExecution.onTimeout()CqlRequestHandler.scheduleTimeout()HostConnectionPool.pendingBorrowCountChannelPool.getAvailableIds()(inverse indicator)HostConnectionPool.totalInFlightChannelPool.getInFlight()Connection.inFlightDriverChannel.getInFlight()Notes
inFlightCallbacksis aCopyOnWriteArrayList,session.getPools()returns aConcurrentHashMap, and pool/channel stats use atomic operations.