fix(clickhouse): retry reads on transient connection errors#4159
Draft
claude[bot] wants to merge 2 commits into
Draft
fix(clickhouse): retry reads on transient connection errors#4159claude[bot] wants to merge 2 commits into
claude[bot] wants to merge 2 commits into
Conversation
ClickHouse reads (query, queryWithStats, queryFast) ran the underlying client.query once and, on any failure, returned a QueryError with no retry. Transient connection errors — most commonly a dead keep-alive socket surfacing as ECONNRESET on the first use of a pooled connection, but also EPIPE/ETIMEDOUT/"socket hang up" — are safe to retry and should not fail the request. - Add isRetryableConnectionError to classify transient connection errors by socket code or message substring, unwrapping a nested cause. Server-side ClickHouseErrors (e.g. SQL errors) are never classified as retryable. - Wrap the read client.query calls in a bounded retry (default 3 attempts, full-jitter exponential backoff). Attempt count and delays are configurable via ClickhouseConfig.connectionRetry so callers and tests can tune or disable the backoff. - Fix the keep-alive config mapping: @clickhouse/client expects snake_case idle_socket_ttl, so the camelCase idleSocketTtl was being silently dropped and the idle-socket TTL never honored. - Add unit tests covering retry-then-success, bounded give-up, non-retryable server errors, and error classification. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B5mP4ULoVXdrAWmiMQh5bG
|
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B5mP4ULoVXdrAWmiMQh5bG
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Requested by Dan Sutton · Slack thread
✅ Checklist
Testing
Added
client.retry.test.ts— 13 unit tests (no Docker required) covering:ClickHouseError) failing fast without retryisRetryableConnectionErrorclassifier (socketcode/ message matching,causeunwrapping)Changelog
Before: The ClickHouse read helpers (
query,queryWithStats,queryFast) executed the underlyingclient.queryexactly once. If the call failed with a transient connection error — most often a dead keep-alive socket surfacing asECONNRESETon first use of a pooled connection, but alsoEPIPE/ETIMEDOUT/socket hang up— the read failed immediately, even though a retry on a fresh socket would have succeeded. Separately, thekeepAlive.idleSocketTtlconfig was silently ignored because@clickhouse/clientexpects the snake_case keyidle_socket_ttl.After: Reads transparently retry on transient connection errors with a small bounded backoff (default 3 attempts, full-jitter exponential). Server-side errors (e.g. SQL errors /
ClickHouseError) are never retried and still fail fast. The idle-socket TTL is now honored.How: Added
isRetryableConnectionError(classifies by socketcode/ message, unwrapscause, excludesClickHouseError) and aqueryWithConnectionRetrywrapper around the three readclient.querycalls ininternal-packages/clickhouse/src/client/client.ts. Retry attempts/backoff are configurable via a newClickhouseConfig.connectionRetryoption. Fixed the keep-alive mapping toidle_socket_ttl(confirmed against@clickhouse/client@1.12.1types).Screenshots
N/A
💯