Commit 48418ca
daemon: add configurable max-age to recycle RPC connections
Long-lived daemon RPC connections stay pinned to a single backend for
their whole lifetime. When electrs connects through a load balancer such
as a Kubernetes ClusterSetIP (`*.clusterset.local`), a connection
established before a node rotation keeps routing to the original backend
via the existing TCP/conntrack flow, even after healthier/closer
backends become available. The connection is only re-established on
error, so a still-working-but-stale endpoint is never rebalanced.
Add a `--daemon-rpc-conn-max-age` option (seconds). When a connection
exceeds the configured age it is proactively recycled before the next
request, re-establishing the TCP connection so the load balancer can
re-select a backend. Defaults to 0 = unlimited (never recycle), so
behavior is unchanged unless explicitly enabled. The age check is also
applied to the per-thread connections used for parallel RPC requests.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
daemon: make max-age connection recycling best-effort
Proactive recycling previously called the infinite-retry reconnect path
while holding the daemon connection mutex, before sending the request.
A transient "new connections fail" event at the load balancer could
therefore block all requests on that connection instead of continuing to
use the existing, still-healthy socket -- turning an LB hiccup into an
electrs outage when --daemon-rpc-conn-max-age is enabled.
Split tcp_connect() into a single-attempt tcp_connect_once() (primary
then fallback, no retry/backoff) and keep the looping tcp_connect() for
startup and post-failure reconnects, where there is no usable socket to
fall back to. Max-age recycling now uses try_reconnect_once(): on
success the connection is swapped, on failure we log and keep the
existing connection, retrying recycling on a later request. Real
send/recv failures still go through the existing infinite reconnect.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
daemon: address Copilot review nits
- config: parse --daemon-rpc-conn-max-age via value_t_or_exit! for
consistent clap error handling instead of a manual parse + panic!.
- daemon: store the actually-connected address (primary or fallback) on
Connection and log it when recycling, so diagnostics aren't misleading
when connected to the fallback daemon.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
daemon: rate-limit failed recycles, add metric + tests
Address review feedback on proactive max-age recycling:
- Blocker: a failed recycle attempt kept the existing connection (good)
but did not update any timestamp, so is_expired() stayed true and every
subsequent request re-attempted the recycle first -- each failed
attempt blocking up to DAEMON_CONNECTION_TIMEOUT under the connection
mutex. During a sustained "new connections fail" event this turned
every fast RPC into a request paying a full connect timeout. Now a
failed attempt records last_recycle_attempt and a cooldown
(DAEMON_CONN_RECYCLE_COOLDOWN, default 30s) gates retries, so the old
socket keeps serving requests at full speed between attempts.
- Extract the recycle decision into a pure `recycle_due()` helper and
cover it with unit tests (max-age boundary, None, and cooldown).
- Add a daemon_rpc_conn_recycled{result="ok|failed"} counter so recycle
behavior is observable in prod.
- tcp_connect_once no longer warns per-attempt; it returns one
descriptive error that callers log, avoiding double log lines on the
recycle path. The startup/error loop logs that error + backoff.
- Document in --daemon-rpc-conn-max-age help that the reconnect is inline
on the request path, so the value should be generous (minutes).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>1 parent d7c2d33 commit 48418ca
5 files changed
Lines changed: 246 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
77 | 78 | | |
78 | 79 | | |
79 | 80 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
177 | 179 | | |
178 | 180 | | |
179 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
180 | 189 | | |
181 | 190 | | |
182 | 191 | | |
| |||
425 | 434 | | |
426 | 435 | | |
427 | 436 | | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
428 | 443 | | |
429 | 444 | | |
430 | 445 | | |
| |||
494 | 509 | | |
495 | 510 | | |
496 | 511 | | |
| 512 | + | |
497 | 513 | | |
498 | 514 | | |
499 | 515 | | |
| |||
0 commit comments