You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Address review feedback on proactive max-age recycling:
- Blocker: a failed recycle attempt kept the existing connection (good)
but did not update any timestamp, so is_expired() stayed true and every
subsequent request re-attempted the recycle first -- each failed
attempt blocking up to DAEMON_CONNECTION_TIMEOUT under the connection
mutex. During a sustained "new connections fail" event this turned
every fast RPC into a request paying a full connect timeout. Now a
failed attempt records last_recycle_attempt and a cooldown
(DAEMON_CONN_RECYCLE_COOLDOWN, default 30s) gates retries, so the old
socket keeps serving requests at full speed between attempts.
- Extract the recycle decision into a pure `recycle_due()` helper and
cover it with unit tests (max-age boundary, None, and cooldown).
- Add a daemon_rpc_conn_recycled{result="ok|failed"} counter so recycle
behavior is observable in prod.
- tcp_connect_once no longer warns per-attempt; it returns one
descriptive error that callers log, avoiding double log lines on the
recycle path. The startup/error loop logs that error + backoff.
- Document in --daemon-rpc-conn-max-age help that the reconnect is inline
on the request path, so the value should be generous (minutes).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: src/config.rs
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -182,7 +182,7 @@ impl Config {
182
182
.arg(
183
183
Arg::with_name("daemon_rpc_conn_max_age")
184
184
.long("daemon-rpc-conn-max-age")
185
-
.help("Max age (in seconds) of a daemon RPC TCP connection before it is proactively recycled. Recycling re-establishes the connection, letting a load balancer (e.g. a Kubernetes ClusterSetIP) re-select a backend after node rotations. 0 = unlimited / never recycle (default)")
185
+
.help("Max age (in seconds) of a daemon RPC TCP connection before it is proactively recycled. Recycling re-establishes the connection, letting a load balancer (e.g. a Kubernetes ClusterSetIP) re-select a backend after node rotations. The reconnect happens inline on the next request, so prefer a generous value (minutes, not seconds) to avoid periodic latency spikes. 0 = unlimited / never recycle (default)")
0 commit comments