apollo_infra: add tcp keepalive to remote server accepted sockets#13456
Conversation
|
Artifacts upload workflows: |
8336741 to
a5e9d19
Compare
fae3094 to
6745e24
Compare
matanl-starkware
left a comment
There was a problem hiding this comment.
@matanl-starkware reviewed 19 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Itay-Tsabary-Starkware).
crates/apollo_infra/src/component_server/remote_component_server.rs line 88 at r1 (raw file):
/// TCP keepalive: milliseconds an accepted socket may be idle before the OS sends probes. /// `None` leaves TCP keepalive unset (OS defaults apply). pub tcp_keepalive_idle_time_ms: Option<u64>,
Why do we need both tcp_keepalive and hyper_keepalive?
Code quote:
pub tcp_keepalive_idle_time_ms: Option<u64>,a5e9d19 to
0c8d5cc
Compare
6745e24 to
b67ffbc
Compare
b67ffbc to
98fdf82
Compare
0c8d5cc to
ec24890
Compare
98fdf82 to
e6749d5
Compare
Itay-Tsabary-Starkware
left a comment
There was a problem hiding this comment.
@Itay-Tsabary-Starkware made 1 comment.
Reviewable status: 0 of 23 files reviewed, 1 unresolved discussion (waiting on matanl-starkware).
crates/apollo_infra/src/component_server/remote_component_server.rs line 88 at r1 (raw file):
Previously, matanl-starkware (Matan Lior) wrote…
Why do we need both tcp_keepalive and hyper_keepalive?
removed, per our discussion
317605e to
bf6ee90
Compare
e6749d5 to
9347d9d
Compare
9347d9d to
0842fde
Compare
matanl-starkware
left a comment
There was a problem hiding this comment.
@matanl-starkware reviewed 7 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: 7 of 23 files reviewed, 1 unresolved discussion (waiting on Itay-Tsabary-Starkware).
crates/apollo_infra/src/component_client/remote_component_client.rs line 121 at r4 (raw file):
if tcp_keepalive >= http_keepalive { Ok(()) } else {
This is guaranteed by the factor check (>1.0)
Code quote:
if tcp_keepalive >= http_keepalive {
Ok(())
} else {3a89d58 to
527a8f7
Compare
56bc1b5 to
efbc706
Compare
527a8f7 to
d114823
Compare
efbc706 to
7ac0021
Compare
d114823 to
8daee35
Compare
matanl-starkware
left a comment
There was a problem hiding this comment.
@matanl-starkware reviewed 5 files and all commit messages, made 2 comments, and resolved 1 discussion.
Reviewable status: 7 of 25 files reviewed, all discussions resolved (waiting on Itay-Tsabary-Starkware).
crates/apollo_infra/src/component_client/remote_component_client_test.rs line 22 at r6 (raw file):
// keepalive_timeout_ms = 1100 ms: tcp_raw = 1650 ms, tcp_whole_secs = 1 s = 1000 ms, which is // less than http_keepalive = 1100 ms.
Just make sure this test's duration is under 300 ms.
Code quote:
11007ac0021 to
175a958
Compare
8daee35 to
2874529
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2874529. Configure here.
2874529 to
c87c629
Compare
175a958 to
c1a5069
Compare
Mirror the client-side SO_KEEPALIVE behaviour on the server: set TCP keepalive probes on each accepted socket via a configurable idle time (tcp_keepalive_idle_time_ms). This lets the OS detect dead clients that disappear without sending FIN/RST. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c87c629 to
7e7db02
Compare
PR SummaryMedium Risk Overview Reuses the client’s Reviewed by Cursor Bugbot for commit 7e7db02. Bugbot is set up for automated code reviews on this repo. Configure here. |
Itay-Tsabary-Starkware
left a comment
There was a problem hiding this comment.
@Itay-Tsabary-Starkware reviewed 26 files and all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on Itay-Tsabary-Starkware).


Mirror the client-side SO_KEEPALIVE behaviour on the server: set TCP
keepalive probes on each accepted socket via a configurable idle time
(tcp_keepalive_idle_time_ms). This lets the OS detect dead clients that
disappear without sending FIN/RST.
Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com
Note
Medium Risk
Touches remote server connection setup and keepalive behavior; misconfiguration could lead to unexpected connection drops or resource retention, though changes are scoped and covered by tests.
Overview
Adds TCP-level keepalive configuration on
RemoteComponentServeraccepted sockets usingsocket2(idle time derived fromkeepalive_timeout_ms * TCP_KEEPALIVE_FACTOR, plus probe interval and retry count) to help the OS detect dead peers.Reuses the existing
validate_keepalive_timeout_msvalidation forRemoteServerConfig, adjusts a keepalive-related client/server test naming, and expands server eviction tests with a GOAWAY-frame detector to assert HTTP/2 PING eviction still works alongside TCP keepalive.Reviewed by Cursor Bugbot for commit c87c629. Bugbot is set up for automated code reviews on this repo. Configure here.