Feat/traffic keepalive#2591
Conversation
# Conflicts: # packages/client-proxy/internal/proxy/paused_sandbox_resumer_grpc.go
Add a shared route IP resolver with the local-cluster fallback needed by CI, and make API/client-proxy callers treat empty resolved routes as unavailable instead of successful resume responses. This keeps BYOC/remote empty node IPs from being treated as routable while preserving the local 127.0.0.1 path.
# Conflicts: # packages/api/internal/api/api.gen.go # packages/client-proxy/internal/proxy/proxy.go
# Conflicts: # packages/api/internal/handlers/proxy_grpc.go # packages/client-proxy/internal/proxy/paused_sandbox_resumer_grpc.go # packages/client-proxy/internal/proxy/proxy.go # packages/client-proxy/internal/proxy/proxy_test.go # packages/client-proxy/main.go
PR SummaryMedium Risk Overview Potential issues: keepalive depends on routing-catalog Reviewed by Cursor Bugbot for commit 088532b. Bugbot is set up for automated code reviews on this repo. Configure here. |
❌ 11 Tests Failed:
View the top 2 failed test(s) by shortest run time
View the full list of 13 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
Code Review
The metadataFirstValue function returns a boolean indicating if the value was found. Ignoring this boolean for providedToken and providedEnvdToken means that a missing header is treated the same as an empty header. It is more robust and explicit to check if the token was actually provided before attempting to match it, especially for security-sensitive authentication tokens.
| } | ||
|
|
||
| timeout = uint64(*lifecycle.Keepalive.Traffic.Timeout) | ||
| } |
There was a problem hiding this comment.
Keepalive timeout stored without upper-bound validation
Low Severity
buildKeepaliveConfig validates that the traffic keepalive timeout is greater than the throttle interval but never validates an upper bound. A caller could supply an arbitrarily large timeout (up to int32 max, ~68 years), which would be stored and later used verbatim in KeepAliveFor. While getMaxAllowedTTL caps the actual extension at MaxInstanceLength, the uncapped stored value is misleading and means the TrafficKeepaliveThrottleInterval error message ("must be greater than N seconds") could fire for a zero or negative timeout that should have been caught by the earlier < 0 check — specifically, no check prevents Timeout = 0 being accepted when 0 > TrafficKeepaliveThrottleInterval is false, making a zero timeout silently pass validation but produce a no-op keepalive.
Reviewed by Cursor Bugbot for commit b24dd2d. Configure here.
This reverts commit 4868193.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 088532b. Configure here.
| } | ||
|
|
||
| if updatedEndTime.IsZero() || !sbx.EndTime.Equal(updatedEndTime) { | ||
| return &sbx, nil |
There was a problem hiding this comment.
KeepAliveFor skips update when store returns stale data
Medium Severity
The condition !sbx.EndTime.Equal(updatedEndTime) is intended to detect when the store returned a sandbox state that doesn't reflect the update. However, if the store implementation applies the update but returns a slightly different time (e.g., due to serialization rounding in Redis), or if a concurrent update modifies EndTime between the closure setting updatedEndTime and the returned sbx, this condition would incorrectly skip the UpdateSandbox and addSandboxToRoutingTable calls. The sandbox timeout would be updated in the local store but NOT propagated to the orchestrator node, causing the sandbox to expire prematurely on the node side while the API thinks it's still alive.
Reviewed by Cursor Bugbot for commit 088532b. Configure here.
| if sandbox.Lifecycle.Keepalive.Traffic != nil && sandbox.Lifecycle.Keepalive.Traffic.Enabled { | ||
| keepalive.Traffic = &sandboxroutingcatalog.TrafficKeepalive{Enabled: true} | ||
| } | ||
| } |
There was a problem hiding this comment.
Keepalive config stored even when traffic disabled
Low Severity
When Lifecycle.Keepalive is non-nil but Traffic is nil or not enabled, a Keepalive{} struct with nil Traffic is still allocated and stored in the routing catalog. This creates unnecessary routing catalog entries with empty keepalive metadata for sandboxes that explicitly set enabled: false. The outer nil check on Keepalive alone is insufficient — it also needs to verify there's actually something meaningful to store. The same pattern is duplicated in metadata.go.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 088532b. Configure here.


No description provided.