You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[serve] Coalesce HAProxy controller broadcasts into a single apply
Under autoscaling churn the Serve controller fires target-group /
fallback-target broadcasts tens of ms apart; without coalescing, each
one runs its own runtime-API command burst on the HAProxy admin
socket. The CLI mux serializes everything (runtime API, stats, the
`-x` socket transfer used by reload) so command-level timeouts and
`-x` failures cluster together once admin-socket pressure exceeds
HAProxy's processing rate.
Coalesce broadcasts in `HAProxyManager` into a single sleeping flush
task. Updates arriving during the sleep are absorbed implicitly via
in-place writes to `self._target_groups` / fallback fields; updates
during the apply re-arm a `_coalesce_pending` flag so the trailing
broadcast in a flurry isn't dropped.
Window is configurable via `RAY_SERVE_HAPROXY_BROADCAST_COALESCE_S`
(default 0.5s, set 0 to disable). Trade-off: each broadcast incurs up
to one window of additional latency before reaching HAProxy, which is
small relative to typical replica-start time and well below the rate
at which admin-socket saturation begins.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments