Skip to content

[4/N] [HAProxy stability] - coalesce controller broadcasts into a single reload#63623

Open
harshit-anyscale wants to merge 4 commits into
ray-project:masterfrom
harshit-anyscale:serve-haproxy-coalesce-broadcasts
Open

[4/N] [HAProxy stability] - coalesce controller broadcasts into a single reload#63623
harshit-anyscale wants to merge 4 commits into
ray-project:masterfrom
harshit-anyscale:serve-haproxy-coalesce-broadcasts

Conversation

@harshit-anyscale
Copy link
Copy Markdown
Contributor

@harshit-anyscale harshit-anyscale commented May 25, 2026

Summary

Carved out of #63308 so the coalescing change can land independently of the stderr-redirect (#63621) and redispatch (#63622) changes in that PR.

Under autoscaling churn the controller's target_groups broadcast fires frequently — its replica set changes on essentially every reconcile that adds or removes a replica — and each broadcast triggers its own config regeneration and graceful reload via -sf. Both target_groups and fallback_targets are emitted from the same control-loop step, so when both change in one ~100 ms tick they reach the proxy microseconds apart and (pre-coalescing) cause two back-to-back reloads.

Both broadcasts are already change-gated on the controller (broadcast_*_if_changed), so this PR collapses genuine consecutive / same-tick changes into one reload — it is not suppressing redundant no-op broadcasts. (fallback_targets in particular is near-static in steady state; it only changes when the fallback proxy's health flips or its actor is replaced.)

This PR collapses adjacent broadcasts into a single reload:

  • HAProxyManager keeps two pieces of state: _update_pending: bool and _coalesce_task: Optional[asyncio.Task].
  • Each incoming broadcast sets _update_pending = True and arms (or re-arms) a single sleeping coalesce task.
  • The task sleeps for RAY_SERVE_HAPROXY_BROADCAST_COALESCE_S (default 100 ms; set to 0 to disable) and then runs one _update_haproxy_backends() call against whatever the latest state is.
  • Broadcasts arriving inside the window are absorbed — they flip the dirty flag, but the task is already armed, so no second reload is scheduled.

Failure handling. If the apply fails, the task re-arms itself and retries on the next tick. After 3 consecutive failures it stops re-arming and waits for the next broadcast to trigger fresh state — this prevents busy-spinning against a persistent error (e.g. HAProxy crashed and is stuck restarting) while still recovering automatically once a new broadcast lands.

Knob. RAY_SERVE_HAPROXY_BROADCAST_COALESCE_S (env var, float seconds). 100 ms by default — small enough that the worst-case extra latency from a single broadcast is bounded, large enough to catch the common back-to-back pattern observed during scale events. Set to 0 to opt out entirely.

@harshit-anyscale harshit-anyscale requested a review from a team as a code owner May 25, 2026 11:49
@harshit-anyscale harshit-anyscale self-assigned this May 25, 2026
@harshit-anyscale harshit-anyscale added serve Ray Serve Related Issue go add ONLY when ready to merge, run all tests labels May 25, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a coalescing mechanism for HAProxy reloads to prevent excessive churn during rapid autoscaling events. It adds a configurable window, RAY_SERVE_HAPROXY_BROADCAST_COALESCE_S, and implements a background task to batch update requests. Feedback indicates that the current retry logic is ineffective because the backend update method is synchronous and does not await the actual reload, meaning exceptions won't be caught as intended. Additionally, improvements were suggested regarding the use of standardized environment variable parsing and the removal of redundant exception information in log messages.

Comment thread python/ray/serve/_private/haproxy.py
Comment thread python/ray/serve/_private/constants.py Outdated
Comment thread python/ray/serve/_private/haproxy.py Outdated
@harshit-anyscale harshit-anyscale changed the title [serve] HAProxy: coalesce controller broadcasts into a single reload [4/N] [HAProxy stability] - coalesce controller broadcasts into a single reload May 25, 2026
Under autoscaling churn the Serve controller fires `target_groups` and
`fallback_targets` broadcasts independently, often only tens of ms
apart. Without coalescing, each broadcast triggers its own config
regeneration and graceful reload via `-sf`.

HAProxyManager now marks state dirty on each broadcast and arms a
single sleeping coalesce task; updates arriving during the sleep
window are absorbed into the same pending apply. Window defaults to
100 ms via RAY_SERVE_HAPROXY_BROADCAST_COALESCE_S; set to 0 to
disable.

If an apply fails, the task re-arms and retries on the next tick (up
to 3 consecutive failures, then waits for a new broadcast — avoids
busy-spinning on a persistent error).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@harshit-anyscale harshit-anyscale force-pushed the serve-haproxy-coalesce-broadcasts branch from 9f1c879 to 4d487f0 Compare May 25, 2026 11:54
@akyang-anyscale
Copy link
Copy Markdown
Contributor

Under autoscaling churn the Serve controller fires target_groups and fallback_targets broadcasts independently, often only tens of ms apart.

Why are fallback_targets being broadcasted so frequently if it only include the head node serve proxy? We should fix that if that's the case.

Comment thread python/ray/serve/_private/haproxy.py Outdated
harshit-anyscale added a commit to harshit-anyscale/ray that referenced this pull request May 29, 2026
Add a serve_haproxy_update_latency_s histogram measuring seconds from
the first coalesced controller broadcast to the HAProxy reload
completing. It spans the coalesce window, any time queued behind an
in-flight reload, and the reload itself — so a slow-updates
investigation can see which part dominates.

Latency is measured from the first broadcast since the last apply
(_coalesce_armed_at), so coalesced broadcasts report one end-to-end
sample, and a failed-then-retried apply still reflects the full delay.

Addresses review request on ray-project#63623.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a serve_haproxy_update_latency_s histogram measuring seconds from
the first coalesced controller broadcast to the HAProxy reload
completing. It spans the coalesce window, any time queued behind an
in-flight reload, and the reload itself — so a slow-updates
investigation can see which part dominates.

Latency is measured from the first broadcast since the last apply
(_coalesce_armed_at), so coalesced broadcasts report one end-to-end
sample, and a failed-then-retried apply still reflects the full delay.

All broadcasts now run through the apply task; a coalesce window of 0
just skips the sleep (apply immediately) instead of taking a separate
synchronous path. This keeps the latency metric recorded regardless of
the coalesce setting — including when it is disabled for debugging.

Addresses review request on ray-project#63623.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@harshit-anyscale harshit-anyscale force-pushed the serve-haproxy-coalesce-broadcasts branch from b785566 to 702207a Compare May 29, 2026 10:46
@harshit-anyscale
Copy link
Copy Markdown
Contributor Author

Under autoscaling churn the Serve controller fires target_groups and fallback_targets broadcasts independently, often only tens of ms apart.

Why are fallback_targets being broadcasted so frequently if it only include the head node serve proxy? We should fix that if that's the case.

my bad, target_groups are getting broadcasted very frequently, not fallback. thanks for pointing, change description as well.

@harshit-anyscale
Copy link
Copy Markdown
Contributor Author

You're right — fallback_targets is change-gated (broadcast_fallback_targets_if_changed) and the target it carries (the head-node proxy's ip / port / instance_id / name, a frozen pydantic Target) is stable, so it does not broadcast frequently in steady state. It only re-broadcasts when the fallback proxy's health flips (HEALTHY ↔ not, which toggles the value to/from {}) or the fallback actor is replaced.

My description overstated it: the frequent broadcaster under churn is target_groups (its replica set changes on essentially every reconcile that adds/removes a replica). Since both are emitted from the same control-loop step, coalescing mainly collapses (a) consecutive target_groups changes across reconciles and (b) the case where both change in the same ~100 ms tick and arrive microseconds apart.

The repeated fallback_targets churn we actually observed was the fallback proxy's health flapping under load — addressed separately by the lenient fallback health-check change, not by coalescing. I've updated the PR description to reflect this.

— Claude (Claude Code)

Comment thread python/ray/serve/_private/haproxy.py Outdated
_update_haproxy_backends previously returned create_task(_reload_haproxy())
and its only caller, _coalesce_and_apply, awaited that task. The task was
leftover indirection from when the synchronous long-poll callbacks called
this directly; coalescing moved the sync->async boundary up into
_schedule_haproxy_update, so the only caller is now async. Make
_update_haproxy_backends async and await _reload_haproxy() directly —
same failure propagation and serialization, one less task and no orphaned
reload if the coalesce task is cancelled.

Addresses review feedback on ray-project#63623.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
self._update_pending: bool = False
self._coalesce_task: Optional[asyncio.Task] = None
self._coalesce_armed_at: Optional[float] = None
self._haproxy_update_latency = metrics.Histogram(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include the metrics in the monitoring docs

"HAProxy reload completing. Includes the coalesce window, time "
"queued behind an in-flight reload, and the reload itself."
),
boundaries=[0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10, 30],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move boundaries in constants.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants