Skip to content

perf(server): bound the provider runtime-event bus#5

Merged
ronak-guliani merged 1 commit into
mainfrom
perf/provider-runtime-pubsub-bus
Jun 10, 2026
Merged

perf(server): bound the provider runtime-event bus#5
ronak-guliani merged 1 commit into
mainfrom
perf/provider-runtime-pubsub-bus

Conversation

@ronak-guliani

@ronak-guliani ronak-guliani commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

ProviderService exposes a unified provider runtime-event stream via a PubSub that internal consumers (ProviderRuntimeIngestion, CheckpointReactor) each subscribe to through streamEvents. The bus was created with PubSub.unbounded:

const runtimeEventPubSub = yield* PubSub.unbounded<ProviderRuntimeEvent>();

With an unbounded PubSub, each subscriber's backlog can grow without bound if a consumer lags during an event burst (streaming deltas, tool calls, large turns). Under sustained load this risks unbounded memory growth and eventual OOM — a reliability failure mode that is worse than backpressure.

Change

  • Switch to PubSub.bounded(RUNTIME_EVENT_BUS_CAPACITY) (capacity 4096).
  • Use the suspending (backpressure) strategy rather than sliding/dropping: runtime events feed event-sourced orchestration ingestion and must never be dropped. A lagging consumer now applies backpressure all the way up to the adapter stdio reader instead of silently accumulating memory.
  • Added a documented capacity constant explaining the rationale.

This keeps behavior predictable under load (per the project's "reliability first / predictable under load" priorities) while preserving at-least-once delivery semantics.

Why this one

Selected as the highest-impact / lowest-effort item from a performance review (item #2): it removes an unbounded-memory failure mode on the core server event path, is a focused one-constant + one-call change, and carries no event-loss risk.

Verification

  • Server typecheck: no real errors (only pre-existing effect-language-service informational diagnostics, none referencing this change).
  • ProviderService.test.ts: 28/28 pass.
  • ProviderRuntimeIngestion.test.ts + CheckpointReactor.test.ts (the actual bus consumers): 57/57 pass — confirms backpressure does not break event flow.
  • oxfmt --check: clean. oxlint: 0 errors (2 pre-existing unrelated warnings).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

@github-actions github-actions Bot added size:S vouch:trusted PR author is trusted by repo permissions or the VOUCHED list. labels Jun 1, 2026
ProviderService exposes a unified runtime-event stream via a PubSub that
internal consumers (ProviderRuntimeIngestion, CheckpointReactor) each
subscribe to. The bus was `PubSub.unbounded`, so if any consumer lagged
during an event burst its per-subscriber backlog grew without bound,
risking unbounded memory growth and eventual OOM under load.

Switch to `PubSub.bounded` with a generous capacity. Because runtime
events feed event-sourced orchestration ingestion and must never be
dropped, this uses the suspending (backpressure) strategy: a slow
consumer now applies backpressure up to the adapter stdio reader instead
of silently accumulating memory. This keeps behavior predictable under
load while preserving at-least-once delivery semantics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ronak-guliani ronak-guliani force-pushed the perf/provider-runtime-pubsub-bus branch from da27ca4 to 684c006 Compare June 10, 2026 20:50
@ronak-guliani ronak-guliani merged commit 3b08f98 into main Jun 10, 2026
5 of 7 checks passed
@ronak-guliani ronak-guliani deleted the perf/provider-runtime-pubsub-bus branch June 10, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S vouch:trusted PR author is trusted by repo permissions or the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant