perf(server): bound the provider runtime-event bus#5
Merged
Conversation
ProviderService exposes a unified runtime-event stream via a PubSub that internal consumers (ProviderRuntimeIngestion, CheckpointReactor) each subscribe to. The bus was `PubSub.unbounded`, so if any consumer lagged during an event burst its per-subscriber backlog grew without bound, risking unbounded memory growth and eventual OOM under load. Switch to `PubSub.bounded` with a generous capacity. Because runtime events feed event-sourced orchestration ingestion and must never be dropped, this uses the suspending (backpressure) strategy: a slow consumer now applies backpressure up to the adapter stdio reader instead of silently accumulating memory. This keeps behavior predictable under load while preserving at-least-once delivery semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
da27ca4 to
684c006
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ProviderServiceexposes a unified provider runtime-event stream via aPubSubthat internal consumers (ProviderRuntimeIngestion,CheckpointReactor) each subscribe to throughstreamEvents. The bus was created withPubSub.unbounded:With an unbounded PubSub, each subscriber's backlog can grow without bound if a consumer lags during an event burst (streaming deltas, tool calls, large turns). Under sustained load this risks unbounded memory growth and eventual OOM — a reliability failure mode that is worse than backpressure.
Change
PubSub.bounded(RUNTIME_EVENT_BUS_CAPACITY)(capacity 4096).sliding/dropping: runtime events feed event-sourced orchestration ingestion and must never be dropped. A lagging consumer now applies backpressure all the way up to the adapter stdio reader instead of silently accumulating memory.This keeps behavior predictable under load (per the project's "reliability first / predictable under load" priorities) while preserving at-least-once delivery semantics.
Why this one
Selected as the highest-impact / lowest-effort item from a performance review (item #2): it removes an unbounded-memory failure mode on the core server event path, is a focused one-constant + one-call change, and carries no event-loss risk.
Verification
ProviderService.test.ts: 28/28 pass.ProviderRuntimeIngestion.test.ts+CheckpointReactor.test.ts(the actual bus consumers): 57/57 pass — confirms backpressure does not break event flow.oxfmt --check: clean.oxlint: 0 errors (2 pre-existing unrelated warnings).Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com