🤖 perf: smooth text streaming (kill cascade re-renders, model-aware reveal) by ammar-agent · Pull Request #3219 · coder/mux

ammar-agent · 2026-05-02T01:39:09Z

Summary

Streamed assistant text (and reasoning) was visibly jittery — periodic catch-up jumps every few seconds, rate stuck at ~72 chars/sec regardless of what the model emitted, and a sub-frame of work for the entire chat list on every delta. This PR makes the cadence smooth in three ordered fixes plus a TPS-display fix discovered during review: leaf-subscribe the streaming-stats pill so it stops invalidating WorkspaceState, replace the smoothing engine's hard-snap with a model-aware soft catch-up, compact streaming parts on append, and floor the TPS calculator's time span so a new stream's first deltas don't spike the displayed rate.

Background

The renderer has had a two-clock smoothing model (SmoothTextEngine + useSmoothStreamingText) for a while, but several regressions defeated it:

WorkspaceState.streamingTokenCount / streamingTPS were computed inside the getWorkspaceState snapshot using Date.now(). Every coalesced delta produced a new snapshot reference, which cascaded WorkspaceShell → ChatPane → MessageRenderer through every row. useDeferredValue was bypassed for the entire stream by shouldBypassDeferredMessages, so reconciliation ran at the ingestion rate.
getAdaptiveRate(backlog) ignored the model's actual emission rate. With a fast model (~120 cps) and BASE_CHARS_PER_SEC=72, the visible cursor fell behind by ~5 chars per ingestion cycle until backlog crossed MAX_VISUAL_LAG_CHARS=120, at which point enforceMaxVisualLag snapped visible := full - 120 and zeroed the budget — that snap is exactly the visible "catch-up jump".
requestIdleCallback({ timeout: 100 }) was used for streaming deltas. The smoothing engine should be the only pacing layer; idle batching just feeds (2).
handleStreamDelta appended a fresh { type: "text" } part per chunk; mergeAdjacentParts re-merged on every render. For a 10k-char reply that's tens of thousands of merges per turn.
calculateTPS divided by now - firstDelta.timestamp. With one delta that span is typically a few milliseconds, so e.g. 50 tokens / 0.005s = 10000 t/s. Phase 1's microtask cadence exposed this — where the prior idle-callback batching used to mask it by sampling later — and Phase 2 wired TPS into the smoothing engine, amplifying its visibility.

Implementation

Four commits, ordered so each phase is verifiable in isolation:

Phase 1 — leaf-subscribe streaming stats, microtask ingestion (775e9023c)

Removed streamingTokenCount / streamingTPS from WorkspaceState.
Added WorkspaceStreamingStats + streamingStatsStore (MapStore) + useWorkspaceStreamingStats(workspaceId) leaf hook (mirrors the existing useWorkspaceStatsSnapshot pattern at WorkspaceStore.ts:4127).
Replaced scheduleIdleStateBump with scheduleStreamingStateBump for streaming delta types (stream-delta, tool-call-delta, reasoning-delta). It coalesces on queueMicrotask instead of an idle callback. init-output and bash-output keep the idle path (terminal-style throughput).
Wired cancelPendingStreamingBump into stream-end / stream-abort / replay reset / removeWorkspace.
StreamingBarrier now reads via the leaf hook.

Phase 2 — model-aware smoothing engine, soft catch-up (85fb141da)

SmoothTextEngine.update() accepts an optional liveCharsPerSec. getAdaptiveRate(backlog, liveCps) combines a steady-state floor (max(BASE, liveCps)), a soft catch-up ramp that drains lag over SOFT_CATCHUP_DRAIN_MS once it exceeds SOFT_CATCHUP_LAG_CHARS=60, and the legacy backlog-pressure ramp (kept as upper bound).
Replaced the hard-snap discontinuity with the soft ramp. MAX_VISUAL_LAG_CHARS is now 1024 (was 120) — a defensive safety net for paused-tab pathological bursts that normal streams never hit.
Bumped MIN_FRAME_CHARS from 1 to 2 so reveals coalesce to ~30 Hz at the BASE rate (half the markdown re-parse cost; humans can't see the difference). Tail-end reveal still works because the gate is now min(MIN_FRAME_CHARS, backlog).
useSmoothStreamingText and TypewriterMarkdown thread liveCharsPerSec through; TypewriterMarkdown accepts a new workspaceId prop, forwarded from AssistantMessage and ReasoningMessage (via MessageRenderer).

Phase 3 — compact-on-append, clean prop surface (0a945ed7b)

StreamingMessageAggregator.handleStreamDelta / handleReasoningDelta append into the previous adjacent text/reasoning part in place. For a 10k-char reply this drops parts.length from thousands to one and mergeAdjacentParts cost from O(N) to O(1). Backend persistence (partial.json, chat.jsonl) is unaffected — those writers live backend-side; this aggregator's parts is pure display state.
TypewriterMarkdown: dropped the deltas: string[] shape (always passed as [content] literal — defeated React.memo) for content: string. Removed the manual React.memo and the inner useMemo for the streaming-context value (React Compiler handles both).

Phase 4 — TPS calculator floor + stream-error token cleanup (a476613be)

calculateTPS now floors the divisor at MIN_TPS_TIME_SPAN_MS = 1000. With one delta the rate becomes tokens / 1s instead of tokens / 0.005s. The reported TPS smoothly ramps up over the first second of a stream instead of spiking and "dropping abruptly". Slight under-statement during the settling window is the trade-off — strictly preferable to an order-of-magnitude over-statement.
The stream-error branch in applyWorkspaceChatEventToAggregator now calls clearTokenState, matching stream-end and stream-abort. Without it, the errored message's deltaHistory entry leaks into a follow-up stream's TPS calculation.

Validation

make typecheck ✅
make lint ✅
Targeted streaming surface: 1009+ tests pass / 0 fail across SmoothTextEngine, useSmoothStreamingText, StreamingMessageAggregator, applyWorkspaceChatEventToAggregator, StreamingTPSCalculator, TypewriterMarkdown, ReasoningMessage, StreamingBarrier{,View}, PinnedTodoList, WorkspaceStore, plus the broader src/browser/utils/messages/, src/browser/features/Messages/, src/browser/stores/, and src/browser/hooks/ suites.
New behavioral tests:
- SmoothTextEngine.test.ts: rate tracks liveCharsPerSec; soft catch-up engaged for 60–1024 char lags without snap; hard snap still fires above the safety threshold.
- StreamingTPSCalculator.test.ts: 1s floor applied for tiny / zero spans; raw span used once it exceeds the floor; negative spans (clock skew) return 0.
- applyWorkspaceChatEventToAggregator.test.ts: stream-error calls clearTokenState.

Risks

Localized to the streaming display path; no protocol or persistence changes.

Re-render shape (Phase 1). Streaming deltas now bump WorkspaceState once per microtask drain instead of once per requestIdleCallback. Net effect under heavy load is less work because the snapshot stops invalidating per-delta TPS, but it's a behavioral shift — verified via the existing 106-test WorkspaceStore suite plus targeted StreamingBarrier tests.
Smoothing engine constants (Phase 2). MAX_VISUAL_LAG_CHARS jumped 120 → 1024 and MIN_FRAME_CHARS 1 → 2. Existing test "caps visual lag when incoming text jumps ahead" still passes against the new soft-ramp behavior, and the new "hard-snaps when lag exceeds the safety threshold" test confirms the safety net still functions.
Compact-on-append (Phase 3). Touches the in-memory parts array shape during streaming. The aggregator already had compaction at stream-end (compactMessageParts); we're just doing it eagerly. No on-disk format change. All StreamingMessageAggregator and applyWorkspaceChatEventToAggregator tests pass.
TPS floor (Phase 4). The reported rate during the first second of a stream now under-counts versus the previous (mathematically broken) value. Backend sessionTimingService also calls calculateTPS; same floor applies there but the backend's window is broader so the visible effect is smaller. No risk to persisted usage / cost calculations — those use usage.outputTokens / duration from the API, not the streaming TPS estimator.

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: xhigh • Cost: $23.55

…deltas

…typewriter

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a945ed7bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Streaming TPS pill could briefly show inflated values at the start of a new stream then 'drop abruptly' to a real value, looking like cached stale data. Two root causes: 1. calculateTPS divided by (now - firstDelta.timestamp). For a brand new stream's first delta that span is just a few ms — e.g. '50 tokens / 0.005s = 10000 t/s'. As more deltas accumulate the window broadens and TPS settles, hence the visible drop. Phase 1's microtask cadence exposed this where the prior idle-callback batching used to mask it. Floor the divisor to a 1s minimum window so the rate smoothly ramps up over the first second of a stream instead of spiking. Underestimation during the settling window is acceptable; order-of-magnitude overestimation isn't. 2. The stream-error event handler in applyWorkspaceChatEventToAggregator didn't call clearTokenState, leaving the errored message's deltaHistory entry to leak. Match stream-end / stream-abort and clear it so a follow-up stream starts with a clean slate. Adds tests for both behaviors.

Codex P1 on PR #3219: every TypewriterMarkdown instance was subscribing to useWorkspaceStreamingStats(workspaceId) regardless of streaming state. Long transcripts of completed assistant messages then re-rendered on every stream-delta of the new live message, undoing part of the cascade-rerender fix. Subscribe to the real workspace key only while the message is actively streaming; completed messages subscribe to the empty-string sentinel which is never bumped. Hook still runs unconditionally per rules of hooks — only the key changes.

ammar-agent · 2026-05-02T02:07:56Z

@codex review

Addressed your P1: TypewriterMarkdown now subscribes to useWorkspaceStreamingStats with the real workspace key only while isStreaming is true. Completed historical messages subscribe to the stable empty-string sentinel (which is never bumped), so a long transcript of finished assistant messages no longer re-renders on every stream-delta of a new active stream.

Added regression tests in TypewriterMarkdown.test.tsx covering both branches (completed → empty key, streaming → real key).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff12fced30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex P2 on PR #3219: streamingStatsStore was bumped on stream-end and stream-abort but not on stream-error. Subscribers (TypewriterMarkdown via useWorkspaceStreamingStats) could keep returning the failed stream's TPS/charsPerSec until the next delta arrived, leaking stale rates into the next stream's early renders. Mirror the stream-end / stream-abort terminal-cleanup pattern in the stream-error path: cancel any pending coalesced bumps, then bump streamingStatsStore so consumers re-read and the snapshot collapses to null (getActiveStreamMessageId is already undefined post-error). Adds a regression test that drives stream-start + stream-delta + caught-up to populate the cache, then asserts both that subscribers are notified and that the post-error snapshot is null.

ammar-agent · 2026-05-02T02:19:00Z

@codex review

Addressed your P2: stream-error now mirrors the stream-end/stream-abort terminal-cleanup pattern — cancels pending coalesced bumps and explicitly bumps streamingStatsStore so subscribers re-read and the snapshot collapses to null (since getActiveStreamMessageId is already cleared by handleStreamError). Added a regression test in WorkspaceStore.test.ts that drives stream-start → stream-delta → caught-up to populate the cache, then asserts both that subscribers are notified and that the post-error snapshot is null.

chatgpt-codex-connector · 2026-05-02T02:24:58Z

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Upstream regression: @smithy/util-retry@4.3.7 was published at 2026-05-02T04:09:26Z with workspace:^ refs that escaped the smithy monorepo. Any lockfile-free bun install (used by both scripts/check-bench-agent.sh and scripts/smoke-test.sh — which mimic 'bun x mux@latest' for end users) fails to resolve those refs. Reproduction outside this repo: mkdir t && cd t echo '{"dependencies":{"@aws-sdk/credential-providers":"^3.940.0"}}' > package.json bun install --ignore-scripts # → 'Workspace dependency "@smithy/types" not found' This breaks every PR opened/pushed after 04:09Z (PR #3219 was lucky to merge ~30 min before). Add an npm-style overrides pin to <=4.3.6 (the last known-good release, 2026-04-28) until smithy republishes. Verified locally: 'bash scripts/check-bench-agent.sh' now passes, and @smithy/util-retry resolves to 4.3.6 in a fresh lockfile-free install.

ammar-agent added 3 commits May 1, 2026 20:27

phase 1: leaf-subscribe streamingTPS/tokenCount + microtask bump for …

775e902

…deltas

phase 2: model-aware smoothing engine + soft catch-up replaces hard snap

85fb141

phase 3: compact-on-append parts + drop deltas[] for content prop on …

0a945ed

…typewriter

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Comment thread src/browser/features/Messages/TypewriterMarkdown.tsx Outdated

ammar-agent added 2 commits May 1, 2026 20:48

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Comment thread src/browser/stores/WorkspaceStore.ts

ammario merged commit af4912e into main May 2, 2026
24 checks passed

ammario deleted the streaming-r08f branch May 2, 2026 03:32

ammar-agent mentioned this pull request May 2, 2026

🤖 perf: hide in-progress line, fade completed lines in temporally #3221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 perf: smooth text streaming (kill cascade re-renders, model-aware reveal)#3219

🤖 perf: smooth text streaming (kill cascade re-renders, model-aware reveal)#3219
ammario merged 6 commits intomainfrom
streaming-r08f

ammar-agent commented May 2, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ammar-agent commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Implementation

Validation

Risks

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ammar-agent commented May 2, 2026 •

edited

Loading