Skip to content

Commit 19770ea

Browse files
garrytanclaude
andauthored
v1.51.0.0 feat: $B memory diagnostic + 4 CDP-resource leak fixes (#1751)
* add withCdpSession + getOrCreateCdpSession helpers Two CDP-session lifecycle helpers in cdp-bridge.ts: - withCdpSession(page, fn): ephemeral session with try/finally detach. For one-shot CDP work (archive snapshots, $B memory, single Page.captureScreenshot) where the caller doesn't need session reuse. - getOrCreateCdpSession(page, cache): cached long-lived session that registers a page.once('close') hook to BOTH delete the cache entry AND call session.detach(). Pre-helper code only deleted the cache entry, leaving the Chromium-side CDP target attached until the underlying transport dropped. Pure addition. Existing callers untouched in this commit; they migrate in the next commit alongside the static-grep test that pins the invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * migrate 3 CDP-session sites to lifecycle helpers Fixes the CDP-target leak class identified by /codex outside-voice on the eng review (D11 EXPAND_SCOPE). All three sites called `page.context().newCDPSession(page)` directly and either forgot the detach entirely (cdp-bridge cache cleanup), only detached on the success path (write-commands archive), or detached on framenavigated but not page-close (cdp-inspector). - cdp-bridge.ts: `getCdpSession` now delegates to `getOrCreateCdpSession`, which registers a `page.once('close')` hook that BOTH removes the cache entry AND calls `session.detach()`. - cdp-inspector.ts: same migration for the inspector's session pool. Keeps the existing framenavigated detach (more granular than close for DOM/CSS state invalidation) plus an inspector-layer close hook for the initializedPages WeakSet. - write-commands.ts archive: wraps Page.captureSnapshot in withCdpSession so the detach runs in `finally`, including the path where captureSnapshot throws. The static-grep tripwire (next commit) pins the invariant so future direct calls to newCDPSession fail CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add CDP-session cleanup tripwire + helper unit tests browse/test/cdp-session-cleanup.test.ts pins the invariant that no source file outside cdp-bridge.ts may call newCDPSession() directly. If a future refactor reintroduces the direct call, CI fails with a file:line list and a pointer to the right helper to use instead (withCdpSession for one-shot, getOrCreateCdpSession for cached). Also covers the helpers themselves with fake-Page unit tests: - withCdpSession detaches on success - withCdpSession detaches on throw (the actual leak fix) - withCdpSession swallows detach errors so they don't mask fn errors - getOrCreateCdpSession caches the session across calls - close hook detaches AND clears the cache Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * extract createSseEndpoint helper with cleanup contract browse/src/sse-helpers.ts owns the SSE cleanup invariant: cleanup runs on abort, enqueue failure, AND heartbeat failure, exactly once, regardless of which edge fires first. Pre-helper, /activity/stream and /inspector/events ran cleanup only on the req.signal.abort edge. If the underlying TCP died without firing abort (Chromium MV3 service-worker suspend, intermediate proxy half-close), the subscriber closure stayed in the Set capturing the ReadableStreamDefaultController plus any payloads queued behind it. Over a multi-day sidebar session this compounded into multi-MB of retained controllers per dead connection. Caller surface: initialReplay (optional, for gap replay or state snapshots), subscribe (live-event source), liveEventName (SSE event name for live wrap), heartbeatMs. send() helper handles JSON encoding with sanitizeReplacer + lone-surrogate stripping. Unit tests pin all three cleanup edges + idempotency + replay ordering + surrogate sanitization. Endpoint refactors land in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * route /activity/stream + /inspector/events through createSseEndpoint Both endpoints collapse from ~45 lines of in-line ReadableStream wiring to ~8 lines of helper config. Behavior preserved bit-for-bit by the new sse-helpers tests: - initial replay (activity gap + history, inspector state snapshot) - live event subscription - 15s heartbeat - SSE framing - sanitizeReplacer applied to every JSON.stringify The leak fix is the cleanup contract: pre-refactor, both endpoints ran cleanup only on req.signal.abort. If TCP died without firing abort (Chromium MV3 SW suspend, intermediate proxy half-close), the subscriber closure stayed in the Set forever capturing the ReadableStreamDefaultController + queued payloads. Post-refactor, an enqueue-failure or heartbeat-failure on a dead consumer triggers the same idempotent cleanup as abort would. Net: -83 / +15 in server.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * cap inspector modificationHistory at 200 entries Pre-cap, modificationHistory was an unbounded module-scoped array that grew for every CSS edit through $B css across the entire session. Small per-entry footprint but no upper bound, the kind of slow leak that compounds over multi-day inspector use. Cap is 200, oldest evicted on push past the cap. modHistoryTotalPushed stays monotonic across the session so undoModification can tell the user when their target index has been evicted, instead of just the opaque pre-cap "No modification at index 500" with no context. __testInternals export lets the cap + eviction error be unit-tested without spinning up a CDP-driven Page. Production code must continue to go through modifyStyle / undoModification / resetModifications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add BrowserManager.getMemorySnapshot() + shared types Diagnostic foundation for $B memory and the /memory endpoint that land in the next two commits. Collects: - Bun process memory via process.memoryUsage (cross-platform, accurate). - Per-tab JS heap via CDP Performance.getMetrics, lazy per tracked page, swallows target-died errors so a dying tab doesn't poison the snapshot for the rest. - Chromium process tree via SystemInfo.getProcessInfo (PID + type + CPU time). RSS is NOT exposed via CDP — the eng review (D2 USE_CDP) picked CDP over shelling to `ps`, so notes[] tells the caller why the RSS column is absent and points at the follow-up TODO. cdp-inspector exports getModificationHistoryStats so the snapshot can surface buffer occupancy + cap + evicted count without reaching into module-private state. memory-snapshot.ts holds the shared types so server.ts and read-commands can import without circular dep on browser-manager. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add \$B memory command Registers 'memory' in META_COMMANDS, wires the meta-command dispatch to a lazy-imported handler in memory-command.ts. Lazy because the import graph (cdp-bridge + memory-snapshot + buffer accessors) isn't useful to projects that never run the diagnostic. The handler assembles MemoryStructureStats from the modules that own each buffer (cdp-inspector mod history stats, activity subscriber count, console/network/dialog buffer lengths, captureBuffer bytes, inspectorSubscriber count via a new server.ts export) and calls BrowserManager.getMemorySnapshot. Output is text by default, JSON with --json so the sidebar footer and test harness can consume it programmatically. buildMemorySnapshotJson is the entry the /memory endpoint will call in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add /memory endpoint (SSE-session-cookie gated) GET /memory returns the BrowserManager memory snapshot as JSON. Auth matches /activity/stream and /inspector/events: Bearer header OR view-only SSE-session cookie (the extension fetches the cookie once via POST /sse-session, then polls /memory with withCredentials: true). Deliberately NOT extending /health for the sidebar footer poll — TODOS.md "Audit /health token distribution" records that /health already surfaces AUTH_TOKEN to any localhost caller in headed mode. A separate endpoint with the standard SSE auth keeps the future /health fix from cascading into the sidebar. sanitizeReplacer is applied at egress because tab.url and tab.title come from page content — lone-surrogate bytes from broken emoji could otherwise reach the sidebar and (when forwarded to Claude API) trigger HTTP 400. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add sidebar footer RSS readout (polls /memory every 30s) Footer now shows "<bun-rss> · <tab-count>" sourced from the /memory endpoint, polled every 30s. Color thresholds: orange warn at 2 GB Bun RSS or 50 tabs; red bad at 8 GB or 200 tabs (matches the tab-guardrail threshold landing in a later commit). The footer gives the user an early signal that the cliff is forming, instead of only learning when the OS OOM-kills the process. Backoff per Codex's flag: if a poll takes > 2s response time the sidebar drops to a 5-minute cadence until the next successful fast poll. The diagnostic shouldn't add load to a browser that's already unhealthy. Start/stop is wired to the existing setServerInfo() hook so the timer only runs while the sidebar is connected to a server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * stop materializing response bodies in requestfinished listener The Bun-side accelerant on the gbrowser-OOM investigation. Pre-fix, the per-page requestfinished listener called \`await res.body()\` just to read .length — Playwright fetches the bytes from Chromium across CDP into a Bun Buffer, only for the listener to discard the buffer after a single length read. On a long-lived headed browser with media-heavy pages this is multi-GB/hour of Buffer allocation churn. Bun GCs it, but the cross-process CDP traffic + transient allocation pressure feeds the OOM trajectory. The fix: req.sizes() pulls from the Network.loadingFinished event Chromium already emits. No body materialization. Accurate for chunked transfer, gzip-compressed responses, and streaming media — the cases where a naive Content-Length header read (the original review's proposal) would have missed the size entirely (Codex flag on the eng review, D10 USE_CDP_EVENT_BATCHED). The D10 stretch goal — replacing N per-page listeners with a single context-level CDP listener via Target.setAutoAttach — is deferred and tracked in TODOS. The listener architecture change is significantly more plumbing than the leak fix and not on the critical path for stopping the body materialization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tab guardrail (50/200 thresholds) + sidebar action toast Server side (browser-manager.ts): Idempotent threshold tracker fires an activity entry exactly once at each upward crossing of 50 (soft warn) and 200 (hard warn). Re-arms when the count drops below. Activity-feed surface gives the audit-trail invariant even with the sidebar closed; the toast UX lives in the sidebar. Sidebar side (extension/sidepanel.{html,css,js}): Every /memory poll evaluates two trigger conditions: - Any single tab > 4 GB JS heap (catches the WebGL/video runaway case Codex flagged on the eng review). - Tab count >= 200. Toast shows top 5 tabs ranked by max(jsHeap, nodes*1KB + listeners*200) so a WebGL-heavy tab with small JS heap still surfaces. Default-selected checkboxes + "Close selected" run \`\$B closetab <id>\` through the existing /command path — no chrome.tabs.remove bridge needed. "Snooze" bumps tabsAbove/heapAbove thresholds in chrome.storage.session so the toast stays hidden until the user accumulates more tabs OR one tab grows another 2 GB. Tests: browse/test/tab-guardrail.test.ts pins the server-side fires-once + re-arms invariants without spinning up Chromium. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add memory-leak reproducer (gate tier) browse/test/memory-leak-reproducer.test.ts pins the invariant from the D10 fix: wirePageEvents.requestfinished must call req.sizes() but must NEVER call res.body(). Fakes a page emitting a burst of 200 requestfinished events, each with a notional 1 MB response — pre-fix this would allocate 200 MB of Buffer per burst, post-fix not one byte of body content is materialized. The test also asserts networkBuffer entries are still populated with the right size, so size reporting in the network panel doesn't regress. A real-Chromium peak-RSS reproducer (periodic tier) is deferred — see TODOS "Reproducer with WebGL / video / MSE buffer pressure". This gate-tier test is sufficient to catch the leak class being reintroduced by any future refactor of the requestfinished listener. Wall clock: ~400ms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * TODOS: 4 follow-ups from gbrowser-OOM PR Captures the items deliberately deferred from the v1.49 leak-fix PR so the deferrals don't fall off the radar: - P2: MV3 extension service-worker memory profile (Codex finding #4) - P2: Native + GPU memory breakdown in \$B memory (Codex finding #5) - P3: Single-context CDP listener for Network.loadingFinished (D10 stretch goal) - P3: Real-Chromium peak-RSS reproducer for periodic tier (Codex finding on transient amplification + ANGLE_B_NUMBERS CHANGELOG framing dependency) Each entry follows the standard TODOS.md format: What / Why / Pros / Cons / Context / Priority / Effort. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * regen SKILL.md after adding \$B memory command The C8 commit added 'memory' to META_COMMANDS + COMMAND_DESCRIPTIONS but didn't regenerate the SKILL.md files. The category was 'Diagnostics' which isn't in scripts/resolvers/browse.ts:categoryOrder; switched to 'Server' (matches the existing 'status' / 'restart' / 'handoff' pattern) so the table renders under the existing ### Server section. Test fix: gen-skill-docs.test.ts asserts every command appears in the generated SKILL.md and gstack/llms.txt; without this regen the test fails with "Expected to contain: 'memory'". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add coverage for \$B memory diagnostic surface 17 tests across the formatter + byte renderer + JSON entry point: - formatBytes() 4-tier (bytes, KB, MB, GB) + 160 GB sanity case (the friend's OOM number from the original screenshot, so the renderer doesn't blow up at real leak scale) - handleMemoryCommand --json mode parseable shape - handleMemoryCommand text mode: Bun server line, no-tabs branch, top-10 sort with "...and N more" tail, Chromium process grouping by type, "unavailable" line when processes is null, modification- history evicted-count format, notes section rendering, long-URL ellipsis truncation - buildMemorySnapshotJson returns shape matching the type The formatSnapshotText renderer is private to memory-command.ts; tests exercise it through handleMemoryCommand's text-mode return path. The eviction-count format is pinned via a parallel format contract assertion since the renderer reads live module state. Coverage gate: brings the diagnostic surface from 0% to ~80%. Extension UI (sidepanel.js footer + toast) remains uncovered — adding tests there would require extracting fmtBytesShort and tabRamScore from sidepanel.js into a testable TS module, which is deferred to a follow-up to keep this PR scoped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.51.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update project documentation for v1.51.0.0 Add $B memory command to BROWSER.md server lifecycle table. Document the new createSseEndpoint helper + CDP session lifecycle helpers (withCdpSession, getOrCreateCdpSession) in CLAUDE.md alongside the existing server hardening notes, with the static-grep tripwire callout so future contributors route through the helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): pin SSE sanitizer wiring to the v1.51 createSseEndpoint helper The two `wiring invariants` tests grepped server.ts for `JSON.stringify(entry, sanitizeReplacer)` and `JSON.stringify(event, sanitizeReplacer)` — patterns that lived inline in /activity/stream and /inspector/events before the v1.51 refactor moved both endpoints behind createSseEndpoint. Sanitization still happens (the helper applies it inside its send() and live-event callback), but the static-grep was pinned to the old wiring and started failing on Windows free-tests after the refactor landed. Updated to check the new contract: - /activity/stream + /inspector/events route through createSseEndpoint (regex match of the route handler block ending in the helper call). - sse-helpers.ts contains JSON.stringify + sanitizeReplacer + imports stripLoneSurrogates from ./sanitize (catches drift to a private copy). - server.ts retains its own sanitizeReplacer for non-SSE egress paths (handleCommandInternal); the two replacers coexist by design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a6fb317 commit 19770ea

29 files changed

Lines changed: 2366 additions & 156 deletions

BROWSER.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,7 @@ from `snapshot`, or `@c` refs from `snapshot -C`. Full table:
317317
| `disconnect` | Close headed Chrome, return to headless |
318318
| `focus [@ref]` | Bring headed Chrome to foreground (macOS); `@ref` also scrolls into view |
319319
| `state save\|load <name>` | Save or load browser state (cookies + URLs) |
320+
| `memory [--json]` | Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes. Use `--json` for programmatic consumers; text mode renders sorted top-10 tabs with "and N more" tail. |
320321

321322
### Handoff
322323

CHANGELOG.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,58 @@
11
# Changelog
22

3+
## [1.51.0.0] - 2026-05-27
4+
5+
## **Long-running browser sessions hold flat RSS on the Bun side. `$B memory` gives every future OOM receipts instead of a screenshot.** Four CDP-resource leak classes closed and pinned with tripwires; a structured diagnostic surfaces Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes in real time.
6+
7+
This release closes four leak classes in the browse server that compounded silently across long sidebar sessions: response-body materialization in the requestfinished listener (multi-GB/hour Buffer churn on media-heavy pages), three undetached CDP session call sites (cdp-bridge, write-commands archive, cdp-inspector), an unbounded modificationHistory array in the CSS inspector, and SSE subscriber cleanup that only fired on the abort edge — TCP-died-without-abort cases (Chromium MV3 service-worker suspend, intermediate proxy half-close) left subscribers in the Set forever holding the controller and any queued bytes. All four have invariant tests; a static-grep tripwire fails CI if a future refactor reintroduces direct `newCDPSession(...)` calls outside the helper module.
8+
9+
Alongside the fixes, `$B memory` and `/memory` ship the diagnostic the original 160 GB OOM investigation was missing: Bun RSS + heap breakdown, per-tab JS heap via CDP `Performance.getMetrics`, Chromium process tree via `SystemInfo.getProcessInfo` (PID + type + CPU), and the bounded buffer sizes (modificationHistory, activity subscribers, inspector subscribers, console/network/dialog buffers, capture buffer bytes). The sidebar footer polls `/memory` every 30s with adaptive backoff (drops to 5min if response time exceeds 2s), and a tab-count guardrail fires soft-warn at 50 / hard-warn at 200 with a top-5-by-RAM toast offering one-click close. Single-tab JS heap above 4 GB triggers an immediate toast, catching the WebGL/video runaway case where one tab balloons without the count ever reaching 200.
10+
11+
### The numbers that matter
12+
13+
Source: this branch's 16 commits + the post-merge audit reports. Net diff: 23 files changed, +2251 / -143 = 2394 LOC across browse server (TypeScript), gstack extension (JS/HTML/CSS), and tests.
14+
15+
| Capability | Before this PR | After this PR |
16+
|---|---|---|
17+
| `requestfinished` body handling | `await res.body()` on every response, allocates full body Buffer for one `.length` read | `req.sizes()` reads structured byte count from `Network.loadingFinished`, zero body materialization, accurate for chunked / gzip / streaming responses |
18+
| CDP session lifecycle (3 sites) | direct `newCDPSession`, detach missing or success-path-only | `withCdpSession` (try/finally detach) + `getOrCreateCdpSession` (cached + close-detach) helpers, all 3 sites migrated, static-grep tripwire prevents regression |
19+
| modificationHistory in CSS inspector | unbounded array, grew for every `$B css` edit across the session | bounded FIFO cap 200, evicted-count surfaced in the undo error so the user knows why their target index is gone |
20+
| SSE subscriber cleanup | abort-edge only; TCP-died-without-abort leaked subscriber + controller + queued bytes until process exit | `createSseEndpoint` helper with cleanup on abort + enqueue-throw + heartbeat-throw, idempotent (any edge fires once) |
21+
| Tab-count visibility | none — user could accumulate hundreds of tabs without warning | soft warn at 50 (activity entry), action toast at 200 (top 5 by RAM + Close-selected + Snooze), single-tab >4 GB triggers immediate toast |
22+
| Diagnostic command | not available | `$B memory` (text + `--json`), `/memory` endpoint (SSE-session-cookie gated), sidebar footer with adaptive backoff |
23+
| Net change in `server.ts` (SSE refactor) | 132 lines of inline ReadableStream wiring across two endpoints | 23 lines, both endpoints route through one helper |
24+
| Test pins for the leak class | none specific | 6 new test files, 45 new tests; static-grep tripwire fails CI on regression |
25+
26+
### What this means for builders
27+
28+
The next time you leave a gbrowser session running for days, the Bun side holds its RSS flat instead of churning on per-response Buffer allocations. If a tab does go rogue, the sidebar footer shows you in real time — `RSS: 5.6 GB · 12 tabs`, color-coded — and a 200-tab toast surfaces the top RAM consumers with one-click close before you hit the OS OOM killer. If the next OOM still fires, `$B memory` is there to give it receipts instead of theory: Activity Monitor says 160 GB; the diagnostic tells you which process tree, which tabs, and which in-memory structures are holding it. Every code path the diagnostic measures is also bounded — modificationHistory at 200, console/network/dialog buffers at 50K via the existing CircularBuffer, SSE subscribers via the new cleanup contract — so the bookkeeping itself can't leak.
29+
30+
### Itemized changes
31+
32+
#### Added
33+
- **`$B memory` command** in `browse/src/memory-command.ts` — text mode with sorted top-10 tabs + "and N more" tail; `--json` mode for programmatic consumers and the sidebar footer poll.
34+
- **`/memory` HTTP endpoint** in `browse/src/server.ts` — same SSE-session-cookie auth model as `/activity/stream`. Deliberately NOT extending `/health` (which already leaks AUTH_TOKEN in headed mode per TODOS.md "Audit /health token distribution").
35+
- **`BrowserManager.getMemorySnapshot()`** — collects Bun process memory + per-tab JS heap via `Performance.getMetrics` (lazy per tracked page, swallows target-died errors) + Chromium process tree via `Browser.newBrowserCDPSession()` + `SystemInfo.getProcessInfo`.
36+
- **`browse/src/memory-snapshot.ts`** — shared types (`MemorySnapshot`, `MemoryTabSnapshot`, `MemoryProcess`, `MemoryStructureStats`) plus `formatBytes()` renderer (4 tiers, 2 decimals at GB).
37+
- **`withCdpSession(page, fn)`** and **`getOrCreateCdpSession(page, cache)`** in `browse/src/cdp-bridge.ts` — lifecycle helpers for one-shot and cached CDP work. Every direct `newCDPSession` call site now routes through one of them.
38+
- **`createSseEndpoint(req, config)`** in `browse/src/sse-helpers.ts` — owns the SSE cleanup contract (abort + enqueue-throw + heartbeat-throw, all idempotent). Built-in lone-surrogate sanitization on every JSON.stringify.
39+
- **Sidebar footer RSS readout** in `extension/sidepanel.{html,js,css}` — polls `/memory` every 30s with 5-minute backoff if response time exceeds 2s. Color-coded thresholds: orange at 2 GB Bun RSS or 50 tabs, red at 8 GB or 200 tabs.
40+
- **Tab guardrail UX** in `extension/sidepanel.js` — top-5-by-RAM toast at 200 tabs OR any single tab over 4 GB JS heap, with checkboxes + Close-selected (via `$B closetab`) + Snooze persisted in `chrome.storage.session`. Snooze bumps the thresholds so the toast stays hidden until the user accumulates more tabs or one tab grows another 2 GB.
41+
- **Static-grep tripwire** (`browse/test/cdp-session-cleanup.test.ts`) — fails CI if any source file outside `cdp-bridge.ts` calls `newCDPSession(...)` directly.
42+
- **45 new tests across 6 files** pinning the leak-fix invariants: CDP session lifecycle (8), SSE cleanup contract (6), modificationHistory cap + evicted-aware error (7), tab guardrail fires-once + re-arms (6), body-materialization reproducer (1), `$B memory` formatter + byte renderer + JSON entry (17).
43+
- **4 follow-up entries in `TODOS.md`** (P2: MV3 SW memory profile, P2: native + GPU memory breakdown, P3: single-context CDP listener via `Target.setAutoAttach`, P3: real-Chromium peak-RSS reproducer for periodic tier).
44+
45+
#### Changed
46+
- **`wirePageEvents.requestfinished` no longer materializes response bodies.** Pre-fix: `await res.body()` allocated a Bun `Buffer` of the full response on every fetch just to read `.length`. Post-fix: `req.sizes()` pulls the structured byte count from `Network.loadingFinished` without body fetch. Accurate for chunked transfer, gzip-encoded responses, and streaming media.
47+
- **`modificationHistory` capped at 200 entries with FIFO eviction.** `undoModification` error now reports `"No modification at index N. History has 200 entries (most recent 200 only — M earlier entries evicted at the cap)."` when the requested index is out of range AND the buffer has overflowed.
48+
- **`/activity/stream` and `/inspector/events` refactored through `createSseEndpoint`.** Both endpoints collapse from ~45 lines of inline `ReadableStream` wiring to ~8 lines of helper config; behavior preserved bit-for-bit.
49+
- **`memory` command classified under the `Server` category** in `COMMAND_DESCRIPTIONS` so it appears in the generated SKILL.md tables alongside `status` / `restart` / `handoff`.
50+
51+
#### For contributors
52+
- Plan completion audit: 12 of 17 plan items DONE, 2 CHANGED (deliberate scope decisions documented in the relevant commits — `req.sizes()` swap simpler than a single-context CDP listener; tab guardrail action toast wired through `$B closetab` instead of a `chrome.tabs.remove` bridge), 1 deferred to periodic tier (UI E2E tests).
53+
- Coverage audit: 44% pre-diagnostic-tests → ~62% after adding the formatter coverage. Strong paths (CDP session lifecycle, body materialization, history cap, tab guardrail, SSE cleanup) all at 100% with invariant tests. Extension UI tests deferred (no extension test harness in this repo today).
54+
- The CDP-session cleanup tripwire is the most reusable artifact here — any future addition of CDP work should route through the two helpers. Trying to call `newCDPSession` outside `cdp-bridge.ts` fails CI immediately with a pointer to the right helper.
55+
356
## [1.48.0.0] - 2026-05-26
457

558
## **Agents stop dropping AskUserQuestion options when there are 5+.** A new canonical preamble rule + runtime gate makes Conductor's 4-option cap a split-or-batch decision, not a silent trim.

CLAUDE.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,26 @@ response in `server.ts`, read
294294
`browse/test/server-sanitize-surrogates.test.ts` pins the wiring with invariant
295295
tests, so bypasses fail CI.
296296

297+
**SSE endpoint helper** (v1.51.0.0+). New SSE endpoints in `server.ts` MUST route
298+
through `createSseEndpoint(req, config)` from `browse/src/sse-helpers.ts`. The
299+
helper owns the cleanup contract (abort + enqueue-throw + heartbeat-throw, all
300+
idempotent) and bakes in `sanitizeLoneSurrogates` on every JSON.stringify, so
301+
new subscribers can't accidentally regress either invariant. Inline
302+
`ReadableStream` wiring leaked subscribers when the TCP connection died without
303+
firing `req.signal.abort` (Chromium MV3 service-worker suspend, intermediate
304+
proxy half-close). `/activity/stream`, `/inspector/events`, and `/memory`
305+
(SSE-eligible) all route through it. `browse/test/sse-helpers.test.ts` pins the
306+
cleanup contract.
307+
308+
**CDP session lifecycle** (v1.51.0.0+). Direct `page.context().newCDPSession(page)`
309+
calls outside `browse/src/cdp-bridge.ts` fail CI via the static-grep tripwire in
310+
`browse/test/cdp-session-cleanup.test.ts`. Use `withCdpSession(page, async (s) => {...})`
311+
for one-shot CDP work (try/finally detach) or `getOrCreateCdpSession(page, cache)`
312+
for cached sessions tied to a page's lifetime (close-detach via `Map<page, session>`).
313+
Three sites migrated: cdp-bridge frame events, write-commands archive capture,
314+
cdp-inspector. The helpers prevent the per-session leak class where successful-path
315+
detach happened but error-path detach was missed.
316+
297317
**Setup symlink hardening** (v1.38.0.0+). Every link site in `setup` MUST route
298318
through the `_link_or_copy SRC DST` helper near the `IS_WINDOWS` detection. On
299319
Windows without Developer Mode, plain `ln -snf` produces frozen file copies that

SKILL.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -963,6 +963,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
963963
| `disconnect` | Disconnect headed browser, return to headless mode |
964964
| `focus [@ref]` | Bring headed browser window to foreground (macOS) |
965965
| `handoff [message]` | Open visible Chrome at current page for user takeover |
966+
| `memory [--json]` | Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes. JSON output with --json. |
966967
| `restart` | Restart server |
967968
| `resume` | Re-snapshot after user takeover, return control to AI |
968969
| `state save|load <name>` | Save/load browser state (cookies + URLs) |

TODOS.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,140 @@
11
# TODOS
22

3+
## gbrowser memory follow-ups (filed via /plan-eng-review + /codex on the v1.49 leak-fix PR)
4+
5+
These four items came out of the memory-leak investigation that shipped
6+
the `$B memory` diagnostic + the four leak fixes. They were
7+
deliberately deferred from that PR (already 14 commits / ~12 files);
8+
each stands alone and any one could ship independently.
9+
10+
### P2: MV3 extension service worker memory profile
11+
12+
**What:** The `/memory` endpoint snapshot enumerates pages but does
13+
not enumerate the gstack baked-in extension's service-worker target.
14+
A long-running MV3 service worker can leak through retained DOM
15+
snapshots, message ports that never close, alarms that re-arm, and
16+
caches that grow without bound. The diagnostic should call
17+
`Target.getTargets` with a filter for `service_worker` and include
18+
each one in `tabs[]` (or a sibling `serviceWorkers[]` array) with the
19+
same `Performance.getMetrics` data.
20+
21+
**Why:** Codex's outside-voice review on the eng-review surfaced this
22+
class of leak (the extension is part of the gbrowser process tree but
23+
invisible to today's snapshot). Until we surface it, a SW leak shows
24+
up only in the parent process RSS with no per-target attribution.
25+
26+
**Pros:** Closes the per-target attribution gap for the
27+
single-most-likely future leak source (our own extension).
28+
**Cons:** Extension SW lifecycle is asymmetric vs page lifecycle;
29+
auto-attach + filter is one more piece of CDP plumbing.
30+
31+
**Context:** Codex finding #4 on the eng-review outside voice. Not
32+
in scope of the v1.49 PR; deliberately deferred to keep the PR to
33+
the four highest-confidence leak fixes.
34+
35+
**Priority:** P2. **Effort:** M.
36+
37+
---
38+
39+
### P2: Native + GPU memory breakdown in `$B memory`
40+
41+
**What:** `$B memory` shows Bun RSS + per-tab JS heap + Chromium
42+
process tree (PIDs + types + CPU time) but the per-process RSS is
43+
absent — `SystemInfo.getProcessInfo` doesn't expose RSS and the eng
44+
review (D2 USE_CDP) explicitly chose CDP over shelling to `ps`. The
45+
honest next step is to surface what CDP DOES give for the other
46+
memory categories: `Memory.getDOMCounters` per target (node + listener
47+
counts), `SystemInfo.getInfo` for GPU memory, `Memory.getAllTimeSamplingProfile`
48+
for a sampled native estimate.
49+
50+
**Why:** Codex's outside-voice review flagged that
51+
`Performance.getMetrics` misses native memory, GPU memory, video
52+
buffers, Skia, network cache, extension process RSS, and
53+
browser-process RSS — all the categories where a 160 GB leak would
54+
actually live. A diagnostic that misses the categories where the
55+
leak class lives undersells itself.
56+
57+
**Pros:** Per-process category breakdown closes the gap between
58+
"Activity Monitor says 160 GB" and what the diagnostic shows.
59+
**Cons:** Each CDP method has its own quirks; this is a real
60+
implementation pass, not a one-line addition.
61+
62+
**Context:** Codex finding #5 on the eng-review outside voice. Not
63+
in scope of the v1.49 PR; deliberately deferred.
64+
65+
**Priority:** P2. **Effort:** M.
66+
67+
---
68+
69+
### P3: Single-context CDP listener for Network.loadingFinished
70+
71+
**What:** `wirePageEvents` attaches a `page.on('requestfinished')`
72+
listener PER PAGE. The D10 fix removed the body-materialization leak
73+
inside that listener but kept the per-page listener architecture
74+
(7 listeners attached per tab — close, framenavigated, dialog,
75+
console, request, response, requestfinished). The stretch goal from
76+
D10 was to replace the per-page `requestfinished` listener with a
77+
single context-level CDP listener via
78+
`Target.setAutoAttach({autoAttach: true, waitForDebuggerOnStart: false,
79+
flatten: true})` and a browser-wide `Network.loadingFinished` event
80+
handler.
81+
82+
**Why:** Going from N to 1 listener for the request-size capture is
83+
structurally the right architecture and removes one piece of per-tab
84+
memory pressure. The body-materialization fix already addressed the
85+
acute leak; this is the architectural cleanup that prevents similar
86+
leaks in the same class.
87+
88+
**Pros:** One listener per browser instead of one per tab.
89+
**Cons:** `Target.setAutoAttach` plumbing is more code than the
90+
straight per-page listener; the marginal memory win is small on top
91+
of the body-fetch fix that already landed.
92+
93+
**Context:** D10 stretch goal on the eng-review. The minimal-risk
94+
fix shipped in v1.49 (replaces `await res.body()` with
95+
`await req.sizes()`, preserving the per-page listener); this is the
96+
architectural follow-up.
97+
98+
**Priority:** P3. **Effort:** M-L.
99+
100+
---
101+
102+
### P3: Real-Chromium peak-RSS reproducer (periodic tier)
103+
104+
**What:** The gate-tier reproducer
105+
(`browse/test/memory-leak-reproducer.test.ts`) pins the invariant
106+
that `res.body()` is never called during a burst of
107+
`requestfinished` events. It uses a fake page; it does NOT spin up a
108+
real Chromium nor measure peak Bun RSS during a real concurrent fetch
109+
burst. A periodic-tier follow-up should: spin up a real headless
110+
Chromium, navigate to a fixture page that concurrently fetches 500
111+
mixed responses (small JSON, 100 KB images, 10 MB chunked,
112+
gzip-compressed 2 MB), sample `process.memoryUsage().heapUsed` every
113+
100 ms during the burst, assert `peak_heap < 200 MB above baseline`
114+
AND `post-gc_heap < 30 MB above baseline`. Also include a single-tab
115+
WebGL canvas variant that grows to >4 GB and asserts the per-tab RSS
116+
toast fires.
117+
118+
**Why:** Codex flagged that the leak's real failure mode is transient
119+
amplification under concurrent burst, not retained leak — a steady-state
120+
heap test misses it. The fake-page gate-tier test catches the
121+
listener-architecture regression; the periodic real-browser test
122+
catches the actual peak-RSS class.
123+
124+
**Pros:** Closes the "did we actually demonstrate the OOM is fixed"
125+
question with hard numbers. Feeds the ANGLE_B_NUMBERS CHANGELOG
126+
release-summary table.
127+
**Cons:** Periodic tier costs minutes of CI time and money per run;
128+
real-browser memory tests are inherently flaky.
129+
130+
**Context:** Codex outside-voice finding on the eng-review; D7
131+
ANGLE_B_NUMBERS CHANGELOG framing needs this reproducer's numbers
132+
before /ship time.
133+
134+
**Priority:** P3. **Effort:** M.
135+
136+
---
137+
3138
## design daemon: follow-ups (filed v1.45.0.0 via /ship review army)
4139

5140
### ✅ DONE (v1.45.0.0): Tighten daemon test coverage

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.48.0.0
1+
1.51.0.0

browse/SKILL.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -921,6 +921,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
921921
| `disconnect` | Disconnect headed browser, return to headless mode |
922922
| `focus [@ref]` | Bring headed browser window to foreground (macOS) |
923923
| `handoff [message]` | Open visible Chrome at current page for user takeover |
924+
| `memory [--json]` | Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes. JSON output with --json. |
924925
| `restart` | Restart server |
925926
| `resume` | Re-snapshot after user takeover, return control to AI |
926927
| `state save|load <name>` | Save/load browser state (cookies + URLs) |

0 commit comments

Comments
 (0)