|
1 | 1 | # TODOS |
2 | 2 |
|
| 3 | +## gbrowser memory follow-ups (filed via /plan-eng-review + /codex on the v1.49 leak-fix PR) |
| 4 | + |
| 5 | +These four items came out of the memory-leak investigation that shipped |
| 6 | +the `$B memory` diagnostic + the four leak fixes. They were |
| 7 | +deliberately deferred from that PR (already 14 commits / ~12 files); |
| 8 | +each stands alone and any one could ship independently. |
| 9 | + |
| 10 | +### P2: MV3 extension service worker memory profile |
| 11 | + |
| 12 | +**What:** The `/memory` endpoint snapshot enumerates pages but does |
| 13 | +not enumerate the gstack baked-in extension's service-worker target. |
| 14 | +A long-running MV3 service worker can leak through retained DOM |
| 15 | +snapshots, message ports that never close, alarms that re-arm, and |
| 16 | +caches that grow without bound. The diagnostic should call |
| 17 | +`Target.getTargets` with a filter for `service_worker` and include |
| 18 | +each one in `tabs[]` (or a sibling `serviceWorkers[]` array) with the |
| 19 | +same `Performance.getMetrics` data. |
| 20 | + |
| 21 | +**Why:** Codex's outside-voice review on the eng-review surfaced this |
| 22 | +class of leak (the extension is part of the gbrowser process tree but |
| 23 | +invisible to today's snapshot). Until we surface it, a SW leak shows |
| 24 | +up only in the parent process RSS with no per-target attribution. |
| 25 | + |
| 26 | +**Pros:** Closes the per-target attribution gap for the |
| 27 | +single-most-likely future leak source (our own extension). |
| 28 | +**Cons:** Extension SW lifecycle is asymmetric vs page lifecycle; |
| 29 | +auto-attach + filter is one more piece of CDP plumbing. |
| 30 | + |
| 31 | +**Context:** Codex finding #4 on the eng-review outside voice. Not |
| 32 | +in scope of the v1.49 PR; deliberately deferred to keep the PR to |
| 33 | +the four highest-confidence leak fixes. |
| 34 | + |
| 35 | +**Priority:** P2. **Effort:** M. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +### P2: Native + GPU memory breakdown in `$B memory` |
| 40 | + |
| 41 | +**What:** `$B memory` shows Bun RSS + per-tab JS heap + Chromium |
| 42 | +process tree (PIDs + types + CPU time) but the per-process RSS is |
| 43 | +absent — `SystemInfo.getProcessInfo` doesn't expose RSS and the eng |
| 44 | +review (D2 USE_CDP) explicitly chose CDP over shelling to `ps`. The |
| 45 | +honest next step is to surface what CDP DOES give for the other |
| 46 | +memory categories: `Memory.getDOMCounters` per target (node + listener |
| 47 | +counts), `SystemInfo.getInfo` for GPU memory, `Memory.getAllTimeSamplingProfile` |
| 48 | +for a sampled native estimate. |
| 49 | + |
| 50 | +**Why:** Codex's outside-voice review flagged that |
| 51 | +`Performance.getMetrics` misses native memory, GPU memory, video |
| 52 | +buffers, Skia, network cache, extension process RSS, and |
| 53 | +browser-process RSS — all the categories where a 160 GB leak would |
| 54 | +actually live. A diagnostic that misses the categories where the |
| 55 | +leak class lives undersells itself. |
| 56 | + |
| 57 | +**Pros:** Per-process category breakdown closes the gap between |
| 58 | +"Activity Monitor says 160 GB" and what the diagnostic shows. |
| 59 | +**Cons:** Each CDP method has its own quirks; this is a real |
| 60 | +implementation pass, not a one-line addition. |
| 61 | + |
| 62 | +**Context:** Codex finding #5 on the eng-review outside voice. Not |
| 63 | +in scope of the v1.49 PR; deliberately deferred. |
| 64 | + |
| 65 | +**Priority:** P2. **Effort:** M. |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +### P3: Single-context CDP listener for Network.loadingFinished |
| 70 | + |
| 71 | +**What:** `wirePageEvents` attaches a `page.on('requestfinished')` |
| 72 | +listener PER PAGE. The D10 fix removed the body-materialization leak |
| 73 | +inside that listener but kept the per-page listener architecture |
| 74 | +(7 listeners attached per tab — close, framenavigated, dialog, |
| 75 | +console, request, response, requestfinished). The stretch goal from |
| 76 | +D10 was to replace the per-page `requestfinished` listener with a |
| 77 | +single context-level CDP listener via |
| 78 | +`Target.setAutoAttach({autoAttach: true, waitForDebuggerOnStart: false, |
| 79 | +flatten: true})` and a browser-wide `Network.loadingFinished` event |
| 80 | +handler. |
| 81 | + |
| 82 | +**Why:** Going from N to 1 listener for the request-size capture is |
| 83 | +structurally the right architecture and removes one piece of per-tab |
| 84 | +memory pressure. The body-materialization fix already addressed the |
| 85 | +acute leak; this is the architectural cleanup that prevents similar |
| 86 | +leaks in the same class. |
| 87 | + |
| 88 | +**Pros:** One listener per browser instead of one per tab. |
| 89 | +**Cons:** `Target.setAutoAttach` plumbing is more code than the |
| 90 | +straight per-page listener; the marginal memory win is small on top |
| 91 | +of the body-fetch fix that already landed. |
| 92 | + |
| 93 | +**Context:** D10 stretch goal on the eng-review. The minimal-risk |
| 94 | +fix shipped in v1.49 (replaces `await res.body()` with |
| 95 | +`await req.sizes()`, preserving the per-page listener); this is the |
| 96 | +architectural follow-up. |
| 97 | + |
| 98 | +**Priority:** P3. **Effort:** M-L. |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +### P3: Real-Chromium peak-RSS reproducer (periodic tier) |
| 103 | + |
| 104 | +**What:** The gate-tier reproducer |
| 105 | +(`browse/test/memory-leak-reproducer.test.ts`) pins the invariant |
| 106 | +that `res.body()` is never called during a burst of |
| 107 | +`requestfinished` events. It uses a fake page; it does NOT spin up a |
| 108 | +real Chromium nor measure peak Bun RSS during a real concurrent fetch |
| 109 | +burst. A periodic-tier follow-up should: spin up a real headless |
| 110 | +Chromium, navigate to a fixture page that concurrently fetches 500 |
| 111 | +mixed responses (small JSON, 100 KB images, 10 MB chunked, |
| 112 | +gzip-compressed 2 MB), sample `process.memoryUsage().heapUsed` every |
| 113 | +100 ms during the burst, assert `peak_heap < 200 MB above baseline` |
| 114 | +AND `post-gc_heap < 30 MB above baseline`. Also include a single-tab |
| 115 | +WebGL canvas variant that grows to >4 GB and asserts the per-tab RSS |
| 116 | +toast fires. |
| 117 | + |
| 118 | +**Why:** Codex flagged that the leak's real failure mode is transient |
| 119 | +amplification under concurrent burst, not retained leak — a steady-state |
| 120 | +heap test misses it. The fake-page gate-tier test catches the |
| 121 | +listener-architecture regression; the periodic real-browser test |
| 122 | +catches the actual peak-RSS class. |
| 123 | + |
| 124 | +**Pros:** Closes the "did we actually demonstrate the OOM is fixed" |
| 125 | +question with hard numbers. Feeds the ANGLE_B_NUMBERS CHANGELOG |
| 126 | +release-summary table. |
| 127 | +**Cons:** Periodic tier costs minutes of CI time and money per run; |
| 128 | +real-browser memory tests are inherently flaky. |
| 129 | + |
| 130 | +**Context:** Codex outside-voice finding on the eng-review; D7 |
| 131 | +ANGLE_B_NUMBERS CHANGELOG framing needs this reproducer's numbers |
| 132 | +before /ship time. |
| 133 | + |
| 134 | +**Priority:** P3. **Effort:** M. |
| 135 | + |
| 136 | +--- |
| 137 | + |
3 | 138 | ## design daemon: follow-ups (filed v1.45.0.0 via /ship review army) |
4 | 139 |
|
5 | 140 | ### ✅ DONE (v1.45.0.0): Tighten daemon test coverage |
|
0 commit comments