Skip to content

Commit 3b4ec09

Browse files
authored
[kernel-1116] Add CDP Monitor (#213)
Introduces the foundational layer of the CDP monitor as a standalone reviewablechunk. No Monitor struct wiring, just the primitives that everything else builds on. - types.go: CDP wire format (cdpMessage), all event type constants, internal state structs (networkReqState, targetInfo, CDP param shapes). - util.go: Console arg extraction, MIME allow-list (isCapturedMIME), resource type filter (isTextualResource), per-MIME body size caps (bodyCapFor), UTF-8-safe body truncation (truncateBody). - computed.go: State machine for the three derived events: network_idle (500ms debounce after all requests finish), layout_settled (1s after page_load with no layout shifts), navigation_settled (fires once all three flags converge). Timer invalidation via navSeq prevents stale AfterFunc callbacks from publishing for a previous navigation. - domains.go: isPageLikeTarget predicate (pages and iframes get Page.* / PerformanceTimeline.*; workers don't), bindingName constant, interaction.js embed. - interaction.js: Injected script tracking clicks, keydowns, and scroll-settled events via the __kernelEvent CDP binding. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **High Risk** > Large new CDP capture subsystem that streams network/console data (including headers/bodies) and manages reconnect/timers across goroutines, which is complex and can impact stability, performance, and sensitive-data exposure if misconfigured. > > **Overview** > Adds a full `cdpmonitor` implementation that connects to Chrome’s DevTools WebSocket, auto-attaches to targets, translates CDP notifications into typed `events.Event`s (console, network, page, interaction), and publishes monitor lifecycle events (disconnect/reconnect/init-failed) with capped-exponential reconnect backoff. > > Introduces per-session computed state machines (`network_idle`, `page_layout_settled`, `page_navigation_settled`), interaction tracking via embedded `interaction.js` with rate-limiting and sensitive-input suppression, and screenshot capture via `ffmpeg` with size downscaling and 2s rate limiting. > > Updates API wiring to pass a `slog` logger into `cdpmonitor.New`, abstracts `ApiService.cdpMonitor` behind an interface for easier stubbing in tests, and adds extensive fixtures/tests to validate CDP wire-type roundtrips and monitor behavior (events, redirects, detach handling, TTL sweeps, and reconnects). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 7cc19d8. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent ba17894 commit 3b4ec09

30 files changed

Lines changed: 4266 additions & 8 deletions

server/cmd/api/api/api.go

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import (
44
"context"
55
"errors"
66
"fmt"
7+
"log/slog"
78
"os"
89
"os/exec"
910
"sync"
@@ -20,6 +21,14 @@ import (
2021
"github.com/kernel/kernel-images/server/lib/scaletozero"
2122
)
2223

24+
type cdpMonitorController interface {
25+
Start(ctx context.Context) error
26+
Stop()
27+
IsRunning() bool
28+
}
29+
30+
var _ cdpMonitorController = (*cdpmonitor.Monitor)(nil)
31+
2332
type ApiService struct {
2433
// defaultRecorderID is used whenever the caller doesn't specify an explicit ID.
2534
defaultRecorderID string
@@ -73,7 +82,7 @@ type ApiService struct {
7382

7483
// CDP event pipeline and cdpMonitor.
7584
captureSession *events.CaptureSession
76-
cdpMonitor *cdpmonitor.Monitor
85+
cdpMonitor cdpMonitorController
7786
monitorMu sync.Mutex
7887
lifecycleCtx context.Context
7988
lifecycleCancel context.CancelFunc
@@ -103,7 +112,7 @@ func New(
103112
return nil, fmt.Errorf("captureSession cannot be nil")
104113
}
105114

106-
mon := cdpmonitor.New(upstreamMgr, captureSession.Publish, displayNum)
115+
mon := cdpmonitor.New(upstreamMgr, captureSession.Publish, displayNum, slog.Default())
107116
ctx, cancel := context.WithCancel(context.Background())
108117

109118
return &ApiService{

server/cmd/api/api/capture_session_test.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,5 +248,12 @@ func newTestService(t *testing.T, mgr recorder.RecordManager) *ApiService {
248248
t.Helper()
249249
svc, err := New(mgr, newMockFactory(), newTestUpstreamManager(), scaletozero.NewNoopController(), newMockNekoClient(t), newCaptureSession(t), 0)
250250
require.NoError(t, err)
251+
svc.cdpMonitor = &stubCdpMonitor{}
251252
return svc
252253
}
254+
255+
type stubCdpMonitor struct{}
256+
257+
func (s *stubCdpMonitor) Start(_ context.Context) error { return nil }
258+
func (s *stubCdpMonitor) Stop() {}
259+
func (s *stubCdpMonitor) IsRunning() bool { return false }

server/lib/cdpmonitor/README.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# CDP Monitor
2+
3+
The monitor is the browser-facing layer of the kernel browser logging pipeline. It connects to Chrome's DevTools endpoint, tracks all page sessions via CDP's `Target.setAutoAttach`, and converts raw CDP notifications into typed `events.Event` values for downstream consumers.
4+
5+
## Overview
6+
7+
`cdpmonitor` manages a Chrome DevTools Protocol (CDP) WebSocket connection to a running Chrome browser. It subscribes to CDP events across all attached tabs, translates them into structured `events.Event` values, and publishes them via a caller-supplied `PublishFunc`. It also derives synthetic events from sequences of CDP events and takes screenshots on significant page activity.
8+
9+
Chrome can restart independently of the monitor. When that happens, `UpstreamProvider` pushes a new DevTools URL and the monitor reconnects automatically, emitting lifecycle events so consumers can track continuity.
10+
11+
## Event taxonomy
12+
13+
**CDP-derived** (1-to-1 with a CDP notification): `console_log`, `console_error`, `network_request`, `network_response`, `network_loading_failed`, `page_tab_opened`, `page_navigation`, `page_dom_content_loaded`, `page_load`, `page_layout_shift`
14+
15+
**Computed** (inferred from sequences of CDP events): `network_idle` (fires when in-flight requests drop to zero), `page_layout_settled` (1 s after `page_load` with no intervening layout shifts), `page_navigation_settled` (fires once `page_dom_content_loaded` and `page_layout_settled` have both fired for the same navigation; intentionally independent of `network_idle` so that a single hung request cannot stall the event).
16+
17+
**Interaction** (fired by `interaction.js` via `Runtime.bindingCalled`): `interaction_click`, `interaction_key`, `interaction_scroll_settled`
18+
19+
**Monitor lifecycle** (emitted by the monitor itself, not by Chrome): `monitor_screenshot`, `monitor_disconnected`, `monitor_reconnected`, `monitor_reconnect_failed`, `monitor_init_failed`
20+
21+
## Responsibilities
22+
23+
| Concern | Where |
24+
| --- | --- |
25+
| WebSocket lifecycle (connect, read, reconnect) | `monitor.go` |
26+
| CDP domain setup per session | `domains.go` |
27+
| Event translation (CDP params to `events.Event`) | `handlers.go` |
28+
| Synthetic event state machines | `computed.go` |
29+
| Screenshot capture via ffmpeg | `screenshot.go` |
30+
| CDP protocol types | `cdp_proto.go`, `types.go` |
31+
| Interaction tracking injected into the page | `interaction.js` |
32+
| Body/MIME capture sizing and text truncation helpers | `util.go` |
33+
34+
## Internals
35+
36+
### Reconnect model
37+
38+
`subscribeToUpstream` listens to `UpstreamProvider.Subscribe()` for new DevTools URLs. On each URL change (indicating Chrome restarted), `handleUpstreamRestart` tears down the existing connection, dials the new URL with capped-exponential backoff (250 ms → 500 ms → 1 s → 2 s, up to 10 attempts), then restarts `readLoop` and re-initializes all CDP sessions. `restartMu` serializes concurrent restart signals so rapid Chrome restarts do not produce overlapping reconnects.
39+
40+
### Goroutines
41+
42+
| Goroutine | Lifetime | Tracked by |
43+
| --- | --- | --- |
44+
| `readLoop` | one per WebSocket connection | `done` channel |
45+
| `subscribeToUpstream` | same as `lifecycleCtx` | `asyncWg` |
46+
| `sweepPendingRequests` | same as `lifecycleCtx` | `asyncWg` |
47+
| `initSession` | short-lived, one per connect or reconnect | `asyncWg` |
48+
| `attachExistingTargets` wrapper | short-lived, one per existing target on reconnect | `asyncWg` |
49+
| `enableDomains` + `injectScript` | short-lived, one per target attach | `asyncWg` |
50+
| `fetchResponseBody` | one per completed network request | `asyncWg` |
51+
| `captureScreenshot` | one per screenshot trigger | `asyncWg` |
52+
53+
`Stop()` cancels `lifecycleCtx`, waits for `readLoop` via `done`, then waits for all other goroutines via `asyncWg` before closing the connection.
54+
55+
### Lock ordering
56+
57+
Locks must be acquired left to right. Never hold a lock on the left while acquiring one further right.
58+
59+
```
60+
restartMu -> lifeMu -> pendReqMu -> computed.mu -> pendMu
61+
restartMu -> lifeMu -> sessionsMu
62+
```
63+
64+
`computed.mu` and `sessionsMu` are never held simultaneously; `cs.stop()` and `cs.resetOnNavigation()` are called only after the relevant `sessionsMu` critical section is complete.
65+
66+
`bindingRateMu` is independent of this ordering and is always acquired alone.
67+
68+
| Lock | Protects |
69+
| --- | --- |
70+
| `restartMu` | Serializes `handleUpstreamRestart` to prevent overlapping reconnects from rapid Chrome restarts |
71+
| `lifeMu` | `conn`, `lifecycleCtx`, `cancel`, `done`, `readReady` -- all fields that change during Start / Stop / reconnect |
72+
| `pendReqMu` | `pendingRequests` (requestId -&gt; `networkReqState`): in-flight network requests accumulating request/response metadata until `loadingFinished` |
73+
| `computed.mu` | All `computedState` fields: counters and timers for the `network_idle`, `page_layout_settled`, and `page_navigation_settled` state machines |
74+
| `pendMu` | `pending` (id -&gt; reply channel): in-flight CDP commands waiting for a response from Chrome |
75+
| `sessionsMu` | `sessions` (sessionID -&gt; `targetInfo`): the set of currently attached CDP targets (tabs, iframes, workers) |
76+
| `bindingRateMu` | `bindingLastSeen` (sessionID:eventType -&gt; time): rate-limit state for `__kernelEvent` binding calls |
77+
78+
Fields that need no mutex use `sync/atomic`: `nextID`, `mainSessionID`, `running`, `lastScreenshotAt`, `screenshotInFlight`.
79+
80+
### WebSocket concurrency
81+
82+
`coder/websocket` guarantees one concurrent `Read` and one concurrent `Write` are safe on the same connection. `readLoop` is the sole reader. All writes go through `send`, which calls `conn.Write` directly -- `conn.Write` is internally serialized by the library, so no external write mutex is needed.
83+
84+
## Event data model
85+
86+
### Envelope and top-level fields
87+
88+
Every event arrives as an `Envelope`:
89+
90+
```json
91+
{
92+
"capture_session_id": "cs_abc123",
93+
"seq": 42,
94+
"event": {
95+
"ts": 1746123456789000,
96+
"type": "network_request",
97+
"category": "network",
98+
"source": { ... },
99+
"data": { ... },
100+
"truncated": false
101+
}
102+
}
103+
```
104+
105+
| Field | Type | Description |
106+
| --- | --- | --- |
107+
| `capture_session_id` | string | Pipeline-assigned ID for the capture session (not a CDP concept). |
108+
| `seq` | uint64 | Monotonically increasing per-capture-session sequence number. |
109+
| `event.ts` | int64 | Wall-clock time the monitor emitted the event, as **Unix microseconds** (µs since epoch). |
110+
| `event.type` | string | See [Event taxonomy](#event-taxonomy). |
111+
| `event.category` | string | One of: `console`, `network`, `page`, `interaction`, `system`. |
112+
| `event.truncated` | bool | `true` if `data` was nulled to fit the 1 MB pipeline limit. |
113+
114+
### Source object
115+
116+
```json
117+
"source": {
118+
"kind": "cdp",
119+
"event": "Network.requestWillBeSent",
120+
"metadata": {
121+
"cdp_session_id": "...",
122+
"target_id": "...",
123+
"target_type": "page"
124+
}
125+
}
126+
```
127+
128+
| Field | Description |
129+
| --- | --- |
130+
| `event` | The raw CDP method that triggered the event (e.g. `Network.requestWillBeSent`). Empty for computed events. |
131+
| `metadata.cdp_session_id` | The CDP WebSocket session multiplexer ID for this target. Changes if Chrome restarts. |
132+
| `metadata.target_id` | Stable identifier for the browser target (tab/window). Survives navigations within the same tab. |
133+
| `metadata.target_type` | Target type as reported by Chrome: `page`, `iframe`, `worker`, etc. |
134+
135+
### CDP identity primer
136+
137+
Five IDs appear across events. Understanding how they nest prevents confusion:
138+
139+
```
140+
target_id <- one per tab/window; stable across navigations
141+
└── cdp_session_id <- WebSocket multiplexer channel to that target; resets on Chrome restart
142+
└── frame_id <- one per frame (top-level or iframe); changes on navigation
143+
└── loader_id <- one per document load; links a navigation to its network requests
144+
└── request_id <- one per request (stable across redirects in a chain)
145+
```
146+
147+
| ID | Where it appears | What it identifies |
148+
| --- | --- | --- |
149+
| `target_id` | `source.metadata`, most `data` objects | The browser tab. Use this to group all events from one tab session. |
150+
| `cdp_session_id` | `source.metadata` | The WebSocket sub-channel. Not stable across reconnects. |
151+
| `frame_id` | `page_navigation`, `network_request`, `network_response`, `network_loading_failed` | The frame the request or navigation belongs to. Top-level frame has no `parent_frame_id`. |
152+
| `source_frame_id` | `page_layout_shift` | The frame where the layout shift occurred. Distinct from the nav context `frame_id`, which is always the top-level navigated frame. |
153+
| `loader_id` | `page_navigation`, `network_request`, `network_response` | The document load that owns a request. Join `network_request.loader_id` to `page_navigation.loader_id` to correlate requests with the navigation that triggered them. |
154+
| `request_id` | `network_request`, `network_response`, `network_loading_failed` | A single request chain (including redirects). Links request to its eventual response or failure. |
155+
156+
### Navigation context fields
157+
158+
Most event `data` objects include a nav context block stamped at the last `page_navigation`. These fields reflect the top-level frame most recently navigated in the session:
159+
160+
| Field | Description |
161+
| --- | --- |
162+
| `session_id` | Same as `source.metadata.cdp_session_id`. Repeated for data-only consumers. |
163+
| `frame_id` | Frame ID of the navigated top-level frame. |
164+
| `loader_id` | Loader ID of the current document. |
165+
| `url` | URL of the current page at the time of the last navigation. |
166+
| `nav_seq` | Monotonically increasing counter, incremented on each `page_navigation`. Use it to detect that the page has navigated between two events in the same session. |
167+
168+
### Per-event data fields
169+
170+
Fields below are the unique additions per event type. Unless otherwise noted, events also include the nav context fields described above. Network events are the exception: they carry their own `loader_id` and `frame_id` directly and do not include nav context.
171+
172+
#### Console events
173+
174+
| Event | Unique fields |
175+
| --- | --- |
176+
| `console_log` | `level` (CDP type string), `text` (first arg), `args` (all args as strings), `stack_trace` |
177+
| `console_error` | Same as `console_log` when `source.event` is `Runtime.consoleAPICalled`. When `source.event` is `Runtime.exceptionThrown`: `text`, `line`, `column`, `source_url` (script file URL, not page URL), `stack_trace`. |
178+
179+
#### Network events
180+
181+
| Event | Fields |
182+
| --- | --- |
183+
| `network_request` | `request_id`, `loader_id`, `frame_id`, `document_url`, `method`, `url`, `headers`, `initiator_type`. Optional: `post_data`, `resource_type`, `is_redirect` + `redirect_url`. |
184+
| `network_response` | `request_id`, `loader_id`, `frame_id`, `method`, `url`, `status`, `headers`. Optional: `status_text`, `mime_type`, `resource_type`, `body` (truncated text body for textual MIME types). |
185+
| `network_loading_failed` | `request_id`, `error_text`, `canceled`. Optional (absent when the request record was not found): `url`, `loader_id`, `frame_id`, `resource_type`. |
186+
187+
#### Page events
188+
189+
| Event | Unique fields |
190+
| --- | --- |
191+
| `page_tab_opened` | `target_id`, `target_type`, `url`, `opener_id`, `title`. Emitted before the first navigation; no nav context. |
192+
| `page_navigation` | `session_id`, `target_id`, `target_type`, `url`, `frame_id`, `parent_frame_id` (absent for top-level frames), `loader_id`. This event establishes the nav context stamped on all subsequent events for the session. |
193+
| `page_dom_content_loaded` | Nav context + `cdp_timestamp` (CDP monotonic seconds; not a wall-clock timestamp -- use `event.ts` for ordering). |
194+
| `page_load` | Nav context + `cdp_timestamp` (CDP monotonic seconds). |
195+
| `page_layout_shift` | Nav context + `source_frame_id`, `time`, `duration`. Optional `layout_shift_details` object: `value`, `had_recent_input`. Optional `lcp_details` object: `render_time`, `load_time`, `size`, `element_id`, `url`, `node_id`. Chrome multiplexes LCP candidate data through the same `PerformanceTimeline.timelineEventAdded` notification, so both may appear on a single event. |
196+
197+
#### Computed events
198+
199+
`network_idle`, `page_layout_settled`, and `page_navigation_settled` carry nav context fields only.
200+
201+
#### Interaction events
202+
203+
All interaction events include nav context plus the fields below.
204+
205+
| Event | Unique fields |
206+
| --- | --- |
207+
| `interaction_click` | `x`, `y` (viewport coords), `selector` (CSS selector of clicked element), `tag`, `text` (element text; empty for sensitive inputs). |
208+
| `interaction_key` | `key` (key name), `selector`, `tag`. Not emitted for sensitive input fields. |
209+
| `interaction_scroll_settled` | `from_x`, `from_y`, `to_x`, `to_y` (scroll positions in px), `target_selector`. |
210+
211+
#### Monitor lifecycle events
212+
213+
Lifecycle events use `source.kind = "local_process"` and carry no nav context, except `monitor_screenshot` which includes nav context alongside the image payload.
214+
215+
| Event | Fields |
216+
| --- | --- |
217+
| `monitor_screenshot` | Nav context + `png` (base64-encoded PNG). |
218+
| `monitor_disconnected` | `reason: "chrome_restarted"`. |
219+
| `monitor_reconnected` | `reconnect_duration_ms`. |
220+
| `monitor_reconnect_failed` | `reason: "reconnect_exhausted"`. |
221+
| `monitor_init_failed` | `step` (name of the init step that failed, e.g. `"Target.setAutoAttach"`). |

0 commit comments

Comments
 (0)