|
| 1 | +# CDP Monitor |
| 2 | + |
| 3 | +The monitor is the browser-facing layer of the kernel browser logging pipeline. It connects to Chrome's DevTools endpoint, tracks all page sessions via CDP's `Target.setAutoAttach`, and converts raw CDP notifications into typed `events.Event` values for downstream consumers. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +`cdpmonitor` manages a Chrome DevTools Protocol (CDP) WebSocket connection to a running Chrome browser. It subscribes to CDP events across all attached tabs, translates them into structured `events.Event` values, and publishes them via a caller-supplied `PublishFunc`. It also derives synthetic events from sequences of CDP events and takes screenshots on significant page activity. |
| 8 | + |
| 9 | +Chrome can restart independently of the monitor. When that happens, `UpstreamProvider` pushes a new DevTools URL and the monitor reconnects automatically, emitting lifecycle events so consumers can track continuity. |
| 10 | + |
| 11 | +## Event taxonomy |
| 12 | + |
| 13 | +**CDP-derived** (1-to-1 with a CDP notification): `console_log`, `console_error`, `network_request`, `network_response`, `network_loading_failed`, `page_tab_opened`, `page_navigation`, `page_dom_content_loaded`, `page_load`, `page_layout_shift` |
| 14 | + |
| 15 | +**Computed** (inferred from sequences of CDP events): `network_idle` (fires when in-flight requests drop to zero), `page_layout_settled` (1 s after `page_load` with no intervening layout shifts), `page_navigation_settled` (fires once `page_dom_content_loaded` and `page_layout_settled` have both fired for the same navigation; intentionally independent of `network_idle` so that a single hung request cannot stall the event). |
| 16 | + |
| 17 | +**Interaction** (fired by `interaction.js` via `Runtime.bindingCalled`): `interaction_click`, `interaction_key`, `interaction_scroll_settled` |
| 18 | + |
| 19 | +**Monitor lifecycle** (emitted by the monitor itself, not by Chrome): `monitor_screenshot`, `monitor_disconnected`, `monitor_reconnected`, `monitor_reconnect_failed`, `monitor_init_failed` |
| 20 | + |
| 21 | +## Responsibilities |
| 22 | + |
| 23 | +| Concern | Where | |
| 24 | +| --- | --- | |
| 25 | +| WebSocket lifecycle (connect, read, reconnect) | `monitor.go` | |
| 26 | +| CDP domain setup per session | `domains.go` | |
| 27 | +| Event translation (CDP params to `events.Event`) | `handlers.go` | |
| 28 | +| Synthetic event state machines | `computed.go` | |
| 29 | +| Screenshot capture via ffmpeg | `screenshot.go` | |
| 30 | +| CDP protocol types | `cdp_proto.go`, `types.go` | |
| 31 | +| Interaction tracking injected into the page | `interaction.js` | |
| 32 | +| Body/MIME capture sizing and text truncation helpers | `util.go` | |
| 33 | + |
| 34 | +## Internals |
| 35 | + |
| 36 | +### Reconnect model |
| 37 | + |
| 38 | +`subscribeToUpstream` listens to `UpstreamProvider.Subscribe()` for new DevTools URLs. On each URL change (indicating Chrome restarted), `handleUpstreamRestart` tears down the existing connection, dials the new URL with capped-exponential backoff (250 ms → 500 ms → 1 s → 2 s, up to 10 attempts), then restarts `readLoop` and re-initializes all CDP sessions. `restartMu` serializes concurrent restart signals so rapid Chrome restarts do not produce overlapping reconnects. |
| 39 | + |
| 40 | +### Goroutines |
| 41 | + |
| 42 | +| Goroutine | Lifetime | Tracked by | |
| 43 | +| --- | --- | --- | |
| 44 | +| `readLoop` | one per WebSocket connection | `done` channel | |
| 45 | +| `subscribeToUpstream` | same as `lifecycleCtx` | `asyncWg` | |
| 46 | +| `sweepPendingRequests` | same as `lifecycleCtx` | `asyncWg` | |
| 47 | +| `initSession` | short-lived, one per connect or reconnect | `asyncWg` | |
| 48 | +| `attachExistingTargets` wrapper | short-lived, one per existing target on reconnect | `asyncWg` | |
| 49 | +| `enableDomains` + `injectScript` | short-lived, one per target attach | `asyncWg` | |
| 50 | +| `fetchResponseBody` | one per completed network request | `asyncWg` | |
| 51 | +| `captureScreenshot` | one per screenshot trigger | `asyncWg` | |
| 52 | + |
| 53 | +`Stop()` cancels `lifecycleCtx`, waits for `readLoop` via `done`, then waits for all other goroutines via `asyncWg` before closing the connection. |
| 54 | + |
| 55 | +### Lock ordering |
| 56 | + |
| 57 | +Locks must be acquired left to right. Never hold a lock on the left while acquiring one further right. |
| 58 | + |
| 59 | +``` |
| 60 | +restartMu -> lifeMu -> pendReqMu -> computed.mu -> pendMu |
| 61 | +restartMu -> lifeMu -> sessionsMu |
| 62 | +``` |
| 63 | + |
| 64 | +`computed.mu` and `sessionsMu` are never held simultaneously; `cs.stop()` and `cs.resetOnNavigation()` are called only after the relevant `sessionsMu` critical section is complete. |
| 65 | + |
| 66 | +`bindingRateMu` is independent of this ordering and is always acquired alone. |
| 67 | + |
| 68 | +| Lock | Protects | |
| 69 | +| --- | --- | |
| 70 | +| `restartMu` | Serializes `handleUpstreamRestart` to prevent overlapping reconnects from rapid Chrome restarts | |
| 71 | +| `lifeMu` | `conn`, `lifecycleCtx`, `cancel`, `done`, `readReady` -- all fields that change during Start / Stop / reconnect | |
| 72 | +| `pendReqMu` | `pendingRequests` (requestId -> `networkReqState`): in-flight network requests accumulating request/response metadata until `loadingFinished` | |
| 73 | +| `computed.mu` | All `computedState` fields: counters and timers for the `network_idle`, `page_layout_settled`, and `page_navigation_settled` state machines | |
| 74 | +| `pendMu` | `pending` (id -> reply channel): in-flight CDP commands waiting for a response from Chrome | |
| 75 | +| `sessionsMu` | `sessions` (sessionID -> `targetInfo`): the set of currently attached CDP targets (tabs, iframes, workers) | |
| 76 | +| `bindingRateMu` | `bindingLastSeen` (sessionID:eventType -> time): rate-limit state for `__kernelEvent` binding calls | |
| 77 | + |
| 78 | +Fields that need no mutex use `sync/atomic`: `nextID`, `mainSessionID`, `running`, `lastScreenshotAt`, `screenshotInFlight`. |
| 79 | + |
| 80 | +### WebSocket concurrency |
| 81 | + |
| 82 | +`coder/websocket` guarantees one concurrent `Read` and one concurrent `Write` are safe on the same connection. `readLoop` is the sole reader. All writes go through `send`, which calls `conn.Write` directly -- `conn.Write` is internally serialized by the library, so no external write mutex is needed. |
| 83 | + |
| 84 | +## Event data model |
| 85 | + |
| 86 | +### Envelope and top-level fields |
| 87 | + |
| 88 | +Every event arrives as an `Envelope`: |
| 89 | + |
| 90 | +```json |
| 91 | +{ |
| 92 | + "capture_session_id": "cs_abc123", |
| 93 | + "seq": 42, |
| 94 | + "event": { |
| 95 | + "ts": 1746123456789000, |
| 96 | + "type": "network_request", |
| 97 | + "category": "network", |
| 98 | + "source": { ... }, |
| 99 | + "data": { ... }, |
| 100 | + "truncated": false |
| 101 | + } |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +| Field | Type | Description | |
| 106 | +| --- | --- | --- | |
| 107 | +| `capture_session_id` | string | Pipeline-assigned ID for the capture session (not a CDP concept). | |
| 108 | +| `seq` | uint64 | Monotonically increasing per-capture-session sequence number. | |
| 109 | +| `event.ts` | int64 | Wall-clock time the monitor emitted the event, as **Unix microseconds** (µs since epoch). | |
| 110 | +| `event.type` | string | See [Event taxonomy](#event-taxonomy). | |
| 111 | +| `event.category` | string | One of: `console`, `network`, `page`, `interaction`, `system`. | |
| 112 | +| `event.truncated` | bool | `true` if `data` was nulled to fit the 1 MB pipeline limit. | |
| 113 | + |
| 114 | +### Source object |
| 115 | + |
| 116 | +```json |
| 117 | +"source": { |
| 118 | + "kind": "cdp", |
| 119 | + "event": "Network.requestWillBeSent", |
| 120 | + "metadata": { |
| 121 | + "cdp_session_id": "...", |
| 122 | + "target_id": "...", |
| 123 | + "target_type": "page" |
| 124 | + } |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +| Field | Description | |
| 129 | +| --- | --- | |
| 130 | +| `event` | The raw CDP method that triggered the event (e.g. `Network.requestWillBeSent`). Empty for computed events. | |
| 131 | +| `metadata.cdp_session_id` | The CDP WebSocket session multiplexer ID for this target. Changes if Chrome restarts. | |
| 132 | +| `metadata.target_id` | Stable identifier for the browser target (tab/window). Survives navigations within the same tab. | |
| 133 | +| `metadata.target_type` | Target type as reported by Chrome: `page`, `iframe`, `worker`, etc. | |
| 134 | + |
| 135 | +### CDP identity primer |
| 136 | + |
| 137 | +Five IDs appear across events. Understanding how they nest prevents confusion: |
| 138 | + |
| 139 | +``` |
| 140 | +target_id <- one per tab/window; stable across navigations |
| 141 | +└── cdp_session_id <- WebSocket multiplexer channel to that target; resets on Chrome restart |
| 142 | + └── frame_id <- one per frame (top-level or iframe); changes on navigation |
| 143 | + └── loader_id <- one per document load; links a navigation to its network requests |
| 144 | + └── request_id <- one per request (stable across redirects in a chain) |
| 145 | +``` |
| 146 | + |
| 147 | +| ID | Where it appears | What it identifies | |
| 148 | +| --- | --- | --- | |
| 149 | +| `target_id` | `source.metadata`, most `data` objects | The browser tab. Use this to group all events from one tab session. | |
| 150 | +| `cdp_session_id` | `source.metadata` | The WebSocket sub-channel. Not stable across reconnects. | |
| 151 | +| `frame_id` | `page_navigation`, `network_request`, `network_response`, `network_loading_failed` | The frame the request or navigation belongs to. Top-level frame has no `parent_frame_id`. | |
| 152 | +| `source_frame_id` | `page_layout_shift` | The frame where the layout shift occurred. Distinct from the nav context `frame_id`, which is always the top-level navigated frame. | |
| 153 | +| `loader_id` | `page_navigation`, `network_request`, `network_response` | The document load that owns a request. Join `network_request.loader_id` to `page_navigation.loader_id` to correlate requests with the navigation that triggered them. | |
| 154 | +| `request_id` | `network_request`, `network_response`, `network_loading_failed` | A single request chain (including redirects). Links request to its eventual response or failure. | |
| 155 | + |
| 156 | +### Navigation context fields |
| 157 | + |
| 158 | +Most event `data` objects include a nav context block stamped at the last `page_navigation`. These fields reflect the top-level frame most recently navigated in the session: |
| 159 | + |
| 160 | +| Field | Description | |
| 161 | +| --- | --- | |
| 162 | +| `session_id` | Same as `source.metadata.cdp_session_id`. Repeated for data-only consumers. | |
| 163 | +| `frame_id` | Frame ID of the navigated top-level frame. | |
| 164 | +| `loader_id` | Loader ID of the current document. | |
| 165 | +| `url` | URL of the current page at the time of the last navigation. | |
| 166 | +| `nav_seq` | Monotonically increasing counter, incremented on each `page_navigation`. Use it to detect that the page has navigated between two events in the same session. | |
| 167 | + |
| 168 | +### Per-event data fields |
| 169 | + |
| 170 | +Fields below are the unique additions per event type. Unless otherwise noted, events also include the nav context fields described above. Network events are the exception: they carry their own `loader_id` and `frame_id` directly and do not include nav context. |
| 171 | + |
| 172 | +#### Console events |
| 173 | + |
| 174 | +| Event | Unique fields | |
| 175 | +| --- | --- | |
| 176 | +| `console_log` | `level` (CDP type string), `text` (first arg), `args` (all args as strings), `stack_trace` | |
| 177 | +| `console_error` | Same as `console_log` when `source.event` is `Runtime.consoleAPICalled`. When `source.event` is `Runtime.exceptionThrown`: `text`, `line`, `column`, `source_url` (script file URL, not page URL), `stack_trace`. | |
| 178 | + |
| 179 | +#### Network events |
| 180 | + |
| 181 | +| Event | Fields | |
| 182 | +| --- | --- | |
| 183 | +| `network_request` | `request_id`, `loader_id`, `frame_id`, `document_url`, `method`, `url`, `headers`, `initiator_type`. Optional: `post_data`, `resource_type`, `is_redirect` + `redirect_url`. | |
| 184 | +| `network_response` | `request_id`, `loader_id`, `frame_id`, `method`, `url`, `status`, `headers`. Optional: `status_text`, `mime_type`, `resource_type`, `body` (truncated text body for textual MIME types). | |
| 185 | +| `network_loading_failed` | `request_id`, `error_text`, `canceled`. Optional (absent when the request record was not found): `url`, `loader_id`, `frame_id`, `resource_type`. | |
| 186 | + |
| 187 | +#### Page events |
| 188 | + |
| 189 | +| Event | Unique fields | |
| 190 | +| --- | --- | |
| 191 | +| `page_tab_opened` | `target_id`, `target_type`, `url`, `opener_id`, `title`. Emitted before the first navigation; no nav context. | |
| 192 | +| `page_navigation` | `session_id`, `target_id`, `target_type`, `url`, `frame_id`, `parent_frame_id` (absent for top-level frames), `loader_id`. This event establishes the nav context stamped on all subsequent events for the session. | |
| 193 | +| `page_dom_content_loaded` | Nav context + `cdp_timestamp` (CDP monotonic seconds; not a wall-clock timestamp -- use `event.ts` for ordering). | |
| 194 | +| `page_load` | Nav context + `cdp_timestamp` (CDP monotonic seconds). | |
| 195 | +| `page_layout_shift` | Nav context + `source_frame_id`, `time`, `duration`. Optional `layout_shift_details` object: `value`, `had_recent_input`. Optional `lcp_details` object: `render_time`, `load_time`, `size`, `element_id`, `url`, `node_id`. Chrome multiplexes LCP candidate data through the same `PerformanceTimeline.timelineEventAdded` notification, so both may appear on a single event. | |
| 196 | + |
| 197 | +#### Computed events |
| 198 | + |
| 199 | +`network_idle`, `page_layout_settled`, and `page_navigation_settled` carry nav context fields only. |
| 200 | + |
| 201 | +#### Interaction events |
| 202 | + |
| 203 | +All interaction events include nav context plus the fields below. |
| 204 | + |
| 205 | +| Event | Unique fields | |
| 206 | +| --- | --- | |
| 207 | +| `interaction_click` | `x`, `y` (viewport coords), `selector` (CSS selector of clicked element), `tag`, `text` (element text; empty for sensitive inputs). | |
| 208 | +| `interaction_key` | `key` (key name), `selector`, `tag`. Not emitted for sensitive input fields. | |
| 209 | +| `interaction_scroll_settled` | `from_x`, `from_y`, `to_x`, `to_y` (scroll positions in px), `target_selector`. | |
| 210 | + |
| 211 | +#### Monitor lifecycle events |
| 212 | + |
| 213 | +Lifecycle events use `source.kind = "local_process"` and carry no nav context, except `monitor_screenshot` which includes nav context alongside the image payload. |
| 214 | + |
| 215 | +| Event | Fields | |
| 216 | +| --- | --- | |
| 217 | +| `monitor_screenshot` | Nav context + `png` (base64-encoded PNG). | |
| 218 | +| `monitor_disconnected` | `reason: "chrome_restarted"`. | |
| 219 | +| `monitor_reconnected` | `reconnect_duration_ms`. | |
| 220 | +| `monitor_reconnect_failed` | `reason: "reconnect_exhausted"`. | |
| 221 | +| `monitor_init_failed` | `step` (name of the init step that failed, e.g. `"Target.setAutoAttach"`). | |
0 commit comments