|
| 1 | +# Browser to API — Reference |
| 2 | + |
| 3 | +Exhaustive reference for every script, flag, file format, and configuration knob the skill exposes. |
| 4 | + |
| 5 | +## Scripts |
| 6 | + |
| 7 | +All scripts are Node ESM (`type: module`). They depend only on the Node standard library. `discover.mjs` is the top-level dispatcher; the others are stage scripts the dispatcher calls in order. Run an individual stage with `discover.mjs --stage <name>` for debugging or partial reruns. |
| 8 | + |
| 9 | +### `discover.mjs --run <path> [flags]` |
| 10 | + |
| 11 | +Top-level dispatcher. Runs `load → filter → normalize → infer → emit` in order. With `--stage <name>`, runs only that stage (assumes prior stages already wrote their intermediate file). |
| 12 | + |
| 13 | +### `open-swagger-ui.mjs (--run <path> | --spec <path>) [flags]` |
| 14 | + |
| 15 | +Preview an emitted OpenAPI spec in a local Swagger UI checkout. The script serves the Swagger UI `dist/` assets and the generated spec from one local HTTP origin, injects a per-run `swagger-initializer.js`, opens the browser by default, and keeps the server alive until interrupted. |
| 16 | + |
| 17 | +- `--run <path>` loads `<run>/api-spec/openapi.yaml`, falling back to `openapi.json`. |
| 18 | +- `--spec <path>` previews an explicit OpenAPI YAML/JSON file. |
| 19 | +- `--swagger-ui <path>` points at a Swagger UI checkout/package directory. If omitted, the script tries `$SWAGGER_UI_DIR`, `~/Developer/swagger-ui`, and `node_modules/swagger-ui-dist`. |
| 20 | +- `--host <host>` defaults to `127.0.0.1`. |
| 21 | +- `--port <port>` defaults to a random free port. |
| 22 | +- `--no-open` prints the URL without opening a browser. |
| 23 | + |
| 24 | +### `load.mjs <run-path> <out-dir> [bodies-dir]` |
| 25 | + |
| 26 | +- Reads `cdp/network/requests.jsonl` and `cdp/network/responses.jsonl`. |
| 27 | +- Pairs by `requestId`. Drops `OPTIONS` (CORS preflight) and pure redirects (status 3xx with `Location` and no body — recorded as metadata on the *next* request in the chain when the requestId carries forward, otherwise dropped). |
| 28 | +- Drops resource types that are not `XHR`, `Fetch`, or `Document` (skips `Image`, `Stylesheet`, `Font`, `Media`, `Manifest`, `Other`, `Script` unless the URL clearly looks like an API endpoint). |
| 29 | +- **Body join**: if a `browse network` capture dir is provided (via `--bodies` or auto-detected at `<run>/cdp/network/bodies/`), each subdir's `request.json` + `response.json` are read and joined to paired rows by `requestId`. The browse-network `id` field IS the CDP requestId for XHR/Fetch resource types, so the join is exact (not URL-or-timestamp matching). Bodies that look like JSON are parsed; otherwise the raw string is preserved. |
| 30 | +- Output: `intermediate/paired.jsonl` — one row per pair with `{ method, url, status, reqHeaders, reqBody, respHeaders, respBody, contentType, type, ts }`. |
| 31 | + |
| 32 | +### `filter.mjs <run-path>` |
| 33 | + |
| 34 | +- Reads `intermediate/paired.jsonl`. |
| 35 | +- Applies `--include` / `--exclude` / `--origins`. |
| 36 | +- Applies built-in exclude list (analytics hosts, sourcemaps, service workers, fonts/CSS that snuck through). |
| 37 | +- Output: `intermediate/filtered.jsonl`. |
| 38 | + |
| 39 | +### `normalize.mjs <run-path>` |
| 40 | + |
| 41 | +- Templatizes paths. Detection order per segment: |
| 42 | + 1. UUID v1–v5 → `{id}` (`string`, `format: uuid`). |
| 43 | + 2. Pure integer → `{id}` (`integer`). |
| 44 | + 3. Hex/base62 ≥ 8 chars → `{id}` (`string`). |
| 45 | + 4. If the same position varies across multiple samples and is short alpha → `{slug}` (`string`). |
| 46 | + 5. Otherwise the segment is left static. |
| 47 | +- Groups paired samples by `(origin, method, templatedPath)`. |
| 48 | +- Collects query parameters across samples; marks `required: true` only when every sample carries the param. |
| 49 | +- If two pre-normalization templates would collapse but yield divergent response status/content-type signatures, they're kept split and flagged. |
| 50 | +- Output: `intermediate/endpoints.jsonl` — one row per endpoint with `{ origin, method, path, samples[], queryParams, statusCodes, normalizationFlags }`. |
| 51 | + |
| 52 | +### `infer.mjs <run-path>` |
| 53 | + |
| 54 | +- For each endpoint, runs JSON-Schema inference across request bodies and (when present) response bodies. |
| 55 | +- Merge rules: required = present-in-all, types = union of observed types, arrays infer item schema, enum detected when ≤ 8 distinct values across ≥ 5 samples. |
| 56 | +- Format hints: `date-time` (ISO-ish), `uri`, `email`, `uuid`. |
| 57 | +- Picks a representative sample (most-recent successful 2xx) and writes redacted request/response example to `samples/`. |
| 58 | +- Output: `intermediate/endpoints.with-schemas.jsonl`. |
| 59 | + |
| 60 | +### `emit.mjs <run-path>` |
| 61 | + |
| 62 | +- Builds the OpenAPI 3.1 document. |
| 63 | +- Hoists structurally-identical schemas into `components.schemas` keyed by structural hash, with names derived from path tokens (`Item`, `Item_List`, etc.) — falls back to `Schema1`, `Schema2` if no path hint applies. |
| 64 | +- Writes `openapi.yaml`, `openapi.json`, `report.md`, `confidence.json`. |
| 65 | + |
| 66 | +## File formats |
| 67 | + |
| 68 | +### `intermediate/paired.jsonl` |
| 69 | + |
| 70 | +```json |
| 71 | +{ |
| 72 | + "requestId": "12345.678", |
| 73 | + "method": "GET", |
| 74 | + "url": "https://api.example.com/v1/items/42?page=2", |
| 75 | + "origin": "https://api.example.com", |
| 76 | + "path": "/v1/items/42", |
| 77 | + "query": { "page": "2" }, |
| 78 | + "status": 200, |
| 79 | + "type": "Fetch", |
| 80 | + "contentType": "application/json", |
| 81 | + "reqHeaders": { "accept": "application/json" }, |
| 82 | + "reqBody": null, |
| 83 | + "respHeaders": { "content-type": "application/json" }, |
| 84 | + "respBody": null, |
| 85 | + "ts": 1714400000000 |
| 86 | +} |
| 87 | +``` |
| 88 | + |
| 89 | +`reqBody` is the verbatim `postData` from `Network.requestWillBeSent` (parsed if JSON). `respBody` is `null` unless a `browse network` capture dir was joined in (see below) — `browse cdp` does not embed bodies. |
| 90 | + |
| 91 | +### Joining `browse network` bodies |
| 92 | + |
| 93 | +`browse network on` is a separate command from the `browse` CLI that writes per-request `request.json` + `response.json` files (with full bodies) to a temp directory. Discover joins these into the trace by `requestId`. |
| 94 | + |
| 95 | +Workflow: |
| 96 | + |
| 97 | +```bash |
| 98 | +# during capture, alongside browser-trace |
| 99 | +browse network on |
| 100 | +# ...drive... |
| 101 | +# IMPORTANT: snapshot the dir before it gets reused |
| 102 | +cp -r "$(browse network path | jq -r .path)" .o11y/<run>/cdp/network/bodies/ |
| 103 | +browse network off |
| 104 | +``` |
| 105 | + |
| 106 | +Internals (matched in `lib/io.mjs` + `load.mjs`): |
| 107 | + |
| 108 | +- The browse-network entry's `request.json.id` field equals the CDP `requestId` for XHR/Fetch resource types. The join is by exact `requestId`, not URL or timestamp. |
| 109 | +- For Document loads, the `id` field is a non-CDP UUID and won't match — those bodies are silently skipped (Documents aren't useful for API spec inference anyway). |
| 110 | +- `response.json` from `browse network` may have empty `status` / `headers` / `mimeType` for some loads — that's fine, those are taken from the CDP firehose. Only `body` is read. |
| 111 | +- The capture dir is shared per `browse` daemon session (`/tmp/.../browse-default-network/`). Run `browse network on` then snapshot the dir before another `browse network on` overwrites it. |
| 112 | + |
| 113 | +### `intermediate/endpoints.jsonl` |
| 114 | + |
| 115 | +```json |
| 116 | +{ |
| 117 | + "endpointKey": "GET https://api.example.com/v1/items/{id}", |
| 118 | + "origin": "https://api.example.com", |
| 119 | + "method": "GET", |
| 120 | + "path": "/v1/items/{id}", |
| 121 | + "rawPaths": ["/v1/items/42", "/v1/items/97"], |
| 122 | + "pathParams": [{ "name": "id", "in": "path", "schema": { "type": "integer" } }], |
| 123 | + "queryParams": [{ "name": "page", "in": "query", "required": false, "schema": { "type": "string" } }], |
| 124 | + "statusCodes": [200, 200, 404], |
| 125 | + "samples": [/* indices into paired.jsonl */], |
| 126 | + "normalizationFlags": [] |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +### `confidence.json` |
| 131 | + |
| 132 | +```json |
| 133 | +{ |
| 134 | + "endpoints": [ |
| 135 | + { |
| 136 | + "key": "GET /v1/items/{id}", |
| 137 | + "samples": 7, |
| 138 | + "statusCodes": [200, 404], |
| 139 | + "responseBodyKnown": false, |
| 140 | + "requestBodyKnown": false, |
| 141 | + "normalizationFlags": [], |
| 142 | + "confidence": "medium" |
| 143 | + } |
| 144 | + ] |
| 145 | +} |
| 146 | +``` |
| 147 | + |
| 148 | +`confidence` is a coarse bucket: `low` (1–2 samples or normalization flags), `medium` (3–9 samples, no flags), `high` (≥ 10 samples, multi-status, no flags). |
| 149 | + |
| 150 | +## CLI flags (full) |
| 151 | + |
| 152 | +| Flag | Default | Notes | |
| 153 | +|---|---|---| |
| 154 | +| `--run <path>` | required | Resolves `cdp/network/{requests,responses}.jsonl` underneath | |
| 155 | +| `--out <path>` | `<run>/api-spec` | | |
| 156 | +| `--bodies <path>` | auto | `browse network` capture dir to join into the trace. Auto-detected from `<run>/cdp/network/bodies/` when present | |
| 157 | +| `--include <regex>` | none | Repeatable. ORed together. Applied after `--origins` | |
| 158 | +| `--exclude <regex>` | (defaults) | Repeatable. Combined with built-in defaults | |
| 159 | +| `--origins <list>` | none | Comma-separated. If set, anything *not* matching is dropped before include/exclude | |
| 160 | +| `--format <yaml\|json\|both>` | `both` | Format of the emitted spec | |
| 161 | +| `--title <string>` | derived | `info.title` in the OpenAPI doc | |
| 162 | +| `--redact <list>` | (defaults) | Comma-separated extra header names / JSON keys to scrub. Adds to defaults; never replaces | |
| 163 | +| `--min-samples <n>` | `1` | Drop endpoints below this threshold (still listed in the report) | |
| 164 | +| `--stage <name>` | (all) | One of `load`, `filter`, `normalize`, `infer`, `emit` | |
| 165 | + |
| 166 | +## Swagger UI preview flags |
| 167 | + |
| 168 | +| Flag | Default | Notes | |
| 169 | +|---|---|---| |
| 170 | +| `--run <path>` | required unless `--spec` is set | Resolves a browser-trace run and previews `<run>/api-spec/openapi.yaml` or `openapi.json` | |
| 171 | +| `--spec <path>` | required unless `--run` is set | Explicit OpenAPI YAML/JSON path | |
| 172 | +| `--swagger-ui <path>` | auto | Checkout/package dir containing either `dist/index.html` or `index.html` + `swagger-ui-bundle.js` | |
| 173 | +| `--host <host>` | `127.0.0.1` | Preview server bind host | |
| 174 | +| `--port <port>` | random | Preview server bind port | |
| 175 | +| `--no-open` | false | Print the URL without launching the browser | |
| 176 | + |
| 177 | +## Default exclude list |
| 178 | + |
| 179 | +URLs matching these patterns are dropped before any analysis (regex, applied to the full URL): |
| 180 | + |
| 181 | +- Analytics: `segment\.(io\|com)`, `mixpanel\.com`, `google-analytics\.com`, `googletagmanager\.com`, `datadog(hq)?\.com`, `sentry\.io`, `amplitude\.com`, `fullstory\.com`, `hotjar\.com`, `intercom\.io`, `clarity\.ms`, `cloudflareinsights\.com`, `doubleclick\.net`, `facebook\.com/tr` |
| 182 | +- Static-only file extensions: `\.(png|jpe?g|gif|svg|webp|ico|woff2?|ttf|eot|otf|css|map|mp4|webm|mp3)(\?|$)` |
| 183 | +- Service worker / metadata: `/sw\.js`, `/service-worker\.js`, `/manifest\.json$`, `/robots\.txt$`, `/favicon\.ico$` |
| 184 | + |
| 185 | +Override granularly via `--include` (which wins over default `--exclude`). |
| 186 | + |
| 187 | +## Default redactions |
| 188 | + |
| 189 | +Headers (case-insensitive): `authorization`, `cookie`, `set-cookie`, `x-csrf-token`, `x-xsrf-token`, `x-api-key`, `proxy-authorization`, plus any header name matching `*token*`, `*secret*`, `*signature*`. |
| 190 | + |
| 191 | +Body keys: `password`, `token`, `secret`, `api_key`, `apiKey`, `accessToken`, `refreshToken`, `creditCard`, `ssn`. |
| 192 | + |
| 193 | +Body values (regex): JWTs (`^eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$`), email addresses (`@` + TLD), phone numbers (E.164-ish). |
| 194 | + |
| 195 | +Redacted values are replaced with `"<redacted>"` so type information is preserved for schema inference. |
| 196 | + |
| 197 | +## Path templating heuristics |
| 198 | + |
| 199 | +Per-segment classifier in `scripts/lib/path-template.mjs`: |
| 200 | + |
| 201 | +| Pattern | Replacement | OpenAPI schema | |
| 202 | +|---|---|---| |
| 203 | +| 8-4-4-4-12 hex (UUID) | `{id}` | `{ type: string, format: uuid }` | |
| 204 | +| `\d+` | `{id}` | `{ type: integer }` | |
| 205 | +| `[A-Za-z0-9]{8,}` (no vowels-only / dictionary check) | `{id}` | `{ type: string }` | |
| 206 | +| Same-position alpha tokens varying across ≥ 2 samples | `{slug}` | `{ type: string }` | |
| 207 | + |
| 208 | +When multiple variable segments exist in one path, names are suffixed: `{id}`, `{id2}`, `{id3}`. The `--name-params` flag (future) will use sibling segment hints (`/products/42` → `{productId}`). |
| 209 | + |
| 210 | +## Confidence flags |
| 211 | + |
| 212 | +Possible entries in `normalizationFlags`: |
| 213 | + |
| 214 | +- `divergent-response-shape` — pre-normalization paths collapsed to the same template but had structurally different responses. The skill keeps them split and emits both. |
| 215 | +- `single-sample` — endpoint observed exactly once. |
| 216 | +- `single-status` — only one status code observed; spec lists only that response. |
| 217 | +- `mixed-content-types` — different `content-type` values across samples. |
| 218 | +- `request-body-only-on-some-samples` — POST/PUT seen with and without a body. |
| 219 | + |
| 220 | +## OpenAPI extensions |
| 221 | + |
| 222 | +The emitter writes a few `x-*` extensions on each operation: |
| 223 | + |
| 224 | +- `x-confidence`: `{ samples, statusCodes, normalizationFlags }` |
| 225 | +- `x-origin`: the origin this operation was observed on (when multiple servers are listed) |
| 226 | +- `x-observed-auth`: array of auth-shaped header names seen on this endpoint (e.g. `["authorization", "x-api-key"]`) |
| 227 | +- `x-sample-count`: total number of paired samples backing the operation |
| 228 | + |
| 229 | +These extensions are stripped from `report.md` (which is human-facing) but preserved in the YAML/JSON. |
| 230 | + |
| 231 | +## Configuration via env |
| 232 | + |
| 233 | +| Var | Default | Effect | |
| 234 | +|---|---|---| |
| 235 | +| `O11Y_ROOT` | `.o11y` | Inherited from `browser-trace`. Used only when `--run` is bare run id rather than a full path | |
| 236 | +| `DISCOVER_ENUM_MAX_DISTINCT` | `8` | Max distinct values to consider a field an enum | |
| 237 | +| `DISCOVER_ENUM_MIN_SAMPLES` | `5` | Min samples before enum detection runs | |
| 238 | +| `SWAGGER_UI_DIR` | auto | Optional Swagger UI checkout/package dir for `open-swagger-ui.mjs` | |
| 239 | + |
| 240 | +## Troubleshooting |
| 241 | + |
| 242 | +| Symptom | Likely cause | Fix | |
| 243 | +|---|---|---| |
| 244 | +| `paired.jsonl` is empty | trace contains no `Network.requestWillBeSent` events for XHR/Fetch | re-run `browser-trace` exercising the dynamic flows; static-only sites won't yield endpoints | |
| 245 | +| `openapi.yaml` has only `paths: {}` | every paired request was filtered out | check `--origins` and the default exclude list; pass `--include '.*'` to bypass filtering | |
| 246 | +| Path templating collapses too aggressively | numeric IDs being misread as enums, or dictionary words misread as slugs | add `--exclude` for the noisy paths and re-run, or file an issue with the trace | |
| 247 | +| Schemas show `type: "string"` for everything | request/response bodies aren't valid JSON or weren't captured | check `paired.jsonl` for `reqBody`/`respBody` content — if `null`, bodies weren't in the trace | |
| 248 | +| Spec validator complains about `info.version` | derived version is `0.1.0-discovered` which some tools dislike | pass `--version 0.1.0` (TODO) or post-edit the file | |
| 249 | +| `Swagger UI not found` | no local Swagger UI checkout/package was detected | clone `https://github.com/swagger-api/swagger-ui` to `~/Developer/swagger-ui`, or pass `--swagger-ui <path>` / set `SWAGGER_UI_DIR` | |
0 commit comments