Add browser-reverse skill — OpenAPI 3.1 from browser-trace captures by derekmeegan · Pull Request #88 · browserbase/skills

derekmeegan · 2026-04-29T18:16:00Z

Summary

browser-reverse consumes a browser-trace run directory and emits an OpenAPI 3.1 spec for the publicly-observable HTTP API of any website, plus a human-readable coverage report and per-endpoint confidence metadata. Pure offline post-processing — composes cleanly with the existing browser-trace skill rather than duplicating capture.

Pipeline (each stage is a discrete script for debuggability via --stage):

load → filter → normalize → infer → emit

Highlights

Path templating — UUIDs, integers, hex/base62 IDs, plus a second-pass slug detector for varying alpha segments. Multi-param paths get {id}, {id2}, etc.
Schema inference (lib/schema-merge.mjs) — JSON-Schema from samples with required-intersection, type unions, format hints (date-time, uri, email, uuid), and enum detection that requires meaningful repetition (not just low cardinality).
Component hoisting with $ref — recurses into nested object/array schemas, hoists when referenced ≥ 2 times OR when it's an object with ≥ 4 properties. Names derived from path tokens.
Redaction — credentials in headers (Authorization, Cookie, *-token, etc.), in body keys (password, apiKey, etc.), and value patterns (JWTs, emails, phone numbers). Replaces values with <redacted> to preserve types for inference.
browse network on integration — pass --bodies <path> (or stash bodies under <run>/cdp/network/bodies/, which is auto-detected) to join real response bodies into the trace by CDP requestId. Without it, the spec has request bodies but no response-body schemas (the browse cdp firehose doesn't embed bodies).
Cross-origin path collisions handled — when two origins serve the same (method, path), the higher-sample operation wins and other origins are recorded under x-also-served-from rather than silently dropped.
Honest reporting — report.md lists every endpoint with samples, statuses, confidence, and normalization flags (single-sample, single-status, mixed-content-types, divergent-response-shape, request-body-only-on-some-samples).

Composition with browser-trace

node ../browser-trace/scripts/start-capture.mjs 9222 my-site
browse env local 9222
browse network on                                 # capture bodies (recommended)
browse open https://example.com
# ...drive flows...
cp -r "$(browse network path | jq -r .path)" .o11y/my-site/cdp/network/bodies/
browse network off
node ../browser-trace/scripts/stop-capture.mjs my-site
node ../browser-trace/scripts/bisect-cdp.mjs my-site

node scripts/discover.mjs --run .o11y/my-site

End-to-end testing

Pipeline ran clean against six sites; five real bugs surfaced and fixed during this work:

Site	Outcome
Hacker News	7 endpoints; query-param type inference (`integer` vs `string` on `id` based on values)
jsonplaceholder.typicode.com	6 endpoints; POST/PUT body schemas, multi-status `200`+`404`, header + body redaction
derekmeegan.com (Next.js)	4 endpoints; `_rsc` query param, Vercel analytics body schema, mixed-content-types detection
browserbase.com	39 endpoints across 14 origins; multi-param path `/pixel/{id}/visitor/{id2}/cerebro`, 12 components hoisted
browser-use.com	23 endpoints; discovered `/api/md/<slug>` LLM-friendly markdown export endpoint
reddit.com	20 endpoints, 30 components, 2 servers; full schema for `/svc/shreddit/events` (Reddit's internal telemetry, 18 nested types), live `ExposeVariant` GraphQL exposure capturing experiment names

Bugs surfaced and fixed during E2E:

Enum over-detection — required distinct ≤ floor(samples/2) so unique IDs don't become enums.
Component hoisting silently disabled — {...}.length is undefined; rewrote to use Object.keys(...).length and recurse into nested schemas.
Redaction double-counting — redactBody() called twice per body; redact once and reuse.
YAML emitter producing invalid scalars — @, `, #, etc. as first character now trigger quoting (was breaking on @vercel/analytics/react).
Cross-origin path collision data loss — paths.<path>.<method> is unique in OpenAPI; higher-sample winner now recorded with x-also-served-from extension.

Files

SKILL.md / REFERENCE.md — skill docs, file format reference, jq recipes, troubleshooting
scripts/discover.mjs — top-level dispatcher with --stage for partial runs
scripts/{load,filter,normalize,infer,emit}.mjs — pipeline stages
scripts/lib/{io,redact,path-template,schema-merge,yaml}.mjs — pure helpers
BODY-CAPTURE-LIFT.md — design doc for adding native body capture to browser-trace (alternative to the current browse network on pairing). Open question for maintainers; no code change in this PR.

Test plan

Run end-to-end against a public site of your choice (e.g. your own marketing site or a public docs page) following the workflow in SKILL.md
Verify openapi.yaml parses with a YAML library (python -c "import yaml; yaml.safe_load(open('...'))")
Verify openapi.json parses (jq . openapi.json)
Confirm report.md correctly flags low-confidence endpoints
Try the --bodies flag with a browse network on capture and confirm response-body schemas appear in the spec

🤖 Generated with Claude Code

Note

Medium Risk
New end-to-end pipeline that processes potentially sensitive trace data (including optional request/response bodies) and emits schemas/specs; correctness and redaction behavior are important to avoid leaking secrets or producing misleading specs.

Overview
Adds a new browser-to-api skill that post-processes a browser-trace run into a best-effort OpenAPI 3.1 spec plus report.md, confidence.json, and per-endpoint redacted samples, implemented as a discover.mjs pipeline (load→filter→normalize→infer→emit).

The pipeline pairs CDP request/response events (optionally joining full bodies from browse network on via --bodies), filters noise via include/exclude/origin rules, templatizes paths (IDs + slug inference), infers JSON Schemas with redaction, hoists repeated schemas into components, and emits YAML/JSON using an in-repo YAML writer; docs (SKILL.md, REFERENCE.md) describe flags, file formats, and troubleshooting.

^{Reviewed by Cursor Bugbot for commit 9446f91. Bugbot is set up for automated code reviews on this repo. Configure here.}

…aptures Consumes a browser-trace run (.o11y/<run>/), pairs CDP request/response events, templatizes paths, infers JSON schemas from samples, and emits an OpenAPI 3.1 document with a coverage report and confidence metadata. Pipeline: load → filter → normalize → infer → emit. Each stage is a discrete script writing to intermediate/ for debuggability. Optional --bodies <path> flag joins a `browse network on` capture by CDP requestId so response bodies feed into schema inference. E2E tested against Hacker News, jsonplaceholder, derekmeegan.com, browserbase.com, browser-use.com, reddit.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

shrey150

Some things to fix or address before rereviewing

shrey150 · 2026-04-29T19:45:56Z

+
+```
+browser-trace        →  .o11y/<run>/cdp/network/{requests,responses}.jsonl
+discover-api-spec    →  .o11y/<run>/api-spec/openapi.yaml + report.md


Is this skill actually defined anywhere in this PC?

shrey150 · 2026-04-29T19:46:37Z

+
+`discover.mjs` auto-detects `<run>/cdp/network/bodies/`. To use a body capture from elsewhere (e.g. didn't snapshot, want the live `browse network` dir), pass `--bodies <path>` explicitly.
+
+Then deliver the artifacts to the user (`exec.sendFile()` for `openapi.yaml` and `report.md`).


exec.sendFile() is for bb not for general use right?

this was from a claude memory i think lol...

shrey150 · 2026-04-29T19:47:18Z

@@ -0,0 +1,118 @@
+# Adding Response Body Capture to `browser-trace` — Lift Estimate


Is this plan mode slop lol

shrey150 · 2026-04-29T19:47:59Z

@@ -0,0 +1,6 @@
+{
+  "name": "browser-reverse",


I would still like a different name, like /discover-api or /browser-to-api or /website-to-api

shrey150 · 2026-04-29T19:49:19Z

@@ -0,0 +1,240 @@
+# Browser Reverse — Reference


I'm not sure this is what a REFERENCE.md file should be - it should exhaustively describe all commands used by the skill, I would maybe recommend removing the Pipeline portion here

Renaming and doc cleanup (per shrey150): - Rename skill from `browser-reverse` to `browser-to-api`. Updates SKILL.md frontmatter + heading, package.json, REFERENCE.md heading, the OpenAPI doc's `info.description`, and the report.md heading. - Fix the stale `discover-api-spec` reference in SKILL.md's composition diagram (left over from an earlier rename). - Drop `BODY-CAPTURE-LIFT.md` from the PR; it's a separate proposal. - Remove the `exec.sendFile()` reference in SKILL.md (browserbase-internal, not a generic skill primitive). - REFERENCE.md restructured to lead with the script/CLI/file-format reference rather than an architecture intro. Pipeline diagram dropped. Bug fixes (per Cursor Bugbot): - `filter.mjs`: rework precedence so `--include` actually rescues URLs that would be hit by a default exclude, matching the documented contract. User `--exclude` still wins. Added a unit-style test path. - `infer.mjs`: skip response-body samples whose CDP status is null. Previously they were keyed under `"0"` but `emit.mjs` only iterates `ep.statusCodes` (which excludes nulls), silently discarding the body. - `load.mjs`: fix the comment in `urlQuery()` — code is first-value-wins, not last-value-wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

derekmeegan · 2026-04-29T20:36:12Z

Pushed 9446f91 addressing all review comments:

@shrey150

✅ Renamed skill to browser-to-api. Updates frontmatter, heading, package.json, OpenAPI info.description, and report.md heading.
✅ Fixed the stale discover-api-spec reference in SKILL.md (line 17 in your comment).
✅ Removed BODY-CAPTURE-LIFT.md from this PR.
✅ Removed the exec.sendFile() reference.
✅ Restructured REFERENCE.md to lead with the script/CLI/file-format reference; dropped the architecture pipeline diagram.

@cursor[bot]

✅ filter.mjs — reworked precedence so --include rescues URLs hit by default excludes (matches the documented contract). User --exclude still wins. Verified with an inline test against app.map (sourcemap default-exclude) being rescued by --include 'app\.map'.
✅ infer.mjs — skip response-body samples whose CDP status is null instead of keying under "0" and having the data silently discarded by emit.
✅ load.mjs — fixed the misleading "Last value wins" comment; code is first-value-wins (which is fine for our use — we only need parameter names + a representative value for type inference).

Branch name is still add-browser-reverse-skill (pre-rename) but the skill itself is browser-to-api everywhere.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 9446f91. Configure here.}

cursor · 2026-04-29T20:44:20Z

+
+  function isKeySecret(name) {
+    const k = String(name).toLowerCase().replace(/[_-]/g, '');
+    return KEY_DENY.has(k) || extraKeys.has(k);


Extra redaction keys silently fail for body matching

Medium Severity

isKeySecret normalizes the input name by stripping underscores and hyphens via .replace(/[_-]/g, ''), but extraKeys stores user-provided --redact values with only toLowerCase() applied — no underscore/hyphen stripping. A user passing --redact my_secret_key stores my_secret_key in extraKeys, but the lookup normalizes the JSON key to mysecretkey, which never matches. User-specified body key redactions containing _ or - are silently ignored, potentially leaking credentials the user explicitly asked to scrub.

Additional Locations (1)

skills/browser-to-api/scripts/lib/redact.mjs#L23-L27

^{Reviewed by Cursor Bugbot for commit 9446f91. Configure here.}

cursor · 2026-04-29T20:44:20Z

+    }
+  }
+  for (const [key, origins] of Object.entries(collisions)) {
+    const [m, p] = key.split(' ');


Collision key split breaks on space-containing paths

Low Severity

The collision key is built as `${m} ${ep.path}` and later destructured via key.split(' ') into [m, p]. If a path ever contained a literal space, split(' ') would produce more than two elements and p would only capture the first path segment, causing paths[p][m] to be undefined and throwing at runtime. Using indexOf(' ') to split on the first space only would be more robust.

Additional Locations (1)

skills/browser-to-api/scripts/emit.mjs#L229-L230

^{Reviewed by Cursor Bugbot for commit 9446f91. Configure here.}

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread skills/browser-reverse/scripts/filter.mjs Outdated

Comment thread skills/browser-to-api/scripts/infer.mjs

Comment thread skills/browser-to-api/scripts/load.mjs

shrey150 requested changes Apr 29, 2026

View reviewed changes

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add browser-reverse skill — OpenAPI 3.1 from browser-trace captures#88

Add browser-reverse skill — OpenAPI 3.1 from browser-trace captures#88
derekmeegan wants to merge 2 commits intomainfrom
add-browser-reverse-skill

derekmeegan commented Apr 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shrey150 left a comment

Uh oh!

shrey150 Apr 29, 2026

Uh oh!

shrey150 Apr 29, 2026

Uh oh!

derekmeegan Apr 30, 2026

Uh oh!

shrey150 Apr 29, 2026

Uh oh!

shrey150 Apr 29, 2026

Uh oh!

shrey150 Apr 29, 2026

Uh oh!

derekmeegan commented Apr 29, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 29, 2026

Uh oh!

cursor Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		`discover.mjs` auto-detects `<run>/cdp/network/bodies/`. To use a body capture from elsewhere (e.g. didn't snapshot, want the live `browse network` dir), pass `--bodies <path>` explicitly.

		Then deliver the artifacts to the user (`exec.sendFile()` for `openapi.yaml` and `report.md`).

		@@ -0,0 +1,118 @@
		# Adding Response Body Capture to `browser-trace` — Lift Estimate

Conversation

derekmeegan commented Apr 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Highlights

Composition with browser-trace

End-to-end testing

Files

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shrey150 left a comment

Choose a reason for hiding this comment

Uh oh!

shrey150 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

shrey150 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

derekmeegan Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

shrey150 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

shrey150 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

shrey150 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

derekmeegan commented Apr 29, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 29, 2026

Choose a reason for hiding this comment

Extra redaction keys silently fail for body matching

Uh oh!

cursor Bot Apr 29, 2026

Choose a reason for hiding this comment

Collision key split breaks on space-containing paths

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

derekmeegan commented Apr 29, 2026 •

edited by cursor Bot

Loading