Add browser-reverse skill — OpenAPI 3.1 from browser-trace captures#88
Add browser-reverse skill — OpenAPI 3.1 from browser-trace captures#88derekmeegan wants to merge 2 commits intomainfrom
Conversation
…aptures Consumes a browser-trace run (.o11y/<run>/), pairs CDP request/response events, templatizes paths, infers JSON schemas from samples, and emits an OpenAPI 3.1 document with a coverage report and confidence metadata. Pipeline: load → filter → normalize → infer → emit. Each stage is a discrete script writing to intermediate/ for debuggability. Optional --bodies <path> flag joins a `browse network on` capture by CDP requestId so response bodies feed into schema inference. E2E tested against Hacker News, jsonplaceholder, derekmeegan.com, browserbase.com, browser-use.com, reddit.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shrey150
left a comment
There was a problem hiding this comment.
Some things to fix or address before rereviewing
|
|
||
| ``` | ||
| browser-trace → .o11y/<run>/cdp/network/{requests,responses}.jsonl | ||
| discover-api-spec → .o11y/<run>/api-spec/openapi.yaml + report.md |
There was a problem hiding this comment.
Is this skill actually defined anywhere in this PC?
|
|
||
| `discover.mjs` auto-detects `<run>/cdp/network/bodies/`. To use a body capture from elsewhere (e.g. didn't snapshot, want the live `browse network` dir), pass `--bodies <path>` explicitly. | ||
|
|
||
| Then deliver the artifacts to the user (`exec.sendFile()` for `openapi.yaml` and `report.md`). |
There was a problem hiding this comment.
exec.sendFile() is for bb not for general use right?
There was a problem hiding this comment.
this was from a claude memory i think lol...
| @@ -0,0 +1,118 @@ | |||
| # Adding Response Body Capture to `browser-trace` — Lift Estimate | |||
There was a problem hiding this comment.
Is this plan mode slop lol
| @@ -0,0 +1,6 @@ | |||
| { | |||
| "name": "browser-reverse", | |||
There was a problem hiding this comment.
I would still like a different name, like /discover-api or /browser-to-api or /website-to-api
| @@ -0,0 +1,240 @@ | |||
| # Browser Reverse — Reference | |||
There was a problem hiding this comment.
I'm not sure this is what a REFERENCE.md file should be - it should exhaustively describe all commands used by the skill, I would maybe recommend removing the Pipeline portion here
Renaming and doc cleanup (per shrey150): - Rename skill from `browser-reverse` to `browser-to-api`. Updates SKILL.md frontmatter + heading, package.json, REFERENCE.md heading, the OpenAPI doc's `info.description`, and the report.md heading. - Fix the stale `discover-api-spec` reference in SKILL.md's composition diagram (left over from an earlier rename). - Drop `BODY-CAPTURE-LIFT.md` from the PR; it's a separate proposal. - Remove the `exec.sendFile()` reference in SKILL.md (browserbase-internal, not a generic skill primitive). - REFERENCE.md restructured to lead with the script/CLI/file-format reference rather than an architecture intro. Pipeline diagram dropped. Bug fixes (per Cursor Bugbot): - `filter.mjs`: rework precedence so `--include` actually rescues URLs that would be hit by a default exclude, matching the documented contract. User `--exclude` still wins. Added a unit-style test path. - `infer.mjs`: skip response-body samples whose CDP status is null. Previously they were keyed under `"0"` but `emit.mjs` only iterates `ep.statusCodes` (which excludes nulls), silently discarding the body. - `load.mjs`: fix the comment in `urlQuery()` — code is first-value-wins, not last-value-wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed 9446f91 addressing all review comments:
@cursor[bot]
Branch name is still |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9446f91. Configure here.
|
|
||
| function isKeySecret(name) { | ||
| const k = String(name).toLowerCase().replace(/[_-]/g, ''); | ||
| return KEY_DENY.has(k) || extraKeys.has(k); |
There was a problem hiding this comment.
Extra redaction keys silently fail for body matching
Medium Severity
isKeySecret normalizes the input name by stripping underscores and hyphens via .replace(/[_-]/g, ''), but extraKeys stores user-provided --redact values with only toLowerCase() applied — no underscore/hyphen stripping. A user passing --redact my_secret_key stores my_secret_key in extraKeys, but the lookup normalizes the JSON key to mysecretkey, which never matches. User-specified body key redactions containing _ or - are silently ignored, potentially leaking credentials the user explicitly asked to scrub.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9446f91. Configure here.
| } | ||
| } | ||
| for (const [key, origins] of Object.entries(collisions)) { | ||
| const [m, p] = key.split(' '); |
There was a problem hiding this comment.
Collision key split breaks on space-containing paths
Low Severity
The collision key is built as `${m} ${ep.path}` and later destructured via key.split(' ') into [m, p]. If a path ever contained a literal space, split(' ') would produce more than two elements and p would only capture the first path segment, causing paths[p][m] to be undefined and throwing at runtime. Using indexOf(' ') to split on the first space only would be more robust.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9446f91. Configure here.


Summary
browser-reverseconsumes abrowser-tracerun directory and emits an OpenAPI 3.1 spec for the publicly-observable HTTP API of any website, plus a human-readable coverage report and per-endpoint confidence metadata. Pure offline post-processing — composes cleanly with the existingbrowser-traceskill rather than duplicating capture.Pipeline (each stage is a discrete script for debuggability via
--stage):Highlights
{id},{id2}, etc.lib/schema-merge.mjs) — JSON-Schema from samples with required-intersection, type unions, format hints (date-time,uri,email,uuid), and enum detection that requires meaningful repetition (not just low cardinality).$ref— recurses into nested object/array schemas, hoists when referenced ≥ 2 times OR when it's an object with ≥ 4 properties. Names derived from path tokens.Authorization,Cookie,*-token, etc.), in body keys (password,apiKey, etc.), and value patterns (JWTs, emails, phone numbers). Replaces values with<redacted>to preserve types for inference.browse network onintegration — pass--bodies <path>(or stash bodies under<run>/cdp/network/bodies/, which is auto-detected) to join real response bodies into the trace by CDPrequestId. Without it, the spec has request bodies but no response-body schemas (thebrowse cdpfirehose doesn't embed bodies).(method, path), the higher-sample operation wins and other origins are recorded underx-also-served-fromrather than silently dropped.report.mdlists every endpoint with samples, statuses, confidence, and normalization flags (single-sample,single-status,mixed-content-types,divergent-response-shape,request-body-only-on-some-samples).Composition with browser-trace
End-to-end testing
Pipeline ran clean against six sites; five real bugs surfaced and fixed during this work:
integervsstringonidbased on values)200+404, header + body redaction_rscquery param, Vercel analytics body schema, mixed-content-types detection/pixel/{id}/visitor/{id2}/cerebro, 12 components hoisted/api/md/<slug>LLM-friendly markdown export endpoint/svc/shreddit/events(Reddit's internal telemetry, 18 nested types), liveExposeVariantGraphQL exposure capturing experiment namesBugs surfaced and fixed during E2E:
distinct ≤ floor(samples/2)so unique IDs don't become enums.{...}.lengthisundefined; rewrote to useObject.keys(...).lengthand recurse into nested schemas.redactBody()called twice per body; redact once and reuse.@,`,#, etc. as first character now trigger quoting (was breaking on@vercel/analytics/react).paths.<path>.<method>is unique in OpenAPI; higher-sample winner now recorded withx-also-served-fromextension.Files
SKILL.md/REFERENCE.md— skill docs, file format reference, jq recipes, troubleshootingscripts/discover.mjs— top-level dispatcher with--stagefor partial runsscripts/{load,filter,normalize,infer,emit}.mjs— pipeline stagesscripts/lib/{io,redact,path-template,schema-merge,yaml}.mjs— pure helpersBODY-CAPTURE-LIFT.md— design doc for adding native body capture tobrowser-trace(alternative to the currentbrowse network onpairing). Open question for maintainers; no code change in this PR.Test plan
SKILL.mdopenapi.yamlparses with a YAML library (python -c "import yaml; yaml.safe_load(open('...'))")openapi.jsonparses (jq . openapi.json)report.mdcorrectly flags low-confidence endpoints--bodiesflag with abrowse network oncapture and confirm response-body schemas appear in the spec🤖 Generated with Claude Code
Note
Medium Risk
New end-to-end pipeline that processes potentially sensitive trace data (including optional request/response bodies) and emits schemas/specs; correctness and redaction behavior are important to avoid leaking secrets or producing misleading specs.
Overview
Adds a new
browser-to-apiskill that post-processes abrowser-tracerun into a best-effort OpenAPI 3.1 spec plusreport.md,confidence.json, and per-endpoint redacted samples, implemented as adiscover.mjspipeline (load→filter→normalize→infer→emit).The pipeline pairs CDP request/response events (optionally joining full bodies from
browse network onvia--bodies), filters noise via include/exclude/origin rules, templatizes paths (IDs + slug inference), infers JSON Schemas with redaction, hoists repeated schemas intocomponents, and emits YAML/JSON using an in-repo YAML writer; docs (SKILL.md,REFERENCE.md) describe flags, file formats, and troubleshooting.Reviewed by Cursor Bugbot for commit 9446f91. Bugbot is set up for automated code reviews on this repo. Configure here.