Skip to content

feat(browse): add record command for video evidence of interactive bug repros#1483

Open
itstimwhite wants to merge 2 commits into
garrytan:mainfrom
itstimwhite:feat/browse-record
Open

feat(browse): add record command for video evidence of interactive bug repros#1483
itstimwhite wants to merge 2 commits into
garrytan:mainfrom
itstimwhite:feat/browse-record

Conversation

@itstimwhite
Copy link
Copy Markdown
Contributor

Why

Static screenshots can't capture the timing of a UI bug. Form-submission order, async loading flicker, drag/drop, scroll-triggered behavior, focus management, dialog timing — these all benefit from a few seconds of video. Annotated screenshots stay the right answer for static bugs (typo, clipped UI, missing element), but interactive bugs deserve a .webm so the developer fixing them can see the exact repro shape.

agent-browser (Vercel Labs) ships a dogfood skill that records video evidence as part of its structured QA workflow. I wanted the same evidence shape available behind $B — including from inside /qa and /qa-only — so this PR adds the primitive at the browse layer and leaves the QA-side flag for a follow-up.

What

A new record meta-command with three subactions:

$B record start [path] [--size WxH]    # rebuild context with recordVideo
$B record stop                          # close context, flush .webm, return paths
$B record status                        # is recording active?

Wraps Playwright's BrowserContext.recordVideo option. Implementation hooks into the existing recreateContext() save/restore path, so cookies, URLs, and open pages survive both start and stop. @e refs are invalidated by the recreate (same caveat as viewport --scale); record stop prints a hint reminding you to re-snapshot.

Defaults output to a timestamped subdirectory under TEMP_DIR so concurrent recordings never collide. Custom paths pass through the standard validateOutputPath policy.

Single-recording invariant: calling record start while already recording auto-stops the prior recording. Headed mode rejects with a clear error (the user can use their OS screen recorder in headed sessions).

Tests

9 integration tests in browse/test/record.test.ts:

  • record status reports not-recording before start
  • record stop is a no-op when not recording
  • record start → activity → record stop produces a non-empty .webm (verified by reading the EBML magic bytes 0x1A 0x45 0xDF 0xA3)
  • Browser remains functional after stop (state preserved across the context recreate)
  • record start while already recording auto-stops the prior recording
  • Unknown flag, malformed --size, missing action, unknown subaction all reject with usage errors

All passing locally:

 9 pass
 0 fail
Ran 9 tests across 1 file. [2.29s]

Existing browse/test/commands.test.ts (223 tests) still green.

The pre-existing snapshot.test.ts flake (closetab last tab auto-creates new) reproduces on main without these changes; not related.

Commits (bisect-friendly)

  1. feat(browse): add record command … — implementation only (browser-manager.ts, commands.ts, meta-commands.ts, new record.test.ts). 352 insertions.
  2. docs(browse): document record command in SKILL.md.tmpl + regenerateSKILL.md.tmpl change + regenerated browse/SKILL.md, top-level SKILL.md, gstack/llms.txt. 67 insertions.

Template change and generated-doc regeneration intentionally split, per the gstack contributor guidance.

Out of scope

  • VERSION bump and CHANGELOG entry — left for the merge so you can write the entry in your voice. The implementation includes a NEW_IN_VERSION: 'record': '1.35.0.0' so the unknown-command hint works the moment this lands; adjust if you cut a different version.
  • /qa --evidence-per-finding flag (the QA-side complement) — separate PR coming, since it's a template-only change to two SKILL.md.tmpl files and is easier to review on its own.
  • Auto-cleanup of recorded .webm files. They're evidence; outliving the session is the point.

Tim White added 2 commits May 13, 2026 20:10
…bug repros

Wraps Playwright's `BrowserContext.recordVideo` option behind a `record
start|stop|status` subcommand. Useful when an interactive bug needs a
timing-faithful repro that a screenshot can't capture: form submission
order, async loading state flicker, drag/drop, scroll-triggered behavior,
focus management, dialog timing.

`record start [path] [--size WxH]`
  Saves session state, recreates the context with recordVideo enabled
  (path defaults to a timestamped dir under TEMP_DIR), restores state.
  Calling `start` while already recording auto-stops the prior recording
  (single-recording invariant).

`record stop`
  Collects each live page's `video()` path, clears the recordVideo flag,
  rebuilds the context (which flushes the .webm to disk), and returns the
  paths. No-op when not recording.

`record status`
  Prints the active recording directory or `Not recording.`

Implementation hooks into the existing `recreateContext()` save/restore
path, so cookies, URLs, and open pages survive both `start` and `stop`.
Headed mode rejects with a clear error (the user can use their OS's
screen recorder for headed sessions). Output paths pass through the
standard `validateOutputPath` policy so the command can't write outside
SAFE_DIRECTORIES.

9 integration tests cover: status before start, no-op stop, start →
activity → stop produces a non-empty WebM (verified by magic-byte
check), browser remains functional after stop (state preserved across
context recreate), auto-stop on double-start, malformed flag rejection,
malformed `--size` rejection, missing action error, unknown subaction
error.
Adds a 'Record video evidence for interactive bug repros' section to
browse/SKILL.md.tmpl right after the retina-screenshot section, explaining
the per-context recording model, the ref-invalidation caveat across start
and stop, and when video is the right evidence shape vs an annotated
screenshot.

Regenerates browse/SKILL.md, SKILL.md, and gstack/llms.txt via
`bun run gen:skill-docs --host all`. No host-specific outputs change
beyond the new command row in the COMMAND_REFERENCE tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant