Skip to content

Enable Chrome DevTools Protocol in devcontainer for semantic agent control #2129

@kantord

Description

@kantord

Summary

Enable the Chrome DevTools Protocol (CDP) on the Electron app inside the devcontainer (and optionally expose it off-container in dev/CI) so that AI agents driving the app can use accessibility-tree / DOM-level control (Playwright CLI, raw CDP, etc.) instead of pixel-based control via xdotool + screenshots.

Context

The devcontainer currently gives an agent a working but coarse control surface:

  • Drive input via DISPLAY=:99 xdotool …
  • Observe state via import -window root /tmp/shot.png + load image into model context

This works, but every observation costs a screenshot in the model's context. For a long agent loop (e.g. the experimental visual bug-fix flow on experiment/bug-fix-visual), token spend on images dominates the cost.

CDP gives an agent the same control surface that Chrome DevTools uses internally:

  • Accessibility tree with element refs ([button ref=e12] \"Submit\")
  • Click/fill/press by ref, not by pixel coordinate
  • Network/console/runtime inspection without screenshots
  • DOM/AX snapshots are plain text — cheap, structured, robust against CSS changes

Electron exposes CDP exactly like Chrome — pass --remote-debugging-port=<N> (or app.commandLine.appendSwitch('remote-debugging-port', N) programmatically), then any CDP client can attach.

Proposed work

  1. Enable CDP on Electron in dev mode.
    In scripts/devcontainer-entrypoint.sh, add --remote-debugging-port=9223 to the pnpm start invocation (or to an Electron-side switch). Pick a port distinct from the noVNC port (currently 6080) and document it.

  2. Forward the port off the container.
    In .devcontainer/devcontainer.json, add 9223 to forwardPorts (and add a runArgs entry mirroring the existing \"-p\", \"\${localEnv:CDP_HOST_PORT}:9223\" if we want host-port control, similar to NOVNC_HOST_PORT). Default to localhost-only — never bind publicly.

  3. Document the workflow in the devcontainer-dev skill (.claude/skills/devcontainer-dev/SKILL.md):

    • How to confirm CDP is up: curl http://localhost:9223/json
    • Recommended client: Playwright CLI (npx playwright open --connect-over-cdp http://localhost:9223) or chromium.connectOverCDP(...) for scripts
    • Shared-control caveat: user clicks and agent clicks can race
    • Tier ladder: try AX/DOM first, fall back to xdotool/screenshots only when semantics aren't enough
  4. Optional: ship a small helper script (scripts/devcontainer-cdp.sh) that wraps the common one-liners (navigate, snapshot, click <ref>, fill <ref> <value>) so agents have a tight, well-bounded interface — analogous to how xdotool is the agent's input verb today.

Why this matters now

The experimental visual bug-fix agent (experiment/bug-fix-visual, PR #2120) is the immediate consumer. That experiment is currently stalled on an unrelated issue (claude-code-action workflow validation), but once it can run, screenshot-driven repro will be its dominant cost. Adding CDP turns that into AX-tree-driven repro for most of the loop, with screenshots reserved for genuine pixel bugs.

References

  • Electron CDP docs: enabling --remote-debugging-port
  • Playwright accessibility-tree snapshot output (text refs like [button ref=e12])
  • Existing devcontainer skill: .claude/skills/devcontainer-dev/SKILL.md (already mentions this as "Future: CDP access")

Out of scope

  • Changing the production agent (_bug-fix-agent.yml) to use CDP — that's a follow-up once CDP is wired.
  • Adding CDP to release builds. Dev/devcontainer only for now.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions