Skip to content

feat(browser): add AgentCore Gateway + BrowserCustom for PR screenshots#37

Open
MichaelWalker-git wants to merge 9 commits intomainfrom
feat/browser-screenshot-integration
Open

feat(browser): add AgentCore Gateway + BrowserCustom for PR screenshots#37
MichaelWalker-git wants to merge 9 commits intomainfrom
feat/browser-screenshot-integration

Conversation

@MichaelWalker-git
Copy link
Copy Markdown
Contributor

Summary

  • Add AgentBrowser CDK construct wiring AgentCore BrowserCustom and Gateway L2 constructs for headless browser screenshots
  • New browser-tool Lambda handler: starts CDP session over WebSocket, navigates to PR page, captures screenshot, uploads to S3, returns presigned URL
  • Agent-side browser.py module invokes the Lambda directly via boto3 (fail-open, feature-gated by BROWSER_TOOL_FUNCTION_NAME)
  • Pipeline integration: captures PR screenshot after ensure_pr() and appends ## Screenshots section to PR body
  • Gateway exposes the same Lambda as an MCP tool endpoint with Cognito client-credentials auth for future external agents

Architecture

Gateway (MCP endpoint, Cognito OAuth2)
  └── Lambda target (browser-tool handler)
        └── BrowserCustom (headless browser via CDP over WebSocket)
              └── S3 bucket (screenshots + recordings)

Agent post-hooks → Lambda invoke (boto3) → same Lambda handler

New files

File Description
cdk/src/handlers/browser-tool.ts Lambda: CDP session → navigate → screenshot → S3 → presigned URL
cdk/src/constructs/agent-browser.ts Construct: S3 bucket, BrowserCustom, Lambda, Gateway, grants
agent/src/browser.py Python: lazy boto3 Lambda client, capture_screenshot() fail-open
cdk/test/constructs/agent-browser.test.ts 9 construct tests
cdk/test/handlers/browser-tool.test.ts 4 handler tests
agent/tests/test_browser.py 4 Python tests

Modified files

File Change
cdk/src/stacks/agent.ts Wire AgentBrowser construct, env vars, grants, CfnOutputs
agent/src/models.py Add screenshot_urls field to TaskResult
agent/src/post_hooks.py Add capture_pr_screenshots(), _append_screenshots_to_pr()
agent/src/pipeline.py Wire screenshot capture after ensure_pr()
cdk/package.json Add ws, @types/ws, @aws-sdk/client-s3, @aws-sdk/s3-request-presigner
cdk/test/stacks/agent.test.ts Update Cognito pool count (1→2) for Gateway's auto-created pool

Design decisions

  • Gateway uses its own Cognito — auto-created with clientCredentials flow (M2M), separate from TaskApi's Cognito
  • Agent calls Lambda directly — post-hooks bypass Gateway MCP protocol for simplicity and speed
  • CDP over WebSocketws package bundled by esbuild into the Lambda
  • Fail-open throughout — screenshot failure never blocks PR creation or task completion
  • Feature-gated — activates only when BROWSER_TOOL_FUNCTION_NAME env var is set

Test plan

  • mise //cdk:compile — TypeScript compiles with zero errors
  • mise //cdk:test — All 680 CDK tests pass (43 suites), including 13 new tests
  • python -m pytest agent/tests/test_browser.py — All 4 Python tests pass
  • Deploy stack, submit a new_task, verify PR has ## Screenshots section with working image link
  • Verify Gateway URL is accessible with Cognito client-credentials token

@MichaelWalker-git MichaelWalker-git requested a review from a team as a code owner April 16, 2026 22:19
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 20, 2026

Thanks @MichaelWalker-git !

QQ:

  • Have you tested a deployment ?
  • Can you also fix the failing build ? I'll review in the meantime

MichaelWalker-git and others added 8 commits April 20, 2026 13:08
…PR screenshots

Add a new AgentBrowser CDK construct that wires AgentCore BrowserCustom
and Gateway L2 constructs so agents can capture screenshots of PRs and
attach them as visual proof. The browser-tool Lambda handler starts a
headless browser session via CDP over WebSocket, navigates to the PR
page, captures a screenshot, uploads to S3, and returns a presigned URL.

Two access paths:
- Internal: agent post-hooks invoke the Lambda directly via boto3
- External: Gateway exposes the same Lambda as an MCP tool endpoint
  with Cognito client-credentials auth for future external agents

The feature is fail-open throughout — screenshot failure never blocks
PR creation or task completion. Feature-gated by the
BROWSER_TOOL_FUNCTION_NAME env var.
The test polls /ping until 503 (set when _background_pipeline_failed
becomes True) then immediately asserts write_terminal was called. But
write_terminal runs after the flag is set in the same except block,
creating a race. Fix by joining background threads before asserting.
@MichaelWalker-git MichaelWalker-git force-pushed the feat/browser-screenshot-integration branch from 305c3dc to 7d1c947 Compare April 20, 2026 20:13
@MichaelWalker-git
Copy link
Copy Markdown
Contributor Author

MichaelWalker-git commented Apr 20, 2026

Thanks @krokoko!

Failing build: Fixed — race condition in test_server.py where the test asserted on write_terminal before the background thread finished. Joined the thread before asserting. Also rebased on latest main.

Deployment: Tested successfully in dev account (098305555551, us-east-1). All new resources created cleanly — BrowserCustom, Gateway, S3 screenshot bucket, Lambda, Cognito User Pool, and GatewayTarget. Stack outputs confirm BrowserId, GatewayUrl, and TokenEndpoint are provisioned.

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 20, 2026

Review:

I think I misunderstand the flow, is it:

  1. Agent creates a PR on GitHub
  2. Agent screenshots that PR on GitHub
  3. Agent embeds the screenshot back into the same PR?
    I was thinking of something like: the agent writes frontend code → starts a dev server → screenshots the running app → attaches to the PR as visual proof of "here's what I built". In that case, we would put the browser in VPC mode, wire it to the existing infra, and have the agent use it during development as an available tool when the dev server is running so it can take screenshots of the available app. Not sure it would work with runtime though, probably with ECS and EC2 options

Also in agent browser, the client bypasses the gateway entirely. It calls lambda directly via boto3. Since this addition doesn't use the gateway at all, I would remove it from this PR and we can add it later when we expose a tool via MCP (this creates now a second cognito user pool, ...)

Other automated issues (to see if relevant based on above)

  • 1)The browser-tool.ts handler passes event.url directly to Page.navigate via CDP with zero server-side validation. The Python-side
    startswith("https://github.com/") check is trivially bypassable and lives only in one caller — the Gateway MCP endpoint bypasses it entirely.

An attacker controlling task input (or any authenticated Gateway caller) could navigate the browser to:

Combined with the presigned URL being auto-published to the PR body, this creates a full exfiltration chain where sensitive data is automatically made public on GitHub.

Required fix: URL allowlist in the Lambda handler itself (not just the Python caller). At minimum: HTTPS-only, reject RFC1918/link-local/metadata IPs, and ideally restrict to github.com since that's the stated purpose.

    1. PRESIGNED_URL_EXPIRES_IN = 3600 produces URLs that become dead images in every PR body after 60 minutes. The PR body is a permanent artifact. Every PR will have broken Screenshot 1 images for all future viewers.

    Required fix: Maybe a simple way is to upload images as GitHub PR assets (persisted by GitHub) ?

    1. "Fail-open" implemented as "fail-silent" — configuration errors indistinguishable from transient failures

    The triple-layered except Exception chain (browser.py → post_hooks.py → pipeline.py) swallows everything uniformly:

    • Missing AWS_REGION → logged same as a Lambda timeout
    • Wrong IAM permissions → logged same as a network blip
    • Import errors → logged as "Screenshot capture failed (non-fatal)"
    • pipeline.py catch → logs zero context: no exception type, no message, no traceback

    The _get_lambda_client() deliberately raise ValueError(...) for misconfiguration, but the caller immediately swallows it in a bare except
    Exception. The design fights itself.

    A proper fail-open pattern distinguishes configuration errors (loud, logged at ERROR) from transient operational errors (acceptable to swallow
    at WARN).

    1. . Empty catch block in session cleanup (browser-tool.ts)

} catch {
// Best-effort cleanup
}

If StopBrowserSessionCommand fails, browser sessions leak silently. Over time: cost escalation, service quota exhaustion, and zero observability
into the leak rate. Even best-effort cleanup must log.

    1. gh pr edit return code not checked — false success logged

    The function logs "Appended 2 screenshot(s) to PR body" even when gh pr edit fails (rate limit, auth failure, PR was merged/closed). The operator is told it worked when it didn't.

    1. No tests for capture_pr_screenshots / _append_screenshots_to_pr

    These functions are the integration glue — they shell out to gh, mutate live PR bodies, and embody the fail-open contract. Zero test coverage. A
    regression here could corrupt PR descriptions in production (the function reads the body, transforms it, and writes it back).

    1. Missing FunctionError check on Lambda invocation response

    When the Lambda itself crashes (unhandled exception), the response contains a FunctionError key with the error payload. The Python code parses
    the payload as if it were a normal response, falls through to "Screenshot failed: unknown" rather than surfacing the actual Lambda crash reason.

    1. WebSocket error handling gaps (browser-tool.ts)
    • No persistent ws.on('error', ...) after connection — unhandled error events crash the Lambda
    • JSON.parse on WebSocket messages has no try/catch — malformed CDP messages crash the event loop
    • Four non-null assertions on SDK response fields — undefined values produce confusing downstream errors instead of clear "missing field"
      messages
    1. Type design: BrowserToolResponse permits illegal states

    The flat interface with all optional fields allows contradictory states like { status: 'success' } with no URL, or { status: 'error',
    presignedUrl: '...' }. This should be a discriminated union — it's a 5-line change that eliminates an entire class of bugs:

    type BrowserToolResponse =
    | { status: 'success'; screenshotS3Key: string; presignedUrl: string }
    | { status: 'error'; error: string };

    1. Duplicate ## Screenshots section on pipeline retry

    _append_screenshots_to_pr unconditionally appends. If the pipeline retries or screenshots are captured twice, the PR body accumulates duplicate
    sections. Should check for existing ## Screenshots and replace.

    1. Unsanitized taskId in S3 key path

    event.taskId goes directly into the S3 key. Path traversal characters (../../) could place objects at unexpected prefixes. Sanitize to
    [a-zA-Z0-9_-].

…simplification

- Remove Gateway from AgentBrowser construct (unused; agent calls Lambda
  directly via boto3, avoids extra Cognito user pool)
- Add URL allowlist in browser-tool Lambda: HTTPS-only, github.com only,
  reject RFC1918/link-local/IMDS to prevent SSRF
- Use discriminated union type for BrowserToolResponse
- Sanitize taskId in S3 key path to [a-zA-Z0-9_-]
- Add persistent WebSocket error handler, wrap JSON.parse in try/catch,
  replace non-null assertions with descriptive error checks
- Log session cleanup failures instead of silently swallowing
- Increase presigned URL expiry to 7 days (bounded by credential lifetime)
- Distinguish config errors (ERROR) from transient errors (WARN) in
  browser.py; check FunctionError on Lambda invoke response
- Check gh pr edit return code; deduplicate ## Screenshots section on retry
- Log exception details in pipeline.py screenshot catch block
- Add 9 tests for capture_pr_screenshots and _append_screenshots_to_pr
@MichaelWalker-git MichaelWalker-git force-pushed the feat/browser-screenshot-integration branch from 82b0af6 to 7395436 Compare April 20, 2026 23:39
@MichaelWalker-git
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @krokoko! All issues addressed in the latest push:

Architecture (Gateway removal): Agreed — removed the Gateway entirely. The agent calls Lambda directly via boto3; no MCP overhead or extra Cognito pool. The browser-during-development vision (VPC mode, dev server screenshots) makes sense for a future iteration with ECS/EC2 compute. This PR stays scoped to PR screenshot capture as a simpler starting point.

Issues fixed:

  1. SSRF protection — Added validateUrl() in the Lambda handler itself: HTTPS-only, github.com domain only, rejects RFC1918/link-local/localhost/IMDS IPs. Validation happens server-side before any browser session starts.

  2. Presigned URL expiry — Increased to 7 days (max viable with role credentials). Added comment noting the credential-lifetime bound. GitHub asset upload would be the proper long-term fix.

  3. Error classification — Config errors (missing AWS_REGION) now log at ERROR. Transient errors stay at WARN. FunctionError check added to surface Lambda crashes distinctly.

  4. Session cleanup logging — Empty catch replaced with console.error (with eslint-disable since no shared logger in this standalone handler).

  5. gh pr edit return code — Now checked; logs WARN on failure instead of false success.

  6. Tests for post_hooks glue — 9 new tests covering capture_pr_screenshots and _append_screenshots_to_pr (success, dedup, failure paths).

  7. FunctionError check — Lambda crash responses now detected and logged at ERROR before falling through.

  8. WebSocket hardening — Persistent ws.on('error') handler, try/catch around JSON.parse in CDP message handlers, non-null assertions replaced with descriptive checks.

  9. Discriminated union typeBrowserToolResponse is now { status: 'success'; screenshotS3Key: string; presignedUrl: string } | { status: 'error'; error: string }.

  10. Duplicate ## Screenshots — Regex-replaces existing section instead of appending duplicates.

  11. TaskId sanitization — Stripped to [a-zA-Z0-9_-] before S3 key construction.

Full build passes (746 CDK tests, 293 agent tests, ESLint, synth).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants