Skip to content

fix: pipe selection, CDP debugger stability, and E2E tests#9

Merged
DeliciousBuding merged 6 commits into
mainfrom
topic/e2e-fixes
May 19, 2026
Merged

fix: pipe selection, CDP debugger stability, and E2E tests#9
DeliciousBuding merged 6 commits into
mainfrom
topic/e2e-fixes

Conversation

@DeliciousBuding
Copy link
Copy Markdown
Owner

Summary

Real-world testing as Claude Code MCP server revealed critical bugs:

fix: stale pipe selection after reboot

Connect() picked pipes[0] blindly. After reboot, stale pipes remain that connect but return Browser session does not belong to this IAB pipe on RPC calls. Now tries all pipes with a getInfo health check before returning.

fix: CDP debugger detaches after ~2 calls

Chrome DevTools Protocol debugger silently detaches after a few CDP calls. Each cdpWithAttach now calls detach before attach to clear stale state, with one auto-retry on "not attached" errors.

feat: debug logging via env var

Optional BRIDGE_DEBUG_LOG env var tees logs to a file for MCP debugging. No hardcoded paths.

test: end-to-end tests

  • client/e2e_test.go: detach->attach->executeCdp sequence, isDebuggerError classification, WaitForLoad timeout
  • mcp/e2e_test.go: full session lifecycle (create->navigate->screenshot->evaluate->finalize), MCP init handshake, debugger retry for navigate/screenshot

Test plan

  • All 62 tests pass (go test ./...)
  • go vet ./... clean
  • Manual MCP smoke: bridge connects, tabs/screenshot/evaluate work end-to-end

Three fixes from real-world MCP session testing:

1. Pipe auto-discovery: try all pipes with a getInfo health check
   instead of blindly picking pipes[0]. After a reboot, stale pipes
   remain that connect but fail with "Browser session does not
   belong to this IAB pipe".

2. CDP debugger detach: call detach before attach to clear stale
   Chrome DevTools debugger state. With auto-retry on "not attached"
   errors, subsequent CDP calls no longer degrade after 2-4 uses.

3. Debug logging: optional BRIDGE_DEBUG_LOG env var to tee logs to
   a file for debugging when spawned by MCP clients that swallow
   stderr. No hardcoded paths.
- client/e2e_test.go: detach→attach→executeCdp sequence tests,
  isDebuggerError classification, WaitForLoad timeout
- mcp/e2e_test.go: full MCP session lifecycle (create→navigate→
  screenshot→evaluate→finalize), MCP init+handshake, debugger
  detach retry for navigate/screenshot, claim+close edge cases

All 62 tests pass, go vet clean.
Copilot AI review requested due to automatic review settings May 18, 2026 08:26
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the browser bridge's reliability by implementing a health-check mechanism during pipe discovery and a robust debugger attachment strategy that includes a proactive detach-attach cycle and retry logic. It also introduces comprehensive E2E tests, improved logging, and support for debug log teeing. Review feedback suggests optimizing the debugger attachment logic to reduce latency by attempting direct execution before repairing the state and recommends implementing a shorter timeout for the connection health check to prevent potential hangs during auto-discovery.

Comment thread internal/client/client.go
}
c := NewFromConn(conn, logger)
// Health check: send a quick getInfo to verify the pipe is usable
result, err := c.SendRequest("getInfo", nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The health check getInfo call uses the default RPC timeout of 60 seconds. If multiple stale pipes exist that accept connections but do not respond to RPCs, the Connect function could hang for several minutes during auto-discovery. Consider implementing a shorter timeout specifically for this health check to ensure fast failover to the next available pipe.

Comment on lines 234 to +253
func (c *Client) cdpWithAttach(tabID int, method string, params map[string]interface{}) (json.RawMessage, error) {
// Detach first to clear any stale debugger state from Chrome
_ = c.detachTab(tabID)
if err := c.attachTab(tabID); err != nil {
return nil, fmt.Errorf("attach failed for tab %d: %w", tabID, err)
}
return c.executeCdp(tabID, method, params)
raw, err := c.executeCdp(tabID, method, params)
if err != nil {
// If attach didn't take, try one more detach+attach+retry cycle
if isDebuggerError(err) {
_ = c.detachTab(tabID)
if err2 := c.attachTab(tabID); err2 != nil {
return nil, fmt.Errorf("retry attach failed for tab %d: %w", tabID, err2)
}
return c.executeCdp(tabID, method, params)
}
return nil, err
}
return raw, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of cdpWithAttach performs a proactive detach and attach cycle before every CDP command, which adds significant latency (3 RPCs per command). Since you've already implemented a retry mechanism for debugger errors, you can optimize this by attempting the executeCdp call directly first. This avoids the overhead when the debugger is already healthy and attached.

func (c *Client) cdpWithAttach(tabID int, method string, params map[string]interface{}) (json.RawMessage, error) {
	// Try executing directly first to avoid the latency of proactive attach/detach.
	raw, err := c.executeCdp(tabID, method, params)
	if err != nil && isDebuggerError(err) {
		// If the debugger is not attached or in a stale state, repair and retry once.
		_ = c.detachTab(tabID)
		if err2 := c.attachTab(tabID); err2 != nil {
			return nil, fmt.Errorf("attach failed for tab %d: %w", tabID, err2)
		}
		return c.executeCdp(tabID, method, params)
	}
	return raw, err
}

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Real-world MCP usage surfaced three pipe/debugger reliability bugs; this PR fixes them and adds end-to-end coverage. Connect() now iterates discovered pipes and validates each with a getInfo health check instead of blindly using the first one. cdpWithAttach proactively detaches before attaching and retries once on "not attached" errors to work around CDP debugger silently dropping after a few calls. An optional BRIDGE_DEBUG_LOG env var tees logs to a file for MCP-mode debugging, and new client/MCP E2E tests cover the lifecycle and retry paths.

Changes:

  • Pipe selection: try every discovered pipe, gated by a getInfo health check before returning a client.
  • CDP stability: cdpWithAttach does detach→attach→execute and retries once when the error matches "not attached".
  • Observability + tests: optional BRIDGE_DEBUG_LOG tee, plus internal/client/e2e_test.go and internal/mcp/e2e_test.go covering session lifecycle, retry, and timeout paths.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
internal/client/client.go Iterate pipes and health-check via getInfo before returning a client.
internal/client/browser.go Add detachTab; rewrite cdpWithAttach to detach+attach+execute with one retry; add isDebuggerError.
internal/client/e2e_test.go New E2E tests for detach→attach→executeCdp ordering, error classification, and WaitForLoad timeout.
internal/client/browser_rpc_test.go Updated call-sequence assertions to expect new detach step.
internal/mcp/e2e_test.go New MCP lifecycle, init/tools-list, debugger-retry, and tab-ID handling tests.
internal/mcp/handlers_test.go Updated recorded-method expectations to include the leading detach.
cmd/bridge/main.go Optional BRIDGE_DEBUG_LOG file tee and extra lifecycle logging in runMCP.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/client/client.go
Comment on lines +66 to +75
// Health check: send a quick getInfo to verify the pipe is usable
result, err := c.SendRequest("getInfo", nil)
if err != nil {
c.Close()
lastErr = err
if logger != nil {
logger.Printf("pipe %s health check failed: %v, trying next...", p.UUID, err)
}
continue
}
Comment on lines 234 to +252
func (c *Client) cdpWithAttach(tabID int, method string, params map[string]interface{}) (json.RawMessage, error) {
// Detach first to clear any stale debugger state from Chrome
_ = c.detachTab(tabID)
if err := c.attachTab(tabID); err != nil {
return nil, fmt.Errorf("attach failed for tab %d: %w", tabID, err)
}
return c.executeCdp(tabID, method, params)
raw, err := c.executeCdp(tabID, method, params)
if err != nil {
// If attach didn't take, try one more detach+attach+retry cycle
if isDebuggerError(err) {
_ = c.detachTab(tabID)
if err2 := c.attachTab(tabID); err2 != nil {
return nil, fmt.Errorf("retry attach failed for tab %d: %w", tabID, err2)
}
return c.executeCdp(tabID, method, params)
}
return nil, err
}
return raw, nil
Comment on lines +255 to 257
func isDebuggerError(err error) bool {
return err != nil && strings.Contains(err.Error(), "not attached")
}
Comment thread cmd/bridge/main.go
Comment on lines +32 to +37
if debugPath := os.Getenv("BRIDGE_DEBUG_LOG"); debugPath != "" {
if f, err := os.OpenFile(debugPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644); err == nil {
logger = log.New(io.MultiWriter(os.Stderr, f), "[codex-bridge] ", log.LstdFlags)
defer f.Close()
}
}
Comment on lines +102 to +104
if err.Error() == "" || err.Error()[:7] != "timed o" {
t.Errorf("error = %v, should mention timeout", err)
}
Comment thread internal/mcp/e2e_test.go
}

// TestNavigateRetryOnDebuggerError verifies that when the navigate handler
// encounters a debugger detach error, it's propagated correctly.
Comment thread internal/mcp/e2e_test.go
Comment on lines +300 to +312
// TestCreateTabRejectsNonNumericClose verifies close_tab rejects non-numeric IDs.
func TestCreateTabRejectsNonNumericClose(t *testing.T) {
srv, _, cleanup := newServerWithPipe(t, func(req protocol.Request) (interface{}, *protocol.ErrorObject) {
return map[string]bool{"ok": true}, nil
})
defer cleanup()

tool := srv.toolMap["codex_close_tab"]
args, _ := json.Marshal(map[string]interface{}{"tab_id": "abc"})
if _, err := tool.Handler(args); err == nil {
t.Fatal("expected error for non-numeric close_tab, got nil")
}
}
Comment on lines +235 to +236
// Detach first to clear any stale debugger state from Chrome
_ = c.detachTab(tabID)
v2.x requires a migrated config format; pinning v1 keeps the existing
config working without a full migration.
v2.x is required by golangci-lint-action@v9. Key changes:
- linters.default: none replaces disable-all
- linters.settings + formatters.settings replace linters-settings
- linters.exclusions.rules replaces issues.exclude-rules
- gosimple merged into staticcheck
- run.timeout removed (no longer a config field in v2)
- errcheck: explicitly ignore Close() errors with _ = c.Close()
- revive: add doc comments on all exported types and functions
- staticcheck: remove trailing \n from error format strings
- Add space separators between concatenated error string parts
- Remove trailing punctuation from error messages
- Convert if-else chain to tagged switch in handlers_test.go
@DeliciousBuding DeliciousBuding merged commit 32162f5 into main May 19, 2026
2 checks passed
@DeliciousBuding DeliciousBuding deleted the topic/e2e-fixes branch May 19, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants