fix: pipe selection, CDP debugger stability, and E2E tests#9
Conversation
Three fixes from real-world MCP session testing: 1. Pipe auto-discovery: try all pipes with a getInfo health check instead of blindly picking pipes[0]. After a reboot, stale pipes remain that connect but fail with "Browser session does not belong to this IAB pipe". 2. CDP debugger detach: call detach before attach to clear stale Chrome DevTools debugger state. With auto-retry on "not attached" errors, subsequent CDP calls no longer degrade after 2-4 uses. 3. Debug logging: optional BRIDGE_DEBUG_LOG env var to tee logs to a file for debugging when spawned by MCP clients that swallow stderr. No hardcoded paths.
- client/e2e_test.go: detach→attach→executeCdp sequence tests, isDebuggerError classification, WaitForLoad timeout - mcp/e2e_test.go: full MCP session lifecycle (create→navigate→ screenshot→evaluate→finalize), MCP init+handshake, debugger detach retry for navigate/screenshot, claim+close edge cases All 62 tests pass, go vet clean.
There was a problem hiding this comment.
Code Review
This pull request enhances the browser bridge's reliability by implementing a health-check mechanism during pipe discovery and a robust debugger attachment strategy that includes a proactive detach-attach cycle and retry logic. It also introduces comprehensive E2E tests, improved logging, and support for debug log teeing. Review feedback suggests optimizing the debugger attachment logic to reduce latency by attempting direct execution before repairing the state and recommends implementing a shorter timeout for the connection health check to prevent potential hangs during auto-discovery.
| } | ||
| c := NewFromConn(conn, logger) | ||
| // Health check: send a quick getInfo to verify the pipe is usable | ||
| result, err := c.SendRequest("getInfo", nil) |
There was a problem hiding this comment.
The health check getInfo call uses the default RPC timeout of 60 seconds. If multiple stale pipes exist that accept connections but do not respond to RPCs, the Connect function could hang for several minutes during auto-discovery. Consider implementing a shorter timeout specifically for this health check to ensure fast failover to the next available pipe.
| func (c *Client) cdpWithAttach(tabID int, method string, params map[string]interface{}) (json.RawMessage, error) { | ||
| // Detach first to clear any stale debugger state from Chrome | ||
| _ = c.detachTab(tabID) | ||
| if err := c.attachTab(tabID); err != nil { | ||
| return nil, fmt.Errorf("attach failed for tab %d: %w", tabID, err) | ||
| } | ||
| return c.executeCdp(tabID, method, params) | ||
| raw, err := c.executeCdp(tabID, method, params) | ||
| if err != nil { | ||
| // If attach didn't take, try one more detach+attach+retry cycle | ||
| if isDebuggerError(err) { | ||
| _ = c.detachTab(tabID) | ||
| if err2 := c.attachTab(tabID); err2 != nil { | ||
| return nil, fmt.Errorf("retry attach failed for tab %d: %w", tabID, err2) | ||
| } | ||
| return c.executeCdp(tabID, method, params) | ||
| } | ||
| return nil, err | ||
| } | ||
| return raw, nil | ||
| } |
There was a problem hiding this comment.
The current implementation of cdpWithAttach performs a proactive detach and attach cycle before every CDP command, which adds significant latency (3 RPCs per command). Since you've already implemented a retry mechanism for debugger errors, you can optimize this by attempting the executeCdp call directly first. This avoids the overhead when the debugger is already healthy and attached.
func (c *Client) cdpWithAttach(tabID int, method string, params map[string]interface{}) (json.RawMessage, error) {
// Try executing directly first to avoid the latency of proactive attach/detach.
raw, err := c.executeCdp(tabID, method, params)
if err != nil && isDebuggerError(err) {
// If the debugger is not attached or in a stale state, repair and retry once.
_ = c.detachTab(tabID)
if err2 := c.attachTab(tabID); err2 != nil {
return nil, fmt.Errorf("attach failed for tab %d: %w", tabID, err2)
}
return c.executeCdp(tabID, method, params)
}
return raw, err
}There was a problem hiding this comment.
Pull request overview
Real-world MCP usage surfaced three pipe/debugger reliability bugs; this PR fixes them and adds end-to-end coverage. Connect() now iterates discovered pipes and validates each with a getInfo health check instead of blindly using the first one. cdpWithAttach proactively detaches before attaching and retries once on "not attached" errors to work around CDP debugger silently dropping after a few calls. An optional BRIDGE_DEBUG_LOG env var tees logs to a file for MCP-mode debugging, and new client/MCP E2E tests cover the lifecycle and retry paths.
Changes:
- Pipe selection: try every discovered pipe, gated by a
getInfohealth check before returning a client. - CDP stability:
cdpWithAttachdoes detach→attach→execute and retries once when the error matches "not attached". - Observability + tests: optional
BRIDGE_DEBUG_LOGtee, plusinternal/client/e2e_test.goandinternal/mcp/e2e_test.gocovering session lifecycle, retry, and timeout paths.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/client/client.go | Iterate pipes and health-check via getInfo before returning a client. |
| internal/client/browser.go | Add detachTab; rewrite cdpWithAttach to detach+attach+execute with one retry; add isDebuggerError. |
| internal/client/e2e_test.go | New E2E tests for detach→attach→executeCdp ordering, error classification, and WaitForLoad timeout. |
| internal/client/browser_rpc_test.go | Updated call-sequence assertions to expect new detach step. |
| internal/mcp/e2e_test.go | New MCP lifecycle, init/tools-list, debugger-retry, and tab-ID handling tests. |
| internal/mcp/handlers_test.go | Updated recorded-method expectations to include the leading detach. |
| cmd/bridge/main.go | Optional BRIDGE_DEBUG_LOG file tee and extra lifecycle logging in runMCP. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Health check: send a quick getInfo to verify the pipe is usable | ||
| result, err := c.SendRequest("getInfo", nil) | ||
| if err != nil { | ||
| c.Close() | ||
| lastErr = err | ||
| if logger != nil { | ||
| logger.Printf("pipe %s health check failed: %v, trying next...", p.UUID, err) | ||
| } | ||
| continue | ||
| } |
| func (c *Client) cdpWithAttach(tabID int, method string, params map[string]interface{}) (json.RawMessage, error) { | ||
| // Detach first to clear any stale debugger state from Chrome | ||
| _ = c.detachTab(tabID) | ||
| if err := c.attachTab(tabID); err != nil { | ||
| return nil, fmt.Errorf("attach failed for tab %d: %w", tabID, err) | ||
| } | ||
| return c.executeCdp(tabID, method, params) | ||
| raw, err := c.executeCdp(tabID, method, params) | ||
| if err != nil { | ||
| // If attach didn't take, try one more detach+attach+retry cycle | ||
| if isDebuggerError(err) { | ||
| _ = c.detachTab(tabID) | ||
| if err2 := c.attachTab(tabID); err2 != nil { | ||
| return nil, fmt.Errorf("retry attach failed for tab %d: %w", tabID, err2) | ||
| } | ||
| return c.executeCdp(tabID, method, params) | ||
| } | ||
| return nil, err | ||
| } | ||
| return raw, nil |
| func isDebuggerError(err error) bool { | ||
| return err != nil && strings.Contains(err.Error(), "not attached") | ||
| } |
| if debugPath := os.Getenv("BRIDGE_DEBUG_LOG"); debugPath != "" { | ||
| if f, err := os.OpenFile(debugPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644); err == nil { | ||
| logger = log.New(io.MultiWriter(os.Stderr, f), "[codex-bridge] ", log.LstdFlags) | ||
| defer f.Close() | ||
| } | ||
| } |
| if err.Error() == "" || err.Error()[:7] != "timed o" { | ||
| t.Errorf("error = %v, should mention timeout", err) | ||
| } |
| } | ||
|
|
||
| // TestNavigateRetryOnDebuggerError verifies that when the navigate handler | ||
| // encounters a debugger detach error, it's propagated correctly. |
| // TestCreateTabRejectsNonNumericClose verifies close_tab rejects non-numeric IDs. | ||
| func TestCreateTabRejectsNonNumericClose(t *testing.T) { | ||
| srv, _, cleanup := newServerWithPipe(t, func(req protocol.Request) (interface{}, *protocol.ErrorObject) { | ||
| return map[string]bool{"ok": true}, nil | ||
| }) | ||
| defer cleanup() | ||
|
|
||
| tool := srv.toolMap["codex_close_tab"] | ||
| args, _ := json.Marshal(map[string]interface{}{"tab_id": "abc"}) | ||
| if _, err := tool.Handler(args); err == nil { | ||
| t.Fatal("expected error for non-numeric close_tab, got nil") | ||
| } | ||
| } |
| // Detach first to clear any stale debugger state from Chrome | ||
| _ = c.detachTab(tabID) |
v2.x requires a migrated config format; pinning v1 keeps the existing config working without a full migration.
v2.x is required by golangci-lint-action@v9. Key changes: - linters.default: none replaces disable-all - linters.settings + formatters.settings replace linters-settings - linters.exclusions.rules replaces issues.exclude-rules - gosimple merged into staticcheck - run.timeout removed (no longer a config field in v2)
- errcheck: explicitly ignore Close() errors with _ = c.Close() - revive: add doc comments on all exported types and functions - staticcheck: remove trailing \n from error format strings
- Add space separators between concatenated error string parts - Remove trailing punctuation from error messages - Convert if-else chain to tagged switch in handlers_test.go
Summary
Real-world testing as Claude Code MCP server revealed critical bugs:
fix: stale pipe selection after reboot
Connect()pickedpipes[0]blindly. After reboot, stale pipes remain that connect but returnBrowser session does not belong to this IAB pipeon RPC calls. Now tries all pipes with agetInfohealth check before returning.fix: CDP debugger detaches after ~2 calls
Chrome DevTools Protocol debugger silently detaches after a few CDP calls. Each
cdpWithAttachnow callsdetachbeforeattachto clear stale state, with one auto-retry on "not attached" errors.feat: debug logging via env var
Optional
BRIDGE_DEBUG_LOGenv var tees logs to a file for MCP debugging. No hardcoded paths.test: end-to-end tests
client/e2e_test.go: detach->attach->executeCdp sequence, isDebuggerError classification, WaitForLoad timeoutmcp/e2e_test.go: full session lifecycle (create->navigate->screenshot->evaluate->finalize), MCP init handshake, debugger retry for navigate/screenshotTest plan
go test ./...)go vet ./...clean