Skip to content

fix(proxy): clear SNI peek deadline before relay, fix socket leaks#15

Merged
nnemirovsky merged 1 commit into
mainfrom
fix/sni-defer-deadline-leak
Apr 10, 2026
Merged

fix(proxy): clear SNI peek deadline before relay, fix socket leaks#15
nnemirovsky merged 1 commit into
mainfrom
fix/sni-defer-deadline-leak

Conversation

@nnemirovsky
Copy link
Copy Markdown
Owner

Summary

  • Fix 10-second read deadline leak in SNI-deferred path that killed all long-running connections (streaming API responses, tool calls). The deadline set for SNI peeking was cleared via defer which only ran when handleConnect returned, but handleConnect blocks in relayData for the connection lifetime.
  • Fix relayData CLOSE_WAIT socket leak (75 leaked sockets observed on production server) by closing writer and setting a read deadline on target when the first relay direction completes.
  • Add IdleConnTimeout (90s) and MaxIdleConnsPerHost (4) to goproxy Transport to prevent stale pooled connections.
  • Suppress expected goproxy broken pipe and handshake EOF warnings via filtered logger.

Test plan

  • go test ./... passes
  • golangci-lint run clean
  • Deployed to production server, verified:
    • OpenAI Codex agent runs complete successfully (previously 0/16 succeeded, now working)
    • Zero CLOSE_WAIT sockets (was 75)
    • FD count stable at ~40 (was 312 and growing)
    • Zero Telegram polling stalls (was every 12 minutes)
    • Zero broken pipe warnings in logs
    • Simple and tool-calling messages both work through the bot

The 10-second read deadline set for SNI peeking in the SNI-deferred
path was cleared via defer, which only runs when handleConnect returns.
Since handleConnect blocks in relayData for the connection lifetime,
the deadline persisted and killed every SNI-deferred connection after
10 seconds. This caused streaming API responses to be truncated
(manifesting as OpenAI "terminated" errors), tool call fetches to fail,
and periodic TLS handshake failures on chatgpt.com.

Fix: clear the deadline explicitly after SNI peek completes, before
the relay phase begins.

Also fixes two secondary issues:

- relayData socket leak: when the first relay direction completed,
  the function could block indefinitely waiting for the second
  goroutine if goproxy held the MITM connection open. Close writer
  and set a read deadline on target to force cleanup. Eliminates
  CLOSE_WAIT socket accumulation (75 leaked sockets observed).

- goproxy Transport stale connections: the MITM proxy Transport had
  no IdleConnTimeout, causing dead pooled connections to persist
  indefinitely. Add IdleConnTimeout (90s) and MaxIdleConnsPerHost (4).

- goproxy log noise: suppress expected broken pipe and handshake EOF
  warnings via a filtered logger. These are normal for short-lived
  polling connections (Telegram getUpdates).
@nnemirovsky nnemirovsky merged commit 3671534 into main Apr 10, 2026
6 checks passed
@nnemirovsky nnemirovsky deleted the fix/sni-defer-deadline-leak branch April 10, 2026 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant