|
| 1 | +# Fix upstream TLS for SNI-deferred connections |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +When a SOCKS5 CONNECT arrives with a raw IP and the connection is SNI-deferred, the custom `handleConnect` calls `s.dial()` before `sniPolicyCheck`. The dial function routes through the injector with the raw IP as the CONNECT target. goproxy reconstructs the upstream URL as `https://<IP>:443/...` and connects to the real server using the IP for TLS ServerName verification. The real server's cert only has DNS SANs (e.g. `*.telegram.org`), not IP SANs, so verification fails with `x509: cannot validate certificate for <IP> because it doesn't contain any IP SANs`. |
| 6 | + |
| 7 | +The MITM cert generation is not the problem (goproxy's `sniAwareTLSConfig` correctly reads SNI from the ClientHello for client-side certs). The issue is the upstream connection using the raw IP as TLS ServerName. |
| 8 | + |
| 9 | +Observed in production: OpenClaw's Telegram client falls back to direct IP connections when DNS-resolved IPs are unreachable, causing all Telegram communication to break. |
| 10 | + |
| 11 | +**Acceptance criteria:** |
| 12 | +- SNI-deferred IP connections to HTTPS servers with hostname-only certs complete successfully |
| 13 | +- No x509 IP SANs errors in sluice logs for SNI-deferred connections |
| 14 | +- Credential injection works for IP-only Telegram connections |
| 15 | + |
| 16 | +## Context |
| 17 | + |
| 18 | +- Custom connect handler: `internal/proxy/server.go` (handleConnect, sniPolicyCheck) |
| 19 | +- SNI extraction: `internal/proxy/sni.go` (extractSNI, peekSNI) |
| 20 | +- Injector dial: `internal/proxy/server.go` (dial, dialThroughInjector) |
| 21 | +- FQDN context key: `internal/proxy/server.go` (ctxKeyFQDN) |
| 22 | +- goproxy upstream URL: goproxy reconstructs from CONNECT target host |
| 23 | + |
| 24 | +**Root cause:** In `handleConnect`, `s.dial()` is called at line ~1090 BEFORE `sniPolicyCheck` at line ~1107. By the time SNI is extracted, the injector already has CONNECT to the raw IP. The fix must extract SNI BEFORE dialing. |
| 25 | + |
| 26 | +## Development Approach |
| 27 | + |
| 28 | +- **Testing approach**: Regular (code first, then tests) |
| 29 | +- CRITICAL: every task MUST include new/updated tests |
| 30 | +- CRITICAL: all tests must pass before starting next task |
| 31 | + |
| 32 | +## Testing Strategy |
| 33 | + |
| 34 | +- Unit tests in `internal/proxy/sni_test.go` for SNI extraction (existing) |
| 35 | +- New integration tests in `internal/proxy/server_test.go` for the SNI-deferred MITM pipeline |
| 36 | +- Test infrastructure: SOCKS5 client sending IP-only CONNECT with a TLS ClientHello containing hostname SNI, upstream HTTPS server with hostname-only cert |
| 37 | + |
| 38 | +## Implementation Steps |
| 39 | + |
| 40 | +### Task 1: Restructure handleConnect for SNI-deferred connections |
| 41 | + |
| 42 | +**Files:** |
| 43 | +- Modify: `internal/proxy/server.go` |
| 44 | +- Test: `internal/proxy/server_test.go` |
| 45 | + |
| 46 | +- [ ] For SNI-deferred connections (ctxKeySNIDeferred is true), change the flow: |
| 47 | + 1. Send SOCKS5 CONNECT success to client (already done) |
| 48 | + 2. Peek ClientHello from client reader to extract SNI |
| 49 | + 3. Evaluate policy with recovered hostname (existing sniPolicyCheck logic) |
| 50 | + 4. Update context ctxKeyFQDN with the recovered hostname |
| 51 | + 5. THEN call s.dial() which routes through injector with correct hostname |
| 52 | + 6. Relay data with the peeked bytes prepended to client reader |
| 53 | +- [ ] Move sniPolicyCheck call BEFORE s.dial() in the SNI-deferred branch |
| 54 | +- [ ] Update sniPolicyCheck to return the recovered hostname so handleConnect can update context |
| 55 | +- [ ] For non-deferred connections, keep the existing flow unchanged |
| 56 | +- [ ] Write test: IP-only SOCKS5 CONNECT with hostname SNI routes upstream using hostname as TLS ServerName |
| 57 | +- [ ] Write test: IP-only CONNECT where SNI extraction fails (no TLS / malformed ClientHello) falls back to IP |
| 58 | +- [ ] Write test: policy denial after SNI extraction closes connection |
| 59 | +- [ ] Run tests - must pass before next task |
| 60 | + |
| 61 | +### Task 2: Verify acceptance criteria |
| 62 | + |
| 63 | +- [ ] Run full test suite: `go test ./... -v -timeout 30s` |
| 64 | +- [ ] Verify no x509 IP SANs warnings in test output |
| 65 | +- [ ] Run linter: `golangci-lint run ./...` |
| 66 | + |
| 67 | +### Task 3: [Final] Update documentation |
| 68 | + |
| 69 | +- [ ] Update CLAUDE.md SNI deferral section to note the dial ordering fix |
| 70 | +- [ ] Move this plan to `docs/plans/completed/` |
| 71 | + |
| 72 | +## Post-Completion |
| 73 | + |
| 74 | +**Manual verification:** |
| 75 | +- Deploy to knuth, restart stack |
| 76 | +- Verify OpenClaw's Telegram fallback IP connections work through MITM |
| 77 | +- Verify no `IP SANs` errors in sluice logs |
| 78 | +- Verify credential injection works for IP-only Telegram connections |
| 79 | +- Force fallback by temporarily blocking DNS for api.telegram.org |
0 commit comments