Skip to content

Commit b643af1

Browse files
authored
fix(proxy): extract SNI before dial for SNI-deferred connections (#14)
Restructure handleConnect so that SNI-deferred connections extract the TLS ClientHello SNI BEFORE dialing through the MITM proxy. Previously, dial was called first with the raw IP as the CONNECT target, causing goproxy to use the IP for the upstream TLS ServerName. The real server's cert has DNS SANs (e.g. *.telegram.org), not IP SANs, so TLS verification failed with "cannot validate certificate for <IP>". New flow for SNI-deferred connections: 1. Send SOCKS5 CONNECT success (so client starts TLS handshake) 2. Peek ClientHello to extract SNI hostname 3. Update context FQDN with recovered hostname 4. Evaluate policy with hostname 5. Dial through MITM with hostname (not IP) Also: refactor relay logic into reusable relayData method, extract sniSaveRule helper to reduce duplication. Also: revised ECH DNS hostname recovery plan from review feedback.
1 parent 391ca9d commit b643af1

4 files changed

Lines changed: 240 additions & 86 deletions

File tree

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# ECH-aware hostname recovery via HTTPS/SVCB DNS records
2+
3+
## Overview
4+
5+
Add HTTPS/SVCB DNS record parsing to the DNS interceptor. When a DNS response contains HTTPS/SVCB records with `ipv4hint`/`ipv6hint` SvcParams, extract the IP addresses and store hostname -> IP mappings in a separate SVCB cache. This cache is always populated but only used as a hostname source when SNI disagrees with it (ECH connections where the real SNI is encrypted and the outer SNI is a dummy).
6+
7+
The hostname recovery priority becomes:
8+
1. FQDN from SOCKS5 CONNECT (if client sends hostname)
9+
2. SNI from TLS ClientHello (happy path for most TLS)
10+
3. SVCB DNS hint (when SNI disagrees with SVCB cache for the same IP, indicating ECH)
11+
4. A/AAAA DNS reverse cache (existing, for non-TLS)
12+
5. Raw IP (last resort)
13+
14+
## Context
15+
16+
- DNS interceptor: `internal/proxy/dns.go` (HandleQuery, forwards queries, gates PopulateFromResponse on A/AAAA types)
17+
- DNS reverse cache: `internal/proxy/dns_reverse.go` (ReverseDNSCache, PopulateFromResponse, parseDNSName)
18+
- SNI policy check: `internal/proxy/server.go` (sniPolicyCheck, handleConnect)
19+
- DNS constants: `internal/proxy/dns_reverse.go` (dnsTypeA, dnsTypeAAAA, dnsHeaderLen)
20+
21+
## Development Approach
22+
23+
- **Testing approach**: Regular (code first, then tests)
24+
- CRITICAL: every task MUST include new/updated tests
25+
- CRITICAL: all tests must pass before starting next task
26+
27+
## Testing Strategy
28+
29+
- Unit tests for SVCB record parsing: `internal/proxy/dns_test.go`
30+
- Unit tests for SVCB cache lookup: `internal/proxy/dns_reverse_test.go` (new file)
31+
- Integration tests for ECH fallback flow: `internal/proxy/server_test.go`
32+
33+
## Solution Overview
34+
35+
1. Add a separate `svcbEntries` map to `ReverseDNSCache` with `StoreSVCB`/`LookupSVCB` methods
36+
2. Extend `PopulateFromResponse` to parse HTTPS (type 65) and SVCB (type 64) records, using `parseDNSName` for the target name field (DNS name compression applies per RFC 9460)
37+
3. Modify `HandleQuery` in `dns.go` to call `PopulateFromResponse` for HTTPS/SVCB query types (currently gated to A/AAAA only)
38+
4. In `sniPolicyCheck`: after extracting SNI, compare it against the SVCB-cached hostname for the destination IP. If they differ, prefer the SVCB hostname (the SNI is likely a dummy ECH outer SNI)
39+
40+
**ECH detection approach**: no explicit ECH detection. Simply compare extracted SNI against SVCB-cached hostname for the same IP. If the SVCB cache says `149.154.167.220 -> example.com` but SNI says `cloudflare-ech.com`, prefer `example.com`. This works because SVCB hints are authoritative for the queried domain. No `HasECH` method needed.
41+
42+
## Progress Tracking
43+
44+
- Mark completed items with `[x]` immediately when done
45+
- Add newly discovered tasks with + prefix
46+
47+
## Implementation Steps
48+
49+
### Task 1: Add SVCB cache and parse HTTPS/SVCB records
50+
51+
**Files:**
52+
- Modify: `internal/proxy/dns_reverse.go`
53+
- Modify: `internal/proxy/dns.go`
54+
- Create: `internal/proxy/dns_reverse_test.go`
55+
56+
- [ ] Add constants: `dnsTypeHTTPS = 65`, `dnsTypeSVCB = 64`, `svcKeyIPv4Hint = 4`, `svcKeyIPv6Hint = 6`
57+
- [ ] Add `svcbEntries` map to `ReverseDNSCache` (separate from existing `entries`) with same TTL/bounds
58+
- [ ] Add `StoreSVCB(ip, hostname)` and `LookupSVCB(ip) string` methods
59+
- [ ] In `PopulateFromResponse`, add cases for HTTPS and SVCB record types
60+
- [ ] Parse SVCB RDATA: priority (2 bytes), target name (use `parseDNSName` for DNS compression), then SvcParam key-value pairs
61+
- [ ] Extract `ipv4hint` (key 4, addresses are 4 bytes each) and `ipv6hint` (key 6, 16 bytes each)
62+
- [ ] Store each hint IP -> hostname in `svcbEntries`
63+
- [ ] In `HandleQuery` (`dns.go`): extend the `PopulateFromResponse` gate to also include HTTPS and SVCB query types
64+
- [ ] Write tests: parse HTTPS record with ipv4hint and ipv6hint
65+
- [ ] Write tests: parse SVCB record, verify StoreSVCB/LookupSVCB
66+
- [ ] Write tests: malformed SVCB RDATA does not crash
67+
- [ ] Write tests: HTTPS record with no hints is a no-op
68+
- [ ] Run tests - must pass before next task
69+
70+
### Task 2: Use SVCB cache as ECH fallback in SNI policy check
71+
72+
**Files:**
73+
- Modify: `internal/proxy/server.go`
74+
- Test: `internal/proxy/server_test.go`
75+
76+
- [ ] In `sniPolicyCheck`: after extracting SNI, look up the destination IP in SVCB cache via `dnsInterceptor.LookupSVCB(ipStr)`
77+
- [ ] If SVCB hostname exists AND differs from extracted SNI, prefer the SVCB hostname (ECH detected)
78+
- [ ] Log `[SNI-ECH] <ip>:<port> SNI=<outer> SVCB=<real> (using SVCB hostname)` for observability
79+
- [ ] Add comment block in `sniPolicyCheck` documenting the hostname recovery priority chain
80+
- [ ] Write test: SNI differs from SVCB cache -> SVCB hostname used
81+
- [ ] Write test: SNI matches SVCB cache -> SNI used (normal case)
82+
- [ ] Write test: no SVCB cache entry -> SNI used as-is
83+
- [ ] Write test: empty SNI with SVCB cache hit -> SVCB hostname used
84+
- [ ] Run tests - must pass before next task
85+
86+
### Task 3: Verify acceptance criteria
87+
88+
- [ ] Verify HTTPS/SVCB records populate the SVCB cache
89+
- [ ] Verify ECH connections recover hostname via SVCB cache
90+
- [ ] Verify non-ECH connections still use SNI
91+
- [ ] Run full test suite: `go test ./... -v -timeout 30s`
92+
- [ ] Run linter: `golangci-lint run ./...`
93+
94+
### Task 4: [Final] Update documentation
95+
96+
- [ ] Add one-line note to CLAUDE.md DNS section about HTTPS/SVCB parsing for ECH fallback
97+
- [ ] Move this plan to `docs/plans/completed/`
98+
99+
## Post-Completion
100+
101+
**Manual verification:**
102+
- Deploy to knuth and verify SVCB record parsing with a Cloudflare-protected domain
103+
- Test with an ECH-enabled server (e.g., `crypto.cloudflare.com`)
104+
- Verify sluice logs show `[SNI-ECH]` entries for ECH connections
File renamed without changes.

internal/proxy/server.go

Lines changed: 109 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -220,16 +220,20 @@ func (r *policyRuleSet) Allow(ctx context.Context, req *socks5.Request) (context
220220
}
221221

222222
dest := req.DestAddr.FQDN
223-
ipOnly := false // true when SOCKS5 CONNECT had no FQDN and DNS cache missed
223+
ipOnly := false // true when SOCKS5 CONNECT had no FQDN
224224
if dest == "" {
225225
if req.DestAddr.IP != nil {
226226
ipStr := req.DestAddr.IP.String()
227-
// Recover hostname from DNS reverse cache. tun2proxy operates
228-
// at the network level and sends IP-only CONNECT requests.
229-
// The DNS interceptor caches IP -> hostname mappings from
230-
// responses, so we can show hostnames in approval messages
231-
// and evaluate hostname-based policy rules.
232-
if r.dnsInterceptor != nil {
227+
port := req.DestAddr.Port
228+
229+
// For TLS ports, always defer to SNI extraction (happy path).
230+
// SNI from the TLS ClientHello is more reliable than the DNS
231+
// reverse cache because it comes directly from the client and
232+
// doesn't expire. DNS cache is used only for non-TLS protocols.
233+
if isTLSPort(port) {
234+
dest = ipStr
235+
ipOnly = true
236+
} else if r.dnsInterceptor != nil {
233237
if hostname := r.dnsInterceptor.ReverseLookup(ipStr); hostname != "" {
234238
dest = hostname
235239
} else {
@@ -1082,6 +1086,51 @@ func dialThroughInjector(injectorAddr, host string, port int, authToken, pinID s
10821086
// the TLS ClientHello SNI, re-evaluates policy with the recovered hostname,
10831087
// and blocks on the approval flow if needed before relaying data.
10841088
func (s *Server) handleConnect(ctx context.Context, writer io.Writer, request *socks5.Request) error {
1089+
// SNI-deferred connections: extract SNI BEFORE dialing so the MITM proxy
1090+
// uses the recovered hostname (not the raw IP) for the upstream TLS
1091+
// ServerName. Without this, the upstream TLS handshake fails because the
1092+
// real server's cert has DNS SANs (e.g. *.telegram.org) but not IP SANs.
1093+
//
1094+
// Hostname recovery priority:
1095+
// 1. FQDN from SOCKS5 CONNECT (if client sends hostname)
1096+
// 2. SNI from TLS ClientHello (this code path)
1097+
// 3. DNS reverse cache (fallback for non-TLS)
1098+
// 4. Raw IP (last resort)
1099+
clientReader := request.Reader
1100+
if deferred, _ := ctx.Value(ctxKeySNIDeferred).(bool); deferred {
1101+
// Send SOCKS5 CONNECT success early so the client starts the TLS
1102+
// handshake, allowing us to peek the ClientHello for SNI.
1103+
// Use the destination address as bind address. The client expects
1104+
// a valid address in the SOCKS5 reply, not 0.0.0.0:0.
1105+
bindAddr := &net.TCPAddr{IP: request.DestAddr.IP, Port: request.DestAddr.Port}
1106+
if sendErr := socks5.SendReply(writer, statute.RepSuccess, bindAddr); sendErr != nil {
1107+
return fmt.Errorf("failed to send reply: %w", sendErr)
1108+
}
1109+
1110+
// Set a read deadline so SNI peeking doesn't block forever if the
1111+
// client is slow to send the TLS ClientHello.
1112+
if conn, ok := writer.(net.Conn); ok {
1113+
conn.SetReadDeadline(time.Now().Add(10 * time.Second)) //nolint:errcheck
1114+
defer conn.SetReadDeadline(time.Time{}) //nolint:errcheck
1115+
}
1116+
1117+
var allow bool
1118+
clientReader, ctx, allow = s.sniPolicyCheckBeforeDial(ctx, request)
1119+
if !allow {
1120+
return nil
1121+
}
1122+
1123+
// Dial with the updated context (FQDN now contains the SNI hostname).
1124+
target, err := s.dial(ctx, "tcp", request.DestAddr.String())
1125+
if err != nil {
1126+
return fmt.Errorf("connect to %v failed: %w", request.RawDestAddr, err)
1127+
}
1128+
defer target.Close() //nolint:errcheck
1129+
1130+
return s.relayData(clientReader, writer, target)
1131+
}
1132+
1133+
// Normal (non-deferred) path: dial first, then relay.
10851134
target, err := s.dial(ctx, "tcp", request.DestAddr.String())
10861135
if err != nil {
10871136
msg := err.Error()
@@ -1102,15 +1151,11 @@ func (s *Server) handleConnect(ctx context.Context, writer io.Writer, request *s
11021151
return fmt.Errorf("failed to send reply: %w", sendErr)
11031152
}
11041153

1105-
// SNI-deferred policy check: peek client bytes for TLS ClientHello.
1106-
clientReader := request.Reader
1107-
if deferred, _ := ctx.Value(ctxKeySNIDeferred).(bool); deferred {
1108-
clientReader = s.sniPolicyCheck(ctx, request, target)
1109-
if clientReader == nil {
1110-
return nil // connection closed by policy check
1111-
}
1112-
}
1154+
return s.relayData(clientReader, writer, target)
1155+
}
11131156

1157+
// relayData bidirectionally copies data between the client and target.
1158+
func (s *Server) relayData(clientReader io.Reader, writer io.Writer, target net.Conn) error {
11141159
errCh := make(chan error, 2)
11151160
go func() {
11161161
_, cpErr := io.Copy(target, clientReader)
@@ -1135,18 +1180,25 @@ func (s *Server) handleConnect(ctx context.Context, writer io.Writer, request *s
11351180
return nil
11361181
}
11371182

1138-
// sniPolicyCheck peeks the first bytes from the client to extract TLS SNI,
1139-
// then re-evaluates policy with the recovered hostname. Returns a reader
1140-
// that replays the peeked bytes followed by the rest of the client stream.
1141-
// Returns nil if the connection should be closed (policy denied or error).
1142-
func (s *Server) sniPolicyCheck(ctx context.Context, request *socks5.Request, target net.Conn) io.Reader {
1143-
buf, sni, err := peekSNI(request.Reader, 4096)
1183+
// sniPolicyCheckBeforeDial peeks the first bytes from the client to extract
1184+
// TLS SNI, re-evaluates policy with the recovered hostname, and updates the
1185+
// context FQDN so that the subsequent dial uses the hostname (not the raw IP)
1186+
// for the MITM upstream connection. Called BEFORE dial for SNI-deferred
1187+
// connections. Returns the client reader (with peeked bytes prepended), the
1188+
// updated context, and whether the connection should proceed.
1189+
func (s *Server) sniPolicyCheckBeforeDial(ctx context.Context, request *socks5.Request) (io.Reader, context.Context, bool) {
1190+
buf, sni, err := peekSNI(request.Reader, 8192)
11441191
if err != nil || sni == "" {
1145-
// Not TLS or no SNI. Fall through with original data.
1192+
// Not TLS or no SNI. Fall through with original data and IP-based context.
1193+
hexPrefix := ""
1194+
if len(buf) >= 6 {
1195+
hexPrefix = fmt.Sprintf(" first6=%02x", buf[:6])
1196+
}
1197+
log.Printf("[SNI-PEEK] no SNI extracted (err=%v, bufLen=%d, sni=%q%s)", err, len(buf), sni, hexPrefix)
11461198
if len(buf) > 0 {
1147-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1199+
return io.MultiReader(bytes.NewReader(buf), request.Reader), ctx, true
11481200
}
1149-
return request.Reader
1201+
return request.Reader, ctx, true
11501202
}
11511203

11521204
sni = strings.TrimRight(sni, ".")
@@ -1156,6 +1208,9 @@ func (s *Server) sniPolicyCheck(ctx context.Context, request *socks5.Request, ta
11561208

11571209
log.Printf("[SNI] %s -> %s:%d (recovered hostname via TLS ClientHello)", ipStr, sni, port)
11581210

1211+
// Update context FQDN so dial() uses the hostname for the MITM upstream.
1212+
ctx = context.WithValue(ctx, ctxKeyFQDN, sni)
1213+
11591214
// Populate DNS reverse cache for future connections.
11601215
if s.dnsInterceptor != nil {
11611216
s.dnsInterceptor.StoreReverse(ipStr, sni)
@@ -1167,83 +1222,65 @@ func (s *Server) sniPolicyCheck(ctx context.Context, request *socks5.Request, ta
11671222
eng = s.rules.engine.Load()
11681223
}
11691224
verdict := eng.Evaluate(sni, port)
1225+
reader := io.MultiReader(bytes.NewReader(buf), request.Reader)
11701226

11711227
switch verdict {
11721228
case policy.Allow:
11731229
log.Printf("[SNI->ALLOW] %s:%d (hostname %s matched allow rule)", ipStr, port, sni)
1174-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1230+
return reader, ctx, true
11751231
case policy.Deny:
11761232
log.Printf("[SNI->DENY] %s:%d (hostname %s matched deny rule)", ipStr, port, sni)
1177-
_ = target.Close()
1178-
return nil
1233+
return nil, ctx, false
11791234
case policy.Ask:
11801235
if s.rules.broker == nil {
11811236
log.Printf("[SNI->DENY] %s:%d (hostname %s: ask treated as deny, no broker)", ipStr, port, sni)
1182-
_ = target.Close()
1183-
return nil
1237+
return nil, ctx, false
11841238
}
11851239
log.Printf("[SNI->ASK] %s:%d (hostname %s: waiting for approval)", ipStr, port, sni)
11861240
timeout := time.Duration(eng.TimeoutSec) * time.Second
11871241
proto := DetectProtocol(port)
11881242
resp, reqErr := s.rules.broker.Request(sni, port, proto.String(), timeout)
11891243
if reqErr != nil {
11901244
log.Printf("[SNI->DENY] %s:%d (hostname %s: approval timeout: %v)", ipStr, port, sni, reqErr)
1191-
_ = target.Close()
1192-
return nil
1245+
return nil, ctx, false
11931246
}
11941247
switch resp {
11951248
case channel.ResponseAllowOnce:
11961249
log.Printf("[SNI->ALLOW] %s:%d (hostname %s: user approved once)", ipStr, port, sni)
1197-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1250+
return reader, ctx, true
11981251
case channel.ResponseAlwaysAllow:
11991252
log.Printf("[SNI->ALLOW+SAVE] %s:%d (hostname %s: user approved always)", ipStr, port, sni)
1200-
s.rules.reloadMu.Lock()
1201-
func() {
1202-
defer s.rules.reloadMu.Unlock()
1203-
if s.rules.store != nil {
1204-
if _, storeErr := s.rules.store.AddRule("allow", store.RuleOpts{Destination: sni, Ports: []int{port}, Source: "approval"}); storeErr != nil {
1205-
log.Printf("[WARN] failed to persist allow rule for %s:%d: %v", sni, port, storeErr)
1206-
}
1207-
if newEng, recompErr := policy.LoadFromStore(s.rules.store); recompErr != nil {
1208-
log.Printf("[WARN] failed to recompile engine after SNI always-allow: %v", recompErr)
1209-
} else if valErr := newEng.Validate(); valErr != nil {
1210-
log.Printf("[WARN] engine validation failed after SNI always-allow: %v", valErr)
1211-
} else {
1212-
s.rules.engine.Store(newEng)
1213-
}
1214-
}
1215-
}()
1216-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1253+
s.sniSaveRule("allow", sni, port)
1254+
return reader, ctx, true
12171255
case channel.ResponseAlwaysDeny:
12181256
log.Printf("[SNI->DENY+SAVE] %s:%d (hostname %s: user denied always)", ipStr, port, sni)
1219-
s.rules.reloadMu.Lock()
1220-
func() {
1221-
defer s.rules.reloadMu.Unlock()
1222-
if s.rules.store != nil {
1223-
if _, storeErr := s.rules.store.AddRule("deny", store.RuleOpts{Destination: sni, Ports: []int{port}, Source: "approval"}); storeErr != nil {
1224-
log.Printf("[WARN] failed to persist deny rule for %s:%d: %v", sni, port, storeErr)
1225-
}
1226-
if newEng, recompErr := policy.LoadFromStore(s.rules.store); recompErr != nil {
1227-
log.Printf("[WARN] failed to recompile engine after SNI always-deny: %v", recompErr)
1228-
} else if valErr := newEng.Validate(); valErr != nil {
1229-
log.Printf("[WARN] engine validation failed after SNI always-deny: %v", valErr)
1230-
} else {
1231-
s.rules.engine.Store(newEng)
1232-
}
1233-
}
1234-
}()
1235-
_ = target.Close()
1236-
return nil
1257+
s.sniSaveRule("deny", sni, port)
1258+
return nil, ctx, false
12371259
default:
12381260
log.Printf("[SNI->DENY] %s:%d (hostname %s: user denied)", ipStr, port, sni)
1239-
_ = target.Close()
1240-
return nil
1261+
return nil, ctx, false
12411262
}
12421263
default:
1243-
// Default verdict (typically deny). Use it.
12441264
log.Printf("[SNI->DENY] %s:%d (hostname %s: default deny)", ipStr, port, sni)
1245-
_ = target.Close()
1246-
return nil
1265+
return nil, ctx, false
1266+
}
1267+
}
1268+
1269+
// sniSaveRule persists an allow or deny rule from an SNI-based approval.
1270+
func (s *Server) sniSaveRule(verdict, dest string, port int) {
1271+
s.rules.reloadMu.Lock()
1272+
defer s.rules.reloadMu.Unlock()
1273+
if s.rules.store != nil {
1274+
if _, storeErr := s.rules.store.AddRule(verdict, store.RuleOpts{Destination: dest, Ports: []int{port}, Source: "approval"}); storeErr != nil {
1275+
log.Printf("[WARN] failed to persist %s rule for %s:%d: %v", verdict, dest, port, storeErr)
1276+
}
1277+
if newEng, recompErr := policy.LoadFromStore(s.rules.store); recompErr != nil {
1278+
log.Printf("[WARN] failed to recompile engine after SNI %s: %v", verdict, recompErr)
1279+
} else if valErr := newEng.Validate(); valErr != nil {
1280+
log.Printf("[WARN] engine validation failed after SNI %s: %v", verdict, valErr)
1281+
} else {
1282+
s.rules.engine.Store(newEng)
1283+
}
12471284
}
12481285
}
12491286

0 commit comments

Comments
 (0)