Skip to content

Commit 41ecfd3

Browse files
committed
fix(proxy): extract SNI before dial for SNI-deferred connections
Restructure handleConnect so that SNI-deferred connections extract the TLS ClientHello SNI BEFORE dialing through the MITM proxy. Previously, dial was called first with the raw IP as the CONNECT target, causing goproxy to use the IP for the upstream TLS ServerName. The real server's cert has DNS SANs (e.g. *.telegram.org), not IP SANs, so TLS verification failed with "cannot validate certificate for <IP>". New flow for SNI-deferred connections: 1. Send SOCKS5 CONNECT success (so client starts TLS handshake) 2. Peek ClientHello to extract SNI hostname 3. Update context FQDN with recovered hostname 4. Evaluate policy with hostname 5. Dial through MITM with hostname (not IP) Also: refactor relay logic into reusable relayData method, extract sniSaveRule helper to reduce duplication. Also: revised ECH DNS hostname recovery plan from review feedback.
1 parent 391ca9d commit 41ecfd3

2 files changed

Lines changed: 186 additions & 64 deletions

File tree

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# ECH-aware hostname recovery via HTTPS/SVCB DNS records
2+
3+
## Overview
4+
5+
Add HTTPS/SVCB DNS record parsing to the DNS interceptor. When a DNS response contains HTTPS/SVCB records with `ipv4hint`/`ipv6hint` SvcParams, extract the IP addresses and store hostname -> IP mappings in a separate SVCB cache. This cache is always populated but only used as a hostname source when SNI disagrees with it (ECH connections where the real SNI is encrypted and the outer SNI is a dummy).
6+
7+
The hostname recovery priority becomes:
8+
1. FQDN from SOCKS5 CONNECT (if client sends hostname)
9+
2. SNI from TLS ClientHello (happy path for most TLS)
10+
3. SVCB DNS hint (when SNI disagrees with SVCB cache for the same IP, indicating ECH)
11+
4. A/AAAA DNS reverse cache (existing, for non-TLS)
12+
5. Raw IP (last resort)
13+
14+
## Context
15+
16+
- DNS interceptor: `internal/proxy/dns.go` (HandleQuery, forwards queries, gates PopulateFromResponse on A/AAAA types)
17+
- DNS reverse cache: `internal/proxy/dns_reverse.go` (ReverseDNSCache, PopulateFromResponse, parseDNSName)
18+
- SNI policy check: `internal/proxy/server.go` (sniPolicyCheck, handleConnect)
19+
- DNS constants: `internal/proxy/dns_reverse.go` (dnsTypeA, dnsTypeAAAA, dnsHeaderLen)
20+
21+
## Development Approach
22+
23+
- **Testing approach**: Regular (code first, then tests)
24+
- CRITICAL: every task MUST include new/updated tests
25+
- CRITICAL: all tests must pass before starting next task
26+
27+
## Testing Strategy
28+
29+
- Unit tests for SVCB record parsing: `internal/proxy/dns_test.go`
30+
- Unit tests for SVCB cache lookup: `internal/proxy/dns_reverse_test.go` (new file)
31+
- Integration tests for ECH fallback flow: `internal/proxy/server_test.go`
32+
33+
## Solution Overview
34+
35+
1. Add a separate `svcbEntries` map to `ReverseDNSCache` with `StoreSVCB`/`LookupSVCB` methods
36+
2. Extend `PopulateFromResponse` to parse HTTPS (type 65) and SVCB (type 64) records, using `parseDNSName` for the target name field (DNS name compression applies per RFC 9460)
37+
3. Modify `HandleQuery` in `dns.go` to call `PopulateFromResponse` for HTTPS/SVCB query types (currently gated to A/AAAA only)
38+
4. In `sniPolicyCheck`: after extracting SNI, compare it against the SVCB-cached hostname for the destination IP. If they differ, prefer the SVCB hostname (the SNI is likely a dummy ECH outer SNI)
39+
40+
**ECH detection approach**: no explicit ECH detection. Simply compare extracted SNI against SVCB-cached hostname for the same IP. If the SVCB cache says `149.154.167.220 -> example.com` but SNI says `cloudflare-ech.com`, prefer `example.com`. This works because SVCB hints are authoritative for the queried domain. No `HasECH` method needed.
41+
42+
## Progress Tracking
43+
44+
- Mark completed items with `[x]` immediately when done
45+
- Add newly discovered tasks with + prefix
46+
47+
## Implementation Steps
48+
49+
### Task 1: Add SVCB cache and parse HTTPS/SVCB records
50+
51+
**Files:**
52+
- Modify: `internal/proxy/dns_reverse.go`
53+
- Modify: `internal/proxy/dns.go`
54+
- Create: `internal/proxy/dns_reverse_test.go`
55+
56+
- [ ] Add constants: `dnsTypeHTTPS = 65`, `dnsTypeSVCB = 64`, `svcKeyIPv4Hint = 4`, `svcKeyIPv6Hint = 6`
57+
- [ ] Add `svcbEntries` map to `ReverseDNSCache` (separate from existing `entries`) with same TTL/bounds
58+
- [ ] Add `StoreSVCB(ip, hostname)` and `LookupSVCB(ip) string` methods
59+
- [ ] In `PopulateFromResponse`, add cases for HTTPS and SVCB record types
60+
- [ ] Parse SVCB RDATA: priority (2 bytes), target name (use `parseDNSName` for DNS compression), then SvcParam key-value pairs
61+
- [ ] Extract `ipv4hint` (key 4, addresses are 4 bytes each) and `ipv6hint` (key 6, 16 bytes each)
62+
- [ ] Store each hint IP -> hostname in `svcbEntries`
63+
- [ ] In `HandleQuery` (`dns.go`): extend the `PopulateFromResponse` gate to also include HTTPS and SVCB query types
64+
- [ ] Write tests: parse HTTPS record with ipv4hint and ipv6hint
65+
- [ ] Write tests: parse SVCB record, verify StoreSVCB/LookupSVCB
66+
- [ ] Write tests: malformed SVCB RDATA does not crash
67+
- [ ] Write tests: HTTPS record with no hints is a no-op
68+
- [ ] Run tests - must pass before next task
69+
70+
### Task 2: Use SVCB cache as ECH fallback in SNI policy check
71+
72+
**Files:**
73+
- Modify: `internal/proxy/server.go`
74+
- Test: `internal/proxy/server_test.go`
75+
76+
- [ ] In `sniPolicyCheck`: after extracting SNI, look up the destination IP in SVCB cache via `dnsInterceptor.LookupSVCB(ipStr)`
77+
- [ ] If SVCB hostname exists AND differs from extracted SNI, prefer the SVCB hostname (ECH detected)
78+
- [ ] Log `[SNI-ECH] <ip>:<port> SNI=<outer> SVCB=<real> (using SVCB hostname)` for observability
79+
- [ ] Add comment block in `sniPolicyCheck` documenting the hostname recovery priority chain
80+
- [ ] Write test: SNI differs from SVCB cache -> SVCB hostname used
81+
- [ ] Write test: SNI matches SVCB cache -> SNI used (normal case)
82+
- [ ] Write test: no SVCB cache entry -> SNI used as-is
83+
- [ ] Write test: empty SNI with SVCB cache hit -> SVCB hostname used
84+
- [ ] Run tests - must pass before next task
85+
86+
### Task 3: Verify acceptance criteria
87+
88+
- [ ] Verify HTTPS/SVCB records populate the SVCB cache
89+
- [ ] Verify ECH connections recover hostname via SVCB cache
90+
- [ ] Verify non-ECH connections still use SNI
91+
- [ ] Run full test suite: `go test ./... -v -timeout 30s`
92+
- [ ] Run linter: `golangci-lint run ./...`
93+
94+
### Task 4: [Final] Update documentation
95+
96+
- [ ] Add one-line note to CLAUDE.md DNS section about HTTPS/SVCB parsing for ECH fallback
97+
- [ ] Move this plan to `docs/plans/completed/`
98+
99+
## Post-Completion
100+
101+
**Manual verification:**
102+
- Deploy to knuth and verify SVCB record parsing with a Cloudflare-protected domain
103+
- Test with an ECH-enabled server (e.g., `crypto.cloudflare.com`)
104+
- Verify sluice logs show `[SNI-ECH]` entries for ECH connections

internal/proxy/server.go

Lines changed: 82 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1082,6 +1082,41 @@ func dialThroughInjector(injectorAddr, host string, port int, authToken, pinID s
10821082
// the TLS ClientHello SNI, re-evaluates policy with the recovered hostname,
10831083
// and blocks on the approval flow if needed before relaying data.
10841084
func (s *Server) handleConnect(ctx context.Context, writer io.Writer, request *socks5.Request) error {
1085+
// SNI-deferred connections: extract SNI BEFORE dialing so the MITM proxy
1086+
// uses the recovered hostname (not the raw IP) for the upstream TLS
1087+
// ServerName. Without this, the upstream TLS handshake fails because the
1088+
// real server's cert has DNS SANs (e.g. *.telegram.org) but not IP SANs.
1089+
//
1090+
// Hostname recovery priority:
1091+
// 1. FQDN from SOCKS5 CONNECT (if client sends hostname)
1092+
// 2. SNI from TLS ClientHello (this code path)
1093+
// 3. DNS reverse cache (fallback for non-TLS)
1094+
// 4. Raw IP (last resort)
1095+
clientReader := request.Reader
1096+
if deferred, _ := ctx.Value(ctxKeySNIDeferred).(bool); deferred {
1097+
// Send SOCKS5 CONNECT success early so the client starts the TLS
1098+
// handshake, allowing us to peek the ClientHello for SNI.
1099+
if sendErr := socks5.SendReply(writer, statute.RepSuccess, nil); sendErr != nil {
1100+
return fmt.Errorf("failed to send reply: %w", sendErr)
1101+
}
1102+
1103+
var allow bool
1104+
clientReader, ctx, allow = s.sniPolicyCheckBeforeDial(ctx, request)
1105+
if !allow {
1106+
return nil
1107+
}
1108+
1109+
// Dial with the updated context (FQDN now contains the SNI hostname).
1110+
target, err := s.dial(ctx, "tcp", request.DestAddr.String())
1111+
if err != nil {
1112+
return fmt.Errorf("connect to %v failed: %w", request.RawDestAddr, err)
1113+
}
1114+
defer target.Close() //nolint:errcheck
1115+
1116+
return s.relayData(clientReader, writer, target)
1117+
}
1118+
1119+
// Normal (non-deferred) path: dial first, then relay.
10851120
target, err := s.dial(ctx, "tcp", request.DestAddr.String())
10861121
if err != nil {
10871122
msg := err.Error()
@@ -1102,15 +1137,11 @@ func (s *Server) handleConnect(ctx context.Context, writer io.Writer, request *s
11021137
return fmt.Errorf("failed to send reply: %w", sendErr)
11031138
}
11041139

1105-
// SNI-deferred policy check: peek client bytes for TLS ClientHello.
1106-
clientReader := request.Reader
1107-
if deferred, _ := ctx.Value(ctxKeySNIDeferred).(bool); deferred {
1108-
clientReader = s.sniPolicyCheck(ctx, request, target)
1109-
if clientReader == nil {
1110-
return nil // connection closed by policy check
1111-
}
1112-
}
1140+
return s.relayData(clientReader, writer, target)
1141+
}
11131142

1143+
// relayData bidirectionally copies data between the client and target.
1144+
func (s *Server) relayData(clientReader io.Reader, writer io.Writer, target net.Conn) error {
11141145
errCh := make(chan error, 2)
11151146
go func() {
11161147
_, cpErr := io.Copy(target, clientReader)
@@ -1135,18 +1166,20 @@ func (s *Server) handleConnect(ctx context.Context, writer io.Writer, request *s
11351166
return nil
11361167
}
11371168

1138-
// sniPolicyCheck peeks the first bytes from the client to extract TLS SNI,
1139-
// then re-evaluates policy with the recovered hostname. Returns a reader
1140-
// that replays the peeked bytes followed by the rest of the client stream.
1141-
// Returns nil if the connection should be closed (policy denied or error).
1142-
func (s *Server) sniPolicyCheck(ctx context.Context, request *socks5.Request, target net.Conn) io.Reader {
1169+
// sniPolicyCheckBeforeDial peeks the first bytes from the client to extract
1170+
// TLS SNI, re-evaluates policy with the recovered hostname, and updates the
1171+
// context FQDN so that the subsequent dial uses the hostname (not the raw IP)
1172+
// for the MITM upstream connection. Called BEFORE dial for SNI-deferred
1173+
// connections. Returns the client reader (with peeked bytes prepended), the
1174+
// updated context, and whether the connection should proceed.
1175+
func (s *Server) sniPolicyCheckBeforeDial(ctx context.Context, request *socks5.Request) (io.Reader, context.Context, bool) {
11431176
buf, sni, err := peekSNI(request.Reader, 4096)
11441177
if err != nil || sni == "" {
1145-
// Not TLS or no SNI. Fall through with original data.
1178+
// Not TLS or no SNI. Fall through with original data and IP-based context.
11461179
if len(buf) > 0 {
1147-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1180+
return io.MultiReader(bytes.NewReader(buf), request.Reader), ctx, true
11481181
}
1149-
return request.Reader
1182+
return request.Reader, ctx, true
11501183
}
11511184

11521185
sni = strings.TrimRight(sni, ".")
@@ -1156,6 +1189,9 @@ func (s *Server) sniPolicyCheck(ctx context.Context, request *socks5.Request, ta
11561189

11571190
log.Printf("[SNI] %s -> %s:%d (recovered hostname via TLS ClientHello)", ipStr, sni, port)
11581191

1192+
// Update context FQDN so dial() uses the hostname for the MITM upstream.
1193+
ctx = context.WithValue(ctx, ctxKeyFQDN, sni)
1194+
11591195
// Populate DNS reverse cache for future connections.
11601196
if s.dnsInterceptor != nil {
11611197
s.dnsInterceptor.StoreReverse(ipStr, sni)
@@ -1167,83 +1203,65 @@ func (s *Server) sniPolicyCheck(ctx context.Context, request *socks5.Request, ta
11671203
eng = s.rules.engine.Load()
11681204
}
11691205
verdict := eng.Evaluate(sni, port)
1206+
reader := io.MultiReader(bytes.NewReader(buf), request.Reader)
11701207

11711208
switch verdict {
11721209
case policy.Allow:
11731210
log.Printf("[SNI->ALLOW] %s:%d (hostname %s matched allow rule)", ipStr, port, sni)
1174-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1211+
return reader, ctx, true
11751212
case policy.Deny:
11761213
log.Printf("[SNI->DENY] %s:%d (hostname %s matched deny rule)", ipStr, port, sni)
1177-
_ = target.Close()
1178-
return nil
1214+
return nil, ctx, false
11791215
case policy.Ask:
11801216
if s.rules.broker == nil {
11811217
log.Printf("[SNI->DENY] %s:%d (hostname %s: ask treated as deny, no broker)", ipStr, port, sni)
1182-
_ = target.Close()
1183-
return nil
1218+
return nil, ctx, false
11841219
}
11851220
log.Printf("[SNI->ASK] %s:%d (hostname %s: waiting for approval)", ipStr, port, sni)
11861221
timeout := time.Duration(eng.TimeoutSec) * time.Second
11871222
proto := DetectProtocol(port)
11881223
resp, reqErr := s.rules.broker.Request(sni, port, proto.String(), timeout)
11891224
if reqErr != nil {
11901225
log.Printf("[SNI->DENY] %s:%d (hostname %s: approval timeout: %v)", ipStr, port, sni, reqErr)
1191-
_ = target.Close()
1192-
return nil
1226+
return nil, ctx, false
11931227
}
11941228
switch resp {
11951229
case channel.ResponseAllowOnce:
11961230
log.Printf("[SNI->ALLOW] %s:%d (hostname %s: user approved once)", ipStr, port, sni)
1197-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1231+
return reader, ctx, true
11981232
case channel.ResponseAlwaysAllow:
11991233
log.Printf("[SNI->ALLOW+SAVE] %s:%d (hostname %s: user approved always)", ipStr, port, sni)
1200-
s.rules.reloadMu.Lock()
1201-
func() {
1202-
defer s.rules.reloadMu.Unlock()
1203-
if s.rules.store != nil {
1204-
if _, storeErr := s.rules.store.AddRule("allow", store.RuleOpts{Destination: sni, Ports: []int{port}, Source: "approval"}); storeErr != nil {
1205-
log.Printf("[WARN] failed to persist allow rule for %s:%d: %v", sni, port, storeErr)
1206-
}
1207-
if newEng, recompErr := policy.LoadFromStore(s.rules.store); recompErr != nil {
1208-
log.Printf("[WARN] failed to recompile engine after SNI always-allow: %v", recompErr)
1209-
} else if valErr := newEng.Validate(); valErr != nil {
1210-
log.Printf("[WARN] engine validation failed after SNI always-allow: %v", valErr)
1211-
} else {
1212-
s.rules.engine.Store(newEng)
1213-
}
1214-
}
1215-
}()
1216-
return io.MultiReader(bytes.NewReader(buf), request.Reader)
1234+
s.sniSaveRule("allow", sni, port)
1235+
return reader, ctx, true
12171236
case channel.ResponseAlwaysDeny:
12181237
log.Printf("[SNI->DENY+SAVE] %s:%d (hostname %s: user denied always)", ipStr, port, sni)
1219-
s.rules.reloadMu.Lock()
1220-
func() {
1221-
defer s.rules.reloadMu.Unlock()
1222-
if s.rules.store != nil {
1223-
if _, storeErr := s.rules.store.AddRule("deny", store.RuleOpts{Destination: sni, Ports: []int{port}, Source: "approval"}); storeErr != nil {
1224-
log.Printf("[WARN] failed to persist deny rule for %s:%d: %v", sni, port, storeErr)
1225-
}
1226-
if newEng, recompErr := policy.LoadFromStore(s.rules.store); recompErr != nil {
1227-
log.Printf("[WARN] failed to recompile engine after SNI always-deny: %v", recompErr)
1228-
} else if valErr := newEng.Validate(); valErr != nil {
1229-
log.Printf("[WARN] engine validation failed after SNI always-deny: %v", valErr)
1230-
} else {
1231-
s.rules.engine.Store(newEng)
1232-
}
1233-
}
1234-
}()
1235-
_ = target.Close()
1236-
return nil
1238+
s.sniSaveRule("deny", sni, port)
1239+
return nil, ctx, false
12371240
default:
12381241
log.Printf("[SNI->DENY] %s:%d (hostname %s: user denied)", ipStr, port, sni)
1239-
_ = target.Close()
1240-
return nil
1242+
return nil, ctx, false
12411243
}
12421244
default:
1243-
// Default verdict (typically deny). Use it.
12441245
log.Printf("[SNI->DENY] %s:%d (hostname %s: default deny)", ipStr, port, sni)
1245-
_ = target.Close()
1246-
return nil
1246+
return nil, ctx, false
1247+
}
1248+
}
1249+
1250+
// sniSaveRule persists an allow or deny rule from an SNI-based approval.
1251+
func (s *Server) sniSaveRule(verdict, dest string, port int) {
1252+
s.rules.reloadMu.Lock()
1253+
defer s.rules.reloadMu.Unlock()
1254+
if s.rules.store != nil {
1255+
if _, storeErr := s.rules.store.AddRule(verdict, store.RuleOpts{Destination: dest, Ports: []int{port}, Source: "approval"}); storeErr != nil {
1256+
log.Printf("[WARN] failed to persist %s rule for %s:%d: %v", verdict, dest, port, storeErr)
1257+
}
1258+
if newEng, recompErr := policy.LoadFromStore(s.rules.store); recompErr != nil {
1259+
log.Printf("[WARN] failed to recompile engine after SNI %s: %v", verdict, recompErr)
1260+
} else if valErr := newEng.Validate(); valErr != nil {
1261+
log.Printf("[WARN] engine validation failed after SNI %s: %v", verdict, valErr)
1262+
} else {
1263+
s.rules.engine.Store(newEng)
1264+
}
12471265
}
12481266
}
12491267

0 commit comments

Comments
 (0)