Skip to content

Commit 89f06b5

Browse files
authored
feat(registries): harden registry fetches against SSRF (CWE-918, MCP-1076) (#745)
* feat(registries): harden registry fetches against SSRF (CWE-918, MCP-1076) Registry server-list fetches use a user-supplied URL (`registry add-source`), which CodeQL flags as go/request-forgery (#97/#98): a malicious or typo'd registry source could point the daemon at internal/metadata endpoints (169.254.169.254, RFC1918 ranges). Add an SSRF guard centered on a single `isBlockedIP` predicate (loopback, RFC1918, RFC6598 CGNAT, link-local incl. the cloud-metadata endpoint, IPv6 unique-local, unspecified, multicast; fails closed), applied at three layers: - dial time: a net.Dialer Control hook on the shared registry HTTP client rejects the actual resolved IP before connect — covers hostnames that resolve into blocked ranges and closes the DNS-rebinding TOCTOU window a parse-time check alone cannot. - fetch pre-flight: validateRegistryURL rejects a literal-IP host (defense in depth + an explicit barrier at the http.NewRequest sink). - add-source/edit-source: literal-IP fail-fast (no DNS, stays pure/offline) so a bad source is refused up front with the stable invalid_registry_url code. Opt out with a new top-level `allow_private_registry_fetch` config flag (default false) for operators who run a trusted registry mirror on an internal/private address. TDD: table tests for every blocked/allowed range (incl. CGNAT and RFC1918 boundaries), an end-to-end loopback dial block, add-source literal-IP rejection, and the two registry-add E2E daemon tests opt in via the new flag. * test(server): opt loopback registry fetch tests past the SSRF guard The in-process server tests that fetch a registry served on a loopback httptest server (consistency_official, mcp_add_from_registry's startTestRegistry helper) now hit the new SSRF dial guard, because the registries-package test bypass does not apply to the server package's test binary. Set AllowPrivateRegistryFetch on their SetRegistriesFromConfig configs — exactly how an operator opts in for a trusted internal mirror — so the fetch is permitted while the production guard stays active for the add-source rejection test. Fixes the linux/arm64 Build Binaries leg (TestCrossSurfaceConsistency_*, TestHandleUpstreamServers_AddFromRegistry_*). * fix(registries): close SSRF proxy-bypass with app-layer host resolution CodexReviewer (round 2) found a real gap: the registry client clones http.DefaultTransport, which keeps Proxy: ProxyFromEnvironment. With HTTP_PROXY/HTTPS_PROXY set, the dialer connects to the PROXY as the first hop, so registryDialControl only ever validates the proxy's IP — never the real target. A hostname resolving to loopback/private/link-local (incl. 169.254.169.254) could still be fetched through a proxy. Add an application-layer guard (guardRegistryTargetHost) that resolves the target host and rejects the fetch if ANY resolved IP is in a blocked range. It runs before the request in registryGet, independent of the transport/proxy, so it holds in the proxy case; the dial-time Control stays as defense-in-depth for the direct path and DNS-rebind TOCTOU. Fail-open on resolver error (a flaky resolver must not break every fetch); literal IPs already covered pre-flight; relaxed by the allow_private_registry_fetch opt-in. Regression coverage: a public-looking hostname resolving to 169.254.169.254 is refused with HTTP(S)_PROXY set; plus direct predicate tests (mixed public+private resolution blocked, all-public allowed, resolver-error fail-open, public literal allowed). Resolver is injectable via a package var for deterministic, network-free tests.
1 parent 85f60a9 commit 89f06b5

15 files changed

Lines changed: 548 additions & 2 deletions

docs/configuration.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -931,6 +931,19 @@ the `registries` array only to **add your own** custom source:
931931
| `protocol` | string | Registry protocol type |
932932
| `count` | number/string | Number of servers in registry (auto-populated) |
933933

934+
**SSRF guard (`allow_private_registry_fetch`).** Because the daemon fetches the
935+
URL you configure, registry fetches refuse any host that is — or resolves to — a
936+
non-routable address (loopback, RFC1918/CGNAT private, link-local including the
937+
`169.254.169.254` cloud-metadata endpoint). This bounds CWE-918 request forgery
938+
against internal services. Set this top-level flag to `true` **only** if you
939+
intentionally run a trusted registry mirror on an internal/private address:
940+
941+
```json
942+
{ "allow_private_registry_fetch": true }
943+
```
944+
945+
Default `false` (secure). See [Registries Documentation](registries.md#adding-your-own-registry-source).
946+
934947
**Default Registries** (shipped built-in, no configuration required):
935948
- `official` — Official MCP Registry (`modelcontextprotocol/registry`): primary, zero-config aggregator
936949
- `reference` — Reference Servers (`builtin/reference`): curated `@modelcontextprotocol` servers, shipped in-binary so the basics work offline

docs/registries.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,26 @@ The ID is derived from the host when omitted; `--protocol` defaults to
7474
This requires a running daemon — the registry list is updated copy-on-write on the
7575
runtime config snapshot and persisted to `mcp_config.json`.
7676

77+
**SSRF guard (CWE-918).** Because the daemon fetches a URL you supply, registry
78+
fetches are constrained so a malicious or typo'd source cannot turn the daemon
79+
into a request-forgery vector against internal services:
80+
81+
- The URL must be `http`/`https` (add-source/edit require `https`).
82+
- A source whose host is a **literal IP in a non-routable range** — loopback
83+
(`127.0.0.0/8`, `::1`), RFC1918 private (`10/8`, `172.16/12`, `192.168/16`),
84+
IPv6 unique-local (`fc00::/7`), RFC6598 CGNAT (`100.64/10`), or link-local
85+
including the cloud metadata endpoint `169.254.169.254` — is **rejected up
86+
front** with `invalid_registry_url`.
87+
- Hostname sources are allowed at add time and re-checked before every fetch by
88+
an **application-layer resolution check** (resolve the target host, reject if
89+
any resolved IP is non-routable) *and* at **dial time** (validate the actual
90+
IP before connect). The dial check alone is bypassable when an
91+
`HTTP_PROXY`/`HTTPS_PROXY` is configured — the transport then dials the proxy,
92+
not the target — so the application-layer check is what holds in the proxy
93+
case; the dial check remains defense-in-depth for the direct path and DNS
94+
rebinding. The official protocol's cursor-follow pagination is also pinned to
95+
the configured host so a hostile `nextCursor` cannot redirect the request.
96+
7797
Equivalent surfaces:
7898

7999
- **REST:** `POST /api/v1/registries` with `{ "url": "https://…", "protocol": "…", "id": "…", "name": "…" }`.

internal/config/config.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,15 @@ type Config struct {
177177
// beyond the add-source rejection.
178178
RegistriesLocked bool `json:"registries_locked,omitempty" mapstructure:"registries-locked"`
179179

180+
// AllowPrivateRegistryFetch opts out of the registry SSRF guard (MCP-1076,
181+
// CWE-918). By default (false) registry fetches refuse any host that is — or
182+
// resolves to — a non-routable address (loopback, RFC1918/CGNAT private,
183+
// link-local incl. the 169.254.169.254 cloud-metadata endpoint), so a
184+
// malicious or typo'd registry source cannot turn the daemon into a
185+
// request-forgery vector against internal services. Set true ONLY when you
186+
// intentionally run a trusted registry mirror on an internal/private address.
187+
AllowPrivateRegistryFetch bool `json:"allow_private_registry_fetch,omitempty" mapstructure:"allow-private-registry-fetch"`
188+
180189
// Deprecated: Features flags are unused and have no runtime effect. Kept for backward compatibility.
181190
Features *FeatureFlags `json:"features,omitempty" mapstructure:"features"`
182191

internal/registries/httpclient.go

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import (
44
"context"
55
"fmt"
66
"io"
7+
"net"
78
"net/http"
89
"net/url"
910
"strings"
@@ -50,12 +51,21 @@ func sharedRegistryClient() *http.Client {
5051
return registryHTTPClient
5152
}
5253

53-
// buildRegistryClient constructs the tuned registry HTTP client.
54+
// buildRegistryClient constructs the tuned registry HTTP client. Its dialer
55+
// carries the SSRF Control hook (registryDialControl), so every connection is
56+
// checked against the blocked-range policy at the moment it dials the resolved
57+
// IP — the authoritative CWE-918 guard that also closes the DNS-rebinding window
58+
// a parse-time host check alone cannot.
5459
func buildRegistryClient() *http.Client {
5560
transport := http.DefaultTransport.(*http.Transport).Clone()
5661
transport.MaxIdleConns = 20
5762
transport.MaxIdleConnsPerHost = 10
5863
transport.IdleConnTimeout = 90 * time.Second
64+
transport.DialContext = (&net.Dialer{
65+
Timeout: 30 * time.Second,
66+
KeepAlive: 30 * time.Second,
67+
Control: registryDialControl,
68+
}).DialContext
5969
return &http.Client{
6070
Timeout: registryRequestTimeout,
6171
Transport: transport,
@@ -82,6 +92,14 @@ func registryGet(ctx context.Context, reg *RegistryEntry, reqURL string) ([]byte
8292
return nil, err
8393
}
8494

95+
// Application-layer SSRF guard: resolve the target host and reject if it
96+
// lands in a blocked range. This holds even with an HTTP(S)_PROXY set, where
97+
// the dialer connects to the proxy and the dial-time Control never sees the
98+
// real target host (CodeQL go/request-forgery — proxy bypass).
99+
if err := guardRegistryTargetHost(ctx, safeURL); err != nil {
100+
return nil, err
101+
}
102+
85103
client := sharedRegistryClient()
86104

87105
var lastErr error
@@ -180,6 +198,14 @@ func validateRegistryURL(reqURL string, reg *RegistryEntry) (string, error) {
180198
if u.Host == "" {
181199
return "", fmt.Errorf("registry request URL has no host")
182200
}
201+
// SSRF pre-flight (CWE-918): reject a literal-IP host in a blocked
202+
// (loopback/private/link-local/metadata) range before the request is built.
203+
// This is defense-in-depth alongside the authoritative dial-time guard
204+
// (registryDialControl), which also covers hostnames resolving into those
205+
// ranges. Relaxed by the allow_private_registry_fetch config opt-in.
206+
if err := hostLiteralBlocked(u.Host, registryAllowPrivateFetch.Load()); err != nil {
207+
return "", err
208+
}
183209
// Pin to the configured registry host so a tainted cursor/path cannot
184210
// redirect the request elsewhere.
185211
if reg != nil && reg.ServersURL != "" {

internal/registries/registry_data.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,11 @@ func SetRegistriesFromConfig(cfg *config.Config) {
5959
}
6060

6161
registryList = merged
62+
63+
// Propagate the SSRF allow-policy (MCP-1076): off by default, opt-in via the
64+
// user's allow_private_registry_fetch flag. Done here so every config load /
65+
// hot-reload keeps the dial-time guard in sync with current config.
66+
SetAllowPrivateRegistryFetch(cfg != nil && cfg.AllowPrivateRegistryFetch)
6267
}
6368

6469
// IsTrusted reports whether this is an official, shipped-by-default registry.

internal/registries/ssrf.go

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
package registries
2+
3+
import (
4+
"context"
5+
"errors"
6+
"fmt"
7+
"net"
8+
"net/url"
9+
"strings"
10+
"sync/atomic"
11+
"syscall"
12+
)
13+
14+
// registryResolveHost resolves a hostname to its IPs. A package var so tests can
15+
// simulate a hostname resolving into a blocked range without real DNS.
16+
var registryResolveHost = func(ctx context.Context, host string) ([]net.IP, error) {
17+
addrs, err := net.DefaultResolver.LookupIPAddr(ctx, host)
18+
if err != nil {
19+
return nil, err
20+
}
21+
ips := make([]net.IP, len(addrs))
22+
for i := range addrs {
23+
ips[i] = addrs[i].IP
24+
}
25+
return ips, nil
26+
}
27+
28+
// ErrBlockedRegistryHost is returned when a registry URL targets — or resolves
29+
// to — a non-routable / internal address. It bounds CWE-918 (request forgery):
30+
// a malicious or typo'd registry source must never let the daemon fetch from
31+
// loopback, RFC1918/CGNAT private space, link-local (incl. the
32+
// 169.254.169.254 cloud-metadata endpoint), or other internal ranges.
33+
var ErrBlockedRegistryHost = errors.New("registry host is not allowed (internal/non-routable address)")
34+
35+
// registryAllowPrivateFetch relaxes the SSRF guard so a fetch may reach
36+
// loopback/private targets. It is OFF by default (secure) and set from the
37+
// user's `allow_private_registry_fetch` config flag by SetRegistriesFromConfig
38+
// — the opt-in allow-policy for operators who run a trusted registry mirror on
39+
// an internal/private address. atomic.Bool because it is written on config
40+
// (re)load while concurrent fetches read it on the dial path. The registries
41+
// test binary also flips it on (httptest servers bind 127.0.0.1).
42+
var registryAllowPrivateFetch atomic.Bool
43+
44+
// testForceAllowPrivate pins the guard open regardless of config. It is set ONLY
45+
// by the registries test binary (httptest servers bind loopback) so a fetch
46+
// still works across a SetRegistriesFromConfig(defaultConfig) call that would
47+
// otherwise reset the flag to false. Always false in production builds.
48+
var testForceAllowPrivate atomic.Bool
49+
50+
// SetAllowPrivateRegistryFetch sets the SSRF allow-policy from config. Exposed so
51+
// SetRegistriesFromConfig (and only it) can propagate the user's flag. The
52+
// test-force override keeps loopback fetches working in the test binary.
53+
func SetAllowPrivateRegistryFetch(allow bool) {
54+
registryAllowPrivateFetch.Store(allow || testForceAllowPrivate.Load())
55+
}
56+
57+
// isBlockedIP reports whether ip falls in a range a registry fetch must never
58+
// reach. This is the single predicate behind both the pre-flight URL check and
59+
// the dial-time Control guard, so the policy lives in exactly one place.
60+
//
61+
// Blocked: loopback, RFC1918 private (10/8, 172.16/12, 192.168/16), IPv6
62+
// unique-local (fc00::/7, via IsPrivate), RFC6598 CGNAT (100.64/10), link-local
63+
// unicast (169.254/16, fe80::/10 — covers the cloud metadata endpoint),
64+
// link-local & interface-local multicast, any other multicast, and the
65+
// unspecified address. A nil/unparseable IP fails closed (blocked).
66+
func isBlockedIP(ip net.IP) bool {
67+
if ip == nil {
68+
return true // fail closed: an address we can't reason about is not safe
69+
}
70+
// RFC6598 carrier-grade NAT (100.64.0.0/10) is not covered by IsPrivate.
71+
if ip4 := ip.To4(); ip4 != nil {
72+
if ip4[0] == 100 && ip4[1] >= 64 && ip4[1] <= 127 {
73+
return true
74+
}
75+
}
76+
return ip.IsLoopback() ||
77+
ip.IsPrivate() ||
78+
ip.IsLinkLocalUnicast() ||
79+
ip.IsLinkLocalMulticast() ||
80+
ip.IsInterfaceLocalMulticast() ||
81+
ip.IsMulticast() ||
82+
ip.IsUnspecified()
83+
}
84+
85+
// hostLiteralBlocked returns a non-nil error if host is a LITERAL IP in a blocked
86+
// range. host may be a bare host, host:port, or a bracketed IPv6 literal. A
87+
// hostname (not an IP literal) returns nil here — hostnames are validated
88+
// authoritatively at dial time (registryDialControl), which also defeats
89+
// DNS-rebinding TOCTOU. allowPrivate (the config opt-in) short-circuits to nil.
90+
func hostLiteralBlocked(host string, allowPrivate bool) error {
91+
if allowPrivate {
92+
return nil
93+
}
94+
h := host
95+
if hh, _, err := net.SplitHostPort(host); err == nil {
96+
h = hh
97+
}
98+
h = strings.Trim(h, "[]")
99+
ip := net.ParseIP(h)
100+
if ip == nil {
101+
return nil // not a literal IP — defer to the dial-time guard
102+
}
103+
if isBlockedIP(ip) {
104+
return fmt.Errorf("%w: %s", ErrBlockedRegistryHost, ip)
105+
}
106+
return nil
107+
}
108+
109+
// ValidateRegistrySourceURL is the add-source / edit-source fail-fast: it rejects
110+
// a user-supplied registry URL whose host is a literal IP in a blocked range, so
111+
// `registry add-source https://169.254.169.254/...` is refused up front with a
112+
// clear error instead of failing later at fetch time. It performs NO DNS lookup
113+
// (keeping add/edit pure and offline) — hostname sources pass here and are
114+
// guarded authoritatively when the daemon actually dials them.
115+
func ValidateRegistrySourceURL(rawURL string) error {
116+
u, err := url.Parse(strings.TrimSpace(rawURL))
117+
if err != nil {
118+
return fmt.Errorf("invalid registry URL: %w", err)
119+
}
120+
return hostLiteralBlocked(u.Host, registryAllowPrivateFetch.Load())
121+
}
122+
123+
// registryDialControl is the authoritative SSRF guard. It is wired as the
124+
// net.Dialer Control hook on the shared registry HTTP client, so it runs with
125+
// the ACTUAL resolved address the connection is about to dial — after DNS
126+
// resolution and before connect. This catches hostnames that resolve into
127+
// blocked ranges and closes the DNS-rebinding TOCTOU window that a parse-time
128+
// check alone leaves open.
129+
func registryDialControl(_, address string, _ syscall.RawConn) error {
130+
if registryAllowPrivateFetch.Load() {
131+
return nil
132+
}
133+
host, _, err := net.SplitHostPort(address)
134+
if err != nil {
135+
host = address
136+
}
137+
ip := net.ParseIP(host)
138+
if isBlockedIP(ip) {
139+
return fmt.Errorf("%w: %s", ErrBlockedRegistryHost, address)
140+
}
141+
return nil
142+
}
143+
144+
// guardRegistryTargetHost is the application-layer SSRF guard: it resolves the
145+
// registry TARGET host and rejects the fetch if ANY resolved address is in a
146+
// blocked range. Unlike registryDialControl, it runs BEFORE the request and is
147+
// independent of the transport, so it holds even when an HTTP(S)_PROXY is set —
148+
// in which case the dialer connects to the proxy and the dial-time Control only
149+
// ever sees the proxy's IP, never the real target (the proxy resolves the host
150+
// itself). The dial-time guard remains as defense-in-depth for the direct
151+
// (no-proxy) path. Caveats: a literal-IP host is already covered by
152+
// validateRegistryURL; a DNS lookup failure is left to the request to surface
153+
// (fail-open on resolver error so a flaky resolver does not break every fetch —
154+
// the dial guard still covers the no-proxy path). Relaxed by the
155+
// allow_private_registry_fetch opt-in.
156+
func guardRegistryTargetHost(ctx context.Context, reqURL string) error {
157+
if registryAllowPrivateFetch.Load() {
158+
return nil
159+
}
160+
u, err := url.Parse(reqURL)
161+
if err != nil {
162+
return fmt.Errorf("invalid request URL: %w", err)
163+
}
164+
host := u.Hostname() // strips the port and unbrackets an IPv6 literal
165+
if host == "" {
166+
return nil
167+
}
168+
if ip := net.ParseIP(host); ip != nil {
169+
// Literal IP — already validated pre-flight; re-check defensively.
170+
if isBlockedIP(ip) {
171+
return fmt.Errorf("%w: %s", ErrBlockedRegistryHost, ip)
172+
}
173+
return nil
174+
}
175+
ips, err := registryResolveHost(ctx, host)
176+
if err != nil {
177+
// Resolution failed — let the request itself surface the failure.
178+
return nil //nolint:nilerr // fail-open on lookup error; dial guard covers no-proxy
179+
}
180+
for _, ip := range ips {
181+
if isBlockedIP(ip) {
182+
return fmt.Errorf("%w: host %q resolves to %s", ErrBlockedRegistryHost, host, ip)
183+
}
184+
}
185+
return nil
186+
}

0 commit comments

Comments
 (0)