Skip to content

Commit 8a0365e

Browse files
aron-muonclaude
andcommitted
Update bug report with root cause and fix summary
Two issues caused MCPRemoteProxy to fail connecting to upstream: X-Forwarded-Host redirect loop and missing redirect following. Both are now fixed. Document the investigation timeline and remaining known issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 31d868a commit 8a0365e

File tree

1 file changed

+60
-178
lines changed

1 file changed

+60
-178
lines changed

BUG_FIX_ISSUE.md

Lines changed: 60 additions & 178 deletions
Original file line numberDiff line numberDiff line change
@@ -1,213 +1,95 @@
1-
## Issue: MCPRemoteProxy fails to forward requests to upstream MCP servers returning 406
1+
## Issue: MCPRemoteProxy fails to connect to upstream MCP servers
22

33
### Summary
44

55
MCPRemoteProxy with embedded auth server (case 2: client auth, no upstream auth) successfully
6-
authenticates users via Okta but fails to establish an MCP connection because the upstream
7-
server rejects all proxied requests with HTTP 406 (Not Acceptable).
6+
authenticates users via Okta but fails to establish an MCP connection to the upstream server.
87

98
### Environment
109

1110
- ToolHive operator and proxyrunner from commit `3b834c81` (aron-muon/toolhive fork)
1211
- Upstream MCP: `https://api.gov.nominal.io/docs/mcp` (Nominal documentation MCP, public, no auth required)
12+
- Proxy: `nominal-docs.mcp.staging.muonspace.com` (MCPRemoteProxy)
1313
- Transport: `streamable-http`
1414
- Auth: Embedded auth server with `disableUpstreamTokenInjection: true`, Okta OIDC upstream
1515

16-
### Root Cause (Hypothesized)
16+
### Root Cause
1717

18-
The upstream MCP server requires the `Accept: application/json, text/event-stream` header on
19-
all POST requests (per the MCP streamable-http spec). Without it, the server returns:
18+
**Two issues combined** to prevent the proxy from reaching the upstream MCP server:
2019

21-
```json
22-
{"jsonrpc":"2.0","id":"server-error","error":{"code":-32600,"message":"Not Acceptable: Client must accept both application/json and text/event-stream"}}
23-
```
24-
HTTP 406.
25-
26-
Direct curl to the upstream with the correct `Accept` header succeeds (HTTP 200 with valid
27-
MCP response). Direct curl without it returns 406.
28-
29-
### What We Tried
30-
31-
1. **`headerForward.addPlaintextHeaders`** — configured `Accept: application/json, text/event-stream`
32-
on the MCPRemoteProxy spec. The middleware is registered (`"header forward middleware
33-
configured","headers":"Accept"`) and applied (`"applied middleware","name":"header-forward"`),
34-
but the upstream still returns 406.
35-
36-
2. **Middleware ordering verification** — the `header-forward` middleware is outermost (added
37-
last in `PopulateMiddlewareConfigs`, applied in reverse order), so it should set the `Accept`
38-
header before any other middleware processes the request. The Go `httputil.ReverseProxy` with
39-
`Rewrite` creates `pr.Out` as a shallow clone of `pr.In`, so headers set by middleware should
40-
propagate to the outbound request.
41-
42-
### Observed Behavior
43-
44-
- Proxy logs show `"applied middleware","name":"strip-auth"` and `"applied middleware","name":"header-forward"` — both are active.
45-
- Authenticated client requests (Claude Code via Okta OAuth) pass through auth/authz successfully — audit events show `outcome: success` with user identity.
46-
- All initialize requests complete in `duration_ms: 2` — far too fast for a real upstream round-trip, indicating the upstream rejects immediately.
47-
- The proxy's internal health check returns `status_code: 401` because it hits the OIDC middleware without a token (separate known issue, non-blocking).
48-
- After 155 attempts over 5 minutes, the proxy logs `"initialize not successful, but continuing"`.
49-
- Claude Code reports `Failed to reconnect to nominal-docs-staging`.
50-
51-
### Expected Behavior
52-
53-
The proxy should forward the `Accept: application/json, text/event-stream` header (either from
54-
the client's original request or from `headerForward` configuration) to the upstream, and the
55-
upstream should respond with HTTP 200 containing a valid MCP initialize response.
56-
57-
---
58-
59-
### Investigation Results
60-
61-
Exhaustive code-level analysis of the entire request path was performed. Findings below.
62-
63-
#### 1. Accept header DOES survive the reverse proxy (code is correct)
64-
65-
Traced the full path from middleware to upstream:
66-
67-
1. `pkg/transport/middleware/header_forward.go:130-134` — middleware calls
68-
`r.Header.Set("Accept", "application/json, text/event-stream")` on the inbound request.
69-
2. Go's `httputil.ReverseProxy.ServeHTTP` calls `outreq = req.Clone(ctx)` which calls
70-
`r.Header.Clone()` — a **deep copy** of all headers, including Accept.
71-
3. Before `Rewrite` is called, Go only strips `Forwarded`, `X-Forwarded-For`,
72-
`X-Forwarded-Host`, `X-Forwarded-Proto` — Accept is **not** stripped.
73-
4. The `Rewrite` function (`pkg/transport/proxy/transparent/transparent_proxy.go:505-545`)
74-
only modifies URL (scheme, host, path), Host header, X-Forwarded-\*, query params, and
75-
OTEL propagation headers — it **never touches Accept**.
76-
5. After `Rewrite`, Go's `removeHopByHopHeaders` only strips `Connection`, `Keep-Alive`,
77-
`Proxy-*`, `Te`, `Trailer`, `Transfer-Encoding`, `Upgrade`**not Accept**.
78-
6. `tracingTransport.RoundTrip` (`transparent_proxy.go:364-447`) only modifies `req.Host`
79-
**not Accept**.
80-
7. The `WithContext` shallow copy in Rewrite shares the same Header map (pointer), so all
81-
modifications are visible to the original `outreq` used by Go's ReverseProxy.
82-
83-
**Conclusion: No code path strips or modifies the Accept header. The header-forward middleware
84-
correctly propagates Accept through the reverse proxy to the upstream.**
20+
#### Issue 1: X-Forwarded-Host causes redirect loop (fixed in `258bd69c`)
8521

86-
#### 2. Nothing strips the Accept header
22+
The transparent proxy's `Rewrite` function called `pr.SetXForwarded()` for all upstreams,
23+
including remote third-party servers. This injected `X-Forwarded-Host: nominal-docs.mcp.staging.muonspace.com`
24+
(the proxy's own hostname) into outbound requests. The third-party upstream at
25+
`api.gov.nominal.io` used this header to construct a 307 redirect URL pointing back to the
26+
proxy, creating an infinite redirect loop:
8727

88-
The only header-stripping middleware is `stripAuthMiddleware`
89-
(`pkg/runner/middleware.go:316-327`), which calls `r.Header.Del("Authorization")` — only
90-
Authorization. Confirmed no other middleware or reverse proxy component touches Accept.
91-
92-
#### 3. MCP parser middleware does NOT intercept requests
93-
94-
`ParsingMiddleware` (`pkg/mcp/parser.go:69-105`) **always** calls `next.ServeHTTP(w, r)`. It
95-
reads the body, parses JSON-RPC, stores the parsed data in request context, and passes
96-
through. It never generates its own response or short-circuits the middleware chain.
97-
98-
#### 4. Path rewriting works correctly
99-
100-
The `Rewrite` function at `transparent_proxy.go:521-523` unconditionally replaces the outbound
101-
path with `remoteBasePath` (`/docs/mcp`). Both client paths `/mcp` and `/docs/mcp` get
102-
rewritten to `/docs/mcp`. The `WithContext` shallow copy shares the same `URL` pointer, so
103-
modifications are visible to the original `outreq`. Tests in
104-
`pkg/transport/proxy/transparent/remote_path_test.go` confirm this behavior.
105-
106-
#### 5. The "2ms / 155 attempts" observations are from the RUNNER's self-check, NOT from client requests
107-
108-
**This is the key finding that reframes the bug.**
109-
110-
At `pkg/runner/runner.go:414`, the runner calls `waitForInitializeSuccess()` which sends
111-
POST requests to **the proxy's own address** (`http://localhost:8080/docs/mcp`) at
112-
`runner.go:810-820`. These self-check requests:
113-
114-
- Have **no JWT token** (the runner doesn't authenticate with itself)
115-
- Go through the **full middleware chain**, including the OIDC auth middleware
116-
- Are immediately rejected by the auth middleware with **401 Unauthorized** (~2ms, no network)
117-
- Are retried with exponential backoff for 5 minutes (155 attempts)
118-
- Result in the `"initialize not successful, but continuing"` log at `runner.go:415`
119-
120-
The runner's self-check (`runner.go:818-820`) **does** set the correct Accept header:
121-
```go
122-
req.Header.Set("Content-Type", "application/json")
123-
req.Header.Set("Accept", "application/json, text/event-stream")
124-
req.Header.Set("MCP-Protocol-Version", "2024-11-05")
12528
```
126-
But it never reaches the upstream because the auth middleware rejects it first.
127-
128-
**The "2ms" timing and "155 attempts" are entirely explained by instant 401 rejections from
129-
the OIDC auth middleware on the runner's unauthenticated self-check requests. They do not
130-
reflect actual client-to-upstream behavior.**
131-
132-
#### 6. Actual client requests (with JWTs) should work
133-
134-
For real Claude Code requests with valid ToolHive JWTs, the flow is:
29+
Client → proxy (nominal-docs.mcp.staging.muonspace.com)
30+
→ upstream (api.gov.nominal.io)
31+
→ 307 redirect to nominal-docs.mcp.staging.muonspace.com ← loop!
32+
```
13533

136-
1. Auth middleware validates JWT — **passes** (confirmed by audit: `outcome: success`)
137-
2. `strip-auth` removes Authorization header — prevents JWT leaking to upstream
138-
3. `header-forward` sets `Accept: application/json, text/event-stream`
139-
4. Reverse proxy forwards to upstream with Accept header intact
140-
5. Upstream should receive Accept and return 200
34+
**Fix**: Skip `SetXForwarded()` when `isRemote` is true. Remote third-party upstreams don't
35+
need the proxy's internal hostname, and it causes redirect loops when the upstream constructs
36+
redirect URLs from `X-Forwarded-Host`. Local container backends still receive these headers.
14137

142-
**Based on code analysis, the header forwarding is correct for authenticated client requests.**
38+
#### Issue 2: Transparent proxy does not follow HTTP redirects (fixed in `31d868ae`)
14339

144-
### Revised Assessment
40+
After fixing the X-Forwarded-Host loop, the upstream still 307-redirected — this time a
41+
legitimate HTTPS→HTTP scheme redirect:
14542

146-
The original hypothesis ("the issue is in the HTTP header forwarding layer") appears to be
147-
**incorrect**. The code correctly forwards the Accept header for authenticated requests. The
148-
observed symptoms (2ms timing, 155 attempts, 401 status codes) are entirely explained by the
149-
runner's unauthenticated self-check hitting the OIDC middleware.
43+
```
44+
from: https://api.gov.nominal.io/docs/mcp/
45+
location: http://api.gov.nominal.io/docs/mcp
46+
```
15047

151-
The remaining failure (`Claude Code reports "Failed to reconnect"`) has a different root cause
152-
than the Accept header. Likely explanations, in order of probability:
48+
Go's `http.Transport.RoundTrip` does not follow redirects (that is `http.Client`'s job), but
49+
`httputil.ReverseProxy` uses the Transport directly. The 307 response was passed through to
50+
the MCP client, which cannot follow upstream redirects through the proxy.
15351

154-
1. **Infrastructure between the proxy pod and the upstream** — a service mesh sidecar
155-
(Istio/Envoy), egress proxy, network policy, or cloud NAT stripping or modifying the
156-
Accept header before it reaches `api.gov.nominal.io`. This is the most likely cause in a
157-
Kubernetes environment.
52+
**Fix**: Added `forwardFollowingRedirects` to `tracingTransport` that follows up to 10
53+
redirects transparently. 307/308 preserve method and body (RFC 7538); 301/302/303 change to
54+
GET (RFC 7231). Each followed redirect is logged at WARN level.
15855

159-
2. **The runner's failed self-check delays client registration** — the 5-minute
160-
`waitForInitializeSuccess` failure means the proxy never confirms upstream reachability
161-
before registering with client managers (`runner.go:425-431`). If Claude Code connects
162-
during or immediately after this window, it might experience session establishment issues
163-
unrelated to the Accept header.
56+
### Investigation Timeline
16457

165-
3. **Observation conflation** — the 406 status was observed via direct curl tests (without
166-
Accept header) during debugging, while the actual Claude Code failure may be a different
167-
issue (OAuth flow completion, session management, or reconnection logic).
58+
#### Initial hypothesis: Accept header not reaching upstream (INCORRECT)
16859

169-
### Next Steps
60+
The original report assumed the upstream was returning 406 because the `Accept` header was
61+
missing. Code analysis proved the header-forward middleware correctly propagates Accept through
62+
the reverse proxy — no code path strips or modifies it.
17063

171-
1. **Add debug logging to `tracingTransport.RoundTrip`** at
172-
`pkg/transport/proxy/transparent/transparent_proxy.go:364` to dump all outbound request
173-
headers (`req.Header`). This will definitively confirm whether Accept reaches the wire
174-
for authenticated client requests.
64+
#### Debug logging revealed the real status code
17565

176-
2. **Check for infrastructure header stripping** — inspect service mesh configs
177-
(Istio/Envoy sidecar), egress policies, and network proxies in the cluster that sit
178-
between the proxy pod and the upstream.
66+
Added outbound request and upstream response logging to `tracingTransport.RoundTrip`. This
67+
showed every upstream response was **307 Temporary Redirect**, not 406. The 406 observed
68+
during initial debugging was from direct curl tests without the Accept header — the proxy was
69+
never reaching the upstream because of the redirect issues.
17970

180-
3. **Capture actual network traffic** — use `tcpdump` or pod-level packet capture to see
181-
the exact HTTP request sent to `api.gov.nominal.io` and verify the Accept header is
182-
present on the wire.
71+
#### The "2ms / 155 attempts" were from the runner self-check
18372

184-
4. **Fix the runner self-check for OIDC-protected proxies**`waitForInitializeSuccess`
185-
should either skip when OIDC is configured, bypass the middleware chain by hitting the
186-
`/health` endpoint, or send the request directly to the upstream (like the pinger does)
187-
instead of through the proxy's own middleware. This is a separate bug that causes
188-
misleading logs and delays proxy readiness.
73+
The `waitForInitializeSuccess()` function (`runner.go:414`) sends POST requests to the
74+
proxy's own address without a JWT. The OIDC auth middleware immediately rejects these with
75+
401 (~2ms, no network round-trip). This produced the misleading "initialize not successful"
76+
logs and fast timing that was initially attributed to the upstream rejecting requests.
18977

190-
### Reproduction
78+
### Fixes Applied
19179

192-
```bash
193-
# Upstream works directly with correct Accept header:
194-
curl -s -w "\n%{http_code}" -X POST https://api.gov.nominal.io/docs/mcp \
195-
-H "Content-Type: application/json" \
196-
-H "Accept: application/json, text/event-stream" \
197-
-d '{"jsonrpc":"2.0","method":"initialize","id":1,"params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}'
198-
# Returns 200 with valid MCP response
80+
| Commit | Description |
81+
|--------|-------------|
82+
| `c1c36ba1` | Add debug logging for outbound request headers |
83+
| `d3f45069` | Add upstream response status logging |
84+
| `a187848a` | Log upstream redirect responses with Location URL |
85+
| `258bd69c` | Strip X-Forwarded-* headers for remote upstream servers |
86+
| `31d868ae` | Follow upstream HTTP redirects in transparent proxy |
19987

200-
# Upstream rejects without Accept header:
201-
curl -s -w "\n%{http_code}" -X POST https://api.gov.nominal.io/docs/mcp \
202-
-H "Content-Type: application/json" \
203-
-d '{"jsonrpc":"2.0","method":"initialize","id":1,"params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}'
204-
# Returns 406: "Not Acceptable: Client must accept both application/json and text/event-stream"
205-
```
88+
### Remaining Known Issues
20689

207-
### Impact
90+
1. **Runner self-check fails for OIDC-protected proxies**`waitForInitializeSuccess` sends
91+
unauthenticated requests through the middleware chain, always getting 401. Should skip or
92+
bypass auth for OIDC-protected proxies. Non-blocking (logs warning and continues).
20893

209-
This blocks MCPRemoteProxy use case 2 (embedded auth server + public upstream MCP) from
210-
working end-to-end. The auth changes (`disableUpstreamTokenInjection`, `strip-auth` middleware)
211-
are functioning correctly. The header forwarding code is also correct. The remaining failure
212-
is most likely environmental (infrastructure header stripping) or caused by the runner
213-
self-check bug producing misleading diagnostics.
94+
2. **Debug logging should be removed** before merging to upstream — the outbound request and
95+
response logging in `tracingTransport.RoundTrip` was added for diagnosis and is noisy.

0 commit comments

Comments
 (0)