Skip to content

Exempt the trusted-internal dispatch from edge middleware#5246

Merged
cafalchio merged 10 commits into
fix/session-affinity-inprocess-dispatchfrom
fix/internal-mcp-middleware-exempt
Jun 25, 2026
Merged

Exempt the trusted-internal dispatch from edge middleware#5246
cafalchio merged 10 commits into
fix/session-affinity-inprocess-dispatchfrom
fix/internal-mcp-middleware-exempt

Conversation

@gandhipratik203

@gandhipratik203 gandhipratik203 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #4987 (fix/session-affinity-inprocess-dispatch); #5212 has merged into that branch, so this PR's diff shows only the work below.

Summary

The session-affinity and Rust runtime paths re-enter the gateway through an in-process / loopback dispatch to /_internal/mcp/rpc (and /_internal/a2a/*). Because that dispatch goes back through the ASGI stack, the edge middleware that already ran for the originating request runs a second time on the internal hop: token scoping, rate limiting, and token-usage logging.

This PR routes the trusted-internal hop around that middleware. Every skip is gated on a single shared trust check, so the exemption is contingent on the full trust boundary and never on a URL prefix alone.

It lands in two commits:

Commit Scope
consolidate the internal-runtime trust gate into one helper One trust gate; callers delegate to it
exempt trusted-internal dispatch from edge middleware Per-middleware exemptions + CSRF config change

Architecture: before & after

The cross-worker forward and the trusted /_internal/mcp/rpc dispatch are unchanged (that landed in #5212). What changes here is what the edge middleware does when the trusted-internal hop re-enters the ASGI stack.

After #5212, the owning worker dispatches in-process to /_internal/mcp/rpc. Because that hop goes back through the full ASGI stack, every edge middleware that already ran for the originating request runs a second time on the internal replay: token scoping, rate limiting, and token-usage logging. That double-counts rate limits and writes duplicate usage rows. CSRF was sidestepped only by a static /_internal/mcp/ entry in csrf_exempt_paths, which left an externally reachable CSRF-free URL.

Before this PR (state after #5212):

edge request ──► [ edge middleware stack runs once ] ──► route handler
                                                            │
                                                            │ owning worker forwards in-process
                                                            ▼
                                POST /_internal/mcp/rpc   (loopback ASGITransport)
                                                            │
              ┌─────────────────────────────────────────────┴──────────────┐
              ▼   the SAME middleware stack runs AGAIN on the internal hop
        TokenScoping   re-scopes the already-scoped identity
        RateLimit      counts the request a SECOND time
        TokenUsage     writes a DUPLICATE usage row
        CSRF           skipped only by a static /_internal/mcp/ path entry
              │        (an externally reachable CSRF-free path)
              ▼
        route-level trust + HMAC verify + auth-context decode   ✓

After this PR:

edge request ──► [ edge middleware stack runs once ] ──► route handler
                                                            │
                                                            │ owning worker forwards in-process
                                                            │  (affinity marker + HMAC + auth-context)
                                                            ▼
                                POST /_internal/mcp/rpc   (loopback ASGITransport)
                                                            │
              ┌─────────────────────────────────────────────┴──────────────┐
              ▼   each middleware consults is_trusted_internal_mcp_request()
        TokenScoping   trusted hop ──► SKIP
        RateLimit      trusted hop ──► SKIP   (no double count)
        TokenUsage     trusted hop ──► SKIP   (no duplicate row)
        CSRF           trusted hop ──► SKIP via the gate (static path entry removed)
              │        forged HMAC / external caller ──► falls through to enforcement
              ▼
        route-level trust + HMAC verify + auth-context decode   ✓   (unchanged, still runs)

Net change:

#5212:  internal hop  →  full middleware stack re-runs            →  double work + a static CSRF hole
#5246:  internal hop  →  gate-recognized, duplicate work skipped  →  work happens once, gate-guarded

The trust gate is the single decision point. A genuine loopback forward (valid HMAC + encoded auth-context) skips the duplicated edge work, while anything that fails the gate (forged HMAC, external caller) flows through the normal middleware exactly as before. Route-level authorization is untouched and still runs on every internal request.

The trust gate

Three near-duplicate trust checks (in main, token_scoping, and the HMAC helper) are replaced by one gate in auth_context:

is_trusted_internal_runtime_request(request, *, allowed_prefixes, require_auth_context, path=None)
is_trusted_internal_mcp_request(request, *, path=None)   # MCP/A2A wrapper

A request is trusted only when all of the following hold:

path matches an allowed internal prefix      (/_internal/mcp or /_internal/a2a)
x-contextforge-mcp-runtime  in {rust, affinity}     (runtime marker)
x-contextforge-mcp-runtime-auth  is a valid HMAC    (the trust boundary)
x-contextforge-auth-context  present                (required for every route except */authenticate)
request.client.host  is loopback                    (127.0.0.1 / ::1, defense in depth)

Notes:

  • The HMAC plus the encoded auth context are the trust boundary. The loopback check is defense in depth, because ProxyHeaders(trusted_hosts="*") makes request.client.host influenced by X-Forwarded-For for genuinely external requests.
  • Auth context is path-aware: */authenticate creates the context, so it is the only internal route that does not require the header.
  • The A2A feature guard is retained: /_internal/a2a/* is never trusted when A2A is disabled.
  • token_scoping previously trusted only the rust marker. It now delegates to the shared gate and so also trusts the affinity marker, closing the gap where the in-process affinity dispatch was still token-scoped.

Middleware exemptions

Middleware Behavior on a trusted hop Why it matters
TokenScoping skipped (delegates to the gate) the internal hop is not re-scoped
RateLimit skipped the edge request was already counted; no double count
TokenUsage skipped no duplicate usage row for the in-process replay
CSRF skipped via the gate see below

CSRF change

/_internal/mcp/ is removed from the default csrf_exempt_paths. The skip is now contingent on the full trust gate rather than a static path prefix. This is strictly tighter: a path-only exemption left a CSRF-free URL reachable by any client, whereas the gate-based skip requires a valid HMAC from a loopback client. The legit loopback forward still skips CSRF; a forged external request to the same path now hits CSRF enforcement.

Loopback passthrough hardening

_LOOPBACK_SKIP_HEADERS is broadened to strip forwarded and client-IP headers (forwarded, x-forwarded-*, x-real-ip, cf-connecting-ip, true-client-ip) so the in-process replay cannot carry a spoofable client address into the internal hop.

What is intentionally NOT exempted

Route-level enforcement is unchanged and still runs on every internal request: the route-level trust check, the forwarded-user build, the internal-request authorization, the server-scope enforcement, method authorization, the HMAC verification, and the auth-context decode. The middleware exemptions remove duplicated edge work; they do not relax the route's own authorization.

Testing

Unit

  • New test_internal_runtime_trust.py: allow and deny for the gate (including the HMAC-is-the-boundary and X-Forwarded-For cases), the affinity marker, A2A enable/disable, the prefix allowlist, token_scoping trusting affinity, and the forwarded-header stripping.
  • Per middleware (RateLimit, TokenUsage, CSRF): an allow test (trusted hop is exempted) and a deny test (a forged HMAC fails the gate and the normal enforcement path runs).

Live verification (3 replicas × 24 workers, affinity on)

Built an image from this branch and ran the multi-worker session-affinity stack: make docker (tags mcpgateway/mcpgateway:latest), then make testing-up (3 replicas × 24 gunicorn workers, USE_STATEFUL_SESSIONS=true, MCPGATEWAY_SESSION_AFFINITY_ENABLED=true, CACHE_TYPE=redis, behind nginx on :8080). Requests round-robin through nginx across the 72 workers with no sticky load balancing, so they land on non-owner workers and must be forwarded to the owner — exactly the in-process /_internal/mcp/rpc hop this PR exempts from duplicate edge middleware.

Setup — counter server, registration, and the session driver

Minimal per-session counter (streamable HTTP). State is keyed by the upstream session, so a scatter onto a fresh connection would show a reset counter — which is exactly what the test detects:

# counter_server.py — listens on :9400, reachable from the gateway containers via host.docker.internal
from mcp.server.fastmcp import Context, FastMCP

mcp = FastMCP("repro-counter", host="0.0.0.0", port=9400)
_counters: dict[int, int] = {}   # keyed by the upstream ServerSession identity

@mcp.tool()
def increment(ctx: Context) -> int:
    key = id(ctx.session)
    _counters[key] = _counters.get(key, 0) + 1
    return _counters[key]

@mcp.tool()
def get_value(ctx: Context) -> int:
    return _counters.get(id(ctx.session), 0)

if __name__ == "__main__":
    mcp.run(transport="streamable-http")

Mint an admin token, register the counter as a gateway (discovery runs against the live counter), and scope its tools into a virtual server:

export TOKEN=$(python -m mcpgateway.utils.create_jwt_token --username admin@example.com --exp 1800)

python counter_server.py        # listens on :9400

# register the gateway
curl -sX POST http://localhost:8080/gateways -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"name":"repro-counter","url":"http://host.docker.internal:9400/mcp","transport":"STREAMABLEHTTP"}'

# discover the tool ids (the list field is `gatewayId` / `gatewaySlug`)
curl -s http://localhost:8080/tools -H "Authorization: Bearer $TOKEN" \
  | jq '.[] | select(.gatewaySlug=="repro-counter") | {id, name}'

# scope them into a virtual server (note the `server` wrapper)
curl -sX POST http://localhost:8080/servers -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"server":{"name":"repro-counter-vs","associated_tools":["<increment id>","<get_value id>"]}}'

Each test drives sessions against /servers/{id}/mcp:

def run_session(url, token, n):
    H = {"Authorization": f"Bearer {token}", "Content-Type": "application/json",
         "Accept": "application/json, text/event-stream"}
    c = httpx.Client(timeout=30, headers=H)
    r = c.post(url, json={"jsonrpc":"2.0","id":1,"method":"initialize","params":{
        "protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"t","version":"1.0"}}})
    c.headers["Mcp-Session-Id"] = r.headers["mcp-session-id"]   # sessionful -> real session id
    c.post(url, json={"jsonrpc":"2.0","method":"notifications/initialized"})
    inc = [call(c, url, "repro-counter-increment") for _ in range(n)]   # tools/call x n
    return c.headers["Mcp-Session-Id"], inc, call(c, url, "repro-counter-get-value")
T1 — single session, 25 increments (PASS)

Steps: one session; increment ×25, then get_value.

Result:

gateway 14bbdee9... reachable=True  tools=[increment, get_value]
session   : 758fc29b82ca4ba2a5fa95a393e10a0e
increments: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
get_value : 25
PASS

Requests round-robin through nginx across the 3 × 24 replicas, so they hit non-owner workers and forward to the owner; the strictly monotonic counter proves the forward reused the single bound upstream session (no scatter). Exempting the edge middleware on the internal hop did not break per-session isolation.

T2 — owner-worker kill / recovery (PASS)

Steps: bind a session and increment ×5; read its owner from Redis (mcpgw:pool_owner:<sid> = host:pid); kill -9 the owning gunicorn worker; hit the stale session id; then initialize a fresh session.

Result:

[1] bound session b690091d... | increments [1, 2, 3, 4, 5]
[2] pool_owner -> 794b607a573e:70
[3] owner worker pid 70 in container mcp-context-forge-gateway-3
[4] kill -9 70
[5] stale-sid request -> HTTP 404 | {"code": -32600, "message": "Session not found"}
[6] fresh initialize 59602cd0... | increments [1, 2, 3]
PASS

When the owning worker dies, the in-memory upstream is gone; a request on the stale id returns a clean structured 404 / JSON-RPC -32600 "Session not found" (not a hang or 5xx), and a fresh initialize binds and works. The middleware exemptions do not change this recovery contract.

T3 — exemption is gated (PASS)

Steps: from outside the trust boundary, POST to the internal endpoints with missing or forged trust headers. Every case must be rejected — there is no blanket path exemption.

Result:

no headers                  -> HTTP 403  CSRF_TOKEN_INVALID
forged HMAC + XFF loopback  -> HTTP 403  CSRF_TOKEN_INVALID
valid bearer, no HMAC       -> HTTP 403  "Internal MCP dispatch is only available ..."
a2a internal endpoint       -> HTTP 403  CSRF_TOKEN_INVALID
PASS

A genuine loopback forward (valid HMAC + auth context) and a forged external request take the same /_internal/mcp/rpc path with opposite outcomes (200 vs 403), decided only by the trust gate. The forged-no-bearer case being rejected by CSRF confirms the path is no longer blanket-exempt.

Throughput

make benchmark-mcp-tools (125 users, 60s) × 3 runs

Tools-only benchmark against the same stack, three back-to-back 60s runs:

Run RPS Failures
1 525.68 0
2 513.58 0
3 476.67 1 (0.00%)

About 90,700 forwarded requests with a single failure, well above the broken-affinity baseline (~15 RPS). The exemptions only remove work on the internal hop, so they do not regress throughput. Absolute RPS is host-load dependent; these are indicative runs on a local colima VM.

Compatibility

  • No new configuration. The CSRF behavior for the internal dispatch moves from a static path entry to the gate-based skip; the net effect for a legit loopback forward is unchanged.
  • No change to external request handling: a request that does not satisfy the full trust gate goes through the normal middleware exactly as before.

…one helper

Replace the three near-duplicate trust checks (main, token_scoping, and the HMAC
helper) with a single generic gate in auth_context: is_trusted_internal_runtime_request
plus a path-aware MCP/A2A wrapper. main and token_scoping delegate to it; token_scoping
gains the 'affinity' marker (previously rust-only), closing the gap where the in-process
session-affinity dispatch was still token-scoped.

- Path-aware auth context: required for every internal route except */authenticate.
- A2A feature guard retained for /_internal/a2a/*.
- HMAC + encoded auth-context are the trust boundary; loopback is defense-in-depth,
  documented as such under ProxyHeaders(trusted_hosts=*).
- Strip forwarded / client-IP headers from loopback passthrough so the in-process
  replay cannot carry a spoofable client address.
- Remove the now-dead duplicate statics, constants, and imports from token_scoping.

Tests cover allow/deny (incl. HMAC-as-boundary and XFF), affinity, A2A enable/disable,
prefix allowlist, token_scoping affinity, and forwarded-header stripping.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ware

Route the in-process and cross-worker internal dispatch (/_internal/mcp/* and
/_internal/a2a/*) around edge middleware that already ran for the originating
request. Each skip is gated on the shared trust gate (loopback + HMAC + runtime
marker + auth context), not a path prefix:

- RateLimit: the internal hop is not counted a second time.
- TokenUsage: the internal hop does not log a duplicate usage row.
- HttpAuth: HTTP auth plugin hooks are not re-fired on the replay.
- CSRF: the skip is contingent on the full trust gate; drop the static
  /_internal/mcp/ entry from csrf_exempt_paths so no externally reachable
  path is left CSRF-free.

TokenScoping already delegates to the same gate. Each middleware gains an allow
test (trusted hop is exempted) and a deny test (a forged HMAC fails the gate and
the normal enforcement path runs).

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Draft to reuse when opening the stacked PR; not part of the runtime change.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
The draft is moved into the pull request body; it does not belong in
the tracked tree.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Base automatically changed from fix/session-affinity-auth-context to fix/session-affinity-inprocess-dispatch June 16, 2026 14:47
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Keep the trusted-internal middleware exemptions scoped to the cases that
produce wrong state on every deployment: rate-limit double counting,
duplicate token-usage rows, token-scoping rejecting scoped tokens, plus
the CSRF gate change. HttpAuthMiddleware is left running on the internal
hop; with no HTTP-auth plugin it is already a no-op there, so removing the
skip is a no-op for the default configuration.

Remove the is_trusted_internal_mcp_request import and skip block from
HttpAuthMiddleware, and drop the two exemption tests and their helper.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Drop the trusted-internal CSRF exemption added in 76acff1 so
/_internal/mcp/* is no longer CSRF-exempt. Remove the trust-gate skip
and its import from CSRFMiddleware, and drop the two trust-gate tests
that asserted the exemption.

The /_internal/mcp/ entry stays out of csrf_exempt_paths, so the hop is
genuinely enforced rather than falling back to a path-prefix skip.
Bearer-authenticated internal hops still pass via the existing bearer
short-circuit; non-bearer (OAuth/public-only) dispatch now requires CSRF.

The trust-gate exemption is preserved in 76acff1 and can be restored
with: git show 76acff1 -- mcpgateway/middleware/csrf_middleware.py

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
@cafalchio cafalchio merged commit 593913c into fix/session-affinity-inprocess-dispatch Jun 25, 2026
1 check passed
@cafalchio cafalchio deleted the fix/internal-mcp-middleware-exempt branch June 25, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants