Skip to content

Commit 03e8ace

Browse files
authored
Merge pull request #235 from davidruzicka/feat/upstream-mcp-proxy-session-and-tool-discovery
feat: upstream MCP proxy - session foundation and tool discovery (Phase 1 + Phase 2)
2 parents b4d8716 + cdce409 commit 03e8ace

96 files changed

Lines changed: 17779 additions & 383 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/skills/auto-update-skills/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: auto-update-skills
3-
description: Propose creating a new skill or update existing after a correction reveals reusable existing knowledge, tools, policies, or preferred communication style. Trigger immediately for critical issues and on repetition for trivial patterns.
3+
description: After correction/feedback: propose new skill or update existing to capture reusable pattern. Covers knowledge, tools, policies, preferred style. Trigger immediately for critical issues, on repetition for trivial patterns.
44
---
55

66
## Goal

.cursorrules

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# lean-ctx — Context Engineering Layer
2+
3+
PREFER lean-ctx MCP tools over native equivalents for token savings:
4+
5+
| PREFER | OVER | Why |
6+
|--------|------|-----|
7+
| `ctx_read(path)` | `Read` | Cached, 8 compression modes |
8+
| `ctx_shell(command)` | `Shell` | Pattern compression |
9+
| `ctx_search(pattern, path)` | `Grep` | Compact results |
10+
| `ctx_tree(path, depth)` | `ls` / `find` | Directory maps |
11+
| `ctx_edit(path, old_string, new_string)` | `Edit` (when Read unavailable) | Search-and-replace without native Read |
12+
13+
Edit files: use native Edit/StrReplace if available. If Edit requires Read and Read is unavailable, use ctx_edit.
14+
Write, Delete, Glob — use normally. NEVER loop on Edit failures — switch to ctx_edit immediately.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ dist/
77
coverage/
88
junit.xml
99
tmp/
10+
.claude/settings.local.json

.planning/PROJECT.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# mcp4openapi - Enterprise MCP Gateway
2+
3+
## What This Is
4+
5+
A centralized, enterprise-grade MCP gateway that acts as the single front door for all AI clients
6+
inside the company to reach upstream MCP servers. It authenticates clients via SSO/OIDC or API
7+
keys, enforces team-level tool access policies, and forwards tool calls to upstream remote HTTP MCP
8+
servers (internal services and third-party SaaS) using credentials supplied by the client at session
9+
initialization - the gateway itself stores no upstream secrets.
10+
11+
Built on top of the existing mcp4openapi server, extending it from an OpenAPI-to-MCP adapter into a
12+
full MCP proxy/gate.
13+
14+
## Core Value
15+
16+
A security boundary between internal AI clients and all upstream MCP servers: one place to
17+
authenticate, authorize, audit, and proxy every tool call in the company.
18+
19+
## Requirements
20+
21+
### Validated
22+
23+
- Existing capabilities already shipped and working:
24+
- ✓ MCP server over HTTP (SSE, sessions, MCP spec 2025-03-26) - existing
25+
- ✓ Profile-driven configuration with Zod-validated schemas - existing
26+
- ✓ OpenAPI-backed tool generation from REST APIs - existing
27+
- ✓ OAuth 2.0 provider (PKCE, DCR, token exchange) - existing
28+
- ✓ Multi-auth support (bearer, query, custom header, OAuth) - existing
29+
- ✓ Multi-tenant HTTP transport with session isolation - existing
30+
- ✓ Rate limiting, SSRF protection, token redaction - existing
31+
- ✓ Prometheus metrics emission (prom-client) - existing
32+
- ✓ Upstream MCP provider config schema (UpstreamMcpProvider type, Zod schemas) - existing (PR #219)
33+
34+
### Validated
35+
36+
- ✓ Upstream session lifecycle (Phase 01) - per-session `UpstreamConnectionManager` with lazy connect, concurrent-safe `getOrConnect`, heartbeat pings, and session-scoped cleanup wired into HTTP transport destruction lifecycle
37+
- ✓ Pass-through credential forwarding (Phase 01) - client-supplied Bearer token forwarded directly to upstream; profile-per-upstream model; no credential storage on gateway; `validateCredentials` with SSRF-protected `validation_endpoint` for early auth validation
38+
- ✓ Auth redaction hardening (Phase 01) - `sanitizeAuthErrorMessage` preserves last-4 Bearer suffix for debuggability; `redactString` fully redacts; token never appears in logs or error responses
39+
40+
### Active
41+
- [ ] Upstream tool discovery and proxy - tools/list and tools/call forwarded to correct upstream
42+
provider; upstream tools appear in tools/list alongside (or instead of) OpenAPI-backed tools
43+
- [ ] Tool namespacing - upstream tool names prefixed/namespaced to prevent collisions across
44+
providers (#215)
45+
- [ ] Team-level allow/deny policy - each client identity (team/API key/SSO principal) maps to a
46+
policy that allows or denies specific upstream servers and/or tool names (#216)
47+
- [ ] Client authentication gate - SSO/OIDC (Entra ID / Okta / Keycloak) for interactive clients;
48+
API keys for M2M; identity resolved before any tool call is processed
49+
- [ ] Upstream notification forwarding - tools/list_changed and other server-initiated upstream
50+
notifications forwarded to downstream SSE clients with replay on reconnect (#214)
51+
- [ ] Audit log - structured persistent log of every tool call: client identity, team, tool name,
52+
upstream server, outcome, timestamp
53+
- [ ] Request tracing - OpenTelemetry trace context propagated through gateway and forwarded to
54+
upstream where possible
55+
- [ ] Third-party SaaS MCP proxy - remote HTTP MCP endpoints for services like GitHub, Slack, etc.
56+
supported through the same upstream provider config model
57+
- [ ] End-to-end documentation and test coverage for proxy mode (#218)
58+
59+
### Out of Scope
60+
61+
- Stdio upstream MCP processes - execution boundary undefined, risk of process isolation issues;
62+
deferred to a later phase behind an explicit feature gate (#217)
63+
- Server-side upstream credential storage - pass-through model replaces the need; vault integration
64+
adds complexity without benefit given the chosen auth model
65+
- Attribute-based access control (ABAC) - team-level allow/deny covers v1 needs; ABAC adds
66+
authoring overhead before any team has adopted the gateway
67+
- Public internet exposure - on-prem/private cloud deployment only; no multi-cloud SaaS distribution
68+
in scope
69+
70+
## Context
71+
72+
- **Existing codebase:** mcp4openapi is a TypeScript/Node.js MCP server (Express, MCP SDK 1.26.0,
73+
jose for JWT, Zod for schema validation). The HTTP transport already handles SSE sessions,
74+
multi-tenancy, OAuth provider, and interceptor chains (auth -> rate-limit -> retry -> fetch).
75+
- **Tracking issue:** davidruzicka/mcp4openapi#211 groups the full MCP proxy roadmap. Issues
76+
#213-#218 map directly to the active requirements above. #212 (upstream config schema) is done.
77+
- **Deployment target:** On-prem / private cloud. No public internet exposure. Docker/Kubernetes
78+
packaging assumed.
79+
- **Client auth model:** SSO/OIDC tokens from the company IdP (Entra ID, Okta, Keycloak) for
80+
interactive users; API keys for machine-to-machine. Both paths must resolve to a team identity
81+
before policy is checked.
82+
- **Upstream auth model:** Pass-through. Clients supply their own upstream credentials at HTTP
83+
session initialization. The gateway extracts and stores them in the session context, then forwards
84+
them on each upstream call. No credential storage or rotation responsibility on the gateway.
85+
- **Security posture:** SSRF protection already in place. Token redaction in logs. Trust boundaries:
86+
inbound client auth and upstream auth are fully separate layers.
87+
88+
## Constraints
89+
90+
- **Tech stack:** TypeScript 5 / Node.js 22 / ESM - no runtime changes; extend, don't replace
91+
- **MCP protocol:** MCP spec 2025-03-26 compliance must be preserved end-to-end (client <-> gateway
92+
<-> upstream)
93+
- **Security:** Inbound client identity must be verified before any upstream connection is
94+
established; upstream credentials must never leak into logs or error responses
95+
- **Compatibility:** Existing OpenAPI-backed tool generation must continue working unchanged;
96+
proxy mode is additive, not a replacement
97+
98+
## Key Decisions
99+
100+
| Decision | Rationale | Outcome |
101+
|----------|-----------|---------|
102+
| Pass-through upstream credentials | Gateway stores no secrets - client owns their own upstream tokens; simpler security model, no vault dependency | Validated in Phase 01 - profile-per-upstream model, `token: string \| undefined` passed directly |
103+
| Profile-per-upstream (not session-level credential aggregation) | Simpler than per-session credential bag; one profile = one upstream = one token env var | Validated in Phase 01 - dead X-Upstream-Authorization extractor removed |
104+
| Remote HTTP upstream first, stdio deferred | Stdio adds process isolation complexity; HTTP upstream covers the primary enterprise use case first | - Pending |
105+
| Build on mcp4openapi transport stack | Existing SSE session management, OAuth provider, multi-tenant HTTP transport are production-grade; extend rather than rewrite | - Pending |
106+
| Team-level allow/deny (not RBAC/ABAC) | Explicit allow/deny per team is auditable and predictable; ABAC adds authoring overhead before adoption | - Pending |
107+
| Tool namespacing by upstream provider | Prevents tool name collisions across providers; makes audit logs and policy rules unambiguous | - Pending |
108+
109+
---
110+
*Last updated: 2026-03-30 after Phase 01 completion*
111+
112+
## Evolution
113+
114+
This document evolves at phase transitions and milestone boundaries.
115+
116+
**After each phase transition** (via `/gsd:transition`):
117+
1. Requirements invalidated? -> Move to Out of Scope with reason
118+
2. Requirements validated? -> Move to Validated with phase reference
119+
3. New requirements emerged? -> Add to Active
120+
4. Decisions to log? -> Add to Key Decisions
121+
5. "What This Is" still accurate? -> Update if drifted
122+
123+
**After each milestone** (via `/gsd:complete-milestone`):
124+
1. Full review of all sections
125+
2. Core Value check - still the right priority?
126+
3. Audit Out of Scope - reasons still valid?
127+
4. Update Context with current state

.planning/REQUIREMENTS.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Requirements - Enterprise MCP Gateway
2+
3+
Generated: 2026-03-27
4+
Project: mcp4openapi enterprise MCP proxy/gate
5+
Milestone: v1 - Proxy foundation + security gate
6+
7+
---
8+
9+
## v1 Requirements
10+
11+
### Proxy Core
12+
13+
- [x] **PROXY-01**: A downstream client session connecting to a profile backed by an upstream MCP server
14+
creates a per-session upstream HTTP connection on first tool use (lazy, not at session init)
15+
- [x] **PROXY-02**: Client-supplied upstream credentials (Bearer token, custom header, OAuth token)
16+
provided at session initialization are stored in the session context and forwarded to the upstream
17+
MCP server for all requests in that session; the gateway stores no credentials server-side
18+
- [x] **PROXY-03**: A tools/list request from a downstream client returns the tool list fetched from
19+
the upstream MCP server defined in the active profile (same profile-per-upstream model as OpenAPI
20+
profiles; no aggregation or namespacing across providers)
21+
- [x] **PROXY-04**: A tools/call request is routed to the upstream MCP server defined in the active
22+
profile and the upstream response is returned to the downstream client with typed error mapping for
23+
upstream failure cases
24+
25+
### Client Authentication
26+
27+
- [ ] **AUTH-01**: Inbound client presenting a JWT is validated against the JWKS endpoint of the
28+
configured identity provider (Entra ID, Okta, or Keycloak); session is rejected if validation
29+
fails before any upstream connection is made
30+
- [ ] **AUTH-02**: Inbound M2M client presenting an API key is validated against a configured API
31+
key store; a valid key resolves to a client identity before session is established
32+
- [ ] **AUTH-03**: Client identity (resolved from SSO JWT or API key) is attached to the session
33+
context and included in every audit log entry for that session
34+
35+
### Security
36+
37+
- [x] **SEC-01**: Tool definitions received from an upstream MCP server are sanitized before being
38+
forwarded to downstream clients; tool names and descriptions are validated against a safe-string
39+
allowlist to prevent tool poisoning and prompt injection via upstream tool metadata
40+
- [x] **SEC-02**: Upstream credential values are redacted from all logs, error responses, and
41+
diagnostic output; existing token-redaction infrastructure is extended to cover the new
42+
upstream-credential session fields
43+
44+
### Observability
45+
46+
- [ ] **OBS-01**: Every tools/call request produces a structured audit log entry containing: session
47+
ID, resolved client identity, tool name, upstream server URL (host only, no credentials),
48+
invocation outcome (success/error code), and wall-clock duration
49+
- [ ] **OBS-02**: Prometheus metrics expose per-upstream and per-client-identity counters and
50+
latency histograms for tools/list and tools/call requests; existing prom-client registry is
51+
extended (no second registry)
52+
- [ ] **OBS-03**: GET /health returns 200 when the server is running; GET /ready returns 200 when
53+
at least one profile is loaded and the server can accept sessions; both endpoints are unauthenticated
54+
55+
### Reliability
56+
57+
- [x] **REL-01**: Application-level heartbeat pings are sent on upstream SSE connections at a
58+
configurable interval (default 30s) to detect silent disconnects before a tool call fails
59+
- [x] **REL-02**: A session reaper runs on a configurable interval (default 60s) and closes
60+
upstream connections for sessions that have been inactive beyond the session timeout; no upstream
61+
connections are leaked when downstream clients disconnect without explicit close
62+
- [x] **REL-03**: Upstream failure cases (connection timeout, auth failure, server unavailable,
63+
malformed response) produce typed error responses to the downstream client with correlation IDs;
64+
no raw stack traces or upstream credential fragments in error payloads
65+
- [x] **REL-04**: Upstream notifications/tools/list_changed events received on a live upstream
66+
session are forwarded to the connected downstream SSE client; if no stream is attached,
67+
notifications are queued and replayed on reconnect using existing SSE replay infrastructure
68+
69+
---
70+
71+
## v2 Requirements (Deferred)
72+
73+
### Policy
74+
75+
- Team-level allow/deny policy: client identity maps to a policy that allows or denies specific
76+
upstream MCP servers or tool name patterns - deferred until v1 adoption demonstrates which
77+
granularity teams need
78+
- Policy dry-run mode: evaluate policy without enforcing, surface what would be denied
79+
80+
### Observability
81+
82+
- OpenTelemetry request tracing with trace context propagated to upstream MCP servers - deferred
83+
until core pipeline is stable; audit log + Prometheus covers operational needs for v1
84+
- Per-tool budget and rate limiting by team identity
85+
86+
### Upstream Sources
87+
88+
- Third-party SaaS MCP endpoints (GitHub, Slack, etc.) - same model as internal HTTP upstreams,
89+
unblocked by v1; explicit phase for auth/trust configuration differences
90+
- Stdio upstream MCP processes - deferred until process isolation boundary is defined
91+
92+
### Advanced Proxy
93+
94+
- Tool definition pinning: administrator can pin upstream tool schemas to detect upstream rug-pull
95+
changes between deployments
96+
97+
---
98+
99+
## Out of Scope
100+
101+
- Server-side upstream credential storage - pass-through model replaces the need; no vault
102+
integration in scope
103+
- Tool namespacing/aggregation across multiple upstream providers in a single profile - profile-
104+
per-upstream model is the architecture; aggregation is a separate product decision
105+
- Attribute-based access control (ABAC) - team allow/deny covers v1; ABAC adds authoring overhead
106+
- Admin UI - CLI and profile config files are the management interface
107+
- Public internet / multi-cloud SaaS distribution - on-prem/private cloud deployment only
108+
109+
---
110+
111+
## Traceability
112+
113+
| REQ-ID | Phase | Status |
114+
|--------|-------|--------|
115+
| PROXY-01 | Phase 1 | Complete |
116+
| PROXY-02 | Phase 1 | Complete |
117+
| PROXY-03 | Phase 2 | Complete |
118+
| PROXY-04 | Phase 2 | Complete |
119+
| AUTH-01 | Phase 4 | Pending |
120+
| AUTH-02 | Phase 3 | Pending |
121+
| AUTH-03 | Phase 3 (partial), Phase 4 (complete) | Pending |
122+
| SEC-01 | Phase 2 | Complete |
123+
| SEC-02 | Phase 1 | Complete |
124+
| OBS-01 | Phase 5 | Pending |
125+
| OBS-02 | Phase 5 | Pending |
126+
| OBS-03 | Phase 5 | Pending |
127+
| REL-01 | Phase 1 | Complete |
128+
| REL-02 | Phase 1 | Complete |
129+
| REL-03 | Phase 1 | Complete |
130+
| REL-04 | Phase 2 | Complete |

0 commit comments

Comments
 (0)