From 5e55c0990397ef3433ac812572b7544e8a413ad1 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:03:11 +0200 Subject: [PATCH 1/9] RFC: MCPServerEntry CRD for direct remote MCP server backends Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect directly to remote MCP servers without MCPRemoteProxy infrastructure, resolving the forced-auth (#3104) and dual-boundary confusion (#4109) issues. Co-Authored-By: Claude Opus 4.6 --- ...X-mcpserverentry-direct-remote-backends.md | 1173 +++++++++++++++++ 1 file changed, 1173 insertions(+) create mode 100644 rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md diff --git a/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md b/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md new file mode 100644 index 0000000..7b061f1 --- /dev/null +++ b/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md @@ -0,0 +1,1173 @@ +# RFC-XXXX: MCPServerEntry CRD for Direct Remote MCP Server Backends + +- **Status**: Draft +- **Author(s)**: Juan Antonio Osorio (@jaosorior) +- **Created**: 2026-03-12 +- **Last Updated**: 2026-03-12 +- **Target Repository**: toolhive +- **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) + +## Summary + +Introduce a new `MCPServerEntry` CRD (short name: `mcpentry`) that allows +VirtualMCPServer to connect directly to remote MCP servers without deploying +MCPRemoteProxy infrastructure. MCPServerEntry is a lightweight, pod-less +configuration resource that declares a remote MCP endpoint and belongs to an +MCPGroup, enabling vMCP to reach remote servers with a single auth boundary +and zero additional pods. + +## Problem Statement + +vMCP currently relies on MCPRemoteProxy (which spawns `thv-proxyrunner` pods) +to reach remote MCP servers. This architecture creates three concrete problems: + +### 1. Forced Authentication on Public Remotes (Issue #3104) + +MCPRemoteProxy requires OIDC authentication configuration even when vMCP +already handles client authentication at its own boundary. This blocks +unauthenticated public remote MCP servers (e.g., context7, public API +gateways) from being placed behind vMCP without configuring unnecessary +auth on the proxy layer. + +### 2. Dual Auth Boundary Confusion (Issue #4109) + +MCPRemoteProxy's single `externalAuthConfigRef` field is used for both the +vMCP-to-proxy boundary AND the proxy-to-remote boundary. When vMCP needs +to authenticate to the remote server through the proxy, token exchange +becomes circular or broken because the same auth config serves two +conflicting purposes: + +``` +Client -> vMCP [boundary 1: client auth] + -> MCPRemoteProxy [boundary 2: vMCP auth + remote auth on SAME config] + -> Remote Server +``` + +The operator cannot express "use auth X for the proxy and auth Y for the +remote" because there is only one `externalAuthConfigRef`. + +### 3. Resource Waste + +Every remote MCP server behind vMCP requires a full Deployment + Service + +Pod just to make an HTTP call that vMCP could make directly. For +organizations with many remote MCP backends, this creates unnecessary +infrastructure cost and operational overhead. + +### Who Is Affected + +- **Platform teams** deploying vMCP with remote MCP backends in Kubernetes +- **Product teams** wanting to register external MCP services behind vMCP +- **Organizations** running public or unauthenticated remote MCP servers + behind vMCP for aggregation + +## Goals + +- Enable vMCP to connect directly to remote MCP servers without + MCPRemoteProxy in the path +- Eliminate the dual auth boundary confusion by providing a single, + unambiguous auth config for the vMCP-to-remote boundary +- Allow unauthenticated remote MCP servers behind vMCP without workarounds +- Deploy zero additional infrastructure (no pods, services, or deployments) + for remote backend declarations +- Follow existing Kubernetes patterns (groupRef, externalAuthConfigRef) + consistent with MCPServer + +## Non-Goals + +- **Deprecating MCPRemoteProxy**: MCPRemoteProxy remains valuable for + standalone proxy use cases with its own auth middleware, audit logging, + and observability. MCPServerEntry is specifically for "behind vMCP" use + cases. +- **Adding health probing from the operator**: The operator controller + should NOT probe remote URLs. Reachability from the operator pod does not + imply reachability from the vMCP pod, and probing expands the operator's + attack surface. Health checking belongs in vMCP's existing runtime + infrastructure (`healthCheckInterval`, circuit breaker). +- **Cross-namespace references**: MCPServerEntry follows the same + namespace-scoped patterns as other ToolHive CRDs. +- **Supporting stdio or container-based transports**: MCPServerEntry is + exclusively for remote HTTP-based MCP servers. +- **CLI mode support**: MCPServerEntry is a Kubernetes-only CRD. CLI mode + already supports remote backends via direct configuration. + +## Proposed Solution + +### High-Level Design + +Introduce a new `MCPServerEntry` CRD that acts as a catalog entry for a +remote MCP endpoint. The naming follows the Istio `ServiceEntry` pattern, +communicating "this is a catalog entry, not an active workload." + +```mermaid +graph TB + subgraph "Client Layer" + Client[MCP Client] + end + + subgraph "Virtual MCP Server" + InAuth[Incoming Auth
Validates: aud=vmcp] + Router[Request Router] + AuthMgr[Backend Auth Manager] + end + + subgraph "Backend Layer (In-Cluster)" + MCPServer1[MCPServer: github-mcp
Pod + Service] + MCPServer2[MCPServer: jira-mcp
Pod + Service] + end + + subgraph "Backend Layer (Remote)" + Entry1[MCPServerEntry: context7
No pods - config only] + Entry2[MCPServerEntry: salesforce
No pods - config only] + end + + subgraph "External Services" + Remote1[context7.com/mcp] + Remote2[mcp.salesforce.com] + end + + Client -->|Token: aud=vmcp| InAuth + InAuth --> Router + Router --> AuthMgr + + AuthMgr -->|In-cluster call| MCPServer1 + AuthMgr -->|In-cluster call| MCPServer2 + AuthMgr -->|Direct HTTPS
+ externalAuthConfig| Remote1 + AuthMgr -->|Direct HTTPS
+ externalAuthConfig| Remote2 + + Entry1 -.->|Declares endpoint| Remote1 + Entry2 -.->|Declares endpoint| Remote2 + + style Entry1 fill:#fff3e0,stroke:#ff9800 + style Entry2 fill:#fff3e0,stroke:#ff9800 + style MCPServer1 fill:#e3f2fd,stroke:#2196f3 + style MCPServer2 fill:#e3f2fd,stroke:#2196f3 +``` + +The key insight is that MCPServerEntry deploys **no infrastructure**. It is +pure configuration that tells vMCP "there is a remote MCP server at this +URL, use this auth to reach it." VirtualMCPServer discovers MCPServerEntry +resources the same way it discovers MCPServer resources: via `groupRef`. + +### Auth Flow Comparison + +**Current (with MCPRemoteProxy) - Two boundaries, one config:** + +``` +Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] + -> MCPRemoteProxy [deploys pod] + externalAuthConfigRef used for BOTH: + - vMCP-to-proxy auth (boundary 2a) + - proxy-to-remote auth (boundary 2b) + -> Remote Server +``` + +**Proposed (with MCPServerEntry) - One clean boundary:** + +``` +Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] + -> MCPServerEntry: vMCP applies externalAuthConfigRef directly + -> Remote Server + (ONE boundary, ONE auth config, no confusion) +``` + +```mermaid +sequenceDiagram + participant Client + participant vMCP as Virtual MCP Server + participant IDP as Identity Provider + participant Remote as Remote MCP Server + + Client->>vMCP: MCP Request
Authorization: Bearer token (aud=vmcp) + + Note over vMCP: Validate incoming token
(existing auth middleware) + + Note over vMCP: Look up MCPServerEntry
for target backend + + alt externalAuthConfigRef is set + vMCP->>IDP: Token exchange request
(per MCPExternalAuthConfig) + IDP-->>vMCP: Exchanged token (aud=remote-api) + vMCP->>Remote: Forward request
Authorization: Bearer exchanged-token + else No auth configured (public remote) + vMCP->>Remote: Forward request
(no Authorization header) + end + + Remote-->>vMCP: MCP Response + vMCP-->>Client: Response +``` + +### Detailed Design + +#### MCPServerEntry CRD + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: context7 + namespace: default +spec: + # REQUIRED: URL of the remote MCP server + remoteURL: https://mcp.context7.com/mcp + + # REQUIRED: Transport protocol + # +kubebuilder:validation:Enum=streamable-http;sse + transport: streamable-http + + # REQUIRED: Group membership (unlike MCPServer where it's optional) + # An MCPServerEntry without a group is dead config - it cannot be + # discovered by any VirtualMCPServer. + groupRef: engineering-team + + # OPTIONAL: Auth configuration for reaching the remote server. + # Omit entirely for unauthenticated public remotes (resolves #3104). + # Single unambiguous purpose: auth to the remote (resolves #4109). + externalAuthConfigRef: + name: salesforce-auth + + # OPTIONAL: Header forwarding configuration. + # Reuses existing pattern from MCPRemoteProxy (THV-0026). + headerForward: + addPlaintextHeaders: + X-Tenant-ID: "tenant-123" + addHeadersFromSecrets: + - headerName: X-API-Key + valueSecretRef: + name: remote-api-credentials + key: api-key + + # OPTIONAL: Custom CA bundle for private remote servers using + # internal/self-signed certificates. + caBundleRef: + name: internal-ca-bundle + key: ca.crt +``` + +**Example: Unauthenticated public remote (resolves #3104):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: context7 +spec: + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + groupRef: engineering-team + # No externalAuthConfigRef - public endpoint, no auth needed +``` + +**Example: Authenticated remote with token exchange:** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: salesforce-mcp +spec: + remoteURL: https://mcp.salesforce.com + transport: streamable-http + groupRef: engineering-team + externalAuthConfigRef: + name: salesforce-token-exchange +--- +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPExternalAuthConfig +metadata: + name: salesforce-token-exchange +spec: + type: tokenExchange + tokenExchange: + tokenUrl: https://keycloak.company.com/realms/myrealm/protocol/openid-connect/token + clientId: salesforce-exchange + clientSecretRef: + name: salesforce-oauth + key: client-secret + audience: mcp.salesforce.com + scopes: ["mcp:read", "mcp:write"] +``` + +**Example: Remote with static header auth:** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: internal-api-mcp +spec: + remoteURL: https://internal-mcp.corp.example.com/mcp + transport: sse + groupRef: engineering-team + headerForward: + addHeadersFromSecrets: + - headerName: Authorization + valueSecretRef: + name: internal-api-token + key: bearer-token + caBundleRef: + name: corp-ca-bundle + key: ca.crt +``` + +#### CRD Type Definitions + +```go +// MCPServerEntry declares a remote MCP server endpoint as a backend for +// VirtualMCPServer. Unlike MCPServer (which deploys container workloads) +// or MCPRemoteProxy (which deploys proxy pods), MCPServerEntry is a +// pure configuration resource that deploys no infrastructure. +// +// +kubebuilder:object:root=true +// +kubebuilder:subresource:status +// +kubebuilder:resource:shortName=mcpentry +// +kubebuilder:printcolumn:name="URL",type=string,JSONPath=`.spec.remoteURL` +// +kubebuilder:printcolumn:name="Transport",type=string,JSONPath=`.spec.transport` +// +kubebuilder:printcolumn:name="Group",type=string,JSONPath=`.spec.groupRef` +// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=="Ready")].status` +// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp` +type MCPServerEntry struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + + Spec MCPServerEntrySpec `json:"spec,omitempty"` + Status MCPServerEntryStatus `json:"status,omitempty"` +} + +type MCPServerEntrySpec struct { + // RemoteURL is the URL of the remote MCP server. + // Must use HTTPS unless the toolhive.stacklok.dev/allow-insecure + // annotation is set to "true" (for development only). + // +kubebuilder:validation:Required + // +kubebuilder:validation:Pattern=`^https?://` + RemoteURL string `json:"remoteURL"` + + // Transport specifies the MCP transport protocol. + // +kubebuilder:validation:Required + // +kubebuilder:validation:Enum=streamable-http;sse + Transport string `json:"transport"` + + // GroupRef is the name of the MCPGroup this entry belongs to. + // Required because an MCPServerEntry without a group cannot be + // discovered by any VirtualMCPServer. + // +kubebuilder:validation:Required + // +kubebuilder:validation:MinLength=1 + GroupRef string `json:"groupRef"` + + // ExternalAuthConfigRef references an MCPExternalAuthConfig in the + // same namespace for authenticating to the remote server. + // Omit for unauthenticated public endpoints. + // +optional + ExternalAuthConfigRef *ExternalAuthConfigRef `json:"externalAuthConfigRef,omitempty"` + + // HeaderForward configures additional headers to inject into + // requests forwarded to the remote server. + // +optional + HeaderForward *HeaderForwardConfig `json:"headerForward,omitempty"` + + // CABundleRef references a ConfigMap or Secret containing a custom + // CA certificate bundle for TLS verification of the remote server. + // Useful for remote servers with private/internal CA certificates. + // +optional + CABundleRef *SecretKeyRef `json:"caBundleRef,omitempty"` +} + +type MCPServerEntryStatus struct { + // Conditions represent the latest available observations of the + // MCPServerEntry's state. + // +optional + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // ObservedGeneration is the most recent generation observed. + // +optional + ObservedGeneration int64 `json:"observedGeneration,omitempty"` +} +``` + +**Condition types:** + +| Type | Purpose | When Set | +|------|---------|----------| +| `Ready` | Overall readiness | Always | +| `GroupRefValid` | Referenced MCPGroup exists | Always | +| `AuthConfigValid` | Referenced MCPExternalAuthConfig exists | Only when `externalAuthConfigRef` is set | +| `CABundleValid` | Referenced CA bundle exists | Only when `caBundleRef` is set | + +There is intentionally **no `RemoteReachable` condition**. The controller +should NOT probe remote URLs because: + +1. Reachability from the operator pod does not imply reachability from the + vMCP pod (different network policies, egress rules, DNS resolution). +2. Probing external URLs from the operator expands its attack surface and + requires egress network access it may not have. +3. It gives false confidence: a probe succeeding now doesn't mean it will + succeed when vMCP makes the actual request. +4. vMCP already has health checking infrastructure (`healthCheckInterval`, + circuit breaker) that operates at the right layer. + +#### Status Example + +```yaml +status: + conditions: + - type: Ready + status: "True" + reason: ValidationSucceeded + message: "MCPServerEntry is valid and ready for discovery" + lastTransitionTime: "2026-03-12T10:00:00Z" + - type: GroupRefValid + status: "True" + reason: GroupExists + message: "MCPGroup 'engineering-team' exists" + lastTransitionTime: "2026-03-12T10:00:00Z" + - type: AuthConfigValid + status: "True" + reason: AuthConfigExists + message: "MCPExternalAuthConfig 'salesforce-auth' exists" + lastTransitionTime: "2026-03-12T10:00:00Z" + observedGeneration: 1 +``` + +#### Component Changes + +##### Operator: New CRD and Controller + +**New files:** +- `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` - CRD type + definitions +- `cmd/thv-operator/controllers/mcpserverentry_controller.go` - + Validation-only controller + +The MCPServerEntry controller is intentionally simple. It performs +**validation only** and creates **no infrastructure**: + +```go +func (r *MCPServerEntryReconciler) Reconcile( + ctx context.Context, req ctrl.Request, +) (ctrl.Result, error) { + var entry mcpv1alpha1.MCPServerEntry + if err := r.Get(ctx, req.NamespacedName, &entry); err != nil { + return ctrl.Result{}, client.IgnoreNotFound(err) + } + + statusManager := NewStatusManager(&entry) + + // Validate groupRef exists + var group mcpv1alpha1.MCPGroup + if err := r.Get(ctx, client.ObjectKey{ + Namespace: entry.Namespace, + Name: entry.Spec.GroupRef, + }, &group); err != nil { + if apierrors.IsNotFound(err) { + statusManager.SetCondition("GroupRefValid", "GroupNotFound", + fmt.Sprintf("MCPGroup %q not found", entry.Spec.GroupRef), + metav1.ConditionFalse) + statusManager.SetCondition("Ready", "ValidationFailed", + "Referenced MCPGroup does not exist", metav1.ConditionFalse) + return r.updateStatus(ctx, &entry, statusManager) + } + return ctrl.Result{}, err + } + statusManager.SetCondition("GroupRefValid", "GroupExists", + fmt.Sprintf("MCPGroup %q exists", entry.Spec.GroupRef), + metav1.ConditionTrue) + + // Validate externalAuthConfigRef if set + if entry.Spec.ExternalAuthConfigRef != nil { + var authConfig mcpv1alpha1.MCPExternalAuthConfig + if err := r.Get(ctx, client.ObjectKey{ + Namespace: entry.Namespace, + Name: entry.Spec.ExternalAuthConfigRef.Name, + }, &authConfig); err != nil { + if apierrors.IsNotFound(err) { + statusManager.SetCondition("AuthConfigValid", + "AuthConfigNotFound", + fmt.Sprintf("MCPExternalAuthConfig %q not found", + entry.Spec.ExternalAuthConfigRef.Name), + metav1.ConditionFalse) + statusManager.SetCondition("Ready", "ValidationFailed", + "Referenced auth config does not exist", + metav1.ConditionFalse) + return r.updateStatus(ctx, &entry, statusManager) + } + return ctrl.Result{}, err + } + statusManager.SetCondition("AuthConfigValid", "AuthConfigExists", + fmt.Sprintf("MCPExternalAuthConfig %q exists", + entry.Spec.ExternalAuthConfigRef.Name), + metav1.ConditionTrue) + } + + // Validate HTTPS requirement + if !strings.HasPrefix(entry.Spec.RemoteURL, "https://") { + if entry.Annotations["toolhive.stacklok.dev/allow-insecure"] != "true" { + statusManager.SetCondition("Ready", "InsecureURL", + "remoteURL must use HTTPS (set annotation "+ + "toolhive.stacklok.dev/allow-insecure=true to override)", + metav1.ConditionFalse) + return r.updateStatus(ctx, &entry, statusManager) + } + } + + statusManager.SetCondition("Ready", "ValidationSucceeded", + "MCPServerEntry is valid and ready for discovery", + metav1.ConditionTrue) + + return r.updateStatus(ctx, &entry, statusManager) +} + +func (r *MCPServerEntryReconciler) SetupWithManager( + mgr ctrl.Manager, +) error { + return ctrl.NewControllerManagedBy(mgr). + For(&mcpv1alpha1.MCPServerEntry{}). + Watches(&mcpv1alpha1.MCPGroup{}, + handler.EnqueueRequestsFromMapFunc( + r.findEntriesForGroup, + )). + Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, + handler.EnqueueRequestsFromMapFunc( + r.findEntriesForAuthConfig, + )). + Complete(r) +} +``` + +No finalizers are needed because MCPServerEntry creates no infrastructure +to clean up. + +##### Operator: MCPGroup Controller Update + +The MCPGroup controller must be updated to watch MCPServerEntry resources +in addition to MCPServer resources, so that `status.servers` and +`status.serverCount` reflect both types of backends in the group. + +**Files to modify:** +- `cmd/thv-operator/controllers/mcpgroup_controller.go` - Add watch for + MCPServerEntry, update status aggregation + +##### Operator: VirtualMCPServer Controller Update + +**Static mode (`outgoingAuth.source: inline`):** The operator generates +the ConfigMap that vMCP reads at startup. This ConfigMap must now include +MCPServerEntry backends alongside MCPServer backends. + +The controller discovers MCPServerEntry resources in the group and +serializes them as remote backend entries in the ConfigMap: + +```yaml +# Generated ConfigMap content +backends: + # From MCPServer resources (existing) + - name: github-mcp + url: http://github-mcp.default.svc:8080 + transport: sse + type: container + auth: + type: token_exchange + # ... + + # From MCPServerEntry resources (new) + - name: context7 + url: https://mcp.context7.com/mcp + transport: streamable-http + type: entry # New backend type + # No auth - public endpoint + + - name: salesforce-mcp + url: https://mcp.salesforce.com + transport: streamable-http + type: entry + auth: + type: token_exchange + # ... +``` + +**Files to modify:** +- `cmd/thv-operator/controllers/virtualmcpserver_controller.go` - Discover + MCPServerEntry resources in group +- `cmd/thv-operator/controllers/virtualmcpserver_vmcpconfig.go` - Include + entry backends in ConfigMap generation + +##### vMCP: Backend Type and Discovery + +**New backend type:** + +```go +// In pkg/vmcp/types.go +const ( + BackendTypeContainer BackendType = "container" + BackendTypeProxy BackendType = "proxy" + BackendTypeEntry BackendType = "entry" // New +) +``` + +**Discovery updates:** + +```go +// In pkg/vmcp/workloads/k8s.go +func (m *K8sWorkloadManager) ListWorkloadsInGroup( + ctx context.Context, groupName string, +) ([]Backend, error) { + var backends []Backend + + // Existing: discover MCPServer resources + mcpServers, err := m.discoverMCPServers(ctx, groupName) + if err != nil { + return nil, fmt.Errorf("discovering MCPServers: %w", err) + } + backends = append(backends, mcpServers...) + + // New: discover MCPServerEntry resources + entries, err := m.discoverMCPServerEntries(ctx, groupName) + if err != nil { + return nil, fmt.Errorf("discovering MCPServerEntries: %w", err) + } + backends = append(backends, entries...) + + return backends, nil +} + +func (m *K8sWorkloadManager) discoverMCPServerEntries( + ctx context.Context, groupName string, +) ([]Backend, error) { + var entryList mcpv1alpha1.MCPServerEntryList + if err := m.client.List(ctx, &entryList, + client.InNamespace(m.namespace), + client.MatchingFields{"spec.groupRef": groupName}, + ); err != nil { + return nil, err + } + + var backends []Backend + for _, entry := range entryList.Items { + backend := Backend{ + ID: fmt.Sprintf("%s/%s", entry.Namespace, entry.Name), + Name: entry.Name, + BaseURL: entry.Spec.RemoteURL, + Transport: entry.Spec.Transport, + Type: BackendTypeEntry, + } + + // Resolve auth if configured + if entry.Spec.ExternalAuthConfigRef != nil { + authConfig, err := m.resolveAuthConfig(ctx, + entry.Namespace, + entry.Spec.ExternalAuthConfigRef.Name, + ) + if err != nil { + return nil, fmt.Errorf( + "resolving auth for entry %s: %w", + entry.Name, err, + ) + } + backend.AuthConfig = authConfig + } + + // Resolve header forward config if set + if entry.Spec.HeaderForward != nil { + backend.HeaderForward = m.resolveHeaderForward( + ctx, entry.Namespace, entry.Spec.HeaderForward, + ) + } + + // Resolve CA bundle if set + if entry.Spec.CABundleRef != nil { + caBundle, err := m.resolveCABundle(ctx, + entry.Namespace, entry.Spec.CABundleRef, + ) + if err != nil { + return nil, fmt.Errorf( + "resolving CA bundle for entry %s: %w", + entry.Name, err, + ) + } + backend.CABundle = caBundle + } + + backends = append(backends, backend) + } + + return backends, nil +} +``` + +##### vMCP: HTTP Client for External TLS + +Backends of type `entry` connect to external URLs over HTTPS. The vMCP +HTTP client must be updated to: + +1. Use the system CA certificate pool by default (for public CAs). +2. Optionally append a custom CA bundle from `caBundleRef` (for private + CAs). +3. Apply the resolved `externalAuthConfigRef` credentials directly to + outgoing requests. + +```go +// In pkg/vmcp/client/client.go +func (c *Client) createTransportForEntry( + backend *Backend, +) (*http.Transport, error) { + tlsConfig := &tls.Config{ + MinVersion: tls.VersionTLS12, + } + + if backend.CABundle != nil { + pool, err := x509.SystemCertPool() + if err != nil { + pool = x509.NewCertPool() + } + if !pool.AppendCertsFromPEM(backend.CABundle) { + return nil, fmt.Errorf("failed to parse CA bundle") + } + tlsConfig.RootCAs = pool + } + + return &http.Transport{ + TLSClientConfig: tlsConfig, + }, nil +} +``` + +##### vMCP: Dynamic Mode Reconciler Update + +For dynamic mode (`outgoingAuth.source: discovered`), the reconciler +infrastructure from THV-0014 must be extended to watch MCPServerEntry +resources. + +**Files to modify:** +- `pkg/vmcp/k8s/manager.go` - Register MCPServerEntry watcher +- `pkg/vmcp/k8s/mcpserverentry_watcher.go` (new) - MCPServerEntry + reconciler + +```go +type MCPServerEntryWatcher struct { + client client.Client + registry vmcp.DynamicRegistry + groupRef string +} + +func (w *MCPServerEntryWatcher) Reconcile( + ctx context.Context, req ctrl.Request, +) (ctrl.Result, error) { + backendID := req.NamespacedName.String() + + var entry mcpv1alpha1.MCPServerEntry + if err := w.client.Get(ctx, req.NamespacedName, &entry); err != nil { + if apierrors.IsNotFound(err) { + w.registry.Remove(backendID) + return ctrl.Result{}, nil + } + return ctrl.Result{}, err + } + + if entry.Spec.GroupRef != w.groupRef { + // Not in our group, remove if previously tracked + w.registry.Remove(backendID) + return ctrl.Result{}, nil + } + + backend, err := w.convertToBackend(ctx, &entry) + if err != nil { + return ctrl.Result{}, err + } + backend.ID = backendID + + if err := w.registry.Upsert(backend); err != nil { + return ctrl.Result{}, err + } + + return ctrl.Result{}, nil +} + +func (w *MCPServerEntryWatcher) SetupWithManager( + mgr ctrl.Manager, +) error { + return ctrl.NewControllerManagedBy(mgr). + For(&mcpv1alpha1.MCPServerEntry{}). + Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, + handler.EnqueueRequestsFromMapFunc( + w.findEntriesForAuthConfig, + )). + Watches(&corev1.Secret{}, + handler.EnqueueRequestsFromMapFunc( + w.findEntriesForSecret, + )). + Complete(w) +} +``` + +##### vMCP: Static Config Parser Update + +The static config parser must be updated to deserialize `type: entry` +backends from the ConfigMap and create appropriate HTTP clients with +external TLS support. + +**Files to modify:** +- `pkg/vmcp/config/` - Parse entry-type backends from static config + +## Security Considerations + +### Threat Model + +| Threat | Description | Mitigation | +|--------|-------------|------------| +| Man-in-the-middle on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | +| Credential exposure in CRD spec | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecrets`; never inline in CRD spec | +| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress | +| Auth config confusion (existing issue) | Dual-boundary auth leading to wrong tokens sent to wrong endpoints | Eliminated: MCPServerEntry has exactly one auth boundary with one purpose | +| Operator probing external URLs | Controller making network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | + +### Authentication and Authorization + +- **No new auth primitives**: MCPServerEntry reuses the existing + `MCPExternalAuthConfig` CRD and `externalAuthConfigRef` pattern. +- **Single boundary**: vMCP's incoming auth validates client tokens. + MCPServerEntry's `externalAuthConfigRef` handles outgoing auth to + the remote. These are cleanly separated. +- **RBAC**: Standard Kubernetes RBAC controls who can create/modify + MCPServerEntry resources. This enables fine-grained access: platform + teams manage VirtualMCPServer, product teams register MCPServerEntry + backends. +- **No privilege escalation**: MCPServerEntry grants no additional + permissions beyond what the referenced MCPExternalAuthConfig already + provides. + +### Data Security + +- **In transit**: HTTPS required for remote connections (with annotation + escape hatch for development). +- **At rest**: No sensitive data stored in MCPServerEntry spec. Auth + credentials are in K8s Secrets, referenced indirectly. +- **CA bundles**: Custom CA certificates referenced via `caBundleRef`, + stored in K8s Secrets/ConfigMaps with standard K8s encryption at rest. + +### Input Validation + +- **remoteURL**: Must match `^https?://` pattern. HTTPS enforced unless + annotation override. Validated by both CRD CEL rules and controller + reconciliation. +- **transport**: Enum validation (`streamable-http` or `sse`). +- **groupRef**: Required, validated to reference an existing MCPGroup. +- **externalAuthConfigRef**: When set, validated to reference an existing + MCPExternalAuthConfig. +- **headerForward**: Uses the same restricted header blocklist and + validation as MCPRemoteProxy (THV-0026). + +### Secrets Management + +- MCPServerEntry follows the same secret access patterns as MCPServer: + - **Dynamic mode**: vMCP reads secrets at runtime via K8s API + (namespace-scoped RBAC). + - **Static mode**: Operator mounts secrets as environment variables. +- Secret rotation follows existing patterns: + - **Dynamic mode**: Watch-based propagation, no pod restart needed. + - **Static mode**: Requires pod restart (Deployment rollout). + +### Audit and Logging + +- vMCP's existing audit middleware logs all requests routed to + MCPServerEntry backends, including user identity and target tool. +- The operator controller logs validation results (group existence, + auth config existence) at standard log levels. +- No sensitive data (URLs with credentials, auth tokens) is logged. + +### Mitigations + +1. **HTTPS enforcement**: Default requires HTTPS; annotation override + requires explicit operator action. +2. **No network probing**: Controller never connects to remote URLs. +3. **Single auth boundary**: Eliminates dual-boundary confusion. +4. **Existing patterns**: Reuses battle-tested secret access, RBAC, + and auth patterns from MCPServer. +5. **NetworkPolicy recommendation**: Documentation recommends restricting + vMCP pod egress to known remote endpoints. +6. **No new attack surface**: Zero additional pods deployed. + +## Alternatives Considered + +### Alternative 1: Add `remoteServerRefs` to VirtualMCPServer Spec + +Embed remote server configuration directly in the VirtualMCPServer CRD. + +```yaml +kind: VirtualMCPServer +spec: + groupRef: + name: engineering-team + remoteServerRefs: + - name: context7 + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + - name: salesforce + remoteURL: https://mcp.salesforce.com + transport: streamable-http + externalAuthConfigRef: + name: salesforce-auth +``` + +**Pros:** +- No new CRD needed +- Simple for small deployments + +**Cons:** +- Violates separation of concerns: VirtualMCPServer manages aggregation, + not backend declaration +- Breaks the `groupRef` discovery pattern: some backends discovered via + group, others embedded inline +- Bloats VirtualMCPServer spec +- Prevents independent lifecycle management: adding/removing a remote + backend requires editing the VirtualMCPServer, which may trigger + reconciliation of unrelated configuration +- Prevents fine-grained RBAC: only VirtualMCPServer editors can manage + remote backends + +**Why not chosen:** Inconsistent with existing patterns and prevents the +RBAC separation that makes MCPServerEntry valuable (platform teams manage +vMCP, product teams register backends). + +### Alternative 2: Extend MCPServer with Remote Mode + +Add a `mode: remote` field to the existing MCPServer CRD. + +```yaml +kind: MCPServer +spec: + mode: remote + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + groupRef: engineering-team +``` + +**Pros:** +- No new CRD +- Reuses existing MCPServer controller infrastructure + +**Cons:** +- MCPServer is fundamentally a container workload resource. Adding a + "don't deploy anything" mode creates confusing semantics: `spec.image` + becomes optional, `spec.resources` is meaningless, status conditions + designed for pod lifecycle don't apply. +- Controller logic becomes complex with conditional paths for + container vs remote modes. +- Existing MCPServer watchers (MCPGroup controller, VirtualMCPServer + controller) would need to handle both modes, adding complexity. +- The controller currently creates Deployments, Services, and ConfigMaps. + Adding a mode that creates none of these is a significant semantic + change. + +**Why not chosen:** Overloading MCPServer with remote-mode semantics +increases complexity and confusion. A separate CRD with clear "this is +configuration only" semantics is cleaner. + +### Alternative 3: Configure Remote Backends Only in vMCP Config + +Handle remote backends entirely in vMCP's configuration (ConfigMap or +runtime discovery) without a CRD. + +**Pros:** +- No CRD changes needed +- Simpler operator + +**Cons:** +- No Kubernetes-native resource to represent remote backends +- No status reporting, no `kubectl get` visibility +- No RBAC for who can manage remote backends +- Breaks the pattern where all backends are discoverable via `groupRef` +- MCPGroup status cannot reflect remote backends + +**Why not chosen:** Loses Kubernetes-native management, visibility, and +access control. + +## Compatibility + +### Backward Compatibility + +MCPServerEntry is a purely additive change: + +- **No changes to existing CRDs**: MCPServer, MCPRemoteProxy, + VirtualMCPServer, MCPGroup, and MCPExternalAuthConfig are unchanged. +- **No changes to existing behavior**: VirtualMCPServer continues to + discover MCPServer resources via `groupRef`. MCPServerEntry adds a + new discovery source alongside the existing one. +- **MCPRemoteProxy still works**: Organizations using MCPRemoteProxy + can continue to do so. MCPServerEntry is an alternative, not a + replacement. +- **No migration required**: Existing deployments work without + modification after the upgrade. + +### Forward Compatibility + +- **Extensibility**: The `MCPServerEntrySpec` can be extended with + additional fields (e.g., rate limiting, tool filtering) without + breaking changes. +- **API versioning**: Starts at `v1alpha1`, consistent with all other + ToolHive CRDs. +- **Future deprecation path**: If MCPRemoteProxy use cases are eventually + subsumed, MCPServerEntry provides a clean migration target. + +## Implementation Plan + +### Phase 1: CRD and Controller + +1. Define `MCPServerEntry` types in + `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` +2. Implement validation-only controller in + `cmd/thv-operator/controllers/mcpserverentry_controller.go` +3. Generate CRD manifests (`task operator-generate`, + `task operator-manifests`) +4. Update MCPGroup controller to watch MCPServerEntry resources +5. Add unit tests for controller validation logic + +### Phase 2: Static Mode Integration + +1. Update VirtualMCPServer controller to discover MCPServerEntry resources + in the group +2. Update ConfigMap generation to include entry-type backends +3. Update vMCP static config parser to deserialize entry backends +4. Add `BackendTypeEntry` to vMCP types +5. Implement external TLS transport creation for entry backends +6. Integration tests with envtest + +### Phase 3: Dynamic Mode Integration + +1. Create `MCPServerEntryWatcher` reconciler in `pkg/vmcp/k8s/` +2. Register watcher in the K8s manager alongside MCPServerWatcher +3. Update `ListWorkloadsInGroup()` to include MCPServerEntry +4. Resolve auth configs for entry backends at runtime +5. Integration tests for dynamic discovery of entry backends + +### Phase 4: Documentation and E2E + +1. CRD reference documentation +2. User guide with examples (public remote, authenticated remote, + private CA) +3. MCPRemoteProxy vs MCPServerEntry comparison guide +4. E2E Chainsaw tests for full lifecycle +5. E2E tests for mixed MCPServer + MCPServerEntry groups + +### Dependencies + +- THV-0014 (K8s-Aware vMCP) for dynamic mode support +- THV-0026 (Header Passthrough) for `headerForward` field reuse +- Existing MCPExternalAuthConfig CRD for auth configuration + +## Testing Strategy + +### Unit Tests + +- Controller validation: groupRef exists, authConfigRef exists, HTTPS + enforcement, annotation override +- CRD type serialization/deserialization +- Backend conversion from MCPServerEntry to internal Backend struct +- External TLS transport creation with and without custom CA bundles +- Static config parsing with entry-type backends + +### Integration Tests (envtest) + +- MCPServerEntry controller reconciliation with real API server +- VirtualMCPServer ConfigMap generation including entry backends +- MCPGroup status update with mixed MCPServer + MCPServerEntry members +- Dynamic mode: MCPServerEntry watcher reconciliation +- Auth config resolution for entry backends +- Secret change propagation to entry backends + +### End-to-End Tests (Chainsaw) + +- Full lifecycle: create MCPGroup, create MCPServerEntry, create + VirtualMCPServer, verify vMCP routes to remote backend +- Mixed group: MCPServer (container) + MCPServerEntry (remote) in same + group +- Unauthenticated public remote behind vMCP +- Authenticated remote with token exchange +- MCPServerEntry deletion removes backend from vMCP +- CA bundle configuration for private remotes + +### Security Tests + +- Verify HTTPS enforcement (HTTP URL without annotation is rejected) +- Verify RBAC separation (entry creation requires correct permissions) +- Verify no network probing from controller +- Verify secret values are not logged + +## Documentation + +- **CRD Reference**: Auto-generated CRD documentation for MCPServerEntry + fields, validation rules, and status conditions +- **User Guide**: How to add remote MCP backends to vMCP using + MCPServerEntry, with examples for common scenarios +- **Comparison Guide**: When to use MCPRemoteProxy vs MCPServerEntry: + + | Feature | MCPRemoteProxy | MCPServerEntry | + |---------|---------------|----------------| + | Deploys pods | Yes (proxy pod) | No | + | Own auth middleware | Yes (oidcConfig, authzConfig) | No | + | Own audit logging | Yes | No (uses vMCP's) | + | Standalone use | Yes | No (only via VirtualMCPServer) | + | GroupRef support | Yes (optional) | Yes (required) | + | Primary use case | Standalone proxy with full observability | Backend declaration for vMCP | + +- **Architecture Documentation**: Update `docs/arch/10-virtual-mcp-architecture.md` + to describe MCPServerEntry as a backend type + +## Open Questions + +1. **Should `remoteURL` strictly require HTTPS?** + Recommendation: Yes, with annotation override + (`toolhive.stacklok.dev/allow-insecure: "true"`) for development. + This prevents accidental plaintext credential transmission while + allowing local development workflows. + +2. **Should the CRD support custom CA bundles for private remote servers?** + Recommendation: Yes, via `caBundleRef` field referencing a Secret or + ConfigMap. This is essential for enterprises with internal CAs. The + current design includes this field. + +3. **Should there be a `disabled` field for temporarily removing an entry + from discovery without deleting it?** + This could be useful for maintenance windows or incident response. + However, it adds complexity and can be achieved by removing the + `groupRef` temporarily. Defer to post-implementation feedback. + +4. **Should MCPServerEntry support `toolConfigRef` for tool filtering?** + MCPRemoteProxy supports tool filtering via `toolConfigRef`. + VirtualMCPServer also has its own tool filtering/override configuration + in `spec.aggregation.tools`. For MCPServerEntry, tool filtering should + be configured at the VirtualMCPServer level (where it already exists) + rather than duplicating it on the entry. Defer unless there is a clear + use case for entry-level filtering. + +## References + +- [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) - + VirtualMCPServer design, auth boundaries, capability aggregation +- [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) - + MCPRemoteProxy CRD design +- [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) - + Group-based backend discovery pattern +- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) - + Dynamic vs static discovery modes, reconciler infrastructure +- [THV-0026: Header Passthrough](./THV-0026-header-passthrough.md) - + `headerForward` configuration pattern +- [Istio ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/) - + Naming pattern inspiration +- [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104) - + MCPRemoteProxy forces OIDC auth on public remotes behind vMCP +- [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) - + Dual auth boundary confusion with externalAuthConfigRef + +--- + +## RFC Lifecycle + + + +### Review History + +| Date | Reviewer | Decision | Notes | +|------|----------|----------|-------| +| 2026-03-12 | @jaosorior | Draft | Initial submission | + +### Implementation Tracking + +| Repository | PR | Status | +|------------|-----|--------| +| toolhive | TBD | Not started | From e8b6c513d7ec7a53f1e9298d303b488b9395698a Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:05:59 +0200 Subject: [PATCH 2/9] Rename RFC to match PR number THV-0055 Co-Authored-By: Claude Opus 4.6 --- ...nds.md => THV-0055-mcpserverentry-direct-remote-backends.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename rfcs/{THV-XXXX-mcpserverentry-direct-remote-backends.md => THV-0055-mcpserverentry-direct-remote-backends.md} (99%) diff --git a/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md similarity index 99% rename from rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md rename to rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 7b061f1..d18484e 100644 --- a/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -1,4 +1,4 @@ -# RFC-XXXX: MCPServerEntry CRD for Direct Remote MCP Server Backends +# RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends - **Status**: Draft - **Author(s)**: Juan Antonio Osorio (@jaosorior) From 841b88698862e4e8aa300b1629c901eec3d4470d Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:09:28 +0200 Subject: [PATCH 3/9] Remove Go code samples, replace with prose descriptions RFC should focus on design intent, not implementation code. Keep YAML/Mermaid examples, replace Go blocks with prose describing controller behavior, discovery logic, and TLS handling. Co-Authored-By: Claude Opus 4.6 --- ...5-mcpserverentry-direct-remote-backends.md | 430 ++++-------------- 1 file changed, 80 insertions(+), 350 deletions(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index d18484e..93e7dbc 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -310,77 +310,30 @@ spec: #### CRD Type Definitions -```go -// MCPServerEntry declares a remote MCP server endpoint as a backend for -// VirtualMCPServer. Unlike MCPServer (which deploys container workloads) -// or MCPRemoteProxy (which deploys proxy pods), MCPServerEntry is a -// pure configuration resource that deploys no infrastructure. -// -// +kubebuilder:object:root=true -// +kubebuilder:subresource:status -// +kubebuilder:resource:shortName=mcpentry -// +kubebuilder:printcolumn:name="URL",type=string,JSONPath=`.spec.remoteURL` -// +kubebuilder:printcolumn:name="Transport",type=string,JSONPath=`.spec.transport` -// +kubebuilder:printcolumn:name="Group",type=string,JSONPath=`.spec.groupRef` -// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=="Ready")].status` -// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp` -type MCPServerEntry struct { - metav1.TypeMeta `json:",inline"` - metav1.ObjectMeta `json:"metadata,omitempty"` - - Spec MCPServerEntrySpec `json:"spec,omitempty"` - Status MCPServerEntryStatus `json:"status,omitempty"` -} - -type MCPServerEntrySpec struct { - // RemoteURL is the URL of the remote MCP server. - // Must use HTTPS unless the toolhive.stacklok.dev/allow-insecure - // annotation is set to "true" (for development only). - // +kubebuilder:validation:Required - // +kubebuilder:validation:Pattern=`^https?://` - RemoteURL string `json:"remoteURL"` - - // Transport specifies the MCP transport protocol. - // +kubebuilder:validation:Required - // +kubebuilder:validation:Enum=streamable-http;sse - Transport string `json:"transport"` - - // GroupRef is the name of the MCPGroup this entry belongs to. - // Required because an MCPServerEntry without a group cannot be - // discovered by any VirtualMCPServer. - // +kubebuilder:validation:Required - // +kubebuilder:validation:MinLength=1 - GroupRef string `json:"groupRef"` - - // ExternalAuthConfigRef references an MCPExternalAuthConfig in the - // same namespace for authenticating to the remote server. - // Omit for unauthenticated public endpoints. - // +optional - ExternalAuthConfigRef *ExternalAuthConfigRef `json:"externalAuthConfigRef,omitempty"` - - // HeaderForward configures additional headers to inject into - // requests forwarded to the remote server. - // +optional - HeaderForward *HeaderForwardConfig `json:"headerForward,omitempty"` - - // CABundleRef references a ConfigMap or Secret containing a custom - // CA certificate bundle for TLS verification of the remote server. - // Useful for remote servers with private/internal CA certificates. - // +optional - CABundleRef *SecretKeyRef `json:"caBundleRef,omitempty"` -} - -type MCPServerEntryStatus struct { - // Conditions represent the latest available observations of the - // MCPServerEntry's state. - // +optional - Conditions []metav1.Condition `json:"conditions,omitempty"` - - // ObservedGeneration is the most recent generation observed. - // +optional - ObservedGeneration int64 `json:"observedGeneration,omitempty"` -} -``` +The `MCPServerEntry` CRD type is defined in +`cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go`. It follows the +standard kubebuilder pattern with `Spec` and `Status` subresources. + +The resource uses the short name `mcpentry` and exposes print columns for +URL, Transport, Group, Ready status, and Age. + +**Spec fields:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `remoteURL` | string | Yes | URL of the remote MCP server. Must match `^https?://`. HTTPS enforced unless `toolhive.stacklok.dev/allow-insecure` annotation is set. | +| `transport` | enum | Yes | MCP transport protocol: `streamable-http` or `sse`. | +| `groupRef` | string | Yes | Name of the MCPGroup this entry belongs to (min length: 1). | +| `externalAuthConfigRef` | object | No | Reference to an MCPExternalAuthConfig in the same namespace. Omit for unauthenticated endpoints. | +| `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type from MCPRemoteProxy. | +| `caBundleRef` | object | No | Reference to a Secret containing a custom CA certificate bundle for TLS verification. | + +**Status fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `conditions` | []Condition | Standard Kubernetes conditions (see table below). | +| `observedGeneration` | int64 | Most recent generation observed by the controller. | **Condition types:** @@ -437,99 +390,25 @@ status: Validation-only controller The MCPServerEntry controller is intentionally simple. It performs -**validation only** and creates **no infrastructure**: - -```go -func (r *MCPServerEntryReconciler) Reconcile( - ctx context.Context, req ctrl.Request, -) (ctrl.Result, error) { - var entry mcpv1alpha1.MCPServerEntry - if err := r.Get(ctx, req.NamespacedName, &entry); err != nil { - return ctrl.Result{}, client.IgnoreNotFound(err) - } - - statusManager := NewStatusManager(&entry) - - // Validate groupRef exists - var group mcpv1alpha1.MCPGroup - if err := r.Get(ctx, client.ObjectKey{ - Namespace: entry.Namespace, - Name: entry.Spec.GroupRef, - }, &group); err != nil { - if apierrors.IsNotFound(err) { - statusManager.SetCondition("GroupRefValid", "GroupNotFound", - fmt.Sprintf("MCPGroup %q not found", entry.Spec.GroupRef), - metav1.ConditionFalse) - statusManager.SetCondition("Ready", "ValidationFailed", - "Referenced MCPGroup does not exist", metav1.ConditionFalse) - return r.updateStatus(ctx, &entry, statusManager) - } - return ctrl.Result{}, err - } - statusManager.SetCondition("GroupRefValid", "GroupExists", - fmt.Sprintf("MCPGroup %q exists", entry.Spec.GroupRef), - metav1.ConditionTrue) - - // Validate externalAuthConfigRef if set - if entry.Spec.ExternalAuthConfigRef != nil { - var authConfig mcpv1alpha1.MCPExternalAuthConfig - if err := r.Get(ctx, client.ObjectKey{ - Namespace: entry.Namespace, - Name: entry.Spec.ExternalAuthConfigRef.Name, - }, &authConfig); err != nil { - if apierrors.IsNotFound(err) { - statusManager.SetCondition("AuthConfigValid", - "AuthConfigNotFound", - fmt.Sprintf("MCPExternalAuthConfig %q not found", - entry.Spec.ExternalAuthConfigRef.Name), - metav1.ConditionFalse) - statusManager.SetCondition("Ready", "ValidationFailed", - "Referenced auth config does not exist", - metav1.ConditionFalse) - return r.updateStatus(ctx, &entry, statusManager) - } - return ctrl.Result{}, err - } - statusManager.SetCondition("AuthConfigValid", "AuthConfigExists", - fmt.Sprintf("MCPExternalAuthConfig %q exists", - entry.Spec.ExternalAuthConfigRef.Name), - metav1.ConditionTrue) - } - - // Validate HTTPS requirement - if !strings.HasPrefix(entry.Spec.RemoteURL, "https://") { - if entry.Annotations["toolhive.stacklok.dev/allow-insecure"] != "true" { - statusManager.SetCondition("Ready", "InsecureURL", - "remoteURL must use HTTPS (set annotation "+ - "toolhive.stacklok.dev/allow-insecure=true to override)", - metav1.ConditionFalse) - return r.updateStatus(ctx, &entry, statusManager) - } - } - - statusManager.SetCondition("Ready", "ValidationSucceeded", - "MCPServerEntry is valid and ready for discovery", - metav1.ConditionTrue) - - return r.updateStatus(ctx, &entry, statusManager) -} - -func (r *MCPServerEntryReconciler) SetupWithManager( - mgr ctrl.Manager, -) error { - return ctrl.NewControllerManagedBy(mgr). - For(&mcpv1alpha1.MCPServerEntry{}). - Watches(&mcpv1alpha1.MCPGroup{}, - handler.EnqueueRequestsFromMapFunc( - r.findEntriesForGroup, - )). - Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, - handler.EnqueueRequestsFromMapFunc( - r.findEntriesForAuthConfig, - )). - Complete(r) -} -``` +**validation only** and creates **no infrastructure**. + +The reconciliation logic: + +1. Fetches the MCPServerEntry resource (ignores not-found for deletions). +2. Validates that the referenced MCPGroup exists in the same namespace. + Sets `GroupRefValid` condition accordingly. +3. If `externalAuthConfigRef` is set, validates that the referenced + MCPExternalAuthConfig exists. Sets `AuthConfigValid` condition. +4. Validates the HTTPS requirement: if `remoteURL` does not use HTTPS, + the controller checks for the `toolhive.stacklok.dev/allow-insecure` + annotation. Without it, the `Ready` condition is set to false with + reason `InsecureURL`. +5. If all validations pass, sets `Ready` to true with reason + `ValidationSucceeded`. + +The controller watches MCPGroup and MCPExternalAuthConfig resources via +`EnqueueRequestsFromMapFunc` handlers, so that changes to referenced +resources trigger re-validation of affected MCPServerEntry resources. No finalizers are needed because MCPServerEntry creates no infrastructure to clean up. @@ -589,211 +468,62 @@ backends: ##### vMCP: Backend Type and Discovery -**New backend type:** +A new `BackendTypeEntry` constant (`"entry"`) is added to +`pkg/vmcp/types.go` alongside the existing `BackendTypeContainer` and +`BackendTypeProxy`. -```go -// In pkg/vmcp/types.go -const ( - BackendTypeContainer BackendType = "container" - BackendTypeProxy BackendType = "proxy" - BackendTypeEntry BackendType = "entry" // New -) -``` +The `ListWorkloadsInGroup()` function in `pkg/vmcp/workloads/k8s.go` is +extended to discover MCPServerEntry resources in addition to MCPServer +resources. For each MCPServerEntry in the group, vMCP: -**Discovery updates:** - -```go -// In pkg/vmcp/workloads/k8s.go -func (m *K8sWorkloadManager) ListWorkloadsInGroup( - ctx context.Context, groupName string, -) ([]Backend, error) { - var backends []Backend - - // Existing: discover MCPServer resources - mcpServers, err := m.discoverMCPServers(ctx, groupName) - if err != nil { - return nil, fmt.Errorf("discovering MCPServers: %w", err) - } - backends = append(backends, mcpServers...) - - // New: discover MCPServerEntry resources - entries, err := m.discoverMCPServerEntries(ctx, groupName) - if err != nil { - return nil, fmt.Errorf("discovering MCPServerEntries: %w", err) - } - backends = append(backends, entries...) - - return backends, nil -} - -func (m *K8sWorkloadManager) discoverMCPServerEntries( - ctx context.Context, groupName string, -) ([]Backend, error) { - var entryList mcpv1alpha1.MCPServerEntryList - if err := m.client.List(ctx, &entryList, - client.InNamespace(m.namespace), - client.MatchingFields{"spec.groupRef": groupName}, - ); err != nil { - return nil, err - } - - var backends []Backend - for _, entry := range entryList.Items { - backend := Backend{ - ID: fmt.Sprintf("%s/%s", entry.Namespace, entry.Name), - Name: entry.Name, - BaseURL: entry.Spec.RemoteURL, - Transport: entry.Spec.Transport, - Type: BackendTypeEntry, - } - - // Resolve auth if configured - if entry.Spec.ExternalAuthConfigRef != nil { - authConfig, err := m.resolveAuthConfig(ctx, - entry.Namespace, - entry.Spec.ExternalAuthConfigRef.Name, - ) - if err != nil { - return nil, fmt.Errorf( - "resolving auth for entry %s: %w", - entry.Name, err, - ) - } - backend.AuthConfig = authConfig - } - - // Resolve header forward config if set - if entry.Spec.HeaderForward != nil { - backend.HeaderForward = m.resolveHeaderForward( - ctx, entry.Namespace, entry.Spec.HeaderForward, - ) - } - - // Resolve CA bundle if set - if entry.Spec.CABundleRef != nil { - caBundle, err := m.resolveCABundle(ctx, - entry.Namespace, entry.Spec.CABundleRef, - ) - if err != nil { - return nil, fmt.Errorf( - "resolving CA bundle for entry %s: %w", - entry.Name, err, - ) - } - backend.CABundle = caBundle - } - - backends = append(backends, backend) - } - - return backends, nil -} -``` +1. Lists MCPServerEntry resources filtered by `spec.groupRef`. +2. Converts each entry to an internal `Backend` struct using the entry's + `remoteURL`, `transport`, and name. +3. Resolves `externalAuthConfigRef` if set (using existing auth resolution + logic). +4. Resolves `headerForward` configuration if set. +5. Resolves `caBundleRef` if set (fetching the CA certificate from the + referenced Secret). +6. Appends the resulting backends alongside MCPServer-sourced backends. ##### vMCP: HTTP Client for External TLS Backends of type `entry` connect to external URLs over HTTPS. The vMCP -HTTP client must be updated to: +HTTP client in `pkg/vmcp/client/client.go` must be updated to: 1. Use the system CA certificate pool by default (for public CAs). 2. Optionally append a custom CA bundle from `caBundleRef` (for private - CAs). -3. Apply the resolved `externalAuthConfigRef` credentials directly to + CAs) to the system pool. +3. Enforce a minimum TLS version of 1.2. +4. Apply the resolved `externalAuthConfigRef` credentials directly to outgoing requests. -```go -// In pkg/vmcp/client/client.go -func (c *Client) createTransportForEntry( - backend *Backend, -) (*http.Transport, error) { - tlsConfig := &tls.Config{ - MinVersion: tls.VersionTLS12, - } - - if backend.CABundle != nil { - pool, err := x509.SystemCertPool() - if err != nil { - pool = x509.NewCertPool() - } - if !pool.AppendCertsFromPEM(backend.CABundle) { - return nil, fmt.Errorf("failed to parse CA bundle") - } - tlsConfig.RootCAs = pool - } - - return &http.Transport{ - TLSClientConfig: tlsConfig, - }, nil -} -``` - ##### vMCP: Dynamic Mode Reconciler Update For dynamic mode (`outgoingAuth.source: discovered`), the reconciler infrastructure from THV-0014 must be extended to watch MCPServerEntry resources. +**New files:** +- `pkg/vmcp/k8s/mcpserverentry_watcher.go` - MCPServerEntry reconciler + **Files to modify:** - `pkg/vmcp/k8s/manager.go` - Register MCPServerEntry watcher -- `pkg/vmcp/k8s/mcpserverentry_watcher.go` (new) - MCPServerEntry - reconciler - -```go -type MCPServerEntryWatcher struct { - client client.Client - registry vmcp.DynamicRegistry - groupRef string -} - -func (w *MCPServerEntryWatcher) Reconcile( - ctx context.Context, req ctrl.Request, -) (ctrl.Result, error) { - backendID := req.NamespacedName.String() - - var entry mcpv1alpha1.MCPServerEntry - if err := w.client.Get(ctx, req.NamespacedName, &entry); err != nil { - if apierrors.IsNotFound(err) { - w.registry.Remove(backendID) - return ctrl.Result{}, nil - } - return ctrl.Result{}, err - } - - if entry.Spec.GroupRef != w.groupRef { - // Not in our group, remove if previously tracked - w.registry.Remove(backendID) - return ctrl.Result{}, nil - } - - backend, err := w.convertToBackend(ctx, &entry) - if err != nil { - return ctrl.Result{}, err - } - backend.ID = backendID - - if err := w.registry.Upsert(backend); err != nil { - return ctrl.Result{}, err - } - - return ctrl.Result{}, nil -} - -func (w *MCPServerEntryWatcher) SetupWithManager( - mgr ctrl.Manager, -) error { - return ctrl.NewControllerManagedBy(mgr). - For(&mcpv1alpha1.MCPServerEntry{}). - Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, - handler.EnqueueRequestsFromMapFunc( - w.findEntriesForAuthConfig, - )). - Watches(&corev1.Secret{}, - handler.EnqueueRequestsFromMapFunc( - w.findEntriesForSecret, - )). - Complete(w) -} -``` + +The `MCPServerEntryWatcher` follows the same reconciler pattern as the +existing `MCPServerWatcher` from THV-0014. It holds a reference to the +`DynamicRegistry` and the target `groupRef`. On reconciliation: + +1. If the resource is deleted (not found), it removes the backend from the + registry by namespaced name. +2. If the entry's `groupRef` doesn't match the watcher's group, it removes + the backend (handles group reassignment). +3. Otherwise, it converts the MCPServerEntry to a `Backend` struct + (resolving auth, headers, CA bundle) and upserts it into the registry. + +The watcher also watches MCPExternalAuthConfig and Secret resources via +`EnqueueRequestsFromMapFunc` handlers, so changes to referenced auth +configs or secrets trigger re-reconciliation of affected entries. ##### vMCP: Static Config Parser Update From 0952cb2c8c2612bc4522653b056c8248f5a192e1 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:11:56 +0200 Subject: [PATCH 4/9] Remove file path lists from component changes section Implementation details like specific file paths belong in the implementation, not the RFC design document. Co-Authored-By: Claude Opus 4.6 --- ...5-mcpserverentry-direct-remote-backends.md | 38 +++---------------- 1 file changed, 5 insertions(+), 33 deletions(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 93e7dbc..604f9da 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -383,12 +383,6 @@ status: ##### Operator: New CRD and Controller -**New files:** -- `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` - CRD type - definitions -- `cmd/thv-operator/controllers/mcpserverentry_controller.go` - - Validation-only controller - The MCPServerEntry controller is intentionally simple. It performs **validation only** and creates **no infrastructure**. @@ -419,10 +413,6 @@ The MCPGroup controller must be updated to watch MCPServerEntry resources in addition to MCPServer resources, so that `status.servers` and `status.serverCount` reflect both types of backends in the group. -**Files to modify:** -- `cmd/thv-operator/controllers/mcpgroup_controller.go` - Add watch for - MCPServerEntry, update status aggregation - ##### Operator: VirtualMCPServer Controller Update **Static mode (`outgoingAuth.source: inline`):** The operator generates @@ -460,12 +450,6 @@ backends: # ... ``` -**Files to modify:** -- `cmd/thv-operator/controllers/virtualmcpserver_controller.go` - Discover - MCPServerEntry resources in group -- `cmd/thv-operator/controllers/virtualmcpserver_vmcpconfig.go` - Include - entry backends in ConfigMap generation - ##### vMCP: Backend Type and Discovery A new `BackendTypeEntry` constant (`"entry"`) is added to @@ -504,12 +488,6 @@ For dynamic mode (`outgoingAuth.source: discovered`), the reconciler infrastructure from THV-0014 must be extended to watch MCPServerEntry resources. -**New files:** -- `pkg/vmcp/k8s/mcpserverentry_watcher.go` - MCPServerEntry reconciler - -**Files to modify:** -- `pkg/vmcp/k8s/manager.go` - Register MCPServerEntry watcher - The `MCPServerEntryWatcher` follows the same reconciler pattern as the existing `MCPServerWatcher` from THV-0014. It holds a reference to the `DynamicRegistry` and the target `groupRef`. On reconciliation: @@ -531,9 +509,6 @@ The static config parser must be updated to deserialize `type: entry` backends from the ConfigMap and create appropriate HTTP clients with external TLS support. -**Files to modify:** -- `pkg/vmcp/config/` - Parse entry-type backends from static config - ## Security Considerations ### Threat Model @@ -738,12 +713,9 @@ MCPServerEntry is a purely additive change: ### Phase 1: CRD and Controller -1. Define `MCPServerEntry` types in - `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` -2. Implement validation-only controller in - `cmd/thv-operator/controllers/mcpserverentry_controller.go` -3. Generate CRD manifests (`task operator-generate`, - `task operator-manifests`) +1. Define `MCPServerEntry` CRD types +2. Implement validation-only controller +3. Generate CRD manifests 4. Update MCPGroup controller to watch MCPServerEntry resources 5. Add unit tests for controller validation logic @@ -759,9 +731,9 @@ MCPServerEntry is a purely additive change: ### Phase 3: Dynamic Mode Integration -1. Create `MCPServerEntryWatcher` reconciler in `pkg/vmcp/k8s/` +1. Create MCPServerEntry reconciler for vMCP's dynamic registry 2. Register watcher in the K8s manager alongside MCPServerWatcher -3. Update `ListWorkloadsInGroup()` to include MCPServerEntry +3. Update workload discovery to include MCPServerEntry 4. Resolve auth configs for entry backends at runtime 5. Integration tests for dynamic discovery of entry backends From b70b8c958b824c678d2376bd156603dafdefe8f1 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 12:38:22 +0200 Subject: [PATCH 5/9] Address review feedback on MCPServerEntry RFC - Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy - Fix Alt 1 YAML example to use string form for groupRef - Change caBundleRef to reference ConfigMap (CA certs are public data) - Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate - Clarify auth resolution loads config only, token exchange deferred to request time - Specify CA bundle volume mount for static mode (PEM files, not env vars) - Document toolConfigRef migration path via aggregation.tools[].workload Co-Authored-By: Claude Opus 4.6 --- ...5-mcpserverentry-direct-remote-backends.md | 67 +++++++++++++------ 1 file changed, 48 insertions(+), 19 deletions(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 604f9da..511f799 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -236,7 +236,8 @@ spec: key: api-key # OPTIONAL: Custom CA bundle for private remote servers using - # internal/self-signed certificates. + # internal/self-signed certificates. References a ConfigMap (not Secret) + # because CA certificates are public data. caBundleRef: name: internal-ca-bundle key: ca.crt @@ -323,10 +324,10 @@ URL, Transport, Group, Ready status, and Age. |-------|------|----------|-------------| | `remoteURL` | string | Yes | URL of the remote MCP server. Must match `^https?://`. HTTPS enforced unless `toolhive.stacklok.dev/allow-insecure` annotation is set. | | `transport` | enum | Yes | MCP transport protocol: `streamable-http` or `sse`. | -| `groupRef` | string | Yes | Name of the MCPGroup this entry belongs to (min length: 1). | +| `groupRef` | string | Yes | Name of the MCPGroup this entry belongs to (min length: 1). Uses a plain string (not `LocalObjectReference`) for consistency with `MCPServer.spec.groupRef` and `MCPRemoteProxy.spec.groupRef`. | | `externalAuthConfigRef` | object | No | Reference to an MCPExternalAuthConfig in the same namespace. Omit for unauthenticated endpoints. | | `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type from MCPRemoteProxy. | -| `caBundleRef` | object | No | Reference to a Secret containing a custom CA certificate bundle for TLS verification. | +| `caBundleRef` | object | No | Reference to a ConfigMap containing a custom CA certificate bundle for TLS verification. ConfigMap is used rather than Secret because CA certificates are public data, consistent with the `kube-root-ca.crt` pattern. | **Status fields:** @@ -463,8 +464,11 @@ resources. For each MCPServerEntry in the group, vMCP: 1. Lists MCPServerEntry resources filtered by `spec.groupRef`. 2. Converts each entry to an internal `Backend` struct using the entry's `remoteURL`, `transport`, and name. -3. Resolves `externalAuthConfigRef` if set (using existing auth resolution - logic). +3. If `externalAuthConfigRef` is set, loads the referenced + MCPExternalAuthConfig spec and stores the auth strategy (token exchange + endpoint, client credentials reference, audience) in the `Backend` + struct. Actual token exchange is deferred to request time because + tokens are short-lived and may be per-user. 4. Resolves `headerForward` configuration if set. 5. Resolves `caBundleRef` if set (fetching the CA certificate from the referenced Secret). @@ -517,7 +521,7 @@ external TLS support. |--------|-------------|------------| | Man-in-the-middle on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | | Credential exposure in CRD spec | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecrets`; never inline in CRD spec | -| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress | +| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress. Note: CEL-based IP range blocking (e.g., RFC 1918) is intentionally not applied because MCPServerEntry legitimately targets internal/corporate MCP servers. RBAC is the appropriate control layer since resource creation is restricted to trusted operators. | | Auth config confusion (existing issue) | Dual-boundary auth leading to wrong tokens sent to wrong endpoints | Eliminated: MCPServerEntry has exactly one auth boundary with one purpose | | Operator probing external URLs | Controller making network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | @@ -543,7 +547,8 @@ external TLS support. - **At rest**: No sensitive data stored in MCPServerEntry spec. Auth credentials are in K8s Secrets, referenced indirectly. - **CA bundles**: Custom CA certificates referenced via `caBundleRef`, - stored in K8s Secrets/ConfigMaps with standard K8s encryption at rest. + stored in K8s ConfigMaps. CA certificates are public data and do not + require Secret-level protection. ### Input Validation @@ -563,6 +568,17 @@ external TLS support. - **Dynamic mode**: vMCP reads secrets at runtime via K8s API (namespace-scoped RBAC). - **Static mode**: Operator mounts secrets as environment variables. +- **CA bundle propagation** differs from credential secrets because CA + certificates are multi-line PEM data that must be loaded from the + filesystem (Go's `crypto/tls` loads CA bundles via file reads, not + environment variables): + - **Dynamic mode**: vMCP reads the CA bundle data from the K8s API + at runtime (from the ConfigMap referenced by `caBundleRef`). + - **Static mode**: The operator mounts the ConfigMap referenced by + `caBundleRef` as a **volume** into the vMCP pod at a well-known + path (e.g., `/etc/toolhive/ca-bundles//ca.crt`). The + generated backend ConfigMap includes the mount path so vMCP can + construct the `tls.Config` at startup. - Secret rotation follows existing patterns: - **Dynamic mode**: Watch-based propagation, no pod restart needed. - **Static mode**: Requires pod restart (Deployment rollout). @@ -596,8 +612,7 @@ Embed remote server configuration directly in the VirtualMCPServer CRD. ```yaml kind: VirtualMCPServer spec: - groupRef: - name: engineering-team + groupRef: engineering-team remoteServerRefs: - name: context7 remoteURL: https://mcp.context7.com/mcp @@ -724,10 +739,14 @@ MCPServerEntry is a purely additive change: 1. Update VirtualMCPServer controller to discover MCPServerEntry resources in the group 2. Update ConfigMap generation to include entry-type backends -3. Update vMCP static config parser to deserialize entry backends -4. Add `BackendTypeEntry` to vMCP types -5. Implement external TLS transport creation for entry backends -6. Integration tests with envtest +3. Mount CA bundle ConfigMaps as volumes into the vMCP pod for entries + that specify `caBundleRef` (at a well-known path such as + `/etc/toolhive/ca-bundles//`) +4. Update vMCP static config parser to deserialize entry backends +5. Add `BackendTypeEntry` to vMCP types +6. Implement external TLS transport creation for entry backends + (loading CA bundles from mounted volume paths) +7. Integration tests with envtest ### Phase 3: Dynamic Mode Integration @@ -819,8 +838,10 @@ MCPServerEntry is a purely additive change: allowing local development workflows. 2. **Should the CRD support custom CA bundles for private remote servers?** - Recommendation: Yes, via `caBundleRef` field referencing a Secret or - ConfigMap. This is essential for enterprises with internal CAs. The + Recommendation: Yes, via `caBundleRef` field referencing a ConfigMap. + CA certificates are public data and ConfigMap is the semantically + appropriate resource type, consistent with the `kube-root-ca.crt` + pattern. This is essential for enterprises with internal CAs. The current design includes this field. 3. **Should there be a `disabled` field for temporarily removing an entry @@ -832,10 +853,18 @@ MCPServerEntry is a purely additive change: 4. **Should MCPServerEntry support `toolConfigRef` for tool filtering?** MCPRemoteProxy supports tool filtering via `toolConfigRef`. VirtualMCPServer also has its own tool filtering/override configuration - in `spec.aggregation.tools`. For MCPServerEntry, tool filtering should - be configured at the VirtualMCPServer level (where it already exists) - rather than duplicating it on the entry. Defer unless there is a clear - use case for entry-level filtering. + in `spec.aggregation.tools`, which supports per-backend filtering via + the `workload` field (e.g., `tools: [{workload: "salesforce", filter: [...]}]`). + For MCPServerEntry, tool filtering should be configured at the + VirtualMCPServer level rather than duplicating it on the entry. + **Migration note:** Users migrating from MCPRemoteProxy who rely on + `toolConfigRef` for per-backend tool filtering should configure + equivalent filtering in `VirtualMCPServer.spec.aggregation.tools` + with the `workload` field set to the MCPServerEntry name. If + post-implementation feedback reveals that `aggregation.tools` is + insufficient for per-backend filtering use cases, `toolConfigRef` + can be added to MCPServerEntry in a follow-up without breaking + changes. ## References From 397ccb48393fb8c90cc7c7e1b27cf7e07cb96640 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Tue, 7 Apr 2026 14:16:29 +0300 Subject: [PATCH 6/9] Add planned deprecation notice in favor of MCPRemoteEndpoint MCPServerEntry ships now to unblock near-term use cases. It will be superseded by MCPRemoteEndpoint, a unified CRD that combines direct and proxy remote connectivity under a single resource. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...0055-mcpserverentry-direct-remote-backends.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 511f799..fc1b014 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -6,6 +6,7 @@ - **Last Updated**: 2026-03-12 - **Target Repository**: toolhive - **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) +- **Planned Deprecation**: This CRD will be superseded by `MCPRemoteEndpoint` (see RFC-XXXX) which unifies direct and proxy remote connectivity into a single resource. MCPServerEntry ships now to unblock near-term use cases; migration guidance will accompany the MCPRemoteEndpoint RFC. ## Summary @@ -74,10 +75,11 @@ infrastructure cost and operational overhead. ## Non-Goals -- **Deprecating MCPRemoteProxy**: MCPRemoteProxy remains valuable for +- **Replacing MCPRemoteProxy now**: MCPRemoteProxy remains valuable for standalone proxy use cases with its own auth middleware, audit logging, and observability. MCPServerEntry is specifically for "behind vMCP" use - cases. + cases. A future `MCPRemoteEndpoint` CRD will unify both direct and + proxy modes under a single resource. - **Adding health probing from the operator**: The operator controller should NOT probe remote URLs. Reachability from the operator pod does not imply reachability from the vMCP pod, and probing expands the operator's @@ -721,8 +723,14 @@ MCPServerEntry is a purely additive change: breaking changes. - **API versioning**: Starts at `v1alpha1`, consistent with all other ToolHive CRDs. -- **Future deprecation path**: If MCPRemoteProxy use cases are eventually - subsumed, MCPServerEntry provides a clean migration target. +- **Planned supersession by MCPRemoteEndpoint**: MCPServerEntry will be + superseded by `MCPRemoteEndpoint`, a unified CRD that combines direct + connectivity (equivalent to MCPServerEntry) and proxy connectivity + (replacing MCPRemoteProxy) under a single `type` discriminator field. + MCPServerEntry ships now to unblock immediate use cases. When + MCPRemoteEndpoint reaches GA, MCPServerEntry will enter a deprecation + window with migration tooling. See the MCPRemoteEndpoint RFC for the + full design and migration plan. ## Implementation Plan From b285f34b14b8eb18e5fb19ffa1e02368ee94bbcd Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Tue, 7 Apr 2026 14:21:39 +0300 Subject: [PATCH 7/9] Update deprecation notice to reference THV-0067 Co-Authored-By: Claude Opus 4.6 (1M context) --- ...5-mcpserverentry-direct-remote-backends.md | 2 +- ...premoteendpoint-unified-remote-backends.md | 1022 +++++++++++++++++ 2 files changed, 1023 insertions(+), 1 deletion(-) create mode 100644 rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index fc1b014..01aa7b5 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -6,7 +6,7 @@ - **Last Updated**: 2026-03-12 - **Target Repository**: toolhive - **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) -- **Planned Deprecation**: This CRD will be superseded by `MCPRemoteEndpoint` (see RFC-XXXX) which unifies direct and proxy remote connectivity into a single resource. MCPServerEntry ships now to unblock near-term use cases; migration guidance will accompany the MCPRemoteEndpoint RFC. +- **Planned Deprecation**: This CRD will be superseded by `MCPRemoteEndpoint` (see [THV-0067](./THV-0067-mcpremoteendpoint-unified-remote-backends.md)) which unifies direct and proxy remote connectivity into a single resource. MCPServerEntry ships now to unblock near-term use cases; migration guidance will accompany the MCPRemoteEndpoint RFC. ## Summary diff --git a/rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md b/rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md new file mode 100644 index 0000000..8aab210 --- /dev/null +++ b/rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md @@ -0,0 +1,1022 @@ +# THV-XXXX: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity + +- **Status**: Draft +- **Author(s)**: @ChrisJBurns, @jaosorior +- **Created**: 2026-03-18 +- **Last Updated**: 2026-04-07 +- **Target Repository**: toolhive +- **Supersedes**: [THV-0055](./THV-0055-mcpserverentry-direct-remote-backends.md) (MCPServerEntry CRD). MCPServerEntry ships first as a near-term solution; this RFC defines its long-term replacement. +- **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) + +## Summary + +Introduce a new `MCPRemoteEndpoint` CRD that unifies remote MCP server +connectivity under a single resource with two explicit modes: + +- **`type: proxy`** — deploys a proxy pod with full auth middleware, authz + policy, and audit logging. Functionally equivalent to `MCPRemoteProxy` and + replaces it. +- **`type: direct`** — no pod deployed; VirtualMCPServer connects directly to + the remote URL. Resolves forced-auth on public remotes + ([#3104](https://github.com/stacklok/toolhive/issues/3104)) and eliminates + unnecessary infrastructure for simple remote backends. + +`MCPRemoteProxy` is deprecated in favour of `MCPRemoteEndpoint` with +`type: proxy`. Existing `MCPRemoteProxy` resources continue to function during +the deprecation window with no immediate migration required. + +## Problem Statement + +### 1. Forced Authentication on Public Remotes (Issue #3104) + +`MCPRemoteProxy` requires OIDC authentication configuration even when +VirtualMCPServer already handles client authentication at its own boundary. +This blocks unauthenticated public remote MCP servers (e.g., context7, public +API gateways) from being placed behind vMCP without configuring unnecessary +auth on the proxy layer. + +### 2. Resource Waste + +Every remote MCP server behind vMCP requires a full Deployment + Service + Pod +just to forward HTTP requests that vMCP could make directly. For organisations +with many remote MCP backends, this creates unnecessary infrastructure cost and +operational overhead. + +### 3. CRD Proliferation and Overlapping Goals + +The original THV-0055 proposed `MCPServerEntry` as a companion resource to +`MCPRemoteProxy`. Both resources would have existed to serve the same +high-level user goal: connecting to a remote MCP server. Both reference a +`remoteURL`, join a `groupRef`, and support `externalAuthConfigRef`. The only +difference is whether a proxy pod is deployed. + +Having two separate CRDs for the same goal — differing only in their mechanism +— increases the API surface users must learn and makes the right choice +non-obvious before writing any YAML. The goal (`connect to a remote server`) +should be the abstraction; the mechanism (`via a proxy pod` vs `directly`) +should be a configuration choice within it. + +### Who Is Affected + +- **Platform teams** deploying vMCP with remote MCP backends in Kubernetes +- **Product teams** wanting to register external MCP services behind vMCP +- **Existing `MCPRemoteProxy` users** who will migrate to + `MCPRemoteEndpoint` with `type: proxy` + +## Goals + +- Provide a single, purpose-built CRD for all remote MCP server connectivity +- Enable vMCP to connect directly to remote MCP servers without a proxy pod + for simple use cases +- Allow unauthenticated remote MCP servers behind vMCP without workarounds +- Retain the full feature set of `MCPRemoteProxy` (auth middleware, authz, + audit logging) under `type: proxy` +- Deprecate `MCPRemoteProxy` with a clear migration path +- Reduce long-term CRD surface area rather than growing it + +## Non-Goals + +- **Removing `MCPRemoteProxy` immediately**: It remains functional during the + deprecation window. Removal is a follow-up once adoption of + `MCPRemoteEndpoint` is confirmed. +- **Adding health probing from the operator**: The controller should NOT probe + remote URLs. Health checking belongs in vMCP's existing runtime + infrastructure (`healthCheckInterval`, circuit breaker). +- **Cross-namespace references**: `MCPRemoteEndpoint` follows the same + namespace-scoped patterns as other ToolHive CRDs. +- **Supporting stdio or container-based transports**: `MCPRemoteEndpoint` is + exclusively for remote HTTP-based MCP servers. +- **CLI mode support**: `MCPRemoteEndpoint` is a Kubernetes-only CRD. +- **Multi-replica vMCP with `type: direct`**: Session state is in-process only. + See [Session Constraints](#session-constraints-in-direct-mode). + +## Mode Selection Guide + +| Scenario | Recommended Mode | Why | +|---|---|---| +| Public, unauthenticated remote (e.g., context7) | `direct` | No auth middleware needed; no pod required | +| Remote with outgoing auth handled by vMCP (token exchange, header injection, etc.) | `direct` | vMCP applies outgoing auth directly; one fewer hop | +| Remote requiring its own OIDC validation boundary | `proxy` | Proxy pod validates tokens independently | +| Remote requiring Cedar authz policies per-endpoint | `proxy` | Authz policies run in the proxy pod | +| Remote needing audit logging at the endpoint level | `proxy` | Proxy pod has its own audit middleware | +| Standalone use without VirtualMCPServer | `proxy` | Direct mode requires vMCP to function | +| Many remotes where pod-per-remote is too costly | `direct` | No Deployment/Service/Pod per remote | + +**Rule of thumb:** Use `direct` for simple, public remotes or any remote +fronted by vMCP where vMCP handles outgoing auth. Use `proxy` when you need an +independent auth/authz/audit boundary per remote, or when the backend needs to +be accessible standalone. + +## Proposed Solution + +### High-Level Design + +`MCPRemoteEndpoint` is a single CRD with a `type` discriminator field. Shared +fields sit at the top level. Fields only applicable to the proxy pod are grouped +under `proxyConfig`. + +```mermaid +graph TB + subgraph "Client Layer" + Client[MCP Client] + end + + subgraph "Virtual MCP Server" + InAuth[Incoming Auth] + Router[Request Router] + AuthMgr[Backend Auth Manager] + end + + subgraph "MCPRemoteEndpoint: type=proxy" + ProxyPod[Proxy Pod
OIDC + Authz + Audit] + end + + subgraph "MCPRemoteEndpoint: type=direct" + DirectEntry[Config Only
No pods] + end + + subgraph "External Services" + Remote1[remote.example.com/mcp] + Remote2[public-api.example.com/mcp] + end + + Client -->|Token: aud=vmcp| InAuth + InAuth --> Router + Router --> AuthMgr + AuthMgr -->|Via proxy pod| ProxyPod + ProxyPod -->|Authenticated HTTPS| Remote1 + AuthMgr -->|Direct HTTPS| Remote2 + DirectEntry -.->|Declares endpoint| Remote2 + + style DirectEntry fill:#fff3e0,stroke:#ff9800 + style ProxyPod fill:#e3f2fd,stroke:#2196f3 +``` + +### Mode Comparison + +| Capability | `type: proxy` | `type: direct` | +|---|---|---| +| Deploys proxy pod | Yes | No | +| Own OIDC validation | Yes | No (vMCP handles this) | +| Own authz policy | Yes | No | +| Own audit logging | Yes (proxy-level) | No (vMCP's audit middleware; see [Audit Limitations](#audit-limitations-in-direct-mode)) | +| Standalone use (without vMCP) | Yes | No | +| Outgoing auth to remote | Yes (`externalAuthConfigRef`) | Yes (`externalAuthConfigRef`) | +| Header forwarding | Yes (`headerForward`) | Yes (`headerForward`) | +| Custom CA bundle | Yes (`caBundleRef`) | Yes (`caBundleRef`) | +| Tool filtering | Yes (`toolConfigRef`) | Yes (`toolConfigRef`) | +| GroupRef support | Yes | Yes | +| Multi-replica vMCP | Yes | No — see [Session Constraints](#session-constraints-in-direct-mode) | +| Credential blast radius | Isolated per proxy pod | All credentials in vMCP pod — see [Security Considerations](#security-considerations) | + +### Auth Flow Comparison + +**`type: proxy` — two independent auth legs:** + +```mermaid +sequenceDiagram + participant C as Client + participant V as vMCP + participant P as Proxy Pod + participant R as Remote Server + + C->>V: Request (aud=vmcp token) + V->>V: Validate incoming token + V->>P: Forward (externalAuthConfigRef credential) + P->>P: oidcConfig validates incoming request + P->>R: Forward (externalAuthConfigRef as outgoing middleware) + R-->>P: Response + P-->>V: Response + V-->>C: Response +``` + +`externalAuthConfigRef` on a `type: proxy` endpoint is read by two separate +consumers: + +1. **vMCP** reads it at backend discovery time (`discoverRemoteProxyAuthConfig()` + in `pkg/vmcp/workloads/k8s.go`). The resolved strategy is applied by vMCP's + `authRoundTripper` when making outgoing calls **to the proxy pod**. +2. **The proxy pod** reads the same field via the operator-generated RunConfig + (`AddExternalAuthConfigOptions()` in `mcpremoteproxy_runconfig.go`). The pod + applies it as outgoing middleware when forwarding requests **to the remote server**. + +In direct mode, only consumer 1 applies — there is no proxy pod. + +`proxyConfig.oidcConfig` is a third, separate concern — it validates tokens +arriving at the proxy pod from vMCP. It is entirely independent of +`externalAuthConfigRef`. + +**`type: direct` — single auth boundary:** + +```mermaid +sequenceDiagram + participant C as Client + participant V as vMCP + participant R as Remote Server + + C->>V: Request (aud=vmcp token) + V->>V: Validate incoming token + V->>V: Apply externalAuthConfigRef as outgoing auth + V->>R: Request (with outgoing credentials) + R-->>V: Response + V-->>C: Response +``` + +vMCP reads `externalAuthConfigRef` and applies it when calling the remote +server directly. For `type: tokenExchange`, the client's validated incoming +token is used as the RFC 8693 `subject_token` to obtain a service token for +the remote. The token exchange server must trust the IdP that issued the +client's token. + +**Token exchange operational requirements (`type: direct`):** +- The STS must be configured to accept subject tokens from vMCP's IdP. +- Configure `audience` in the `MCPExternalAuthConfig` to match the remote + server's expected audience claim. + +**Unsupported `externalAuthConfigRef` types for `type: direct`:** + +The following types are **not valid** when `type: direct`: + +- **`embeddedAuthServer`**: Requires a running pod to host the OAuth2 server. + No pod exists in direct mode. +- **`awsSts`**: No converter is registered in vMCP's DefaultRegistry + (`pkg/vmcp/auth/converters`). The registry only registers `tokenExchange`, + `headerInjection`, and `unauthenticated`. Using `awsSts` in direct mode will + cause backend discovery to fail at runtime. + +The controller MUST reject these combinations and set ConfigurationValid=False with reason UnsupportedAuthTypeForDirectMode. + +### Detailed Design + +#### CRD Validation Rules + +CEL `XValidation` rules in Kubebuilder are **struct-level** markers — placed on +the type being validated, not on a field within it. The pattern (from +`virtualmcpserver_types.go:88`): + +```go +// +kubebuilder:validation:XValidation:rule="...",message="..." +type StructName struct { ... } +``` + +The four rules for `MCPRemoteEndpoint`, placed on their correct owning types: + +```go +// MCPRemoteEndpointSpec struct-level rules: +// +// +kubebuilder:validation:XValidation:rule="self.type != 'direct' || !has(self.proxyConfig)",message="spec.proxyConfig must not be set when type is direct" +// +kubebuilder:validation:XValidation:rule="self.type != 'proxy' || has(self.proxyConfig)",message="spec.proxyConfig is required when type is proxy" +// +kubebuilder:validation:XValidation:rule="oldSelf == null || self.type == oldSelf.type",message="spec.type is immutable after creation" +// +//nolint:lll +type MCPRemoteEndpointSpec struct { ... } + +// MCPRemoteEndpointProxyConfig — oidcConfig uses standard required marker: +type MCPRemoteEndpointProxyConfig struct { + // +kubebuilder:validation:Required + OIDCConfig OIDCConfigRef `json:"oidcConfig"` + // ... +} +``` + +**Important:** The `oldSelf == null` guard is required so the immutability rule +passes on object creation (when no previous state exists). Without it, the rule +will panic or be silently skipped on create depending on Kubernetes version. + + +#### MCPRemoteEndpoint CRD + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: context7 + namespace: default +spec: + # REQUIRED: Connectivity mode — IMMUTABLE after creation. + # Delete and recreate to change type. + # +kubebuilder:validation:Enum=proxy;direct + # +kubebuilder:default=proxy + # (immutability enforced by struct-level CEL rule, not here) + type: direct + + # REQUIRED: URL of the remote MCP server. + # +kubebuilder:validation:Pattern=`^https?://` + remoteURL: https://mcp.context7.com/mcp + + # REQUIRED: Transport protocol. + # streamable-http is RECOMMENDED. sse is the legacy 2024-11-05 transport, + # retained for backwards compatibility with servers that have not yet migrated. + # +kubebuilder:validation:Enum=streamable-http;sse + transport: streamable-http + + # REQUIRED: Group membership. MCPRemoteEndpoint only functions as part of + # an MCPGroup (aggregated by VirtualMCPServer), so groupRef is always required. + groupRef: engineering-team + + # OPTIONAL: Auth for outgoing requests to the remote server. + # In proxy mode: vMCP reads this for vMCP->proxy auth AND the proxy pod + # reads it for proxy->remote auth (two separate consumers, same field). + # In direct mode: vMCP reads this for vMCP->remote auth only. + # Omit for unauthenticated public remotes. + # NOT valid in direct mode: embeddedAuthServer, awsSts (see Auth Flow section). + externalAuthConfigRef: + name: salesforce-auth + + # OPTIONAL: Header forwarding. Applies to both modes. + headerForward: + addPlaintextHeaders: + # WARNING: values stored in plaintext in etcd and visible via kubectl. + # Never put API keys, tokens, or secrets here. + # Use addHeadersFromSecret for sensitive values. + X-Tenant-ID: "tenant-123" + addHeadersFromSecret: + - headerName: X-API-Key + valueSecretRef: + name: remote-api-credentials + key: api-key + + # OPTIONAL: Custom CA bundle (ConfigMap) for private remote servers. + # NOTE: CA bundle ConfigMaps are trust anchors. Protect them with RBAC — + # anyone with ConfigMap write access in the namespace can inject a malicious + # CA and intercept TLS traffic to this backend. + caBundleRef: + name: internal-ca-bundle + key: ca.crt + + # OPTIONAL: Tool filtering. Applies to both modes. + toolConfigRef: + name: my-tool-config + + # OPTIONAL: Proxy pod configuration. + # REQUIRED when type: proxy. MUST NOT be set when type: direct. + # Validation is enforced by struct-level CEL rules on MCPRemoteEndpointSpec + # and MCPRemoteEndpointProxyConfig — not by field-level markers here. + proxyConfig: + oidcConfig: # REQUIRED within proxyConfig + type: kubernetes + authzConfig: + type: inline + inline: + policies: [...] + audit: + enabled: true + telemetry: + openTelemetry: + enabled: true + resources: + limits: + cpu: "500m" + memory: "128Mi" + serviceAccount: my-service-account + # +kubebuilder:default=8080 + proxyPort: 8080 + # +kubebuilder:validation:Enum=ClientIP;None + # +kubebuilder:default=ClientIP + # NOTE: ClientIP affinity is a rough approximation; Mcp-Session-Id + # header-based affinity is spec-correct but requires an ingress controller. + sessionAffinity: ClientIP + # +kubebuilder:default=false + trustProxyHeaders: false + endpointPrefix: "" + resourceOverrides: {} +``` + +**Example: Unauthenticated public remote (direct mode):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: context7 +spec: + type: direct + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + groupRef: engineering-team +``` + +**Example: Token exchange auth (direct mode):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: salesforce-mcp +spec: + type: direct + remoteURL: https://mcp.salesforce.com + transport: streamable-http + groupRef: engineering-team + externalAuthConfigRef: + name: salesforce-token-exchange # type: tokenExchange +``` + +**Example: Standalone proxy with auth middleware (proxy mode):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: internal-api-mcp +spec: + type: proxy + remoteURL: https://internal-mcp.corp.example.com/mcp + transport: streamable-http + groupRef: engineering-team + proxyConfig: + oidcConfig: + type: kubernetes + authzConfig: + type: inline + inline: + policies: ["permit(principal, action, resource);"] + audit: + enabled: true +``` + +#### CRD Metadata + +```go +// +kubebuilder:resource:shortName=mcpre +// +kubebuilder:printcolumn:name="Type",type="string",JSONPath=".spec.type" +// +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase" +// +kubebuilder:printcolumn:name="Remote URL",type="string",JSONPath=".spec.remoteURL" +// +kubebuilder:printcolumn:name="URL",type="string",JSONPath=".status.url" +// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" +``` + +Short name: `mcpre` (consistent with `mcpg` for MCPGroup, `vmcp` for +VirtualMCPServer, `extauth` for MCPExternalAuthConfig). + +#### Spec Fields + +**Top-level (both modes):** + +| Field | Type | Required | Description | +|---|---|---|---| +| `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. **Immutable after creation.** | +| `remoteURL` | string | Yes | URL of the remote MCP server. | +| `transport` | enum | Yes | `streamable-http` (recommended) or `sse` (legacy 2024-11-05 transport). | +| `groupRef` | string | Yes | Name of the MCPGroup. | +| `externalAuthConfigRef` | object | No | Outgoing auth config. In proxy mode: read by both vMCP (vMCP→proxy) and the proxy pod (proxy→remote). In direct mode: read by vMCP only (vMCP→remote). Types `embeddedAuthServer` and `awsSts` are invalid in direct mode. | +| `headerForward` | object | No | Header injection. `addPlaintextHeaders` values are stored in plaintext in etcd — use `addHeadersFromSecret` for secrets. | +| `caBundleRef` | object | No | ConfigMap containing a custom CA bundle. Protect with RBAC — write access enables MITM. | +| `toolConfigRef` | object | No | Tool filtering. | + +**`proxyConfig` (only when `type: proxy`):** + +| Field | Type | Required | Description | +|---|---|---|---| +| `oidcConfig` | object | Yes | Validates tokens arriving at the proxy pod. | +| `authzConfig` | object | No | Cedar authorization policy. | +| `audit` | object | No | Audit logging for the proxy pod. | +| `telemetry` | object | No | Observability configuration. | +| `resources` | object | No | Container resource limits. | +| `serviceAccount` | string | No | Existing SA to use; auto-created if unset. | +| `proxyPort` | int | No | Port to expose. Default: 8080. | +| `sessionAffinity` | enum | No | `ClientIP` (default) or `None`. | +| `trustProxyHeaders` | bool | No | Trust X-Forwarded-* headers. Default: false. | +| `endpointPrefix` | string | No | Path prefix for ingress routing. | +| `resourceOverrides` | object | No | Metadata overrides for created resources. | + +#### Status Fields + +| Field | Type | Description | +|---|---|---| +| `conditions` | []Condition | Standard Kubernetes conditions. | +| `phase` | string | `Pending`, `Ready`, `Failed`, `Terminating`. | +| `url` | string | For `type: proxy`: cluster-internal Service URL (set once Deployment is ready). For `type: direct`: set to `spec.remoteURL` immediately upon validation. | +| `observedGeneration` | int64 | Most recent generation reconciled. | + +**`status.url` lifecycle note:** For `type: proxy`, `status.url` is empty until +the proxy Deployment becomes ready. Backend discoverers (static and dynamic) +MUST treat an empty `status.url` as "backend not yet available" and skip the +backend — not remove it from the registry. For `type: direct`, `status.url` is +set immediately after validation, so this race does not apply. + +**Condition types:** + +| Type | Purpose | When Set | +|---|---|---| +| `Ready` | Overall readiness | Always | +| `GroupRefValid` | MCPGroup exists | Always | +| `AuthConfigValid` | MCPExternalAuthConfig exists | When `externalAuthConfigRef` is set | +| `CABundleValid` | CA bundle ConfigMap exists | When `caBundleRef` is set | +| `DeploymentReady` | Proxy deployment healthy | Only when `type: proxy` | +| `ConfigurationValid` | All validation checks passed | Always | + +No `RemoteReachable` condition — the controller never probes remote URLs. + +#### Component Changes + +##### Operator: MCPRemoteEndpoint Controller + +**Pre-requisite: extract shared proxy logic.** `mcpremoteproxy_controller.go` +is ~1,125 lines with all proxy reconciliation logic bound to +`*mcpv1alpha1.MCPRemoteProxy` methods. Before Phase 1, extract the +Deployment/Service/ServiceAccount/RBAC creation functions into a shared +`pkg/operator/remoteproxy/` package that accepts an interface rather than the +concrete type. `MCPRemoteProxyReconciler` is then refactored to use it. +This is a refactoring-only step with no API changes — all existing tests must +pass unchanged. This is scoped as Phase 0 step 4. + +**`type: proxy` path** — uses the extracted shared package: +1. Validates spec (OIDC config, group ref, auth config ref, CA bundle ref) +2. Ensures Deployment, Service, ServiceAccount, RBAC +3. Monitors deployment health, updates `Ready` condition +4. Sets `status.url` to the cluster-internal Service URL + +**`type: direct` path** — validation only, no infrastructure: +1. Validates MCPGroup exists; sets `GroupRefValid` +2. If `externalAuthConfigRef` set, validates it exists; sets `AuthConfigValid` +3. If `externalAuthConfigRef` type is `embeddedAuthServer` or `awsSts`, sets + `ConfigurationValid=False` with reason `UnsupportedAuthTypeForDirectMode` +4. If `caBundleRef` set, validates ConfigMap exists; sets `CABundleValid` +5. Sets `Ready=True` and `status.url = spec.remoteURL` + +No finalizers for `type: direct`. `type: proxy` uses the same finalizer pattern +as the existing MCPRemoteProxy controller. + +##### Operator: MCPGroup Controller Update + +The MCPGroup controller currently watches MCPServer and MCPRemoteProxy. It must +be updated to also watch MCPRemoteEndpoint. The following changes are required +(this is not a single bullet point): + +1. Register a field indexer for `MCPRemoteEndpoint.spec.groupRef` in + `SetupFieldIndexers()` at manager startup — without this, `MatchingFields` + queries for MCPRemoteEndpoint silently return empty results. +2. Add `findReferencingMCPRemoteEndpoints()` mirroring the existing + `findReferencingMCPRemoteProxies()`. +3. Add `findMCPGroupForMCPRemoteEndpoint()` watch mapper. +4. Register the watch in `SetupWithManager()` via + `Watches(&mcpv1alpha1.MCPRemoteEndpoint{}, ...)`. +5. Update `updateGroupMemberStatus()` to call the new function and populate + new status fields. +6. Update `handleListFailure()` and `handleDeletion()` for MCPRemoteEndpoint + membership. +7. Add RBAC markers — without these the operator gets a Forbidden error at + runtime: + ``` + // +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints,verbs=get;list;watch + // +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints/status,verbs=get;update;patch + ``` + +**Status fields (additive — no renames):** New fields `status.remoteEndpoints` +and `status.remoteEndpointCount` are added alongside the existing +`status.remoteProxies` and `status.remoteProxyCount`. Both are populated during +the deprecation window. Old fields are removed only when MCPRemoteProxy is +removed. This preserves backward compatibility for existing jsonpath queries +and monitoring dashboards. + +##### Operator: VirtualMCPServer Controller Update + +**`StaticBackendConfig` schema change required.** The current +`StaticBackendConfig` struct in `pkg/vmcp/config/config.go` has only `Name`, +`URL`, `Transport`, and `Metadata`. The vMCP binary uses `KnownFields(true)` +strict YAML parsing. Writing new fields (`Type`, `CABundlePath`, `Headers`) +to the ConfigMap before updating the vMCP binary will cause a startup failure. + +Implementation order: +1. Add `Type`, `CABundlePath`, and `HeaderEnvVars` fields to `StaticBackendConfig` +2. Update the vMCP binary and the roundtrip test in + `pkg/vmcp/config/crd_cli_roundtrip_test.go` +3. Deploy the updated vMCP image **before** the operator starts writing these + fields — co-ordinate Helm chart version bumping accordingly + +Additional touch points: +- `listMCPRemoteEndpointsAsMap()` — new function for ConfigMap generation +- `getExternalAuthConfigNameFromWorkload()` — add MCPRemoteEndpoint case +- Deployment volume mount logic for `caBundleRef` ConfigMaps + +**Header secret handling in static mode:** Secret values MUST NOT be inlined +into the backend ConfigMap. Instead, the operator uses the same `SecretKeyRef` +pattern that MCPRemoteProxy already uses: + +1. For each `type: direct` endpoint with `addHeadersFromSecret` entries, the + operator adds `SecretKeyRef` environment variables to the **vMCP Deployment** + (e.g. `TOOLHIVE_SECRET_HEADER_FORWARD_X_API_KEY_`). +2. The static backend ConfigMap stores only the env var names — never the + secret values themselves. +3. At runtime, vMCP resolves header values via the existing + `secrets.EnvironmentProvider`, identical to how MCPRemoteProxy pods handle + this today. + +This ensures no key material is written to ConfigMaps or stored in etcd in +plaintext. The trade-off is that adding or removing `addHeadersFromSecret` +entries on a direct endpoint triggers a vMCP Deployment update (and therefore +a pod restart), consistent with how CA bundle changes already behave in static +mode. + +**CA bundle in static mode:** The operator mounts the `caBundleRef` ConfigMap +as a volume into the vMCP pod at `/etc/toolhive/ca-bundles//ca.crt`. +The generated backend ConfigMap includes the mount path so vMCP can construct +the correct `tls.Config`. Pod restart is required when a CA bundle changes in +static mode. + +##### vMCP: Backend Discovery Update + +Add `WorkloadTypeMCPRemoteEndpoint` to `pkg/vmcp/workloads/discoverer.go`. + +Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in +`pkg/vmcp/workloads/k8s.go`. For MCPRemoteEndpoint: +- `type: proxy` — uses `status.url` (proxy Service URL), same as MCPRemoteProxy +- `type: direct` — uses `spec.remoteURL` directly + +**Name collision prevention:** The MCPRemoteEndpoint controller MUST reject +creation if an MCPServer or MCPRemoteProxy with the same name already exists in +the namespace, setting `ConfigurationValid=False` with reason +`NameCollision`. Likewise, the MCPServer and MCPRemoteProxy controllers MUST +be updated to reject collisions with MCPRemoteEndpoint. This prevents +surprising fallback behaviour where deleting one resource type silently +activates a different resource with the same name. + +`fetchBackendResource()` in `pkg/vmcp/k8s/backend_reconciler.go` retains its +existing resolution order (MCPServer → MCPRemoteProxy → MCPRemoteEndpoint) as +a defensive fallback, but the admission-time rejection above makes same-name +collisions a user error rather than an implicit resolution policy. + +##### vMCP: HTTP Client for Direct Mode + +For `type: direct` backends: +1. Use system CA pool by default; optionally append `caBundleRef` CA bundle +2. Enforce TLS 1.2 minimum +3. Apply `externalAuthConfigRef` credentials via `authRoundTripper` +4. Inject `MCP-Protocol-Version: ` on every HTTP request + after initialization — this is a MUST per MCP spec 2025-11-25 and applies + to both POST (tool calls) and GET (server notification stream) requests + +##### vMCP: Reconnection Handling for Direct Mode + +When a `type: direct` backend connection drops, vMCP follows this sequence +per MCP spec 2025-11-25: + +1. **Attempt stream resumption (SHOULD).** If the backend previously issued SSE + event IDs, vMCP SHOULD issue an HTTP GET with `Last-Event-ID` set to the + last received event ID before re-initializing. If the connection recovers + and the session remains valid, no re-initialization is needed. + +2. **Exponential backoff.** Initial: 1s, cap: 30s, jitter recommended. + If the backend sends a `retry` field in an SSE event, that value overrides + the local backoff for that attempt. + +3. **Full re-initialization on HTTP 404 or session loss.** If HTTP 404 is + returned on a request carrying an `MCP-Session-Id`, discard all session state + and execute the full handshake: + ``` + POST initialize request → InitializeResult (new MCP-Session-Id) + POST notifications/initialized + ``` + After initialization, re-discover ALL capabilities advertised in the new + `InitializeResult` (tools, resources, prompts as applicable). Results from + the prior session MUST NOT be reused. + +4. **Re-establish GET stream.** See section below. + +5. **Circuit breaker.** After 5 consecutive failed attempts, mark the backend + `unavailable` and open the circuit breaker. The resource transitions to the + `Failed` phase. A half-open probe at 60-second intervals tests recovery. + +##### vMCP: Server-Initiated Notifications in Direct Mode + +vMCP acts as an MCP **client** toward each `type: direct` backend and MUST +maintain a persistent HTTP GET SSE stream to each backend for server-initiated +messages. + +After initialization (after sending `notifications/initialized`), vMCP MUST +issue: +``` +GET +Accept: text/event-stream +MCP-Session-Id: (if the server issued one) +MCP-Protocol-Version: 2025-11-25 (MUST be included per spec) +``` + +Notifications vMCP MUST handle: + +| Notification | Action | +|---|---| +| `notifications/tools/list_changed` | Re-fetch `tools/list`, update routing table | +| `notifications/resources/list_changed` | Re-fetch `resources/list`, update routing table | +| `notifications/prompts/list_changed` | Re-fetch `prompts/list`, update routing table | + +vMCP MUST only act on notifications for capabilities advertised with +`listChanged: true` in the `InitializeResult`. Other notifications should be +logged and discarded. + +The GET stream MUST be re-established as step 4 of the reconnection sequence +above. If it cannot be established, the backend follows the circuit breaker path. + +##### vMCP: Dynamic Mode Reconciler Update + +Extend `BackendReconciler` in `pkg/vmcp/k8s/backend_reconciler.go` to watch +MCPRemoteEndpoint using the same `EnqueueRequestsFromMapFunc` pattern. +`fetchBackendResource()` gains a third type to try (see resolution order above). + +##### Session Constraints in Direct Mode + +**Why multi-replica fails.** The MCP `Mcp-Session-Id` is stored in +`LocalStorage`, which is a `sync.Map` held entirely in process memory +(`pkg/transport/session/storage_local.go`). A second vMCP replica has no +knowledge of sessions established by the first, causing HTTP 400 or 404 errors +on routed requests. + +**Single-replica is the only supported constraint.** `type: direct` endpoints +MUST be deployed with `replicas: 1` on the VirtualMCPServer Deployment. + +**No distributed session backend exists.** `pkg/transport/session/storage.go` +defines a `Storage` interface that is Redis-compatible. The serialization +helpers in `serialization.go` are explicitly marked +`// nolint:unused // Will be used in Phase 4 for Redis/Valkey storage`. However, +no Redis implementation of `session.Storage` exists in the codebase — the Redis +code in `pkg/authserver/storage/redis.go` is for a different purpose (OAuth +server state via fosite) and is unrelated. A distributed session backend would +need to be built from scratch as a new `session.Storage` implementation and is +out of scope for this RFC. + +## Security Considerations + +### Threat Model + +| Threat | Description | Mitigation | +|---|---|---| +| MITM on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | +| Credential exposure | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets; never inline. `addPlaintextHeaders` stores values in plaintext in etcd — use `addHeadersFromSecret` for sensitive values | +| SSRF via remoteURL | Compromised workload with CRD write access sets `remoteURL` to internal targets | RBAC + NetworkPolicy (see below) | +| Auth config confusion | Wrong credentials sent to wrong backend | Eliminated in direct mode: `externalAuthConfigRef` has one purpose (vMCP→remote). In proxy mode: see Auth Flow for the dual-consumer behaviour | +| Operator probing external URLs | Controller makes network requests to untrusted URLs | Eliminated: validation only, no probing | +| Expanded vMCP egress | vMCP pod makes outbound calls in direct mode | Acknowledged trade-off. See Credential Blast Radius below | +| Trust store injection | ConfigMap write access allows injecting malicious CA | CA bundle ConfigMaps are trust anchors; protect with RBAC | +| Token audience confusion | Exchanged token has broader scope than intended | Post-exchange audience validation MUST be implemented — see Phase 2 | + +### SSRF Mitigation + +When threat actors include compromised workloads with CRD write access. The following are required: + +1. **RBAC (REQUIRED):** Only cluster administrators or trusted platform service + accounts should have `create`/`update` permissions on MCPRemoteEndpoint. + +### Credential Blast Radius in Direct Mode + +In `type: proxy` mode, each proxy pod holds credentials for exactly one backend. +A compromised proxy pod yields credentials for one service. + +In `type: direct` mode, the vMCP pod holds credentials for every direct backend +simultaneously. A compromised vMCP pod yields credentials for all backends. + +**Recommendation for high-security environments:** Use `type: proxy` for +sensitive-credential backends. Reserve `type: direct` for unauthenticated or +low-sensitivity backends. Consider dedicated VirtualMCPServer instances (and +therefore dedicated MCPGroups) to isolate high-sensitivity backends. + +### CA Bundle Trust Store Considerations + +CA bundle ConfigMaps are trust anchors, not merely public data. Anyone with +`configmaps:update` in the namespace can inject a malicious CA certificate, +enabling MITM attacks against all `type: direct` backends referencing that +ConfigMap. CA bundle ConfigMaps MUST be protected with the same RBAC rigour as +the MCPRemoteEndpoint resource itself. + +### Audit Limitations in Direct Mode + +In `type: proxy` mode, the proxy pod logs: incoming request details, outgoing +URL, auth outcome, and remote response status. + +In `type: direct` mode, vMCP's existing audit middleware logs incoming client +requests but does **not** currently log: the remote URL contacted, outgoing auth +outcome, or remote HTTP response status. This is a known gap. + +**Required enhancement (Phase 2):** vMCP's audit middleware must be extended for +`type: direct` backends to log the remote URL, auth method, and remote HTTP +status code. + +### Secrets Management + +- **Dynamic mode**: vMCP reads secrets at runtime via K8s API. +- **Static mode**: Credentials mounted as environment variables; CA bundles + mounted as volumes. +- **Routine secret rotation** (static mode): Deployment rollout — old pods + continue serving until replaced. +- **Emergency revocation** (compromised credential): Use `strategy: Recreate` + on the VirtualMCPServer Deployment, or trigger `kubectl rollout restart` + immediately. RollingUpdate leaves old pods running with the revoked credential + until replacement completes. + +### Authentication and Authorization + +- **No new auth primitives**: Reuses existing `MCPExternalAuthConfig` CRD. +- **Direct mode**: vMCP validates incoming client tokens; `externalAuthConfigRef` + handles outgoing auth to the remote. Single, unambiguous boundary. +- **Proxy mode**: Two independent boundaries — see Auth Flow Comparison for + the dual-consumer behaviour of `externalAuthConfigRef`. +- **Post-exchange audience validation**: The current token exchange implementation + (`pkg/auth/tokenexchange/exchange.go`) does not validate the `aud` claim of + the returned token against the configured `audience` parameter. This MUST be + implemented before `type: direct` is considered secure for multi-backend + deployments. Scoped to Phase 2. + +## Deprecation + +Both `MCPRemoteProxy` and `MCPServerEntry` (THV-0055) are deprecated as of +this RFC. MCPServerEntry ships first as a near-term solution; once +MCPRemoteEndpoint reaches GA, MCPServerEntry's `type: direct` mode provides +equivalent functionality and MCPServerEntry enters its deprecation window. + +**Note on deprecation mechanism:** `+kubebuilder:deprecatedversion` only +deprecates API versions within the same CRD. It cannot deprecate one CRD in +favour of a different CRD. The deprecation is communicated via: +1. Warning events emitted on every MCPRemoteProxy and MCPServerEntry + `Reconcile()` call +2. A `deprecated: "true"` field in the CRD description +3. Documentation updates + +**Timeline:** + +| Phase | Trigger | What Happens | +|---|---|---| +| MCPServerEntry ships | THV-0055 merges | MCPServerEntry available for near-term direct remote use cases | +| Announced | This RFC merges | Warning events on MCPRemoteProxy reconcile; CRD description updated | +| Feature freeze | MCPRemoteEndpoint Phase 1 merged | Bug fixes and security patches only for MCPRemoteProxy and MCPServerEntry | +| Migration window | MCPRemoteEndpoint reaches GA | Minimum 2 minor ToolHive operator releases | +| Removal | After migration window | MCPRemoteProxy, MCPServerEntry CRDs, controllers, Helm templates, RBAC removed | + +### Migration: MCPRemoteProxy → MCPRemoteEndpoint + +| `MCPRemoteProxy` field | `MCPRemoteEndpoint` equivalent | Notes | +|---|---|---| +| `spec.remoteURL` | `spec.remoteURL` | | +| `spec.port` (deprecated) | `spec.proxyConfig.proxyPort` | Use `proxyPort` | +| `spec.proxyPort` | `spec.proxyConfig.proxyPort` | | +| `spec.transport` | `spec.transport` | | +| `spec.groupRef` | `spec.groupRef` | | +| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | See Auth Flow — dual-consumer behaviour preserved | +| `spec.headerForward` | `spec.headerForward` | | +| `spec.toolConfigRef` | `spec.toolConfigRef` | | +| `spec.oidcConfig` | `spec.proxyConfig.oidcConfig` | | +| `spec.authzConfig` | `spec.proxyConfig.authzConfig` | | +| `spec.audit` | `spec.proxyConfig.audit` | | +| `spec.telemetry` | `spec.proxyConfig.telemetry` | | +| `spec.resources` | `spec.proxyConfig.resources` | | +| `spec.serviceAccount` | `spec.proxyConfig.serviceAccount` | | +| `spec.sessionAffinity` | `spec.proxyConfig.sessionAffinity` | | +| `spec.trustProxyHeaders` | `spec.proxyConfig.trustProxyHeaders` | | +| `spec.endpointPrefix` | `spec.proxyConfig.endpointPrefix` | | +| `spec.resourceOverrides` | `spec.proxyConfig.resourceOverrides` | | +| *(not present)* | `spec.type` | Set to `proxy` | +| *(not present)* | `spec.caBundleRef` | New field; not on MCPRemoteProxy | + +### Migration: MCPServerEntry → MCPRemoteEndpoint + +| `MCPServerEntry` field | `MCPRemoteEndpoint` equivalent | Notes | +|---|---|---| +| `spec.remoteURL` | `spec.remoteURL` | | +| `spec.transport` | `spec.transport` | | +| `spec.groupRef` | `spec.groupRef` | | +| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | | +| `spec.headerForward` | `spec.headerForward` | | +| `spec.caBundleRef` | `spec.caBundleRef` | | +| *(not present)* | `spec.type` | Set to `direct` | +| *(not present)* | `spec.toolConfigRef` | New field; not on MCPServerEntry | + +## Alternatives Considered + +### Alternative 1: Keep MCPServerEntry Permanently Alongside MCPRemoteProxy (THV-0055) + +MCPServerEntry (THV-0055) ships first as a near-term solution for direct +remote backends behind vMCP. However, keeping both MCPServerEntry and +MCPRemoteProxy permanently means two CRDs with overlapping goals +(`remoteURL`, `groupRef`, `externalAuthConfigRef`, `headerForward` on both), +increasing cognitive load and long-term CRD surface area. +MCPRemoteEndpoint unifies both under a single resource, with MCPServerEntry's +`type: direct` mode covering the same use case. MCPServerEntry will enter a +deprecation window once MCPRemoteEndpoint reaches GA. + +### Alternative 2: `direct: true` Flag on MCPRemoteProxy + +**Why not chosen:** MCPRemoteProxy has ~9 pod-deployment-specific fields +that become inapplicable and confusing with a direct flag. Field pollution is +too high. The typed `proxyConfig` sub-object in MCPRemoteEndpoint solves this +cleanly. + +### Alternative 3: Inline Remote Backends in VirtualMCPServer + +**Why not chosen:** Prevents RBAC separation (only VirtualMCPServer editors +can manage backends) and couples backend lifecycle to vMCP reconciliation. + +## Compatibility + +### Backward Compatibility + +- `MCPRemoteProxy` continues to function during the deprecation window +- `MCPServer` is unchanged +- `VirtualMCPServer`, `MCPGroup`, `MCPExternalAuthConfig` receive additive + changes only (new watches, new status fields alongside existing ones) + +### Forward Compatibility + +- Starts at `v1alpha1`, graduation path to `v1beta1` as part of broader CRD + revamp +- `type` field and typed sub-configs allow future modes without breaking changes + +## Implementation Plan + +### Phase 0: MCPRemoteProxy Deprecation + Controller Refactoring + +1. Add `deprecated: "true"` to MCPRemoteProxy CRD description +2. Emit Warning events on every MCPRemoteProxy `Reconcile()` call +3. Update documentation +4. Extract shared proxy reconciliation logic from `mcpremoteproxy_controller.go` + into `pkg/operator/remoteproxy/` — refactoring only, no API changes, + all existing MCPRemoteProxy tests must pass + +### Phase 1: CRD and Controller + +1. Define `MCPRemoteEndpoint` CRD types with struct-level CEL rules (see CRD + Validation Rules section for correct placement) +2. Implement controller with both code paths using the Phase 0 shared package +3. Generate CRD manifests; update Helm chart with default NetworkPolicy +4. Update MCPGroup controller — all 7 code changes listed above, including + field indexer registration and RBAC markers +5. Unit tests for both controller paths; CEL rule tests + +### Phase 2: Static Mode Integration + +1. Add `Type`, `CABundlePath`, `HeaderEnvVars` fields to `StaticBackendConfig`; + update vMCP binary and roundtrip test BEFORE operator starts writing them +2. Update VirtualMCPServer controller: `listMCPRemoteEndpointsAsMap()`, + `getExternalAuthConfigNameFromWorkload()`, ConfigMap generation +3. Add `SecretKeyRef` env vars to vMCP Deployment for `addHeadersFromSecret` + entries on `type: direct` endpoints; store env var names (not values) in + backend ConfigMap +4. Mount CA bundle ConfigMaps as volumes +5. Implement post-exchange audience validation in token exchange strategy +6. Extend vMCP audit middleware for `type: direct` (remote URL, auth outcome, + remote HTTP status) +7. Integration tests with envtest + +### Phase 3: Dynamic Mode Integration + +1. Add `WorkloadTypeMCPRemoteEndpoint` to `pkg/vmcp/workloads/discoverer.go` +2. Extend `BackendReconciler` and `ListWorkloadsInGroup()` / `GetWorkloadAsVMCPBackend()` +3. Implement `MCP-Protocol-Version` header injection for direct mode HTTP client +4. Implement reconnection handling (stream resumption, full re-init on 404, + circuit breaker) +5. Implement persistent GET stream for server-initiated notifications +6. Integration tests for dynamic discovery + +### Phase 4: Documentation and E2E + +1. CRD reference documentation +2. Migration guide: MCPRemoteProxy → MCPRemoteEndpoint +3. User guide with mode selection guidance +4. Document single-replica constraint for `type: direct` +5. E2E Chainsaw tests for both modes and mixed groups + +### Dependencies + +- THV-0014 (K8s-Aware vMCP) — already merged; dynamic mode (Phase 3) is unblocked + +## Open Questions + +1. **`groupRef` required?** Resolved: Yes. Follow-up: consider requiring it on + MCPServer and MCPRemoteProxy too for consistency. + +2. **MCPRemoteProxy removal timing?** Resolved: After two minor releases + post-MCPRemoteEndpoint GA. + +3. **`toolConfigRef` placement?** Resolved: Shared top-level field. + +4. **`disabled` field?** Deferred. `groupRef` is required; disabling an endpoint + requires deletion. A `disabled: true` field can be added additively later if + post-implementation feedback shows deletion is too disruptive. + +5. **Multi-replica for `type: direct`?** Resolved as out-of-scope. Single-replica + is the constraint. Redis session storage is a future follow-up requiring a new + `session.Storage` implementation. + +## References + +- [THV-0055: MCPServerEntry CRD](./THV-0055-mcpserverentry-direct-remote-backends.md) — near-term solution for direct remote backends; superseded by this RFC once MCPRemoteEndpoint reaches GA +- [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) +- [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) +- [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) +- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) (merged) +- [THV-0026: Header Passthrough](./THV-0026-header-passthrough.md) +- [MCP Specification 2025-11-25: Transports](https://modelcontextprotocol.io/specification/2025-11-25/basic/transports) +- [MCP Specification 2025-11-25: Lifecycle](https://modelcontextprotocol.io/specification/2025-11-25/basic/lifecycle) +- [RFC 8693: OAuth 2.0 Token Exchange](https://datatracker.ietf.org/doc/html/rfc8693) +- [Kubernetes API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) +- [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104) +- [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) + +--- + +## RFC Lifecycle + +| Date | Reviewer | Decision | Notes | +|---|---|---|---| +| 2026-03-18 | @ChrisJBurns, @jaosorior | Draft | Initial submission | +| 2026-03-18 | Review agents | Revision 1 | Addressed: CEL rules, type immutability, auth flows, session constraints, security hardening | +| 2026-03-18 | Review agents | Revision 2 | Fixed: CEL placement, auth flow accuracy, Redis claim, reconnection protocol, GET stream, embeddedAuthServer/awsSts restriction, MCPGroup RBAC markers, audience validation, broken anchors, CA bundle warning, emergency rotation, short name | +| 2026-04-07 | @jaosorior | Revision 3 | Split into separate RFC; MCPServerEntry (THV-0055) ships first as near-term solution | From 399dda1245e4ac8a1daafb74090a4533380842dc Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Tue, 7 Apr 2026 14:21:59 +0300 Subject: [PATCH 8/9] Remove THV-0067 from this branch (belongs in its own PR) Co-Authored-By: Claude Opus 4.6 (1M context) --- ...premoteendpoint-unified-remote-backends.md | 1022 ----------------- 1 file changed, 1022 deletions(-) delete mode 100644 rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md diff --git a/rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md b/rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md deleted file mode 100644 index 8aab210..0000000 --- a/rfcs/THV-0067-mcpremoteendpoint-unified-remote-backends.md +++ /dev/null @@ -1,1022 +0,0 @@ -# THV-XXXX: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity - -- **Status**: Draft -- **Author(s)**: @ChrisJBurns, @jaosorior -- **Created**: 2026-03-18 -- **Last Updated**: 2026-04-07 -- **Target Repository**: toolhive -- **Supersedes**: [THV-0055](./THV-0055-mcpserverentry-direct-remote-backends.md) (MCPServerEntry CRD). MCPServerEntry ships first as a near-term solution; this RFC defines its long-term replacement. -- **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) - -## Summary - -Introduce a new `MCPRemoteEndpoint` CRD that unifies remote MCP server -connectivity under a single resource with two explicit modes: - -- **`type: proxy`** — deploys a proxy pod with full auth middleware, authz - policy, and audit logging. Functionally equivalent to `MCPRemoteProxy` and - replaces it. -- **`type: direct`** — no pod deployed; VirtualMCPServer connects directly to - the remote URL. Resolves forced-auth on public remotes - ([#3104](https://github.com/stacklok/toolhive/issues/3104)) and eliminates - unnecessary infrastructure for simple remote backends. - -`MCPRemoteProxy` is deprecated in favour of `MCPRemoteEndpoint` with -`type: proxy`. Existing `MCPRemoteProxy` resources continue to function during -the deprecation window with no immediate migration required. - -## Problem Statement - -### 1. Forced Authentication on Public Remotes (Issue #3104) - -`MCPRemoteProxy` requires OIDC authentication configuration even when -VirtualMCPServer already handles client authentication at its own boundary. -This blocks unauthenticated public remote MCP servers (e.g., context7, public -API gateways) from being placed behind vMCP without configuring unnecessary -auth on the proxy layer. - -### 2. Resource Waste - -Every remote MCP server behind vMCP requires a full Deployment + Service + Pod -just to forward HTTP requests that vMCP could make directly. For organisations -with many remote MCP backends, this creates unnecessary infrastructure cost and -operational overhead. - -### 3. CRD Proliferation and Overlapping Goals - -The original THV-0055 proposed `MCPServerEntry` as a companion resource to -`MCPRemoteProxy`. Both resources would have existed to serve the same -high-level user goal: connecting to a remote MCP server. Both reference a -`remoteURL`, join a `groupRef`, and support `externalAuthConfigRef`. The only -difference is whether a proxy pod is deployed. - -Having two separate CRDs for the same goal — differing only in their mechanism -— increases the API surface users must learn and makes the right choice -non-obvious before writing any YAML. The goal (`connect to a remote server`) -should be the abstraction; the mechanism (`via a proxy pod` vs `directly`) -should be a configuration choice within it. - -### Who Is Affected - -- **Platform teams** deploying vMCP with remote MCP backends in Kubernetes -- **Product teams** wanting to register external MCP services behind vMCP -- **Existing `MCPRemoteProxy` users** who will migrate to - `MCPRemoteEndpoint` with `type: proxy` - -## Goals - -- Provide a single, purpose-built CRD for all remote MCP server connectivity -- Enable vMCP to connect directly to remote MCP servers without a proxy pod - for simple use cases -- Allow unauthenticated remote MCP servers behind vMCP without workarounds -- Retain the full feature set of `MCPRemoteProxy` (auth middleware, authz, - audit logging) under `type: proxy` -- Deprecate `MCPRemoteProxy` with a clear migration path -- Reduce long-term CRD surface area rather than growing it - -## Non-Goals - -- **Removing `MCPRemoteProxy` immediately**: It remains functional during the - deprecation window. Removal is a follow-up once adoption of - `MCPRemoteEndpoint` is confirmed. -- **Adding health probing from the operator**: The controller should NOT probe - remote URLs. Health checking belongs in vMCP's existing runtime - infrastructure (`healthCheckInterval`, circuit breaker). -- **Cross-namespace references**: `MCPRemoteEndpoint` follows the same - namespace-scoped patterns as other ToolHive CRDs. -- **Supporting stdio or container-based transports**: `MCPRemoteEndpoint` is - exclusively for remote HTTP-based MCP servers. -- **CLI mode support**: `MCPRemoteEndpoint` is a Kubernetes-only CRD. -- **Multi-replica vMCP with `type: direct`**: Session state is in-process only. - See [Session Constraints](#session-constraints-in-direct-mode). - -## Mode Selection Guide - -| Scenario | Recommended Mode | Why | -|---|---|---| -| Public, unauthenticated remote (e.g., context7) | `direct` | No auth middleware needed; no pod required | -| Remote with outgoing auth handled by vMCP (token exchange, header injection, etc.) | `direct` | vMCP applies outgoing auth directly; one fewer hop | -| Remote requiring its own OIDC validation boundary | `proxy` | Proxy pod validates tokens independently | -| Remote requiring Cedar authz policies per-endpoint | `proxy` | Authz policies run in the proxy pod | -| Remote needing audit logging at the endpoint level | `proxy` | Proxy pod has its own audit middleware | -| Standalone use without VirtualMCPServer | `proxy` | Direct mode requires vMCP to function | -| Many remotes where pod-per-remote is too costly | `direct` | No Deployment/Service/Pod per remote | - -**Rule of thumb:** Use `direct` for simple, public remotes or any remote -fronted by vMCP where vMCP handles outgoing auth. Use `proxy` when you need an -independent auth/authz/audit boundary per remote, or when the backend needs to -be accessible standalone. - -## Proposed Solution - -### High-Level Design - -`MCPRemoteEndpoint` is a single CRD with a `type` discriminator field. Shared -fields sit at the top level. Fields only applicable to the proxy pod are grouped -under `proxyConfig`. - -```mermaid -graph TB - subgraph "Client Layer" - Client[MCP Client] - end - - subgraph "Virtual MCP Server" - InAuth[Incoming Auth] - Router[Request Router] - AuthMgr[Backend Auth Manager] - end - - subgraph "MCPRemoteEndpoint: type=proxy" - ProxyPod[Proxy Pod
OIDC + Authz + Audit] - end - - subgraph "MCPRemoteEndpoint: type=direct" - DirectEntry[Config Only
No pods] - end - - subgraph "External Services" - Remote1[remote.example.com/mcp] - Remote2[public-api.example.com/mcp] - end - - Client -->|Token: aud=vmcp| InAuth - InAuth --> Router - Router --> AuthMgr - AuthMgr -->|Via proxy pod| ProxyPod - ProxyPod -->|Authenticated HTTPS| Remote1 - AuthMgr -->|Direct HTTPS| Remote2 - DirectEntry -.->|Declares endpoint| Remote2 - - style DirectEntry fill:#fff3e0,stroke:#ff9800 - style ProxyPod fill:#e3f2fd,stroke:#2196f3 -``` - -### Mode Comparison - -| Capability | `type: proxy` | `type: direct` | -|---|---|---| -| Deploys proxy pod | Yes | No | -| Own OIDC validation | Yes | No (vMCP handles this) | -| Own authz policy | Yes | No | -| Own audit logging | Yes (proxy-level) | No (vMCP's audit middleware; see [Audit Limitations](#audit-limitations-in-direct-mode)) | -| Standalone use (without vMCP) | Yes | No | -| Outgoing auth to remote | Yes (`externalAuthConfigRef`) | Yes (`externalAuthConfigRef`) | -| Header forwarding | Yes (`headerForward`) | Yes (`headerForward`) | -| Custom CA bundle | Yes (`caBundleRef`) | Yes (`caBundleRef`) | -| Tool filtering | Yes (`toolConfigRef`) | Yes (`toolConfigRef`) | -| GroupRef support | Yes | Yes | -| Multi-replica vMCP | Yes | No — see [Session Constraints](#session-constraints-in-direct-mode) | -| Credential blast radius | Isolated per proxy pod | All credentials in vMCP pod — see [Security Considerations](#security-considerations) | - -### Auth Flow Comparison - -**`type: proxy` — two independent auth legs:** - -```mermaid -sequenceDiagram - participant C as Client - participant V as vMCP - participant P as Proxy Pod - participant R as Remote Server - - C->>V: Request (aud=vmcp token) - V->>V: Validate incoming token - V->>P: Forward (externalAuthConfigRef credential) - P->>P: oidcConfig validates incoming request - P->>R: Forward (externalAuthConfigRef as outgoing middleware) - R-->>P: Response - P-->>V: Response - V-->>C: Response -``` - -`externalAuthConfigRef` on a `type: proxy` endpoint is read by two separate -consumers: - -1. **vMCP** reads it at backend discovery time (`discoverRemoteProxyAuthConfig()` - in `pkg/vmcp/workloads/k8s.go`). The resolved strategy is applied by vMCP's - `authRoundTripper` when making outgoing calls **to the proxy pod**. -2. **The proxy pod** reads the same field via the operator-generated RunConfig - (`AddExternalAuthConfigOptions()` in `mcpremoteproxy_runconfig.go`). The pod - applies it as outgoing middleware when forwarding requests **to the remote server**. - -In direct mode, only consumer 1 applies — there is no proxy pod. - -`proxyConfig.oidcConfig` is a third, separate concern — it validates tokens -arriving at the proxy pod from vMCP. It is entirely independent of -`externalAuthConfigRef`. - -**`type: direct` — single auth boundary:** - -```mermaid -sequenceDiagram - participant C as Client - participant V as vMCP - participant R as Remote Server - - C->>V: Request (aud=vmcp token) - V->>V: Validate incoming token - V->>V: Apply externalAuthConfigRef as outgoing auth - V->>R: Request (with outgoing credentials) - R-->>V: Response - V-->>C: Response -``` - -vMCP reads `externalAuthConfigRef` and applies it when calling the remote -server directly. For `type: tokenExchange`, the client's validated incoming -token is used as the RFC 8693 `subject_token` to obtain a service token for -the remote. The token exchange server must trust the IdP that issued the -client's token. - -**Token exchange operational requirements (`type: direct`):** -- The STS must be configured to accept subject tokens from vMCP's IdP. -- Configure `audience` in the `MCPExternalAuthConfig` to match the remote - server's expected audience claim. - -**Unsupported `externalAuthConfigRef` types for `type: direct`:** - -The following types are **not valid** when `type: direct`: - -- **`embeddedAuthServer`**: Requires a running pod to host the OAuth2 server. - No pod exists in direct mode. -- **`awsSts`**: No converter is registered in vMCP's DefaultRegistry - (`pkg/vmcp/auth/converters`). The registry only registers `tokenExchange`, - `headerInjection`, and `unauthenticated`. Using `awsSts` in direct mode will - cause backend discovery to fail at runtime. - -The controller MUST reject these combinations and set ConfigurationValid=False with reason UnsupportedAuthTypeForDirectMode. - -### Detailed Design - -#### CRD Validation Rules - -CEL `XValidation` rules in Kubebuilder are **struct-level** markers — placed on -the type being validated, not on a field within it. The pattern (from -`virtualmcpserver_types.go:88`): - -```go -// +kubebuilder:validation:XValidation:rule="...",message="..." -type StructName struct { ... } -``` - -The four rules for `MCPRemoteEndpoint`, placed on their correct owning types: - -```go -// MCPRemoteEndpointSpec struct-level rules: -// -// +kubebuilder:validation:XValidation:rule="self.type != 'direct' || !has(self.proxyConfig)",message="spec.proxyConfig must not be set when type is direct" -// +kubebuilder:validation:XValidation:rule="self.type != 'proxy' || has(self.proxyConfig)",message="spec.proxyConfig is required when type is proxy" -// +kubebuilder:validation:XValidation:rule="oldSelf == null || self.type == oldSelf.type",message="spec.type is immutable after creation" -// -//nolint:lll -type MCPRemoteEndpointSpec struct { ... } - -// MCPRemoteEndpointProxyConfig — oidcConfig uses standard required marker: -type MCPRemoteEndpointProxyConfig struct { - // +kubebuilder:validation:Required - OIDCConfig OIDCConfigRef `json:"oidcConfig"` - // ... -} -``` - -**Important:** The `oldSelf == null` guard is required so the immutability rule -passes on object creation (when no previous state exists). Without it, the rule -will panic or be silently skipped on create depending on Kubernetes version. - - -#### MCPRemoteEndpoint CRD - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPRemoteEndpoint -metadata: - name: context7 - namespace: default -spec: - # REQUIRED: Connectivity mode — IMMUTABLE after creation. - # Delete and recreate to change type. - # +kubebuilder:validation:Enum=proxy;direct - # +kubebuilder:default=proxy - # (immutability enforced by struct-level CEL rule, not here) - type: direct - - # REQUIRED: URL of the remote MCP server. - # +kubebuilder:validation:Pattern=`^https?://` - remoteURL: https://mcp.context7.com/mcp - - # REQUIRED: Transport protocol. - # streamable-http is RECOMMENDED. sse is the legacy 2024-11-05 transport, - # retained for backwards compatibility with servers that have not yet migrated. - # +kubebuilder:validation:Enum=streamable-http;sse - transport: streamable-http - - # REQUIRED: Group membership. MCPRemoteEndpoint only functions as part of - # an MCPGroup (aggregated by VirtualMCPServer), so groupRef is always required. - groupRef: engineering-team - - # OPTIONAL: Auth for outgoing requests to the remote server. - # In proxy mode: vMCP reads this for vMCP->proxy auth AND the proxy pod - # reads it for proxy->remote auth (two separate consumers, same field). - # In direct mode: vMCP reads this for vMCP->remote auth only. - # Omit for unauthenticated public remotes. - # NOT valid in direct mode: embeddedAuthServer, awsSts (see Auth Flow section). - externalAuthConfigRef: - name: salesforce-auth - - # OPTIONAL: Header forwarding. Applies to both modes. - headerForward: - addPlaintextHeaders: - # WARNING: values stored in plaintext in etcd and visible via kubectl. - # Never put API keys, tokens, or secrets here. - # Use addHeadersFromSecret for sensitive values. - X-Tenant-ID: "tenant-123" - addHeadersFromSecret: - - headerName: X-API-Key - valueSecretRef: - name: remote-api-credentials - key: api-key - - # OPTIONAL: Custom CA bundle (ConfigMap) for private remote servers. - # NOTE: CA bundle ConfigMaps are trust anchors. Protect them with RBAC — - # anyone with ConfigMap write access in the namespace can inject a malicious - # CA and intercept TLS traffic to this backend. - caBundleRef: - name: internal-ca-bundle - key: ca.crt - - # OPTIONAL: Tool filtering. Applies to both modes. - toolConfigRef: - name: my-tool-config - - # OPTIONAL: Proxy pod configuration. - # REQUIRED when type: proxy. MUST NOT be set when type: direct. - # Validation is enforced by struct-level CEL rules on MCPRemoteEndpointSpec - # and MCPRemoteEndpointProxyConfig — not by field-level markers here. - proxyConfig: - oidcConfig: # REQUIRED within proxyConfig - type: kubernetes - authzConfig: - type: inline - inline: - policies: [...] - audit: - enabled: true - telemetry: - openTelemetry: - enabled: true - resources: - limits: - cpu: "500m" - memory: "128Mi" - serviceAccount: my-service-account - # +kubebuilder:default=8080 - proxyPort: 8080 - # +kubebuilder:validation:Enum=ClientIP;None - # +kubebuilder:default=ClientIP - # NOTE: ClientIP affinity is a rough approximation; Mcp-Session-Id - # header-based affinity is spec-correct but requires an ingress controller. - sessionAffinity: ClientIP - # +kubebuilder:default=false - trustProxyHeaders: false - endpointPrefix: "" - resourceOverrides: {} -``` - -**Example: Unauthenticated public remote (direct mode):** - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPRemoteEndpoint -metadata: - name: context7 -spec: - type: direct - remoteURL: https://mcp.context7.com/mcp - transport: streamable-http - groupRef: engineering-team -``` - -**Example: Token exchange auth (direct mode):** - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPRemoteEndpoint -metadata: - name: salesforce-mcp -spec: - type: direct - remoteURL: https://mcp.salesforce.com - transport: streamable-http - groupRef: engineering-team - externalAuthConfigRef: - name: salesforce-token-exchange # type: tokenExchange -``` - -**Example: Standalone proxy with auth middleware (proxy mode):** - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPRemoteEndpoint -metadata: - name: internal-api-mcp -spec: - type: proxy - remoteURL: https://internal-mcp.corp.example.com/mcp - transport: streamable-http - groupRef: engineering-team - proxyConfig: - oidcConfig: - type: kubernetes - authzConfig: - type: inline - inline: - policies: ["permit(principal, action, resource);"] - audit: - enabled: true -``` - -#### CRD Metadata - -```go -// +kubebuilder:resource:shortName=mcpre -// +kubebuilder:printcolumn:name="Type",type="string",JSONPath=".spec.type" -// +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase" -// +kubebuilder:printcolumn:name="Remote URL",type="string",JSONPath=".spec.remoteURL" -// +kubebuilder:printcolumn:name="URL",type="string",JSONPath=".status.url" -// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" -``` - -Short name: `mcpre` (consistent with `mcpg` for MCPGroup, `vmcp` for -VirtualMCPServer, `extauth` for MCPExternalAuthConfig). - -#### Spec Fields - -**Top-level (both modes):** - -| Field | Type | Required | Description | -|---|---|---|---| -| `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. **Immutable after creation.** | -| `remoteURL` | string | Yes | URL of the remote MCP server. | -| `transport` | enum | Yes | `streamable-http` (recommended) or `sse` (legacy 2024-11-05 transport). | -| `groupRef` | string | Yes | Name of the MCPGroup. | -| `externalAuthConfigRef` | object | No | Outgoing auth config. In proxy mode: read by both vMCP (vMCP→proxy) and the proxy pod (proxy→remote). In direct mode: read by vMCP only (vMCP→remote). Types `embeddedAuthServer` and `awsSts` are invalid in direct mode. | -| `headerForward` | object | No | Header injection. `addPlaintextHeaders` values are stored in plaintext in etcd — use `addHeadersFromSecret` for secrets. | -| `caBundleRef` | object | No | ConfigMap containing a custom CA bundle. Protect with RBAC — write access enables MITM. | -| `toolConfigRef` | object | No | Tool filtering. | - -**`proxyConfig` (only when `type: proxy`):** - -| Field | Type | Required | Description | -|---|---|---|---| -| `oidcConfig` | object | Yes | Validates tokens arriving at the proxy pod. | -| `authzConfig` | object | No | Cedar authorization policy. | -| `audit` | object | No | Audit logging for the proxy pod. | -| `telemetry` | object | No | Observability configuration. | -| `resources` | object | No | Container resource limits. | -| `serviceAccount` | string | No | Existing SA to use; auto-created if unset. | -| `proxyPort` | int | No | Port to expose. Default: 8080. | -| `sessionAffinity` | enum | No | `ClientIP` (default) or `None`. | -| `trustProxyHeaders` | bool | No | Trust X-Forwarded-* headers. Default: false. | -| `endpointPrefix` | string | No | Path prefix for ingress routing. | -| `resourceOverrides` | object | No | Metadata overrides for created resources. | - -#### Status Fields - -| Field | Type | Description | -|---|---|---| -| `conditions` | []Condition | Standard Kubernetes conditions. | -| `phase` | string | `Pending`, `Ready`, `Failed`, `Terminating`. | -| `url` | string | For `type: proxy`: cluster-internal Service URL (set once Deployment is ready). For `type: direct`: set to `spec.remoteURL` immediately upon validation. | -| `observedGeneration` | int64 | Most recent generation reconciled. | - -**`status.url` lifecycle note:** For `type: proxy`, `status.url` is empty until -the proxy Deployment becomes ready. Backend discoverers (static and dynamic) -MUST treat an empty `status.url` as "backend not yet available" and skip the -backend — not remove it from the registry. For `type: direct`, `status.url` is -set immediately after validation, so this race does not apply. - -**Condition types:** - -| Type | Purpose | When Set | -|---|---|---| -| `Ready` | Overall readiness | Always | -| `GroupRefValid` | MCPGroup exists | Always | -| `AuthConfigValid` | MCPExternalAuthConfig exists | When `externalAuthConfigRef` is set | -| `CABundleValid` | CA bundle ConfigMap exists | When `caBundleRef` is set | -| `DeploymentReady` | Proxy deployment healthy | Only when `type: proxy` | -| `ConfigurationValid` | All validation checks passed | Always | - -No `RemoteReachable` condition — the controller never probes remote URLs. - -#### Component Changes - -##### Operator: MCPRemoteEndpoint Controller - -**Pre-requisite: extract shared proxy logic.** `mcpremoteproxy_controller.go` -is ~1,125 lines with all proxy reconciliation logic bound to -`*mcpv1alpha1.MCPRemoteProxy` methods. Before Phase 1, extract the -Deployment/Service/ServiceAccount/RBAC creation functions into a shared -`pkg/operator/remoteproxy/` package that accepts an interface rather than the -concrete type. `MCPRemoteProxyReconciler` is then refactored to use it. -This is a refactoring-only step with no API changes — all existing tests must -pass unchanged. This is scoped as Phase 0 step 4. - -**`type: proxy` path** — uses the extracted shared package: -1. Validates spec (OIDC config, group ref, auth config ref, CA bundle ref) -2. Ensures Deployment, Service, ServiceAccount, RBAC -3. Monitors deployment health, updates `Ready` condition -4. Sets `status.url` to the cluster-internal Service URL - -**`type: direct` path** — validation only, no infrastructure: -1. Validates MCPGroup exists; sets `GroupRefValid` -2. If `externalAuthConfigRef` set, validates it exists; sets `AuthConfigValid` -3. If `externalAuthConfigRef` type is `embeddedAuthServer` or `awsSts`, sets - `ConfigurationValid=False` with reason `UnsupportedAuthTypeForDirectMode` -4. If `caBundleRef` set, validates ConfigMap exists; sets `CABundleValid` -5. Sets `Ready=True` and `status.url = spec.remoteURL` - -No finalizers for `type: direct`. `type: proxy` uses the same finalizer pattern -as the existing MCPRemoteProxy controller. - -##### Operator: MCPGroup Controller Update - -The MCPGroup controller currently watches MCPServer and MCPRemoteProxy. It must -be updated to also watch MCPRemoteEndpoint. The following changes are required -(this is not a single bullet point): - -1. Register a field indexer for `MCPRemoteEndpoint.spec.groupRef` in - `SetupFieldIndexers()` at manager startup — without this, `MatchingFields` - queries for MCPRemoteEndpoint silently return empty results. -2. Add `findReferencingMCPRemoteEndpoints()` mirroring the existing - `findReferencingMCPRemoteProxies()`. -3. Add `findMCPGroupForMCPRemoteEndpoint()` watch mapper. -4. Register the watch in `SetupWithManager()` via - `Watches(&mcpv1alpha1.MCPRemoteEndpoint{}, ...)`. -5. Update `updateGroupMemberStatus()` to call the new function and populate - new status fields. -6. Update `handleListFailure()` and `handleDeletion()` for MCPRemoteEndpoint - membership. -7. Add RBAC markers — without these the operator gets a Forbidden error at - runtime: - ``` - // +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints,verbs=get;list;watch - // +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints/status,verbs=get;update;patch - ``` - -**Status fields (additive — no renames):** New fields `status.remoteEndpoints` -and `status.remoteEndpointCount` are added alongside the existing -`status.remoteProxies` and `status.remoteProxyCount`. Both are populated during -the deprecation window. Old fields are removed only when MCPRemoteProxy is -removed. This preserves backward compatibility for existing jsonpath queries -and monitoring dashboards. - -##### Operator: VirtualMCPServer Controller Update - -**`StaticBackendConfig` schema change required.** The current -`StaticBackendConfig` struct in `pkg/vmcp/config/config.go` has only `Name`, -`URL`, `Transport`, and `Metadata`. The vMCP binary uses `KnownFields(true)` -strict YAML parsing. Writing new fields (`Type`, `CABundlePath`, `Headers`) -to the ConfigMap before updating the vMCP binary will cause a startup failure. - -Implementation order: -1. Add `Type`, `CABundlePath`, and `HeaderEnvVars` fields to `StaticBackendConfig` -2. Update the vMCP binary and the roundtrip test in - `pkg/vmcp/config/crd_cli_roundtrip_test.go` -3. Deploy the updated vMCP image **before** the operator starts writing these - fields — co-ordinate Helm chart version bumping accordingly - -Additional touch points: -- `listMCPRemoteEndpointsAsMap()` — new function for ConfigMap generation -- `getExternalAuthConfigNameFromWorkload()` — add MCPRemoteEndpoint case -- Deployment volume mount logic for `caBundleRef` ConfigMaps - -**Header secret handling in static mode:** Secret values MUST NOT be inlined -into the backend ConfigMap. Instead, the operator uses the same `SecretKeyRef` -pattern that MCPRemoteProxy already uses: - -1. For each `type: direct` endpoint with `addHeadersFromSecret` entries, the - operator adds `SecretKeyRef` environment variables to the **vMCP Deployment** - (e.g. `TOOLHIVE_SECRET_HEADER_FORWARD_X_API_KEY_`). -2. The static backend ConfigMap stores only the env var names — never the - secret values themselves. -3. At runtime, vMCP resolves header values via the existing - `secrets.EnvironmentProvider`, identical to how MCPRemoteProxy pods handle - this today. - -This ensures no key material is written to ConfigMaps or stored in etcd in -plaintext. The trade-off is that adding or removing `addHeadersFromSecret` -entries on a direct endpoint triggers a vMCP Deployment update (and therefore -a pod restart), consistent with how CA bundle changes already behave in static -mode. - -**CA bundle in static mode:** The operator mounts the `caBundleRef` ConfigMap -as a volume into the vMCP pod at `/etc/toolhive/ca-bundles//ca.crt`. -The generated backend ConfigMap includes the mount path so vMCP can construct -the correct `tls.Config`. Pod restart is required when a CA bundle changes in -static mode. - -##### vMCP: Backend Discovery Update - -Add `WorkloadTypeMCPRemoteEndpoint` to `pkg/vmcp/workloads/discoverer.go`. - -Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in -`pkg/vmcp/workloads/k8s.go`. For MCPRemoteEndpoint: -- `type: proxy` — uses `status.url` (proxy Service URL), same as MCPRemoteProxy -- `type: direct` — uses `spec.remoteURL` directly - -**Name collision prevention:** The MCPRemoteEndpoint controller MUST reject -creation if an MCPServer or MCPRemoteProxy with the same name already exists in -the namespace, setting `ConfigurationValid=False` with reason -`NameCollision`. Likewise, the MCPServer and MCPRemoteProxy controllers MUST -be updated to reject collisions with MCPRemoteEndpoint. This prevents -surprising fallback behaviour where deleting one resource type silently -activates a different resource with the same name. - -`fetchBackendResource()` in `pkg/vmcp/k8s/backend_reconciler.go` retains its -existing resolution order (MCPServer → MCPRemoteProxy → MCPRemoteEndpoint) as -a defensive fallback, but the admission-time rejection above makes same-name -collisions a user error rather than an implicit resolution policy. - -##### vMCP: HTTP Client for Direct Mode - -For `type: direct` backends: -1. Use system CA pool by default; optionally append `caBundleRef` CA bundle -2. Enforce TLS 1.2 minimum -3. Apply `externalAuthConfigRef` credentials via `authRoundTripper` -4. Inject `MCP-Protocol-Version: ` on every HTTP request - after initialization — this is a MUST per MCP spec 2025-11-25 and applies - to both POST (tool calls) and GET (server notification stream) requests - -##### vMCP: Reconnection Handling for Direct Mode - -When a `type: direct` backend connection drops, vMCP follows this sequence -per MCP spec 2025-11-25: - -1. **Attempt stream resumption (SHOULD).** If the backend previously issued SSE - event IDs, vMCP SHOULD issue an HTTP GET with `Last-Event-ID` set to the - last received event ID before re-initializing. If the connection recovers - and the session remains valid, no re-initialization is needed. - -2. **Exponential backoff.** Initial: 1s, cap: 30s, jitter recommended. - If the backend sends a `retry` field in an SSE event, that value overrides - the local backoff for that attempt. - -3. **Full re-initialization on HTTP 404 or session loss.** If HTTP 404 is - returned on a request carrying an `MCP-Session-Id`, discard all session state - and execute the full handshake: - ``` - POST initialize request → InitializeResult (new MCP-Session-Id) - POST notifications/initialized - ``` - After initialization, re-discover ALL capabilities advertised in the new - `InitializeResult` (tools, resources, prompts as applicable). Results from - the prior session MUST NOT be reused. - -4. **Re-establish GET stream.** See section below. - -5. **Circuit breaker.** After 5 consecutive failed attempts, mark the backend - `unavailable` and open the circuit breaker. The resource transitions to the - `Failed` phase. A half-open probe at 60-second intervals tests recovery. - -##### vMCP: Server-Initiated Notifications in Direct Mode - -vMCP acts as an MCP **client** toward each `type: direct` backend and MUST -maintain a persistent HTTP GET SSE stream to each backend for server-initiated -messages. - -After initialization (after sending `notifications/initialized`), vMCP MUST -issue: -``` -GET -Accept: text/event-stream -MCP-Session-Id: (if the server issued one) -MCP-Protocol-Version: 2025-11-25 (MUST be included per spec) -``` - -Notifications vMCP MUST handle: - -| Notification | Action | -|---|---| -| `notifications/tools/list_changed` | Re-fetch `tools/list`, update routing table | -| `notifications/resources/list_changed` | Re-fetch `resources/list`, update routing table | -| `notifications/prompts/list_changed` | Re-fetch `prompts/list`, update routing table | - -vMCP MUST only act on notifications for capabilities advertised with -`listChanged: true` in the `InitializeResult`. Other notifications should be -logged and discarded. - -The GET stream MUST be re-established as step 4 of the reconnection sequence -above. If it cannot be established, the backend follows the circuit breaker path. - -##### vMCP: Dynamic Mode Reconciler Update - -Extend `BackendReconciler` in `pkg/vmcp/k8s/backend_reconciler.go` to watch -MCPRemoteEndpoint using the same `EnqueueRequestsFromMapFunc` pattern. -`fetchBackendResource()` gains a third type to try (see resolution order above). - -##### Session Constraints in Direct Mode - -**Why multi-replica fails.** The MCP `Mcp-Session-Id` is stored in -`LocalStorage`, which is a `sync.Map` held entirely in process memory -(`pkg/transport/session/storage_local.go`). A second vMCP replica has no -knowledge of sessions established by the first, causing HTTP 400 or 404 errors -on routed requests. - -**Single-replica is the only supported constraint.** `type: direct` endpoints -MUST be deployed with `replicas: 1` on the VirtualMCPServer Deployment. - -**No distributed session backend exists.** `pkg/transport/session/storage.go` -defines a `Storage` interface that is Redis-compatible. The serialization -helpers in `serialization.go` are explicitly marked -`// nolint:unused // Will be used in Phase 4 for Redis/Valkey storage`. However, -no Redis implementation of `session.Storage` exists in the codebase — the Redis -code in `pkg/authserver/storage/redis.go` is for a different purpose (OAuth -server state via fosite) and is unrelated. A distributed session backend would -need to be built from scratch as a new `session.Storage` implementation and is -out of scope for this RFC. - -## Security Considerations - -### Threat Model - -| Threat | Description | Mitigation | -|---|---|---| -| MITM on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | -| Credential exposure | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets; never inline. `addPlaintextHeaders` stores values in plaintext in etcd — use `addHeadersFromSecret` for sensitive values | -| SSRF via remoteURL | Compromised workload with CRD write access sets `remoteURL` to internal targets | RBAC + NetworkPolicy (see below) | -| Auth config confusion | Wrong credentials sent to wrong backend | Eliminated in direct mode: `externalAuthConfigRef` has one purpose (vMCP→remote). In proxy mode: see Auth Flow for the dual-consumer behaviour | -| Operator probing external URLs | Controller makes network requests to untrusted URLs | Eliminated: validation only, no probing | -| Expanded vMCP egress | vMCP pod makes outbound calls in direct mode | Acknowledged trade-off. See Credential Blast Radius below | -| Trust store injection | ConfigMap write access allows injecting malicious CA | CA bundle ConfigMaps are trust anchors; protect with RBAC | -| Token audience confusion | Exchanged token has broader scope than intended | Post-exchange audience validation MUST be implemented — see Phase 2 | - -### SSRF Mitigation - -When threat actors include compromised workloads with CRD write access. The following are required: - -1. **RBAC (REQUIRED):** Only cluster administrators or trusted platform service - accounts should have `create`/`update` permissions on MCPRemoteEndpoint. - -### Credential Blast Radius in Direct Mode - -In `type: proxy` mode, each proxy pod holds credentials for exactly one backend. -A compromised proxy pod yields credentials for one service. - -In `type: direct` mode, the vMCP pod holds credentials for every direct backend -simultaneously. A compromised vMCP pod yields credentials for all backends. - -**Recommendation for high-security environments:** Use `type: proxy` for -sensitive-credential backends. Reserve `type: direct` for unauthenticated or -low-sensitivity backends. Consider dedicated VirtualMCPServer instances (and -therefore dedicated MCPGroups) to isolate high-sensitivity backends. - -### CA Bundle Trust Store Considerations - -CA bundle ConfigMaps are trust anchors, not merely public data. Anyone with -`configmaps:update` in the namespace can inject a malicious CA certificate, -enabling MITM attacks against all `type: direct` backends referencing that -ConfigMap. CA bundle ConfigMaps MUST be protected with the same RBAC rigour as -the MCPRemoteEndpoint resource itself. - -### Audit Limitations in Direct Mode - -In `type: proxy` mode, the proxy pod logs: incoming request details, outgoing -URL, auth outcome, and remote response status. - -In `type: direct` mode, vMCP's existing audit middleware logs incoming client -requests but does **not** currently log: the remote URL contacted, outgoing auth -outcome, or remote HTTP response status. This is a known gap. - -**Required enhancement (Phase 2):** vMCP's audit middleware must be extended for -`type: direct` backends to log the remote URL, auth method, and remote HTTP -status code. - -### Secrets Management - -- **Dynamic mode**: vMCP reads secrets at runtime via K8s API. -- **Static mode**: Credentials mounted as environment variables; CA bundles - mounted as volumes. -- **Routine secret rotation** (static mode): Deployment rollout — old pods - continue serving until replaced. -- **Emergency revocation** (compromised credential): Use `strategy: Recreate` - on the VirtualMCPServer Deployment, or trigger `kubectl rollout restart` - immediately. RollingUpdate leaves old pods running with the revoked credential - until replacement completes. - -### Authentication and Authorization - -- **No new auth primitives**: Reuses existing `MCPExternalAuthConfig` CRD. -- **Direct mode**: vMCP validates incoming client tokens; `externalAuthConfigRef` - handles outgoing auth to the remote. Single, unambiguous boundary. -- **Proxy mode**: Two independent boundaries — see Auth Flow Comparison for - the dual-consumer behaviour of `externalAuthConfigRef`. -- **Post-exchange audience validation**: The current token exchange implementation - (`pkg/auth/tokenexchange/exchange.go`) does not validate the `aud` claim of - the returned token against the configured `audience` parameter. This MUST be - implemented before `type: direct` is considered secure for multi-backend - deployments. Scoped to Phase 2. - -## Deprecation - -Both `MCPRemoteProxy` and `MCPServerEntry` (THV-0055) are deprecated as of -this RFC. MCPServerEntry ships first as a near-term solution; once -MCPRemoteEndpoint reaches GA, MCPServerEntry's `type: direct` mode provides -equivalent functionality and MCPServerEntry enters its deprecation window. - -**Note on deprecation mechanism:** `+kubebuilder:deprecatedversion` only -deprecates API versions within the same CRD. It cannot deprecate one CRD in -favour of a different CRD. The deprecation is communicated via: -1. Warning events emitted on every MCPRemoteProxy and MCPServerEntry - `Reconcile()` call -2. A `deprecated: "true"` field in the CRD description -3. Documentation updates - -**Timeline:** - -| Phase | Trigger | What Happens | -|---|---|---| -| MCPServerEntry ships | THV-0055 merges | MCPServerEntry available for near-term direct remote use cases | -| Announced | This RFC merges | Warning events on MCPRemoteProxy reconcile; CRD description updated | -| Feature freeze | MCPRemoteEndpoint Phase 1 merged | Bug fixes and security patches only for MCPRemoteProxy and MCPServerEntry | -| Migration window | MCPRemoteEndpoint reaches GA | Minimum 2 minor ToolHive operator releases | -| Removal | After migration window | MCPRemoteProxy, MCPServerEntry CRDs, controllers, Helm templates, RBAC removed | - -### Migration: MCPRemoteProxy → MCPRemoteEndpoint - -| `MCPRemoteProxy` field | `MCPRemoteEndpoint` equivalent | Notes | -|---|---|---| -| `spec.remoteURL` | `spec.remoteURL` | | -| `spec.port` (deprecated) | `spec.proxyConfig.proxyPort` | Use `proxyPort` | -| `spec.proxyPort` | `spec.proxyConfig.proxyPort` | | -| `spec.transport` | `spec.transport` | | -| `spec.groupRef` | `spec.groupRef` | | -| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | See Auth Flow — dual-consumer behaviour preserved | -| `spec.headerForward` | `spec.headerForward` | | -| `spec.toolConfigRef` | `spec.toolConfigRef` | | -| `spec.oidcConfig` | `spec.proxyConfig.oidcConfig` | | -| `spec.authzConfig` | `spec.proxyConfig.authzConfig` | | -| `spec.audit` | `spec.proxyConfig.audit` | | -| `spec.telemetry` | `spec.proxyConfig.telemetry` | | -| `spec.resources` | `spec.proxyConfig.resources` | | -| `spec.serviceAccount` | `spec.proxyConfig.serviceAccount` | | -| `spec.sessionAffinity` | `spec.proxyConfig.sessionAffinity` | | -| `spec.trustProxyHeaders` | `spec.proxyConfig.trustProxyHeaders` | | -| `spec.endpointPrefix` | `spec.proxyConfig.endpointPrefix` | | -| `spec.resourceOverrides` | `spec.proxyConfig.resourceOverrides` | | -| *(not present)* | `spec.type` | Set to `proxy` | -| *(not present)* | `spec.caBundleRef` | New field; not on MCPRemoteProxy | - -### Migration: MCPServerEntry → MCPRemoteEndpoint - -| `MCPServerEntry` field | `MCPRemoteEndpoint` equivalent | Notes | -|---|---|---| -| `spec.remoteURL` | `spec.remoteURL` | | -| `spec.transport` | `spec.transport` | | -| `spec.groupRef` | `spec.groupRef` | | -| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | | -| `spec.headerForward` | `spec.headerForward` | | -| `spec.caBundleRef` | `spec.caBundleRef` | | -| *(not present)* | `spec.type` | Set to `direct` | -| *(not present)* | `spec.toolConfigRef` | New field; not on MCPServerEntry | - -## Alternatives Considered - -### Alternative 1: Keep MCPServerEntry Permanently Alongside MCPRemoteProxy (THV-0055) - -MCPServerEntry (THV-0055) ships first as a near-term solution for direct -remote backends behind vMCP. However, keeping both MCPServerEntry and -MCPRemoteProxy permanently means two CRDs with overlapping goals -(`remoteURL`, `groupRef`, `externalAuthConfigRef`, `headerForward` on both), -increasing cognitive load and long-term CRD surface area. -MCPRemoteEndpoint unifies both under a single resource, with MCPServerEntry's -`type: direct` mode covering the same use case. MCPServerEntry will enter a -deprecation window once MCPRemoteEndpoint reaches GA. - -### Alternative 2: `direct: true` Flag on MCPRemoteProxy - -**Why not chosen:** MCPRemoteProxy has ~9 pod-deployment-specific fields -that become inapplicable and confusing with a direct flag. Field pollution is -too high. The typed `proxyConfig` sub-object in MCPRemoteEndpoint solves this -cleanly. - -### Alternative 3: Inline Remote Backends in VirtualMCPServer - -**Why not chosen:** Prevents RBAC separation (only VirtualMCPServer editors -can manage backends) and couples backend lifecycle to vMCP reconciliation. - -## Compatibility - -### Backward Compatibility - -- `MCPRemoteProxy` continues to function during the deprecation window -- `MCPServer` is unchanged -- `VirtualMCPServer`, `MCPGroup`, `MCPExternalAuthConfig` receive additive - changes only (new watches, new status fields alongside existing ones) - -### Forward Compatibility - -- Starts at `v1alpha1`, graduation path to `v1beta1` as part of broader CRD - revamp -- `type` field and typed sub-configs allow future modes without breaking changes - -## Implementation Plan - -### Phase 0: MCPRemoteProxy Deprecation + Controller Refactoring - -1. Add `deprecated: "true"` to MCPRemoteProxy CRD description -2. Emit Warning events on every MCPRemoteProxy `Reconcile()` call -3. Update documentation -4. Extract shared proxy reconciliation logic from `mcpremoteproxy_controller.go` - into `pkg/operator/remoteproxy/` — refactoring only, no API changes, - all existing MCPRemoteProxy tests must pass - -### Phase 1: CRD and Controller - -1. Define `MCPRemoteEndpoint` CRD types with struct-level CEL rules (see CRD - Validation Rules section for correct placement) -2. Implement controller with both code paths using the Phase 0 shared package -3. Generate CRD manifests; update Helm chart with default NetworkPolicy -4. Update MCPGroup controller — all 7 code changes listed above, including - field indexer registration and RBAC markers -5. Unit tests for both controller paths; CEL rule tests - -### Phase 2: Static Mode Integration - -1. Add `Type`, `CABundlePath`, `HeaderEnvVars` fields to `StaticBackendConfig`; - update vMCP binary and roundtrip test BEFORE operator starts writing them -2. Update VirtualMCPServer controller: `listMCPRemoteEndpointsAsMap()`, - `getExternalAuthConfigNameFromWorkload()`, ConfigMap generation -3. Add `SecretKeyRef` env vars to vMCP Deployment for `addHeadersFromSecret` - entries on `type: direct` endpoints; store env var names (not values) in - backend ConfigMap -4. Mount CA bundle ConfigMaps as volumes -5. Implement post-exchange audience validation in token exchange strategy -6. Extend vMCP audit middleware for `type: direct` (remote URL, auth outcome, - remote HTTP status) -7. Integration tests with envtest - -### Phase 3: Dynamic Mode Integration - -1. Add `WorkloadTypeMCPRemoteEndpoint` to `pkg/vmcp/workloads/discoverer.go` -2. Extend `BackendReconciler` and `ListWorkloadsInGroup()` / `GetWorkloadAsVMCPBackend()` -3. Implement `MCP-Protocol-Version` header injection for direct mode HTTP client -4. Implement reconnection handling (stream resumption, full re-init on 404, - circuit breaker) -5. Implement persistent GET stream for server-initiated notifications -6. Integration tests for dynamic discovery - -### Phase 4: Documentation and E2E - -1. CRD reference documentation -2. Migration guide: MCPRemoteProxy → MCPRemoteEndpoint -3. User guide with mode selection guidance -4. Document single-replica constraint for `type: direct` -5. E2E Chainsaw tests for both modes and mixed groups - -### Dependencies - -- THV-0014 (K8s-Aware vMCP) — already merged; dynamic mode (Phase 3) is unblocked - -## Open Questions - -1. **`groupRef` required?** Resolved: Yes. Follow-up: consider requiring it on - MCPServer and MCPRemoteProxy too for consistency. - -2. **MCPRemoteProxy removal timing?** Resolved: After two minor releases - post-MCPRemoteEndpoint GA. - -3. **`toolConfigRef` placement?** Resolved: Shared top-level field. - -4. **`disabled` field?** Deferred. `groupRef` is required; disabling an endpoint - requires deletion. A `disabled: true` field can be added additively later if - post-implementation feedback shows deletion is too disruptive. - -5. **Multi-replica for `type: direct`?** Resolved as out-of-scope. Single-replica - is the constraint. Redis session storage is a future follow-up requiring a new - `session.Storage` implementation. - -## References - -- [THV-0055: MCPServerEntry CRD](./THV-0055-mcpserverentry-direct-remote-backends.md) — near-term solution for direct remote backends; superseded by this RFC once MCPRemoteEndpoint reaches GA -- [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) -- [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) -- [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) -- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) (merged) -- [THV-0026: Header Passthrough](./THV-0026-header-passthrough.md) -- [MCP Specification 2025-11-25: Transports](https://modelcontextprotocol.io/specification/2025-11-25/basic/transports) -- [MCP Specification 2025-11-25: Lifecycle](https://modelcontextprotocol.io/specification/2025-11-25/basic/lifecycle) -- [RFC 8693: OAuth 2.0 Token Exchange](https://datatracker.ietf.org/doc/html/rfc8693) -- [Kubernetes API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) -- [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104) -- [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) - ---- - -## RFC Lifecycle - -| Date | Reviewer | Decision | Notes | -|---|---|---|---| -| 2026-03-18 | @ChrisJBurns, @jaosorior | Draft | Initial submission | -| 2026-03-18 | Review agents | Revision 1 | Addressed: CEL rules, type immutability, auth flows, session constraints, security hardening | -| 2026-03-18 | Review agents | Revision 2 | Fixed: CEL placement, auth flow accuracy, Redis claim, reconnection protocol, GET stream, embeddedAuthServer/awsSts restriction, MCPGroup RBAC markers, audience validation, broken anchors, CA bundle warning, emergency rotation, short name | -| 2026-04-07 | @jaosorior | Revision 3 | Split into separate RFC; MCPServerEntry (THV-0055) ships first as near-term solution | From 2d8603a9814c283ff7e688133ce4786b7acb8036 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Tue, 7 Apr 2026 13:04:56 +0000 Subject: [PATCH 9/9] Address @ChrisJBurns review feedback on SSRF blast radius and CA bundle complexity Co-Authored-By: Claude Opus 4.6 (1M context) --- .../THV-0055-mcpserverentry-direct-remote-backends.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 01aa7b5..b8404ee 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -523,7 +523,7 @@ external TLS support. |--------|-------------|------------| | Man-in-the-middle on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | | Credential exposure in CRD spec | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecrets`; never inline in CRD spec | -| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress. Note: CEL-based IP range blocking (e.g., RFC 1918) is intentionally not applied because MCPServerEntry legitimately targets internal/corporate MCP servers. RBAC is the appropriate control layer since resource creation is restricted to trusted operators. | +| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress. Note: CEL-based IP range blocking (e.g., RFC 1918) is intentionally not applied because MCPServerEntry legitimately targets internal/corporate MCP servers. RBAC is the appropriate control layer since resource creation is restricted to trusted operators. **Blast radius note:** With MCPRemoteProxy, outbound calls to remote URLs originate from an isolated proxy pod. With MCPServerEntry, those calls originate from the vMCP pod itself, which means a misconfigured `remoteURL` exposes vMCP's network position rather than a disposable proxy's. This is an intentional trade-off: the proxy pod's isolation was never a security boundary (it shares the same NetworkPolicy-governed namespace), but operators should be aware that vMCP's egress surface grows with each MCPServerEntry. Restricting vMCP egress via NetworkPolicy is strongly recommended. | | Auth config confusion (existing issue) | Dual-boundary auth leading to wrong tokens sent to wrong endpoints | Eliminated: MCPServerEntry has exactly one auth boundary with one purpose | | Operator probing external URLs | Controller making network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | @@ -581,6 +581,15 @@ external TLS support. path (e.g., `/etc/toolhive/ca-bundles//ca.crt`). The generated backend ConfigMap includes the mount path so vMCP can construct the `tls.Config` at startup. + - **Implementation note**: The VirtualMCPServer controller does not + currently mount arbitrary ConfigMaps as volumes, so this introduces + a new operator pattern. The controller will need to generate + per-entry volume and volumeMount entries in the vMCP Deployment spec, + and handle additions/removals of `caBundleRef` across MCPServerEntry + resources (which triggers a Deployment update and pod restart in + static mode). This is non-trivial but bounded... the same pattern + would be required by any solution that supports custom CA bundles, + including a `direct: true` flag on MCPRemoteProxy. - Secret rotation follows existing patterns: - **Dynamic mode**: Watch-based propagation, no pod restart needed. - **Static mode**: Requires pod restart (Deployment rollout).