Skip to content

Latest commit

 

History

History
359 lines (259 loc) · 14.5 KB

File metadata and controls

359 lines (259 loc) · 14.5 KB

Virtual MCP Server Architecture

The Virtual MCP Server (vMCP) aggregates multiple MCP servers from a ToolHive group into a single unified interface. This document explains the architecture and design of vMCP.

Overview

vMCP solves the problem of MCP server sprawl. As organizations deploy more specialized MCP servers, clients need to connect to multiple endpoints. vMCP provides:

  • Unified endpoint - One URL for clients to access many backends
  • Tool aggregation - Combine tools from multiple servers
  • Conflict resolution - Handle duplicate tool names automatically
  • Composite workflows - Create new tools that orchestrate multiple backends
  • Centralized security - Single authentication and authorization point
  • Token management - Exchange and cache tokens for backend access
  • Shared telemetry - Reference an MCPTelemetryConfig via telemetryConfigRef for fleet-wide OpenTelemetry settings

Architecture

The vmcp package follows Domain-Driven Design principles with clear separation into bounded contexts:

graph TB
    subgraph "Virtual MCP Server"
        Server[Server<br/>HTTP + MCP Protocol]
        Discovery[Discovery Manager]
        Router[Router]
        BackendClient[Backend Client]
        Health[Health Monitor]
    end

    subgraph "Aggregation"
        Aggregator[Aggregator]
        Conflict[Conflict Resolver]
    end

    subgraph "Authentication"
        InAuth[Incoming Auth<br/>OIDC / Anonymous]
        OutAuth[Outgoing Auth<br/>Token Exchange / Headers]
    end

    subgraph "MCPGroup"
        B1[MCPServer]
        B2[MCPServer]
        B3[MCPRemoteProxy]
        B4[MCPServerEntry]
    end

    Client[MCP Client] --> Server
    Server --> InAuth
    InAuth --> Discovery
    Discovery --> Aggregator
    Aggregator --> Conflict
    Discovery --> Router
    Router --> OutAuth
    OutAuth --> BackendClient
    BackendClient --> B1
    BackendClient --> B2
    BackendClient --> B3
    BackendClient --> B4
    Health --> B1
    Health --> B2
    Health --> B3
    Health --> B4

    style Server fill:#90caf9
    style Aggregator fill:#81c784
    style Router fill:#fff59d
Loading

Core Concepts

Concept Purpose
Routing Forward MCP requests (tools, resources, prompts) to appropriate backends
Aggregation Discover capabilities, resolve conflicts, merge into unified view
Authentication Two-boundary model: incoming (client → vMCP) and outgoing (vMCP → backend)
Composition Execute multi-step workflows across multiple backends
Caching Reduce auth overhead by caching exchanged tokens

Implementation: pkg/vmcp/ (discovery: pkg/vmcp/discovery/, routing: pkg/vmcp/router/)

Backend Discovery

vMCP discovers backends from an MCPGroup. The group acts as a container for related MCP servers that should be exposed together.

graph LR
    vMCP[VirtualMCPServer] -->|references| Group[MCPGroup]
    Group -->|contains| S1[MCPServer]
    Group -->|contains| S2[MCPServer]
    Group -->|contains| R1[MCPRemoteProxy]
    Group -->|contains| E1[MCPServerEntry]

    style vMCP fill:#90caf9
    style Group fill:#ba68c8
Loading

Discovery process:

  1. VirtualMCPServer references an MCPGroup by name
  2. All MCPServers, MCPRemoteProxies, and MCPServerEntries in that group are discovered
  3. For each backend, URL, transport type, and auth config are extracted
  4. vMCP queries each backend for available tools, resources, and prompts

MCPServerEntry backends connect directly to remote MCP servers without deploying a proxy pod. They are zero-infrastructure catalog entries that declare a remote endpoint URL, optional external auth, and an optional CA bundle for TLS verification. CA bundle data is fetched from Kubernetes ConfigMaps at discovery time. In dynamic mode, the BackendReconciler watches ConfigMap changes and uses a field index on spec.caBundleRef.configMapRef.name to efficiently re-reconcile only the MCPServerEntry backends affected by a given ConfigMap update.

Implementation: pkg/vmcp/aggregator/

Aggregation Pipeline

Aggregation happens in three stages:

graph LR
    A[1. Discovery<br/>Find backends] --> B[2. Query<br/>Get capabilities]
    B --> C[3. Resolve<br/>Handle conflicts]
    C --> D[4. Merge<br/>Create routing table]

    style A fill:#e3f2fd
    style B fill:#e8f5e9
    style C fill:#fff3e0
    style D fill:#fce4ec
Loading
  1. Discovery - Find all backends in the MCPGroup
  2. Query - Ask each backend for its tools, resources, and prompts (parallel)
  3. Resolve - Handle naming conflicts using configured strategy
  4. Merge - Create unified routing table mapping names to backends

Conflict Resolution

When backends expose tools with the same name, vMCP resolves the conflict using one of three strategies:

Strategy Behavior
prefix Prepend backend name to all tools (e.g., github_create_issue)
priority First backend in priority order wins, others hidden
manual Explicit mapping for each conflict

Tool Filtering

Beyond conflict resolution, vMCP can filter which tools are exposed through allow/deny lists, renaming, and description overrides.

Implementation: pkg/vmcp/aggregator/

Composite Tools

Composite tools are new tools defined in vMCP that orchestrate calls to multiple backend tools. They enable complex workflows without client awareness of the underlying backends.

graph LR
    subgraph "Composite Tool"
        Step1[Step 1]
        Step2[Step 2]
        Step3[Step 3]
    end

    Step1 --> Step2
    Step1 --> Step3

    style Step1 fill:#90caf9
    style Step2 fill:#81c784
    style Step3 fill:#81c784
Loading

Step dependencies form a DAG (Directed Acyclic Graph). Steps without dependencies execute in parallel, while dependent steps wait for prerequisites.

Steps can be of three types:

  • tool: Execute a backend tool
  • elicitation: Request user input via MCP elicitation protocol
  • forEach: Iterate over a collection from a previous step, executing an inner tool step per item with bounded parallelism

Implementation: pkg/vmcp/composer/

Two-Boundary Authentication

vMCP uses separate authentication for incoming clients and outgoing backend calls:

graph LR
    subgraph "Boundary 1: Incoming"
        Client[Client] -->|JWT| vMCP[vMCP]
    end

    subgraph "Boundary 2: Outgoing"
        vMCP -->|Exchanged Token| Backend[Backend]
    end

    style Client fill:#e3f2fd
    style vMCP fill:#90caf9
    style Backend fill:#ffb74d
Loading

Incoming Authentication

Validates clients connecting to vMCP using OIDC token validation or anonymous access.

Outgoing Authentication

Authenticates vMCP to backend MCP servers using:

  • Token exchange - RFC 8693 exchange of client token for backend-specific token
  • Header injection - Static API key or header injection
  • Unauthenticated - For internal/trusted backends

Exchanged tokens are cached to avoid repeated exchange calls.

Implementation: pkg/vmcp/auth/, pkg/vmcp/cache/

Request Flow

sequenceDiagram
    participant Client
    participant Server as vMCP Server
    participant Router
    participant Backend

    Client->>Server: tools/call (tool_name)
    Server->>Server: Validate client auth
    Server->>Router: Route tool_name
    Router->>Server: BackendTarget
    Server->>Server: Apply outgoing auth
    Server->>Backend: tools/call (original_name)
    Backend->>Server: Tool result
    Server->>Client: Tool result
Loading

Key insight: If a tool was renamed during conflict resolution (e.g., github_create_issue), vMCP translates it back to the original name (create_issue) when calling the backend.

Request Processing Pipeline

vMCP uses a middleware chain to process incoming requests. The chain is configured in pkg/vmcp/server/server.go.

Middleware Execution Order

Middleware is applied by wrapping handlers, so execution order is outer-to-inner:

Order Middleware Required Purpose
1 Recovery Always Catches panics, returns HTTP 500
2 Authentication Optional Validates incoming JWT tokens (OIDC/Anonymous)
3 Authorization Optional Evaluates Cedar policies (composed with auth)
4 Audit Optional Logs request events for compliance
5 Discovery Always Aggregates backend capabilities per session
6 Backend Enrichment Optional Adds backend name to audit context
7 Telemetry Optional OpenTelemetry instrumentation

Discovery Middleware

The Discovery middleware (pkg/vmcp/discovery/middleware.go) is central to vMCP's multi-tenant design:

  • Initialize requests (no session ID): Discovers capabilities from all backends in the MCPGroup, stores routing table in session
  • Subsequent requests (with session ID): Retrieves cached capabilities from session

This lazy per-session discovery ensures:

  • Deterministic behavior within a session
  • Support for dynamic backends (Kubernetes)
  • No notification spam from redundant capability updates

Timeouts: Discovery has a 15-second timeout. Timeout returns HTTP 504, discovery failure returns HTTP 503.

Backend Enrichment Middleware

When Audit is configured, the Backend Enrichment middleware (pkg/vmcp/server/backend_enrichment.go) parses the MCP request to determine which backend will handle it:

MCP Method Lookup
tools/call nameRoutingTable.Tools
resources/read uriRoutingTable.Resources
prompts/get nameRoutingTable.Prompts

This enriches audit events with the backend name for better observability.

Authentication Composition

When Authorization is configured, Authentication middleware is composed with MCP Parsing and Authorization:

Authentication → MCP Parsing → Authorization → Next Handler

This composition is created by pkg/vmcp/auth/factory/incoming.NewIncomingAuthMiddleware().

Implementation: pkg/vmcp/server/server.go, pkg/vmcp/discovery/middleware.go, pkg/vmcp/auth/factory/

Health Monitoring

vMCP monitors backend health with configurable intervals. Health status (healthy, degraded, unhealthy, unauthenticated, unknown) affects routing decisions and is reported in VirtualMCPServer status.

Implementation: pkg/vmcp/health/

Deployment

vMCP can be deployed in three ways:

  • Kubernetes - Via the VirtualMCPServer CRD managed by the operator
  • Local CLI (thv vmcp) - Recommended path for local and non-Kubernetes use; built into the main thv binary
  • Standalone vmcp binary - Preserved for backwards compatibility and advanced CLI use

Implementation:

  • Kubernetes: cmd/thv-operator/controllers/virtualmcpserver_controller.go
  • Local CLI: cmd/thv/app/vmcp.go, pkg/vmcp/cli/
  • Standalone binary: cmd/vmcp/

Local CLI Mode

thv vmcp is the recommended way to run a vMCP server outside of Kubernetes. It provides the same aggregation, tool routing, and optimizer capabilities as the Kubernetes-managed VirtualMCPServer, but runs as a local foreground process driven by Cobra CLI flags.

Key features:

  • Zero-config quick mode: thv vmcp serve --group <name> generates an in-memory config from a running ToolHive group — no YAML file required.
  • Config-file workflow: thv vmcp initthv vmcp validatethv vmcp serve --config for reproducible deployments.
  • Optimizer tiers: optional FTS5 keyword search (Tier 1) and managed TEI semantic search (Tier 2) reduce tool count for MCP clients.
  • Loopback-only binding: quick mode enforces a loopback-only host via ServeConfig.validateQuickModeHostlocalhost, 127.0.0.1, ::1, or any other loopback IP is accepted; non-loopback addresses are rejected.

See Local vMCP CLI Mode for the full architecture, optimizer tier table, and TEI container lifecycle documentation.

Status Reporting

Status reporting enables vMCP runtime to report operational status directly instead of relying on the operator to infer state. Status reporting is optional and pluggable so different environments can consume status (CLI vs Kubernetes) without duplicating discovery logic.

Why Status Reporting

  • Avoid duplicate backend discovery: vMCP already discovers backends for capability aggregation; we reuse that data for status instead of having the operator rediscover.
  • Provide authoritative runtime view: backend availability, phase, and conditions are produced at runtime by the component that actually talks to backends.
  • Enable multiple sinks: logging for CLI, Kubernetes CRD status for clusters, future file/metrics reporters.

Key Concepts

  • StatusReporter interface (pkg/vmcp/status/reporter.go): ReportStatus(ctx, *vmcp.Status) and Start(ctx) returning shutdown func.
  • Status model (pkg/vmcp/types.go):
    • Phase: Pending, Ready, Degraded, Failed
    • Conditions: metav1.Condition (ready, backends discovered, auth configured) using shared constants
    • DiscoveredBackends: backend URL/auth type/health with timestamps
  • CLI reporter: Logging-only reporter (no persistence) always logs status updates.
  • Lifecycle hook: server starts the reporter, collects shutdown funcs, and stops them during graceful shutdown.

Integration in vMCP Runtime

  • Server config (pkg/vmcp/server/server.go): optional StatusReporter; nil disables status reporting.
  • Startup: reporter Start is invoked; failure is treated as fatal when configured. Shutdown funcs are collected and run on Stop.
  • Reporting: runtime components call ReportStatus as discovery and health change.

Extensibility

  • Additional reporters can be added under pkg/vmcp/status/ implementing Reporter and using shared vmcp.Status types.
  • Future sinks: Kubernetes status writer, file-based reporter for CLI (thv status), metrics exporter.

Implementation: pkg/vmcp/status/

Related Documentation