The Virtual MCP Server (vMCP) aggregates multiple MCP servers from a ToolHive group into a single unified interface. This document explains the architecture and design of vMCP.
vMCP solves the problem of MCP server sprawl. As organizations deploy more specialized MCP servers, clients need to connect to multiple endpoints. vMCP provides:
- Unified endpoint - One URL for clients to access many backends
- Tool aggregation - Combine tools from multiple servers
- Conflict resolution - Handle duplicate tool names automatically
- Composite workflows - Create new tools that orchestrate multiple backends
- Centralized security - Single authentication and authorization point
- Token management - Exchange and cache tokens for backend access
- Shared telemetry - Reference an MCPTelemetryConfig via
telemetryConfigReffor fleet-wide OpenTelemetry settings
The vmcp package follows Domain-Driven Design principles with clear separation into bounded contexts:
graph TB
subgraph "Virtual MCP Server"
Server[Server<br/>HTTP + MCP Protocol]
Discovery[Discovery Manager]
Router[Router]
BackendClient[Backend Client]
Health[Health Monitor]
end
subgraph "Aggregation"
Aggregator[Aggregator]
Conflict[Conflict Resolver]
end
subgraph "Authentication"
InAuth[Incoming Auth<br/>OIDC / Anonymous]
OutAuth[Outgoing Auth<br/>Token Exchange / Headers]
end
subgraph "MCPGroup"
B1[MCPServer]
B2[MCPServer]
B3[MCPRemoteProxy]
B4[MCPServerEntry]
end
Client[MCP Client] --> Server
Server --> InAuth
InAuth --> Discovery
Discovery --> Aggregator
Aggregator --> Conflict
Discovery --> Router
Router --> OutAuth
OutAuth --> BackendClient
BackendClient --> B1
BackendClient --> B2
BackendClient --> B3
BackendClient --> B4
Health --> B1
Health --> B2
Health --> B3
Health --> B4
style Server fill:#90caf9
style Aggregator fill:#81c784
style Router fill:#fff59d
| Concept | Purpose |
|---|---|
| Routing | Forward MCP requests (tools, resources, prompts) to appropriate backends |
| Aggregation | Discover capabilities, resolve conflicts, merge into unified view |
| Authentication | Two-boundary model: incoming (client → vMCP) and outgoing (vMCP → backend) |
| Composition | Execute multi-step workflows across multiple backends |
| Caching | Reduce auth overhead by caching exchanged tokens |
Implementation: pkg/vmcp/ (discovery: pkg/vmcp/discovery/, routing: pkg/vmcp/router/)
vMCP discovers backends from an MCPGroup. The group acts as a container for related MCP servers that should be exposed together.
graph LR
vMCP[VirtualMCPServer] -->|references| Group[MCPGroup]
Group -->|contains| S1[MCPServer]
Group -->|contains| S2[MCPServer]
Group -->|contains| R1[MCPRemoteProxy]
Group -->|contains| E1[MCPServerEntry]
style vMCP fill:#90caf9
style Group fill:#ba68c8
Discovery process:
- VirtualMCPServer references an MCPGroup by name
- All MCPServers, MCPRemoteProxies, and MCPServerEntries in that group are discovered
- For each backend, URL, transport type, and auth config are extracted
- vMCP queries each backend for available tools, resources, and prompts
MCPServerEntry backends connect directly to remote MCP servers without deploying a proxy pod. They are zero-infrastructure catalog entries that declare a remote endpoint URL, optional external auth, and an optional CA bundle for TLS verification. CA bundle data is fetched from Kubernetes ConfigMaps at discovery time. In dynamic mode, the BackendReconciler watches ConfigMap changes and uses a field index on spec.caBundleRef.configMapRef.name to efficiently re-reconcile only the MCPServerEntry backends affected by a given ConfigMap update.
Implementation: pkg/vmcp/aggregator/
Aggregation happens in three stages:
graph LR
A[1. Discovery<br/>Find backends] --> B[2. Query<br/>Get capabilities]
B --> C[3. Resolve<br/>Handle conflicts]
C --> D[4. Merge<br/>Create routing table]
style A fill:#e3f2fd
style B fill:#e8f5e9
style C fill:#fff3e0
style D fill:#fce4ec
- Discovery - Find all backends in the MCPGroup
- Query - Ask each backend for its tools, resources, and prompts (parallel)
- Resolve - Handle naming conflicts using configured strategy
- Merge - Create unified routing table mapping names to backends
When backends expose tools with the same name, vMCP resolves the conflict using one of three strategies:
| Strategy | Behavior |
|---|---|
| prefix | Prepend backend name to all tools (e.g., github_create_issue) |
| priority | First backend in priority order wins, others hidden |
| manual | Explicit mapping for each conflict |
Beyond conflict resolution, vMCP can filter which tools are exposed through allow/deny lists, renaming, and description overrides.
Implementation: pkg/vmcp/aggregator/
Composite tools are new tools defined in vMCP that orchestrate calls to multiple backend tools. They enable complex workflows without client awareness of the underlying backends.
graph LR
subgraph "Composite Tool"
Step1[Step 1]
Step2[Step 2]
Step3[Step 3]
end
Step1 --> Step2
Step1 --> Step3
style Step1 fill:#90caf9
style Step2 fill:#81c784
style Step3 fill:#81c784
Step dependencies form a DAG (Directed Acyclic Graph). Steps without dependencies execute in parallel, while dependent steps wait for prerequisites.
Steps can be of three types:
- tool: Execute a backend tool
- elicitation: Request user input via MCP elicitation protocol
- forEach: Iterate over a collection from a previous step, executing an inner tool step per item with bounded parallelism
Implementation: pkg/vmcp/composer/
vMCP uses separate authentication for incoming clients and outgoing backend calls:
graph LR
subgraph "Boundary 1: Incoming"
Client[Client] -->|JWT| vMCP[vMCP]
end
subgraph "Boundary 2: Outgoing"
vMCP -->|Exchanged Token| Backend[Backend]
end
style Client fill:#e3f2fd
style vMCP fill:#90caf9
style Backend fill:#ffb74d
Validates clients connecting to vMCP using OIDC token validation or anonymous access.
Authenticates vMCP to backend MCP servers using:
- Token exchange - RFC 8693 exchange of client token for backend-specific token
- Header injection - Static API key or header injection
- Unauthenticated - For internal/trusted backends
Exchanged tokens are cached to avoid repeated exchange calls.
Implementation: pkg/vmcp/auth/, pkg/vmcp/cache/
sequenceDiagram
participant Client
participant Server as vMCP Server
participant Router
participant Backend
Client->>Server: tools/call (tool_name)
Server->>Server: Validate client auth
Server->>Router: Route tool_name
Router->>Server: BackendTarget
Server->>Server: Apply outgoing auth
Server->>Backend: tools/call (original_name)
Backend->>Server: Tool result
Server->>Client: Tool result
Key insight: If a tool was renamed during conflict resolution (e.g., github_create_issue), vMCP translates it back to the original name (create_issue) when calling the backend.
vMCP uses a middleware chain to process incoming requests. The chain is configured in pkg/vmcp/server/server.go.
Middleware is applied by wrapping handlers, so execution order is outer-to-inner:
| Order | Middleware | Required | Purpose |
|---|---|---|---|
| 1 | Recovery | Always | Catches panics, returns HTTP 500 |
| 2 | Authentication | Optional | Validates incoming JWT tokens (OIDC/Anonymous) |
| 3 | Authorization | Optional | Evaluates Cedar policies (composed with auth) |
| 4 | Audit | Optional | Logs request events for compliance |
| 5 | Discovery | Always | Aggregates backend capabilities per session |
| 6 | Backend Enrichment | Optional | Adds backend name to audit context |
| 7 | Telemetry | Optional | OpenTelemetry instrumentation |
The Discovery middleware (pkg/vmcp/discovery/middleware.go) is central to vMCP's multi-tenant design:
- Initialize requests (no session ID): Discovers capabilities from all backends in the MCPGroup, stores routing table in session
- Subsequent requests (with session ID): Retrieves cached capabilities from session
This lazy per-session discovery ensures:
- Deterministic behavior within a session
- Support for dynamic backends (Kubernetes)
- No notification spam from redundant capability updates
Timeouts: Discovery has a 15-second timeout. Timeout returns HTTP 504, discovery failure returns HTTP 503.
When Audit is configured, the Backend Enrichment middleware (pkg/vmcp/server/backend_enrichment.go) parses the MCP request to determine which backend will handle it:
| MCP Method | Lookup |
|---|---|
tools/call |
name → RoutingTable.Tools |
resources/read |
uri → RoutingTable.Resources |
prompts/get |
name → RoutingTable.Prompts |
This enriches audit events with the backend name for better observability.
When Authorization is configured, Authentication middleware is composed with MCP Parsing and Authorization:
Authentication → MCP Parsing → Authorization → Next Handler
This composition is created by pkg/vmcp/auth/factory/incoming.NewIncomingAuthMiddleware().
Implementation: pkg/vmcp/server/server.go, pkg/vmcp/discovery/middleware.go, pkg/vmcp/auth/factory/
vMCP monitors backend health with configurable intervals. Health status (healthy, degraded, unhealthy, unauthenticated, unknown) affects routing decisions and is reported in VirtualMCPServer status.
Implementation: pkg/vmcp/health/
vMCP can be deployed in three ways:
- Kubernetes - Via the VirtualMCPServer CRD managed by the operator
- Local CLI (
thv vmcp) - Recommended path for local and non-Kubernetes use; built into the mainthvbinary - Standalone
vmcpbinary - Preserved for backwards compatibility and advanced CLI use
Implementation:
- Kubernetes:
cmd/thv-operator/controllers/virtualmcpserver_controller.go - Local CLI:
cmd/thv/app/vmcp.go,pkg/vmcp/cli/ - Standalone binary:
cmd/vmcp/
thv vmcp is the recommended way to run a vMCP server outside of Kubernetes. It provides the same aggregation, tool routing, and optimizer capabilities as the Kubernetes-managed VirtualMCPServer, but runs as a local foreground process driven by Cobra CLI flags.
Key features:
- Zero-config quick mode:
thv vmcp serve --group <name>generates an in-memory config from a running ToolHive group — no YAML file required. - Config-file workflow:
thv vmcp init→thv vmcp validate→thv vmcp serve --configfor reproducible deployments. - Optimizer tiers: optional FTS5 keyword search (Tier 1) and managed TEI semantic search (Tier 2) reduce tool count for MCP clients.
- Loopback-only binding: quick mode enforces a loopback-only host via
ServeConfig.validateQuickModeHost—localhost,127.0.0.1,::1, or any other loopback IP is accepted; non-loopback addresses are rejected.
See Local vMCP CLI Mode for the full architecture, optimizer tier table, and TEI container lifecycle documentation.
Status reporting enables vMCP runtime to report operational status directly instead of relying on the operator to infer state. Status reporting is optional and pluggable so different environments can consume status (CLI vs Kubernetes) without duplicating discovery logic.
- Avoid duplicate backend discovery: vMCP already discovers backends for capability aggregation; we reuse that data for status instead of having the operator rediscover.
- Provide authoritative runtime view: backend availability, phase, and conditions are produced at runtime by the component that actually talks to backends.
- Enable multiple sinks: logging for CLI, Kubernetes CRD status for clusters, future file/metrics reporters.
StatusReporterinterface (pkg/vmcp/status/reporter.go):ReportStatus(ctx, *vmcp.Status)andStart(ctx)returning shutdown func.- Status model (
pkg/vmcp/types.go):- Phase: Pending, Ready, Degraded, Failed
- Conditions:
metav1.Condition(ready, backends discovered, auth configured) using shared constants - DiscoveredBackends: backend URL/auth type/health with timestamps
- CLI reporter: Logging-only reporter (no persistence) always logs status updates.
- Lifecycle hook: server starts the reporter, collects shutdown funcs, and stops them during graceful shutdown.
- Server config (
pkg/vmcp/server/server.go): optionalStatusReporter; nil disables status reporting. - Startup: reporter
Startis invoked; failure is treated as fatal when configured. Shutdown funcs are collected and run onStop. - Reporting: runtime components call
ReportStatusas discovery and health change.
- Additional reporters can be added under
pkg/vmcp/status/implementingReporterand using sharedvmcp.Statustypes. - Future sinks: Kubernetes status writer, file-based reporter for CLI (
thv status), metrics exporter.
Implementation: pkg/vmcp/status/
- Core Concepts - Virtual MCP Server concept
- Groups - MCPGroup for backend organization
- Operator Architecture - CRD details
- Transport Architecture - Transport types used by backends
- Middleware Architecture - Shared middleware system (Authentication, Audit, Telemetry, etc.)
- Local vMCP CLI Mode -
thv vmcpCLI surface, optimizer tiers, and TEI lifecycle - vMCP Library Embedding - Embedding
pkg/vmcp/in downstream Go projects - vMCP Scalability Limits and Constraints - Per-pod session cap, TTL mechanics, Redis sizing, and pod restart behaviour