Virtual MCP Server Architecture

The Virtual MCP Server (vMCP) aggregates multiple MCP servers from a ToolHive group into a single unified interface. This document explains the architecture and design of vMCP.

Overview

vMCP solves the problem of MCP server sprawl. As organizations deploy more specialized MCP servers, clients need to connect to multiple endpoints. vMCP provides:

Unified endpoint - One URL for clients to access many backends
Tool aggregation - Combine tools from multiple servers
Conflict resolution - Handle duplicate tool names automatically
Composite workflows - Create new tools that orchestrate multiple backends
Centralized security - Single authentication and authorization point
Token management - Exchange and cache tokens for backend access
Shared telemetry - Reference an MCPTelemetryConfig via telemetryConfigRef for fleet-wide OpenTelemetry settings

Architecture

The vmcp package follows Domain-Driven Design principles with clear separation into bounded contexts:

graph TB
    subgraph "Virtual MCP Server"
        Server[Server<br/>HTTP + MCP Protocol]
        Discovery[Discovery Manager]
        Router[Router]
        BackendClient[Backend Client]
        Health[Health Monitor]
    end

    subgraph "Aggregation"
        Aggregator[Aggregator]
        Conflict[Conflict Resolver]
    end

    subgraph "Authentication"
        InAuth[Incoming Auth<br/>OIDC / Anonymous]
        OutAuth[Outgoing Auth<br/>Token Exchange / Headers]
    end

    subgraph "MCPGroup"
        B1[MCPServer]
        B2[MCPServer]
        B3[MCPRemoteProxy]
        B4[MCPServerEntry]
    end

    Client[MCP Client] --> Server
    Server --> InAuth
    InAuth --> Discovery
    Discovery --> Aggregator
    Aggregator --> Conflict
    Discovery --> Router
    Router --> OutAuth
    OutAuth --> BackendClient
    BackendClient --> B1
    BackendClient --> B2
    BackendClient --> B3
    BackendClient --> B4
    Health --> B1
    Health --> B2
    Health --> B3
    Health --> B4

    style Server fill:#90caf9
    style Aggregator fill:#81c784
    style Router fill:#fff59d

Core Concepts

Concept	Purpose
Routing	Forward MCP requests (tools, resources, prompts) to appropriate backends
Aggregation	Discover capabilities, resolve conflicts, merge into unified view
Authentication	Two-boundary model: incoming (client → vMCP) and outgoing (vMCP → backend)
Composition	Execute multi-step workflows across multiple backends
Caching	Reduce auth overhead by caching exchanged tokens

Implementation: pkg/vmcp/ (discovery: pkg/vmcp/discovery/, routing: pkg/vmcp/router/)

Backend Discovery

vMCP discovers backends from an MCPGroup. The group acts as a container for related MCP servers that should be exposed together.

graph LR
    vMCP[VirtualMCPServer] -->|references| Group[MCPGroup]
    Group -->|contains| S1[MCPServer]
    Group -->|contains| S2[MCPServer]
    Group -->|contains| R1[MCPRemoteProxy]
    Group -->|contains| E1[MCPServerEntry]

    style vMCP fill:#90caf9
    style Group fill:#ba68c8

Discovery process:

VirtualMCPServer references an MCPGroup by name
All MCPServers, MCPRemoteProxies, and MCPServerEntries in that group are discovered
For each backend, URL, transport type, and auth config are extracted
vMCP queries each backend for available tools, resources, and prompts

MCPServerEntry backends connect directly to remote MCP servers without deploying a proxy pod. They are zero-infrastructure catalog entries that declare a remote endpoint URL, optional external auth, and an optional CA bundle for TLS verification. CA bundle data is fetched from Kubernetes ConfigMaps at discovery time. In dynamic mode, the BackendReconciler watches ConfigMap changes and uses a field index on spec.caBundleRef.configMapRef.name to efficiently re-reconcile only the MCPServerEntry backends affected by a given ConfigMap update.

Implementation: pkg/vmcp/aggregator/

Aggregation Pipeline

Aggregation happens in three stages:

graph LR
    A[1. Discovery<br/>Find backends] --> B[2. Query<br/>Get capabilities]
    B --> C[3. Resolve<br/>Handle conflicts]
    C --> D[4. Merge<br/>Create routing table]

    style A fill:#e3f2fd
    style B fill:#e8f5e9
    style C fill:#fff3e0
    style D fill:#fce4ec

Discovery - Find all backends in the MCPGroup
Query - Ask each backend for its tools, resources, and prompts (parallel)
Resolve - Handle naming conflicts using configured strategy
Merge - Create unified routing table mapping names to backends

Conflict Resolution

When backends expose tools with the same name, vMCP resolves the conflict using one of three strategies:

Strategy	Behavior
prefix	Prepend backend name to all tools (e.g., `github_create_issue`)
priority	First backend in priority order wins, others hidden
manual	Explicit mapping for each conflict

Tool Filtering

Beyond conflict resolution, vMCP can filter which tools are exposed through allow/deny lists, renaming, and description overrides.

Implementation: pkg/vmcp/aggregator/

Composite Tools

Composite tools are new tools defined in vMCP that orchestrate calls to multiple backend tools. They enable complex workflows without client awareness of the underlying backends.

graph LR
    subgraph "Composite Tool"
        Step1[Step 1]
        Step2[Step 2]
        Step3[Step 3]
    end

    Step1 --> Step2
    Step1 --> Step3

    style Step1 fill:#90caf9
    style Step2 fill:#81c784
    style Step3 fill:#81c784

Step dependencies form a DAG (Directed Acyclic Graph). Steps without dependencies execute in parallel, while dependent steps wait for prerequisites.

Steps can be of three types:

tool: Execute a backend tool
elicitation: Request user input via MCP elicitation protocol
forEach: Iterate over a collection from a previous step, executing an inner tool step per item with bounded parallelism

Implementation: pkg/vmcp/composer/

Two-Boundary Authentication

vMCP uses separate authentication for incoming clients and outgoing backend calls:

graph LR
    subgraph "Boundary 1: Incoming"
        Client[Client] -->|JWT| vMCP[vMCP]
    end

    subgraph "Boundary 2: Outgoing"
        vMCP -->|Exchanged Token| Backend[Backend]
    end

    style Client fill:#e3f2fd
    style vMCP fill:#90caf9
    style Backend fill:#ffb74d

Incoming Authentication

Validates clients connecting to vMCP using OIDC token validation or anonymous access.

Outgoing Authentication

Authenticates vMCP to backend MCP servers using:

Token exchange - RFC 8693 exchange of client token for backend-specific token
Header injection - Static API key or header injection
Unauthenticated - For internal/trusted backends

Exchanged tokens are cached to avoid repeated exchange calls.

Implementation: pkg/vmcp/auth/, pkg/vmcp/cache/

Request Flow

sequenceDiagram
    participant Client
    participant Server as vMCP Server
    participant Router
    participant Backend

    Client->>Server: tools/call (tool_name)
    Server->>Server: Validate client auth
    Server->>Router: Route tool_name
    Router->>Server: BackendTarget
    Server->>Server: Apply outgoing auth
    Server->>Backend: tools/call (original_name)
    Backend->>Server: Tool result
    Server->>Client: Tool result

Key insight: If a tool was renamed during conflict resolution (e.g., github_create_issue), vMCP translates it back to the original name (create_issue) when calling the backend.

Request Processing Pipeline

vMCP uses a middleware chain to process incoming requests. The chain is configured in pkg/vmcp/server/server.go.

Middleware Execution Order

Middleware is applied by wrapping handlers, so execution order is outer-to-inner:

Order	Middleware	Required	Purpose
1	Recovery	Always	Catches panics, returns HTTP 500
2	Authentication	Optional	Validates incoming JWT tokens (OIDC/Anonymous)
3	Authorization	Optional	Evaluates Cedar policies (composed with auth)
4	Audit	Optional	Logs request events for compliance
5	Discovery	Always	Aggregates backend capabilities per session
6	Backend Enrichment	Optional	Adds backend name to audit context
7	Telemetry	Optional	OpenTelemetry instrumentation

Discovery Middleware

The Discovery middleware (pkg/vmcp/discovery/middleware.go) is central to vMCP's multi-tenant design:

Initialize requests (no session ID): Discovers capabilities from all backends in the MCPGroup, stores routing table in session
Subsequent requests (with session ID): Retrieves cached capabilities from session

This lazy per-session discovery ensures:

Deterministic behavior within a session
Support for dynamic backends (Kubernetes)
No notification spam from redundant capability updates

Timeouts: Discovery has a 15-second timeout. Timeout returns HTTP 504, discovery failure returns HTTP 503.

Backend Enrichment Middleware

When Audit is configured, the Backend Enrichment middleware (pkg/vmcp/server/backend_enrichment.go) parses the MCP request to determine which backend will handle it:

MCP Method	Lookup
`tools/call`	`name` → `RoutingTable.Tools`
`resources/read`	`uri` → `RoutingTable.Resources`
`prompts/get`	`name` → `RoutingTable.Prompts`

This enriches audit events with the backend name for better observability.

Authentication Composition

When Authorization is configured, Authentication middleware is composed with MCP Parsing and Authorization:

Authentication → MCP Parsing → Authorization → Next Handler

This composition is created by pkg/vmcp/auth/factory/incoming.NewIncomingAuthMiddleware().

Implementation: pkg/vmcp/server/server.go, pkg/vmcp/discovery/middleware.go, pkg/vmcp/auth/factory/

Health Monitoring

vMCP monitors backend health with configurable intervals. Health status (healthy, degraded, unhealthy, unauthenticated, unknown) affects routing decisions and is reported in VirtualMCPServer status.

Implementation: pkg/vmcp/health/

Deployment

vMCP can be deployed in three ways:

Kubernetes - Via the VirtualMCPServer CRD managed by the operator
Local CLI (thv vmcp) - Recommended path for local and non-Kubernetes use; built into the main thv binary
Standalone vmcp binary - Preserved for backwards compatibility and advanced CLI use

Implementation:

Kubernetes: cmd/thv-operator/controllers/virtualmcpserver_controller.go
Local CLI: cmd/thv/app/vmcp.go, pkg/vmcp/cli/
Standalone binary: cmd/vmcp/

Local CLI Mode

thv vmcp is the recommended way to run a vMCP server outside of Kubernetes. It provides the same aggregation, tool routing, and optimizer capabilities as the Kubernetes-managed VirtualMCPServer, but runs as a local foreground process driven by Cobra CLI flags.

Key features:

Zero-config quick mode: thv vmcp serve --group <name> generates an in-memory config from a running ToolHive group — no YAML file required.
Config-file workflow: thv vmcp init → thv vmcp validate → thv vmcp serve --config for reproducible deployments.
Optimizer tiers: optional FTS5 keyword search (Tier 1) and managed TEI semantic search (Tier 2) reduce tool count for MCP clients.
Loopback-only binding: quick mode enforces a loopback-only host via ServeConfig.validateQuickModeHost — localhost, 127.0.0.1, ::1, or any other loopback IP is accepted; non-loopback addresses are rejected.

See Local vMCP CLI Mode for the full architecture, optimizer tier table, and TEI container lifecycle documentation.

Status Reporting

Status reporting enables vMCP runtime to report operational status directly instead of relying on the operator to infer state. Status reporting is optional and pluggable so different environments can consume status (CLI vs Kubernetes) without duplicating discovery logic.

Why Status Reporting

Avoid duplicate backend discovery: vMCP already discovers backends for capability aggregation; we reuse that data for status instead of having the operator rediscover.
Provide authoritative runtime view: backend availability, phase, and conditions are produced at runtime by the component that actually talks to backends.
Enable multiple sinks: logging for CLI, Kubernetes CRD status for clusters, future file/metrics reporters.

Key Concepts

StatusReporter interface (pkg/vmcp/status/reporter.go): ReportStatus(ctx, *vmcp.Status) and Start(ctx) returning shutdown func.
Status model (pkg/vmcp/types.go):
- Phase: Pending, Ready, Degraded, Failed
- Conditions: metav1.Condition (ready, backends discovered, auth configured) using shared constants
- DiscoveredBackends: backend URL/auth type/health with timestamps
CLI reporter: Logging-only reporter (no persistence) always logs status updates.
Lifecycle hook: server starts the reporter, collects shutdown funcs, and stops them during graceful shutdown.

Integration in vMCP Runtime

Server config (pkg/vmcp/server/server.go): optional StatusReporter; nil disables status reporting.
Startup: reporter Start is invoked; failure is treated as fatal when configured. Shutdown funcs are collected and run on Stop.
Reporting: runtime components call ReportStatus as discovery and health change.

Extensibility

Additional reporters can be added under pkg/vmcp/status/ implementing Reporter and using shared vmcp.Status types.
Future sinks: Kubernetes status writer, file-based reporter for CLI (thv status), metrics exporter.

Implementation: pkg/vmcp/status/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual MCP Server Architecture

Overview

Architecture

Core Concepts

Backend Discovery

Aggregation Pipeline

Conflict Resolution

Tool Filtering

Composite Tools

Two-Boundary Authentication

Incoming Authentication

Outgoing Authentication

Request Flow

Request Processing Pipeline

Middleware Execution Order

Discovery Middleware

Backend Enrichment Middleware

Authentication Composition

Health Monitoring

Deployment

Local CLI Mode

Status Reporting

Why Status Reporting

Key Concepts

Integration in vMCP Runtime

Extensibility

Related Documentation

FilesExpand file tree

10-virtual-mcp-architecture.md

Latest commit

History

10-virtual-mcp-architecture.md

File metadata and controls

Virtual MCP Server Architecture

Overview

Architecture

Core Concepts

Backend Discovery

Aggregation Pipeline

Conflict Resolution

Tool Filtering

Composite Tools

Two-Boundary Authentication

Incoming Authentication

Outgoing Authentication

Request Flow

Request Processing Pipeline

Middleware Execution Order

Discovery Middleware

Backend Enrichment Middleware

Authentication Composition

Health Monitoring

Deployment

Local CLI Mode

Status Reporting

Why Status Reporting

Key Concepts

Integration in vMCP Runtime

Extensibility

Related Documentation