The thv vmcp subcommand lets users run a Virtual MCP Server (vMCP) locally without Kubernetes. It aggregates multiple MCP server backends from a ToolHive group into a single unified endpoint that any MCP client can connect to.
graph TB
Client[MCP Client] -->|HTTP/SSE/Streamable-HTTP| vMCP[thv vmcp serve<br/>pkg/vmcp/cli/serve.go]
vMCP -->|discover| Groups[ToolHive Groups<br/>pkg/groups/]
vMCP -->|aggregate| B1[Backend MCP Server 1]
vMCP -->|aggregate| B2[Backend MCP Server 2]
vMCP -->|aggregate| BN[Backend MCP Server N]
vMCP -.->|optional| Optimizer[Optimizer<br/>pkg/vmcp/optimizer/]
Optimizer -.->|Tier 2| TEI[TEI Container<br/>thv-embedding-*]
style vMCP fill:#90caf9
style Optimizer fill:#81c784
style TEI fill:#ffb74d
style Groups fill:#90caf9
The original vMCP deployment model required a Kubernetes cluster and a VirtualMCPServer CRD managed by the operator. This is well-suited for production multi-tenant environments but creates friction for local development and non-Kubernetes users.
thv vmcp provides the same aggregation, tool routing, and optimizer capabilities without requiring a cluster. It runs as a foreground process driven by Cobra CLI flags, with a zero-config quick mode for the common case of aggregating a local ToolHive group.
This path replaces the earlier Python StacklokLabs/mcp-optimizer project (see Migration from mcp-optimizer).
The thv vmcp command has three subcommands:
| Subcommand | Purpose |
|---|---|
thv vmcp init |
Generate a starter YAML config from a running group |
thv vmcp validate |
Validate a YAML config for syntax and semantic errors |
thv vmcp serve |
Start the aggregated vMCP server |
sequenceDiagram
participant Client as MCP Client
participant Cobra as Cobra CLI<br/>cmd/thv/app/vmcp.go
participant Serve as pkg/vmcp/cli/serve.go
participant Server as vMCP Server<br/>pkg/vmcp/server/
participant Agg as Aggregator<br/>pkg/vmcp/aggregator/
participant Backend as Backend MCP Server
Client->>Cobra: thv vmcp serve [flags]
Cobra->>Serve: vmcpcli.Serve(ServeConfig{...})
Serve->>Serve: Load or generate config
Serve->>Server: Build server with middleware chain
Server->>Agg: Discover and connect backends
Agg->>Backend: MCP initialize handshake
Backend-->>Agg: capabilities
Agg-->>Server: merged capability table
Server-->>Client: server ready on :4483
Client->>Server: tools/list
Server->>Agg: route to backend(s)
Agg->>Backend: tools/list
Backend-->>Agg: tool list
Agg-->>Client: merged tool list
Implementation: cmd/thv/app/vmcp.go, pkg/vmcp/cli/serve.go
When --config is omitted and --group is set, thv vmcp serve generates an in-memory YAML configuration from the named ToolHive group. No configuration file is required.
Security requirement: in quick mode, --host is still honoured but validateQuickModeHost() rejects any value that is not a loopback address. Accepted values are an empty string (defaults to 127.0.0.1), "localhost", or any IP for which net.IP.IsLoopback() returns true (e.g. ::1). Any non-loopback address is rejected to prevent an unauthenticated server from being exposed on the network.
Implementation: pkg/vmcp/cli/serve.go — generateQuickModeConfig()
The recommended workflow for reproducible or customized deployments:
thv vmcp init --group <group-name> --output vmcp.yaml
# review and edit vmcp.yaml
thv vmcp validate --config vmcp.yaml
thv vmcp serve --config vmcp.yaml
thv vmcp init discovers running workloads in the given group and writes a starter YAML pre-populated with one backends entry per accessible workload.
Implementation: pkg/vmcp/cli/init.go
thv vmcp serve supports an optional tool optimizer that exposes find_tool and call_tool instead of passing all backend tools through to the client. This is useful when the aggregated tool count is large.
| Tier | Flag(s) | Optimizer | External Service | Exposed Tools |
|---|---|---|---|---|
| 0 | (none) | None | None | All backend tools passed through |
| 1 | --optimizer |
FTS5 keyword (SQLite in-process) | None | find_tool, call_tool only |
| 2 | --optimizer-embedding |
FTS5 + TEI semantic | Managed TEI container | find_tool, call_tool only |
| 3 | optimizer.embeddingService in config YAML |
FTS5 + external embedding service | User-managed | find_tool, call_tool only |
Tier 2 (--optimizer-embedding) implies --optimizer. The TEI container is started automatically and stopped on server shutdown.
Implementation: pkg/vmcp/optimizer/optimizer.go, pkg/vmcp/cli/embedding_manager.go
When --optimizer-embedding is set, ToolHive manages a HuggingFace Text Embeddings Inference (TEI) container for semantic search.
sequenceDiagram
participant Serve as serve.go
participant EM as EmbeddingServiceManager<br/>embedding_manager.go
participant RT as Container Runtime
participant TEI as TEI Container
Serve->>EM: Start(ctx)
EM->>EM: containerNameForModel(model)<br/>→ thv-embedding-<8-char-hash>
EM->>RT: inspect existing container
alt container exists and is running
RT-->>EM: running
EM->>EM: reuse; started=false (no ownership)
else container absent or stopped
EM->>RT: create container
RT->>TEI: start thv-embedding-<hash>
EM->>EM: poll /health with exponential backoff<br/>(2s → 4s → 8s … max 30s, until ctx cancelled)
TEI-->>EM: 200 OK (model loaded)
EM->>EM: started=true (owns container)
end
EM-->>Serve: embedding URL
Serve->>Serve: run vMCP server
Serve->>EM: Stop(ctx) on shutdown
alt started==true
EM->>RT: stop container
else started==false
EM->>EM: no-op (container not owned)
end
Container naming: thv-embedding-<model-short-hash> where the hash is the first 8 hex characters of the SHA-256 of the model name. This avoids invalid container-name characters (e.g., slashes in BAAI/bge-small-en-v1.5).
Ownership tracking: EmbeddingServiceManager sets an internal started flag only when it deploys the container itself (deployContainer). When it finds an already-running container and calls reuseContainer, started remains false.
Reuse semantics: if a container with the correct name is already running when thv vmcp serve starts (e.g. left running by another process or a previous invocation that did not shut down cleanly), ToolHive reuses it and does not stop it on exit. In the normal case — where thv vmcp serve itself deployed the container — it will stop it on shutdown, so the next invocation will redeploy from scratch.
Health polling: exponential backoff starting at 2 s, multiplier 2, cap at 30 s per interval. pollHealth() polls until the passed context.Context is cancelled — there is no built-in total-time budget. thv vmcp serve passes cmd.Context() without an additional deadline, so polling continues indefinitely until the user cancels (Ctrl-C) or the context is otherwise closed.
Graceful shutdown: EmbeddingServiceManager.Stop() stops the TEI container only if this instance deployed it (started == true). It is a no-op when the container was reused from an external process.
Implementation: pkg/vmcp/cli/embedding_manager.go
The default TEI image (ghcr.io/huggingface/text-embeddings-inference:cpu-latest) is published as an amd64-only image. On Apple Silicon Macs, Docker/OrbStack runs it via Rosetta 2 x86-64 emulation. This works but is slower than native. A future improvement may select an ARM64-native image automatically; for now, cpu-latest is the only supported CPU path.
Key files:
| File | Role |
|---|---|
cmd/thv/app/vmcp.go |
Cobra command definitions; flag parsing |
pkg/vmcp/cli/serve.go |
Serve() entry point; config loading, optimizer wiring, server start |
pkg/vmcp/cli/init.go |
Init() entry point; workload discovery and YAML template generation |
pkg/vmcp/cli/validate.go |
Validate() entry point; config file validation |
pkg/vmcp/cli/embedding_manager.go |
TEI container lifecycle (Tier 2) |
pkg/vmcp/optimizer/optimizer.go |
GetAndValidateConfig, NewOptimizerFactory |
pkg/vmcp/config/config.go |
Config struct; OptimizerConfig.EmbeddingService for Tier 3 |
The Python StacklokLabs/mcp-optimizer project is deprecated in favour of the Go-native thv vmcp serve --optimizer. The Go implementation ships in every ToolHive release, requires no separate Python environment, and is fully integrated with ToolHive's container and group management.
| mcp-optimizer feature | thv vmcp equivalent |
|---|---|
| Keyword (FTS5) search | thv vmcp serve --optimizer |
| Semantic (embedding) search | thv vmcp serve --optimizer-embedding |
| Custom embedding model | --embedding-model <HuggingFace model name> |
| Custom TEI image | --embedding-image <image ref> |
| External embedding service | optimizer.embeddingService in config YAML (Tier 3) |
- Stop the Python
mcp-optimizerprocess. - Ensure ToolHive is up to date (
thv version). - Run
thv vmcp init --group <your-group> --output vmcp.yamlto generate a config from your current group. - Start with
thv vmcp serve --group <your-group> --optimizer(quick mode) orthv vmcp serve --config vmcp.yaml --optimizer(config-file mode). - Update any MCP client configuration to point at the new
thv vmcpendpoint (defaulthttp://127.0.0.1:4483).
- Virtual MCP Server Architecture — Kubernetes-side vMCP (CRD, operator, backend discovery)
- vMCP Library Embedding — Embedding
pkg/vmcp/in downstream Go projects - Groups — ToolHive groups used as vMCP backend source
- Deployment Modes — Local vs Kubernetes deployment comparison