Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/upstream-projects.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ projects:

- id: toolhive
repo: stacklok/toolhive
version: v0.23.1
version: v0.24.0
# toolhive is a monorepo covering the CLI, the Kubernetes
# operator, and the vMCP gateway. It also introduces cross-
# cutting features that land in concepts/, integrations/,
Expand Down
11 changes: 7 additions & 4 deletions docs/toolhive/concepts/vmcp.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@ description:
for simpler client configuration.
---

This document explains Virtual MCP Server (vMCP), a feature of the ToolHive
Kubernetes Operator. You'll learn why it exists, when to use it, and how it
simplifies managing multiple MCP servers while enabling powerful multi-system
workflows.
This document explains Virtual MCP Server (vMCP), a ToolHive feature available
both through the [Kubernetes operator](../guides-vmcp/quickstart.mdx) and the
[local CLI](../guides-vmcp/local-cli.mdx). You'll learn why it exists, when to
use it, and how it simplifies managing multiple MCP servers while enabling
powerful multi-system workflows.

## The problem vMCP solves

Expand Down Expand Up @@ -176,6 +177,8 @@ teams managing multiple MCP servers.

- [Try the Quickstart](../guides-vmcp/quickstart.mdx) to deploy your first vMCP
on a Kubernetes cluster
- [Run vMCP locally with the CLI](../guides-vmcp/local-cli.mdx) for a quick
evaluation without Kubernetes
- [Learn how to deploy vMCP](../guides-vmcp/intro.mdx) for a full overview of
configuration and architecture

Expand Down
6 changes: 4 additions & 2 deletions docs/toolhive/guides-vmcp/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,12 @@ connection.
- **Evaluating vMCP?** Read
[Understanding Virtual MCP Server](../concepts/vmcp.mdx) for the full picture
of what it does and when it's the right fit.
- **Ready to try it?** Follow the [Quickstart](./quickstart.mdx) to deploy your
first vMCP on a Kubernetes cluster.
- **Ready to deploy?** Follow the [Quickstart](./quickstart.mdx) to deploy your
first vMCP through the Kubernetes operator.
- **Already running vMCP?** Jump to [Configuration](./configuration.mdx) or
[Authentication](./authentication.mdx).
- **Local testing only?** Run [vMCP locally with the CLI](./local-cli.mdx) to
aggregate a local ToolHive group without a cluster.

## Contents

Expand Down
26 changes: 21 additions & 5 deletions docs/toolhive/guides-vmcp/intro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,25 @@ description:

## What is vMCP?

Virtual MCP Server (vMCP) is a feature of the ToolHive Kubernetes Operator that
acts as an aggregation proxy, consolidating multiple backend MCP servers into a
single unified interface. Instead of configuring clients to connect to each MCP
server individually, you connect once to vMCP and access all backend tools
through a single endpoint.
Virtual MCP Server (vMCP) is a ToolHive feature that acts as an aggregation
proxy, consolidating multiple backend MCP servers into a single unified
interface. Instead of configuring clients to connect to each MCP server
individually, you connect once to vMCP and access all backend tools through a
single endpoint.

You can run vMCP two ways:

- **On Kubernetes** through the `VirtualMCPServer` custom resource managed by
the ToolHive operator. This is the recommended option for shared,
multi-tenant, or production deployments.
- **Locally from the CLI** with the `thv vmcp` command, which aggregates a local
[ToolHive group](../concepts/groups.mdx) without a cluster. See
[Run vMCP locally with the CLI](./local-cli.mdx).

The underlying aggregation, tool routing, and optimizer capabilities are the
same in both modes. The rest of this page focuses on the Kubernetes deployment
model. For individual development or local testing without a cluster, see
[Run vMCP locally with the CLI](./local-cli.mdx).

vMCP supports two types of backends:

Expand Down Expand Up @@ -110,6 +124,8 @@ guide.

- [Try the Quickstart](./quickstart.mdx) to deploy your first vMCP on a
Kubernetes cluster
- [Run vMCP locally with the CLI](./local-cli.mdx) to aggregate a local ToolHive
group without Kubernetes
- [Configure vMCP servers](./configuration.mdx) to set up groups, backends, and
tool aggregation

Expand Down
274 changes: 274 additions & 0 deletions docs/toolhive/guides-vmcp/local-cli.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
---
title: Run vMCP locally with the CLI
description:
Run Virtual MCP Server locally with the thv vmcp command to aggregate a
ToolHive group without Kubernetes.
---

Virtual MCP Server (vMCP) is usually deployed on Kubernetes through the
`VirtualMCPServer` custom resource, but you can also run it locally from the
ToolHive CLI. The `thv vmcp` subcommands aggregate the MCP servers in a local
[ToolHive group](../concepts/groups.mdx) behind a single endpoint, without a
cluster or operator.

Use this mode for local development, quick evaluation, or any case where you
want vMCP's aggregation, tool routing, and optimizer capabilities without the
operational overhead of Kubernetes.

## When to use the local CLI

- You are developing or evaluating vMCP on your workstation.
- You run MCP servers locally with `thv run` and want to expose them through a
single endpoint.
- You want to use the vMCP [optimizer](./optimizer.mdx) to reduce token usage
across a local group.
- You don't yet need the clustered, operator-managed deployment model covered in
the [Quickstart](./quickstart.mdx).

For production and multi-tenant deployments, use the Kubernetes
[`VirtualMCPServer`](./quickstart.mdx) resource instead.

## Prerequisites

- ToolHive CLI v0.24.0 or later. Check with `thv version`.
- A container runtime (Docker, Podman, or OrbStack) available to ToolHive.
- A ToolHive group with one or more running MCP servers. To create one:

```bash
thv group create my-group
thv run --group my-group fetch
thv run --group my-group github
```

See [Manage ToolHive groups](../guides-cli/group-management.mdx) for details.

## Subcommands at a glance

The `thv vmcp` command has three subcommands:

| Subcommand | Purpose |
| ------------------- | ----------------------------------------------------- |
| `thv vmcp init` | Generate a starter YAML config from a running group |
| `thv vmcp validate` | Validate a YAML config for syntax and semantic errors |
| `thv vmcp serve` | Start the aggregated vMCP server |

There are two ways to run the server:

- **Quick mode** uses `thv vmcp serve --group <name>` to generate an in-memory
config from a group. No YAML file is required.
- **Config-file mode** uses `thv vmcp init` → edit → `thv vmcp validate` →
`thv vmcp serve --config vmcp.yaml` for reproducible or customized setups.

## Quick mode

Quick mode is the fastest way to aggregate a local group. Run the server with
just a group name:

```bash
thv vmcp serve --group my-group
```

By default, the server binds to `127.0.0.1:4483`. Point your MCP client at
`http://127.0.0.1:4483` to access all tools from the group through a single
endpoint.

:::note[Loopback-only]

Quick mode always uses anonymous authentication, so `thv vmcp serve --group`
only accepts loopback bind addresses (`127.0.0.1`, `::1`, `localhost`, or the
default empty value). Binding to a non-loopback interface is rejected to avoid
exposing an unauthenticated server on the network. To bind to a non-loopback
address, use [config-file mode](#config-file-mode) and configure client
authentication.

:::

### Enable the optimizer in quick mode

Add `--optimizer` or `--optimizer-embedding` to replace the full tool list with
`find_tool` and `call_tool` primitives:

```bash
# Tier 1: FTS5 keyword search (no external container)
thv vmcp serve --group my-group --optimizer

# Tier 2: FTS5 + semantic search using a managed TEI container
thv vmcp serve --group my-group --optimizer-embedding
```

See [Optimizer tiers](#optimizer-tiers) for the full comparison.

## Config-file mode

Config-file mode is recommended when you need to customize backend settings,
authentication, or aggregation rules, or when you want a reproducible setup
checked into version control.

### Step 1: Generate a starter config

`thv vmcp init` discovers running workloads in a group and writes a starter YAML
file with one backend entry per accessible workload:

```bash
thv vmcp init --group my-group --output vmcp.yaml
```

Omit `--output` to write the generated YAML to standard output instead.

The generated file includes inline comments describing each section. A minimal
example looks like this:

```yaml title="vmcp.yaml"
# Generated by `thv vmcp init`. Review and customize before use.

name: my-group-vmcp
groupRef: my-group

incomingAuth:
type: anonymous

outgoingAuth:
source: inline

aggregation:
conflictResolution: prefix
conflictResolutionConfig:
prefixFormat: '{workload}_'

backends:
- name: fetch
url: http://127.0.0.1:12345/sse
transport: sse
- name: github
url: http://127.0.0.1:12346/mcp
transport: streamable-http
```

### Step 2: Review and edit

Customize the generated config. Common edits include:

- Changing `incomingAuth` from `anonymous` to `oidc` to require authenticated
clients.
- Adding tool filters, renames, or overrides under each backend.
- Configuring the [optimizer](./optimizer.mdx) under an `optimizer` section.

See [Configure vMCP](./configuration.mdx) for the full schema.

### Step 3: Validate the config

```bash
thv vmcp validate --config vmcp.yaml
```

Validation checks YAML syntax, required fields, middleware configuration, and
backend settings. It exits `0` on success and non-zero with a descriptive
message otherwise.

### Step 4: Start the server

```bash
thv vmcp serve --config vmcp.yaml
```

When both `--config` and `--group` are set, `--config` takes precedence.

## Optimizer tiers

`thv vmcp serve` supports four tiers of tool optimization. Tier 0 is the
default; tiers 1 through 3 replace the full backend tool list with `find_tool`
and `call_tool` primitives that search the aggregated tool set. Tier 1 uses FTS5
keyword search only; tiers 2 and 3 add semantic embeddings on top for hybrid
search.

| Tier | Flag or setting | Search | External service |
| ---- | ------------------------------------------- | ------------------------------- | ----------------------------- |
| 0 | (none) | None - all tools passed through | None |
| 1 | `--optimizer` | FTS5 keyword (in-process) | None |
| 2 | `--optimizer-embedding` | FTS5 + TEI semantic | Managed TEI container |
| 3 | `optimizer.embeddingService` in config YAML | FTS5 + external embedding | User-managed embedding server |

Tier 2 implies Tier 1: `--optimizer-embedding` also enables the keyword index.
For Tier 2, ToolHive starts and stops a HuggingFace Text Embeddings Inference
(TEI) container named `thv-embedding-<hash>` automatically. Customize the model
and image with `--embedding-model` and `--embedding-image`.

For the conceptual background and tuning parameters, see
[Optimize tool discovery](./optimizer.mdx) and
[Tool optimization](../concepts/tool-optimization.mdx).

## Enable audit logging

Add `--enable-audit` to `thv vmcp serve` to turn on audit logging with default
settings when the loaded config doesn't already define an audit section:

```bash
thv vmcp serve --group my-group --enable-audit
```

For audit configuration options, see [Audit logging](./audit-logging.mdx).

## Command reference

All `thv vmcp` flags, with their defaults:

### `thv vmcp serve`

| Flag | Default | Description |
| ----------------------- | ---------------------------------------------------------- | -------------------------------------------------------------------- |
| `--config`, `-c` | (empty) | Path to a vMCP configuration file |
| `--group` | (empty) | ToolHive group name for quick mode (used when `--config` is not set) |
| `--host` | `127.0.0.1` | Bind address (quick mode requires a loopback address) |
| `--port` | `4483` | TCP port to listen on |
| `--enable-audit` | `false` | Enable audit logging with default configuration |
| `--optimizer` | `false` | Enable Tier 1 FTS5 keyword optimizer |
| `--optimizer-embedding` | `false` | Enable Tier 2 semantic optimizer (implies `--optimizer`) |
| `--embedding-model` | `BAAI/bge-small-en-v1.5` | HuggingFace model name for the managed TEI container |
| `--embedding-image` | `ghcr.io/huggingface/text-embeddings-inference:cpu-latest` | TEI container image |

### `thv vmcp init`

| Flag | Default | Description |
| ---------------- | ---------- | -------------------------------------------------- |
| `--group`, `-g` | (required) | ToolHive group name whose workloads are discovered |
| `--output`, `-o` | stdout | Output file path for the generated config |
| `--config`, `-c` | stdout | Alias for `--output` |

### `thv vmcp validate`

| Flag | Default | Description |
| ---------------- | ---------- | ----------------------------------------------- |
| `--config`, `-c` | (required) | Path to the vMCP configuration file to validate |

For full CLI help, run `thv vmcp --help` or see
[`thv vmcp`](../reference/cli/thv_vmcp.md) in the reference.

## Compared to the Kubernetes deployment

| Aspect | Local CLI (`thv vmcp`) | Kubernetes (`VirtualMCPServer` CRD) |
| ----------------- | ---------------------------------------------- | ----------------------------------------- |
| Runtime | Foreground process | Pod managed by the operator |
| Configuration | CLI flags or local YAML file | `VirtualMCPServer` custom resource |
| Backend discovery | Reads ToolHive groups on the local machine | Reads `MCPGroup` resources in the cluster |
| Authentication | Anonymous in quick mode; configurable in files | Full OIDC integration via CRD fields |
| Lifecycle | Tied to the terminal session | Managed declaratively, survives restarts |
| Embedding server | Managed TEI container (Tier 2) | `EmbeddingServer` custom resource |

The underlying aggregation, tool routing, and optimizer logic are the same. Use
the local CLI for development and single-user workflows; use the Kubernetes
deployment for shared, production, or multi-user environments.

## Next steps

- [Configure vMCP](./configuration.mdx) to customize backends, authentication,
and aggregation rules.
- [Optimize tool discovery](./optimizer.mdx) to tune `find_tool` and `call_tool`
for large toolsets.
- [Deploy vMCP on Kubernetes](./quickstart.mdx) when you're ready to move to a
production-grade deployment.

## Related information

- [Understanding Virtual MCP Server](../concepts/vmcp.mdx)
- [Manage ToolHive groups](../guides-cli/group-management.mdx)
- [`thv vmcp` CLI reference](../reference/cli/thv_vmcp.md)
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,5 @@ thv [flags]
* [thv status](thv_status.md) - Show detailed status of an MCP server
* [thv stop](thv_stop.md) - Stop one or more MCP servers
* [thv version](thv_version.md) - Show the version of ToolHive
* [thv vmcp](thv_vmcp.md) - Run and manage a Virtual MCP Server locally

Loading