Skip to content

Commit d35a9ad

Browse files
authored
Add tool filtering example and CLI optimizer section (#795)
Add a "Filter tools per workload" subsection to local-cli.mdx showing how aggregation.tools + filter can reduce the exposed tool surface from all backend tools to a curated subset. Includes an override example and a link to the full tool-aggregation reference. Add a "Local mode (CLI)" section to optimizer.mdx so users landing on the optimizer guide see the thv vmcp serve --optimizer / --optimizer-embedding flags without needing to find local-cli.mdx. Rename "Quick start" to "Quick start (Kubernetes)" to distinguish the two paths, and scope the "EmbeddingServer is always required" callout to Kubernetes only — Tier 1 CLI mode needs no embedding server. Closes: #794
1 parent 3b37699 commit d35a9ad

3 files changed

Lines changed: 222 additions & 60 deletions

File tree

docs/toolhive/guides-vmcp/local-cli.mdx

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,9 +150,56 @@ Customize the generated config. Common edits include:
150150
151151
- Changing `incomingAuth` from `anonymous` to `oidc` to require authenticated
152152
clients.
153-
- Adding tool filters, renames, or overrides under each backend.
153+
- Adding tool filters, renames, or overrides under `aggregation.tools`.
154154
- Configuring the [optimizer](./optimizer.mdx) under an `optimizer` section.
155155

156+
#### Filter tools per workload
157+
158+
Use `aggregation.tools` to expose only a curated subset of tools from each
159+
backend. Tools not listed in `filter` are hidden from `tools/list` responses.
160+
161+
```yaml title="vmcp.yaml"
162+
aggregation:
163+
conflictResolution: prefix
164+
conflictResolutionConfig:
165+
prefixFormat: '{workload}_'
166+
tools:
167+
- workload: fetch
168+
filter:
169+
- fetch
170+
- workload: filesystem
171+
filter:
172+
- read_file
173+
- write_file
174+
- list_directory
175+
```
176+
177+
With this config, a client calling `tools/list` sees three tools
178+
(`filesystem_read_file`, `filesystem_write_file`, `filesystem_list_directory`)
179+
plus the single `fetch_fetch` tool — instead of all tools exposed by both
180+
backends.
181+
182+
You can also rename tools or override descriptions without modifying the
183+
backends:
184+
185+
```yaml title="vmcp.yaml"
186+
aggregation:
187+
tools:
188+
- workload: fetch
189+
overrides:
190+
fetch:
191+
description: 'Retrieve any URL and return its content as text'
192+
```
193+
194+
To hide all backend tools globally (or per workload) and expose only
195+
[composite tools](./composite-tools.mdx) to clients, use
196+
`aggregation.excludeAllTools` or `aggregation.tools[].excludeAll`. Hidden tools
197+
are removed from `tools/list` but remain routable internally. See
198+
[Excluding all tools](./tool-aggregation.mdx#excluding-all-tools) for examples.
199+
200+
For the full filter and override reference, see
201+
[Tool aggregation](./tool-aggregation.mdx).
202+
156203
See [Configure vMCP](./configuration.mdx) for the full schema.
157204

158205
### Step 3: Validate the config

docs/toolhive/guides-vmcp/optimizer.mdx

Lines changed: 127 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -10,59 +10,11 @@ number of tools exposed to clients can grow quickly. The optimizer addresses
1010
this by filtering tools per request, reducing token usage and improving tool
1111
selection accuracy.
1212

13-
For a step-by-step tutorial that walks through the full setup, see the
14-
[MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx). This guide covers the
15-
configuration details for the VirtualMCPServer and EmbeddingServer CRDs.
13+
This guide covers configuration for Kubernetes deployments and local CLI use.
14+
For a step-by-step Kubernetes tutorial, see the
15+
[MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx).
1616

17-
## Benefits
18-
19-
- **Reduced token usage**: Only relevant tools are included in context, not the
20-
entire toolset
21-
- **Improved tool selection**: The right tools surface for each query. With
22-
fewer tools to reason over, agents are more likely to choose correctly
23-
24-
## How it works
25-
26-
1. You send a prompt that requires tool assistance
27-
2. The AI calls `find_tool` with keywords extracted from the prompt
28-
3. vMCP performs hybrid semantic and keyword search across all backend tools
29-
4. Only the most relevant tools (up to 8 by default) are returned
30-
5. The AI calls `call_tool` to execute the selected tool, and vMCP routes the
31-
request to the appropriate backend
32-
33-
```mermaid
34-
flowchart TB
35-
subgraph vmcpGroup["VirtualMCPServer"]
36-
direction TB
37-
vmcp["vMCP (optimizer enabled)"]
38-
end
39-
subgraph embedding["EmbeddingServer"]
40-
direction TB
41-
tei["Text Embeddings Inference"]
42-
end
43-
subgraph backends["MCPGroup backends"]
44-
direction TB
45-
mcp1["MCP server"]
46-
mcp2["MCP server"]
47-
mcp3["MCP server"]
48-
end
49-
50-
client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup
51-
vmcp <-. "semantic search" .-> embedding
52-
vmcp <-. "discovers / routes" .-> backends
53-
```
54-
55-
:::info[How search works internally]
56-
57-
The optimizer uses an internal SQLite database for both keyword search (using
58-
full-text search) and storing semantic vectors. Keyword search runs locally
59-
against this database; semantic search uses vectors generated by an embedding
60-
server. To control how results from these two sources are blended, see the
61-
[parameter reference](#parameter-reference).
62-
63-
:::
64-
65-
## Quick start
17+
## Quick start (Kubernetes)
6618

6719
### Step 1: Create an EmbeddingServer
6820

@@ -163,6 +115,120 @@ spec:
163115

164116
:::
165117

118+
## Local mode (CLI)
119+
120+
You can enable the optimizer directly from the `thv vmcp` CLI without a
121+
Kubernetes cluster.
122+
123+
### Tier 1 — keyword search
124+
125+
Tier 1 uses FTS5 full-text search running in-process. No external service or
126+
container is required:
127+
128+
```bash
129+
thv vmcp serve --group my-group --optimizer
130+
```
131+
132+
Or add it to an existing config file:
133+
134+
```yaml title="vmcp.yaml"
135+
optimizer: {}
136+
```
137+
138+
Then start the server with:
139+
140+
```bash
141+
thv vmcp serve --config vmcp.yaml
142+
```
143+
144+
### Tier 2 — managed TEI container
145+
146+
Tier 2 adds vector similarity search on top of keyword search. ToolHive
147+
automatically starts and stops a
148+
[HuggingFace Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)
149+
(TEI) container. A container runtime (Docker, Podman, or OrbStack) must be
150+
available:
151+
152+
```bash
153+
thv vmcp serve --group my-group --optimizer-embedding
154+
```
155+
156+
To customize the model or image used for the auto-managed container:
157+
158+
```bash
159+
thv vmcp serve --group my-group --optimizer-embedding \
160+
--embedding-model BAAI/bge-small-en-v1.5 \
161+
--embedding-image ghcr.io/huggingface/text-embeddings-inference:cpu-latest
162+
```
163+
164+
### Tier 3 — external embedding service
165+
166+
Tier 3 uses an embedding server you already manage. No container runtime is
167+
required. Set `embeddingService` in your existing config file to point at the
168+
server:
169+
170+
```yaml title="vmcp.yaml"
171+
optimizer:
172+
embeddingService: http://127.0.0.1:8090
173+
```
174+
175+
Then start the server with:
176+
177+
```bash
178+
thv vmcp serve --config vmcp.yaml
179+
```
180+
181+
For the full optimizer tier comparison, see the
182+
[local CLI guide](./local-cli.mdx#optimizer-tiers).
183+
184+
## Benefits
185+
186+
- **Reduced token usage**: Only relevant tools are included in context, not the
187+
entire toolset
188+
- **Improved tool selection**: The right tools surface for each query. With
189+
fewer tools to reason over, agents are more likely to choose correctly
190+
191+
## How it works
192+
193+
1. You send a prompt that requires tool assistance
194+
2. The AI calls `find_tool` with keywords extracted from the prompt
195+
3. vMCP performs hybrid semantic and keyword search across all backend tools
196+
4. Only the most relevant tools (up to 8 by default) are returned
197+
5. The AI calls `call_tool` to execute the selected tool, and vMCP routes the
198+
request to the appropriate backend
199+
200+
```mermaid
201+
flowchart TB
202+
subgraph vmcpGroup["vMCP"]
203+
direction TB
204+
vmcp["vMCP (optimizer enabled)"]
205+
end
206+
subgraph embedding["Embedding service (Tiers 2 and 3)"]
207+
direction TB
208+
tei["Text Embeddings Inference"]
209+
end
210+
subgraph backends["MCP backends"]
211+
direction TB
212+
mcp1["MCP server"]
213+
mcp2["MCP server"]
214+
mcp3["MCP server"]
215+
end
216+
217+
client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup
218+
vmcp <-. "semantic search" .-> embedding
219+
vmcp <-. "discovers / routes" .-> backends
220+
```
221+
222+
:::info[How search works internally]
223+
224+
The optimizer uses an internal SQLite database for both keyword search (using
225+
full-text search) and storing semantic vectors. Keyword search runs locally
226+
against this database; semantic search uses vectors generated by an embedding
227+
server. To control how results from these two sources are blended, see the
228+
[parameter reference](#parameter-reference).
229+
230+
:::
231+
166232
## Tune the optimizer
167233

168234
To customize optimizer behavior, add the `optimizer` block under `spec.config`
@@ -190,12 +256,16 @@ spec:
190256
exclude={['embeddingService']}
191257
/>
192258

193-
:::info[EmbeddingServer is always required]
259+
:::info[Kubernetes: EmbeddingServer is always required]
260+
261+
When using the Kubernetes operator, even if you set `hybridSearchSemanticRatio`
262+
to `"0.0"` (all keyword search), the optimizer still requires a configured
263+
`EmbeddingServer`. The EmbeddingServer won't be used at runtime when the
264+
semantic ratio is `0.0`, but the configuration must be present due to how the
265+
operator wires the resources internally.
194266

195-
Even if you set `hybridSearchSemanticRatio` to `"0.0"` (all keyword search), the
196-
optimizer still requires a configured EmbeddingServer. The EmbeddingServer won't
197-
be used at runtime when the semantic ratio is `0.0`, but the configuration must
198-
be present due to how the optimizer is wired internally.
267+
This restriction does not apply to local CLI mode. `thv vmcp serve --optimizer`
268+
runs keyword-only search with no EmbeddingServer and no container.
199269

200270
:::
201271

docs/toolhive/guides-vmcp/tool-aggregation.mdx

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,8 +125,53 @@ spec:
125125
filter: ['create_issue', 'list_issues', 'get_issue']
126126
```
127127

128-
Only the listed tools are included; all others from that backend MCP server are
129-
excluded.
128+
Only the listed tools are advertised to clients; all others are hidden from
129+
`tools/list` responses. Hidden tools remain available in the internal routing
130+
table so composite tool workflows can still call them.
131+
132+
## Excluding all tools
133+
134+
To hide every tool from `tools/list` — globally or per workload — use
135+
`excludeAllTools` or `excludeAll`. These are the opt-out complement to `filter`
136+
(which is an allow-list): use them when you want clients to interact only
137+
through [composite tools](./composite-tools.mdx) workflows rather than raw
138+
backend tools.
139+
140+
Hidden tools are removed from `tools/list` responses but remain in the internal
141+
routing table, so composite tools can still call them.
142+
143+
### Hide all backend tools globally
144+
145+
Set `aggregation.excludeAllTools: true` to hide every tool from every backend:
146+
147+
```yaml title="VirtualMCPServer resource"
148+
spec:
149+
config:
150+
aggregation:
151+
excludeAllTools: true # hide every backend tool from tools/list
152+
```
153+
154+
### Hide all tools for a specific workload
155+
156+
Set `excludeAll: true` inside a workload entry to hide all tools from one
157+
backend while leaving other backends unaffected:
158+
159+
```yaml title="VirtualMCPServer resource"
160+
spec:
161+
config:
162+
aggregation:
163+
tools:
164+
- workload: github
165+
excludeAll: true # hide all github tools from tools/list
166+
- workload: jira
167+
filter: ['create_issue', 'search_issues']
168+
```
169+
170+
**When to use:** When composite tools are the only surface you intend to expose
171+
to clients. Set `excludeAllTools: true` (or `excludeAll: true` per workload) to
172+
prevent clients from calling raw backend tools directly, then define
173+
[composite tools](./composite-tools.mdx) that orchestrate the hidden tools
174+
internally.
130175

131176
## Tool overrides
132177

0 commit comments

Comments
 (0)