@@ -10,59 +10,11 @@ number of tools exposed to clients can grow quickly. The optimizer addresses
1010this by filtering tools per request, reducing token usage and improving tool
1111selection accuracy.
1212
13- For a step-by-step tutorial that walks through the full setup, see the
14- [ MCP Optimizer tutorial ] ( ../tutorials/mcp-optimizer.mdx ) . This guide covers the
15- configuration details for the VirtualMCPServer and EmbeddingServer CRDs .
13+ This guide covers configuration for Kubernetes deployments and local CLI use.
14+ For a step-by-step Kubernetes tutorial, see the
15+ [ MCP Optimizer tutorial ] ( ../tutorials/mcp-optimizer.mdx ) .
1616
17- ## Benefits
18-
19- - ** Reduced token usage** : Only relevant tools are included in context, not the
20- entire toolset
21- - ** Improved tool selection** : The right tools surface for each query. With
22- fewer tools to reason over, agents are more likely to choose correctly
23-
24- ## How it works
25-
26- 1 . You send a prompt that requires tool assistance
27- 2 . The AI calls ` find_tool ` with keywords extracted from the prompt
28- 3 . vMCP performs hybrid semantic and keyword search across all backend tools
29- 4 . Only the most relevant tools (up to 8 by default) are returned
30- 5 . The AI calls ` call_tool ` to execute the selected tool, and vMCP routes the
31- request to the appropriate backend
32-
33- ``` mermaid
34- flowchart TB
35- subgraph vmcpGroup["VirtualMCPServer"]
36- direction TB
37- vmcp["vMCP (optimizer enabled)"]
38- end
39- subgraph embedding["EmbeddingServer"]
40- direction TB
41- tei["Text Embeddings Inference"]
42- end
43- subgraph backends["MCPGroup backends"]
44- direction TB
45- mcp1["MCP server"]
46- mcp2["MCP server"]
47- mcp3["MCP server"]
48- end
49-
50- client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup
51- vmcp <-. "semantic search" .-> embedding
52- vmcp <-. "discovers / routes" .-> backends
53- ```
54-
55- :::info[ How search works internally]
56-
57- The optimizer uses an internal SQLite database for both keyword search (using
58- full-text search) and storing semantic vectors. Keyword search runs locally
59- against this database; semantic search uses vectors generated by an embedding
60- server. To control how results from these two sources are blended, see the
61- [ parameter reference] ( #parameter-reference ) .
62-
63- :::
64-
65- ## Quick start
17+ ## Quick start (Kubernetes)
6618
6719### Step 1: Create an EmbeddingServer
6820
@@ -163,6 +115,120 @@ spec:
163115
164116:: :
165117
118+ # # Local mode (CLI)
119+
120+ You can enable the optimizer directly from the `thv vmcp` CLI without a
121+ Kubernetes cluster.
122+
123+ # ## Tier 1 — keyword search
124+
125+ Tier 1 uses FTS5 full-text search running in-process. No external service or
126+ container is required :
127+
128+ ` ` ` bash
129+ thv vmcp serve --group my-group --optimizer
130+ ` ` `
131+
132+ Or add it to an existing config file :
133+
134+ ` ` ` yaml title="vmcp.yaml"
135+ optimizer: {}
136+ ` ` `
137+
138+ Then start the server with :
139+
140+ ` ` ` bash
141+ thv vmcp serve --config vmcp.yaml
142+ ` ` `
143+
144+ # ## Tier 2 — managed TEI container
145+
146+ Tier 2 adds vector similarity search on top of keyword search. ToolHive
147+ automatically starts and stops a
148+ [HuggingFace Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)
149+ (TEI) container. A container runtime (Docker, Podman, or OrbStack) must be
150+ available :
151+
152+ ` ` ` bash
153+ thv vmcp serve --group my-group --optimizer-embedding
154+ ` ` `
155+
156+ To customize the model or image used for the auto-managed container :
157+
158+ ` ` ` bash
159+ thv vmcp serve --group my-group --optimizer-embedding \
160+ --embedding-model BAAI/bge-small-en-v1.5 \
161+ --embedding-image ghcr.io/huggingface/text-embeddings-inference:cpu-latest
162+ ` ` `
163+
164+ # ## Tier 3 — external embedding service
165+
166+ Tier 3 uses an embedding server you already manage. No container runtime is
167+ required. Set `embeddingService` in your existing config file to point at the
168+ server :
169+
170+ ` ` ` yaml title="vmcp.yaml"
171+ optimizer:
172+ embeddingService: http://127.0.0.1:8090
173+ ` ` `
174+
175+ Then start the server with :
176+
177+ ` ` ` bash
178+ thv vmcp serve --config vmcp.yaml
179+ ` ` `
180+
181+ For the full optimizer tier comparison, see the
182+ [local CLI guide](./local-cli.mdx#optimizer-tiers).
183+
184+ # # Benefits
185+
186+ - **Reduced token usage**: Only relevant tools are included in context, not the
187+ entire toolset
188+ - **Improved tool selection**: The right tools surface for each query. With
189+ fewer tools to reason over, agents are more likely to choose correctly
190+
191+ # # How it works
192+
193+ 1. You send a prompt that requires tool assistance
194+ 2. The AI calls `find_tool` with keywords extracted from the prompt
195+ 3. vMCP performs hybrid semantic and keyword search across all backend tools
196+ 4. Only the most relevant tools (up to 8 by default) are returned
197+ 5. The AI calls `call_tool` to execute the selected tool, and vMCP routes the
198+ request to the appropriate backend
199+
200+ ` ` ` mermaid
201+ flowchart TB
202+ subgraph vmcpGroup["vMCP"]
203+ direction TB
204+ vmcp["vMCP (optimizer enabled)"]
205+ end
206+ subgraph embedding["Embedding service (Tiers 2 and 3)"]
207+ direction TB
208+ tei["Text Embeddings Inference"]
209+ end
210+ subgraph backends["MCP backends"]
211+ direction TB
212+ mcp1["MCP server"]
213+ mcp2["MCP server"]
214+ mcp3["MCP server"]
215+ end
216+
217+ client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup
218+ vmcp <-. "semantic search" .-> embedding
219+ vmcp <-. "discovers / routes" .-> backends
220+ ` ` `
221+
222+ :::info[How search works internally]
223+
224+ The optimizer uses an internal SQLite database for both keyword search (using
225+ full-text search) and storing semantic vectors. Keyword search runs locally
226+ against this database; semantic search uses vectors generated by an embedding
227+ server. To control how results from these two sources are blended, see the
228+ [parameter reference](#parameter-reference).
229+
230+ :: :
231+
166232# # Tune the optimizer
167233
168234To customize optimizer behavior, add the `optimizer` block under `spec.config`
@@ -190,12 +256,16 @@ spec:
190256 exclude={['embeddingService']}
191257/>
192258
193- :::info[EmbeddingServer is always required]
259+ :::info[Kubernetes : EmbeddingServer is always required]
260+
261+ When using the Kubernetes operator, even if you set `hybridSearchSemanticRatio`
262+ to `"0.0"` (all keyword search), the optimizer still requires a configured
263+ ` EmbeddingServer` . The EmbeddingServer won't be used at runtime when the
264+ semantic ratio is `0.0`, but the configuration must be present due to how the
265+ operator wires the resources internally.
194266
195- Even if you set `hybridSearchSemanticRatio` to `"0.0"` (all keyword search), the
196- optimizer still requires a configured EmbeddingServer. The EmbeddingServer won't
197- be used at runtime when the semantic ratio is `0.0`, but the configuration must
198- be present due to how the optimizer is wired internally.
267+ This restriction does not apply to local CLI mode. `thv vmcp serve --optimizer`
268+ runs keyword-only search with no EmbeddingServer and no container.
199269
200270:: :
201271
0 commit comments