Add tool filtering example and CLI optimizer section (#795)

yrobla · web-flow · commit d35a9adfb53c · 2026-04-23T10:28:47.000+02:00
Add a "Filter tools per workload" subsection to local-cli.mdx showing how aggregation.tools + filter can reduce the exposed tool surface from all backend tools to a curated subset. Includes an override example and a link to the full tool-aggregation reference. Add a "Local mode (CLI)" section to optimizer.mdx so users landing on the optimizer guide see the thv vmcp serve --optimizer / --optimizer-embedding flags without needing to find local-cli.mdx. Rename "Quick start" to "Quick start (Kubernetes)" to distinguish the two paths, and scope the "EmbeddingServer is always required" callout to Kubernetes only — Tier 1 CLI mode needs no embedding server. Closes: #794
diff --git a/docs/toolhive/guides-vmcp/local-cli.mdx b/docs/toolhive/guides-vmcp/local-cli.mdx
@@ -150,9 +150,56 @@ Customize the generated config. Common edits include:
 
 - Changing `incomingAuth` from `anonymous` to `oidc` to require authenticated
   clients.
-- Adding tool filters, renames, or overrides under each backend.
+- Adding tool filters, renames, or overrides under `aggregation.tools`.
 - Configuring the [optimizer](./optimizer.mdx) under an `optimizer` section.
 
+#### Filter tools per workload
+
+Use `aggregation.tools` to expose only a curated subset of tools from each
+backend. Tools not listed in `filter` are hidden from `tools/list` responses.
+
+```yaml title="vmcp.yaml"
+aggregation:
+  conflictResolution: prefix
+  conflictResolutionConfig:
+    prefixFormat: '{workload}_'
+  tools:
+    - workload: fetch
+      filter:
+        - fetch
+    - workload: filesystem
+      filter:
+        - read_file
+        - write_file
+        - list_directory
+```
+
+With this config, a client calling `tools/list` sees three tools
+(`filesystem_read_file`, `filesystem_write_file`, `filesystem_list_directory`)
+plus the single `fetch_fetch` tool — instead of all tools exposed by both
+backends.
+
+You can also rename tools or override descriptions without modifying the
+backends:
+
+```yaml title="vmcp.yaml"
+aggregation:
+  tools:
+    - workload: fetch
+      overrides:
+        fetch:
+          description: 'Retrieve any URL and return its content as text'
+```
+
+To hide all backend tools globally (or per workload) and expose only
+[composite tools](./composite-tools.mdx) to clients, use
+`aggregation.excludeAllTools` or `aggregation.tools[].excludeAll`. Hidden tools
+are removed from `tools/list` but remain routable internally. See
+[Excluding all tools](./tool-aggregation.mdx#excluding-all-tools) for examples.
+
+For the full filter and override reference, see
+[Tool aggregation](./tool-aggregation.mdx).
+
 See [Configure vMCP](./configuration.mdx) for the full schema.
 
 ### Step 3: Validate the config
diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx
@@ -10,59 +10,11 @@ number of tools exposed to clients can grow quickly. The optimizer addresses
 this by filtering tools per request, reducing token usage and improving tool
 selection accuracy.
 
-For a step-by-step tutorial that walks through the full setup, see the
-[MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx). This guide covers the
-configuration details for the VirtualMCPServer and EmbeddingServer CRDs.
+This guide covers configuration for Kubernetes deployments and local CLI use.
+For a step-by-step Kubernetes tutorial, see the
+[MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx).
 
-## Benefits
-
-- **Reduced token usage**: Only relevant tools are included in context, not the
-  entire toolset
-- **Improved tool selection**: The right tools surface for each query. With
-  fewer tools to reason over, agents are more likely to choose correctly
-
-## How it works
-
-1. You send a prompt that requires tool assistance
-2. The AI calls `find_tool` with keywords extracted from the prompt
-3. vMCP performs hybrid semantic and keyword search across all backend tools
-4. Only the most relevant tools (up to 8 by default) are returned
-5. The AI calls `call_tool` to execute the selected tool, and vMCP routes the
-   request to the appropriate backend
-
-```mermaid
-flowchart TB
-  subgraph vmcpGroup["VirtualMCPServer"]
-    direction TB
-    vmcp["vMCP (optimizer enabled)"]
-  end
-  subgraph embedding["EmbeddingServer"]
-    direction TB
-    tei["Text Embeddings Inference"]
-  end
-  subgraph backends["MCPGroup backends"]
-    direction TB
-    mcp1["MCP server"]
-    mcp2["MCP server"]
-    mcp3["MCP server"]
-  end
-
-  client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup
-  vmcp <-. "semantic search" .-> embedding
-  vmcp <-. "discovers / routes" .-> backends
-```
-
-:::info[How search works internally]
-
-The optimizer uses an internal SQLite database for both keyword search (using
-full-text search) and storing semantic vectors. Keyword search runs locally
-against this database; semantic search uses vectors generated by an embedding
-server. To control how results from these two sources are blended, see the
-[parameter reference](#parameter-reference).
-
-:::
-
-## Quick start
+## Quick start (Kubernetes)
 
 ### Step 1: Create an EmbeddingServer
 
@@ -163,6 +115,120 @@ spec:
 
 :::
 
+## Local mode (CLI)
+
+You can enable the optimizer directly from the `thv vmcp` CLI without a
+Kubernetes cluster.
+
+### Tier 1 — keyword search
+
+Tier 1 uses FTS5 full-text search running in-process. No external service or
+container is required:
+
+```bash
+thv vmcp serve --group my-group --optimizer
+```
+
+Or add it to an existing config file:
+
+```yaml title="vmcp.yaml"
+optimizer: {}
+```
+
+Then start the server with:
+
+```bash
+thv vmcp serve --config vmcp.yaml
+```
+
+### Tier 2 — managed TEI container
+
+Tier 2 adds vector similarity search on top of keyword search. ToolHive
+automatically starts and stops a
+[HuggingFace Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)
+(TEI) container. A container runtime (Docker, Podman, or OrbStack) must be
+available:
+
+```bash
+thv vmcp serve --group my-group --optimizer-embedding
+```
+
+To customize the model or image used for the auto-managed container:
+
+```bash
+thv vmcp serve --group my-group --optimizer-embedding \
+  --embedding-model BAAI/bge-small-en-v1.5 \
+  --embedding-image ghcr.io/huggingface/text-embeddings-inference:cpu-latest
+```
+
+### Tier 3 — external embedding service
+
+Tier 3 uses an embedding server you already manage. No container runtime is
+required. Set `embeddingService` in your existing config file to point at the
+server:
+
+```yaml title="vmcp.yaml"
+optimizer:
+  embeddingService: http://127.0.0.1:8090
+```
+
+Then start the server with:
+
+```bash
+thv vmcp serve --config vmcp.yaml
+```
+
+For the full optimizer tier comparison, see the
+[local CLI guide](./local-cli.mdx#optimizer-tiers).
+
+## Benefits
+
+- **Reduced token usage**: Only relevant tools are included in context, not the
+  entire toolset
+- **Improved tool selection**: The right tools surface for each query. With
+  fewer tools to reason over, agents are more likely to choose correctly
+
+## How it works
+
+1. You send a prompt that requires tool assistance
+2. The AI calls `find_tool` with keywords extracted from the prompt
+3. vMCP performs hybrid semantic and keyword search across all backend tools
+4. Only the most relevant tools (up to 8 by default) are returned
+5. The AI calls `call_tool` to execute the selected tool, and vMCP routes the
+   request to the appropriate backend
+
+```mermaid
+flowchart TB
+  subgraph vmcpGroup["vMCP"]
+    direction TB
+    vmcp["vMCP (optimizer enabled)"]
+  end
+  subgraph embedding["Embedding service (Tiers 2 and 3)"]
+    direction TB
+    tei["Text Embeddings Inference"]
+  end
+  subgraph backends["MCP backends"]
+    direction TB
+    mcp1["MCP server"]
+    mcp2["MCP server"]
+    mcp3["MCP server"]
+  end
+
+  client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup
+  vmcp <-. "semantic search" .-> embedding
+  vmcp <-. "discovers / routes" .-> backends
+```
+
+:::info[How search works internally]
+
+The optimizer uses an internal SQLite database for both keyword search (using
+full-text search) and storing semantic vectors. Keyword search runs locally
+against this database; semantic search uses vectors generated by an embedding
+server. To control how results from these two sources are blended, see the
+[parameter reference](#parameter-reference).
+
+:::
+
 ## Tune the optimizer
 
 To customize optimizer behavior, add the `optimizer` block under `spec.config`
@@ -190,12 +256,16 @@ spec:
   exclude={['embeddingService']}
 />
 
-:::info[EmbeddingServer is always required]
+:::info[Kubernetes: EmbeddingServer is always required]
+
+When using the Kubernetes operator, even if you set `hybridSearchSemanticRatio`
+to `"0.0"` (all keyword search), the optimizer still requires a configured
+`EmbeddingServer`. The EmbeddingServer won't be used at runtime when the
+semantic ratio is `0.0`, but the configuration must be present due to how the
+operator wires the resources internally.
 
-Even if you set `hybridSearchSemanticRatio` to `"0.0"` (all keyword search), the
-optimizer still requires a configured EmbeddingServer. The EmbeddingServer won't
-be used at runtime when the semantic ratio is `0.0`, but the configuration must
-be present due to how the optimizer is wired internally.
+This restriction does not apply to local CLI mode. `thv vmcp serve --optimizer`
+runs keyword-only search with no EmbeddingServer and no container.
 
 :::
 
diff --git a/docs/toolhive/guides-vmcp/tool-aggregation.mdx b/docs/toolhive/guides-vmcp/tool-aggregation.mdx
@@ -125,8 +125,53 @@ spec:
           filter: ['create_issue', 'list_issues', 'get_issue']
 ```
 
-Only the listed tools are included; all others from that backend MCP server are
-excluded.
+Only the listed tools are advertised to clients; all others are hidden from
+`tools/list` responses. Hidden tools remain available in the internal routing
+table so composite tool workflows can still call them.
+
+## Excluding all tools
+
+To hide every tool from `tools/list` — globally or per workload — use
+`excludeAllTools` or `excludeAll`. These are the opt-out complement to `filter`
+(which is an allow-list): use them when you want clients to interact only
+through [composite tools](./composite-tools.mdx) workflows rather than raw
+backend tools.
+
+Hidden tools are removed from `tools/list` responses but remain in the internal
+routing table, so composite tools can still call them.
+
+### Hide all backend tools globally
+
+Set `aggregation.excludeAllTools: true` to hide every tool from every backend:
+
+```yaml title="VirtualMCPServer resource"
+spec:
+  config:
+    aggregation:
+      excludeAllTools: true # hide every backend tool from tools/list
+```
+
+### Hide all tools for a specific workload
+
+Set `excludeAll: true` inside a workload entry to hide all tools from one
+backend while leaving other backends unaffected:
+
+```yaml title="VirtualMCPServer resource"
+spec:
+  config:
+    aggregation:
+      tools:
+        - workload: github
+          excludeAll: true # hide all github tools from tools/list
+        - workload: jira
+          filter: ['create_issue', 'search_issues']
+```
+
+**When to use:** When composite tools are the only surface you intend to expose
+to clients. Set `excludeAllTools: true` (or `excludeAll: true` per workload) to
+prevent clients from calling raw backend tools directly, then define
+[composite tools](./composite-tools.mdx) that orchestrate the hidden tools
+internally.
 
 ## Tool overrides