Skip to content

Commit d34bd76

Browse files
JAORMXclaude
andcommitted
Address review feedback on THV-0059 vMCP local experience RFC
- Add goal to deprecate legacy mcp-optimizer Python project - Add --embedding-image flag for TEI container image configurability - Change TEI container naming to model-hash based for shared reuse - Switch from fixed port to random port via pkg/networking.FindAvailable() - Change from silent FTS5 fallback to fail-fast on TEI failure - Add pkg/labels/ integration for container lifecycle ownership - Remove commented-out backends from init output (single source of truth) - Expand platform considerations with Rosetta 2 emulation details - Note Unix socket transport as future security consideration - Add docs website update and legacy optimizer removal to Documentation - Add K8s dependency isolation as optional cleanup item Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ccbdf0d commit d34bd76

1 file changed

Lines changed: 37 additions & 35 deletions

File tree

rfcs/THV-0059-vmcp-local-experience.md

Lines changed: 37 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
- **Status**: Draft
44
- **Author(s)**: Juan Antonio Osorio (@JAORMX)
55
- **Created**: 2026-03-24
6-
- **Last Updated**: 2026-03-24
6+
- **Last Updated**: 2026-04-14
77
- **Target Repository**: toolhive
88
- **Related Issues**: N/A
99

@@ -33,7 +33,8 @@ All ToolHive users who manage more than one MCP server locally are affected. The
3333

3434
- Add a `thv vmcp` subcommand with `serve`, `validate`, and `init` sub-commands that integrate vMCP into the main CLI
3535
- Support a zero-config quickstart: `thv vmcp serve --group <name>` should work without a config file for simple aggregation
36-
- Bring the optimizer to the local experience with managed embedding service lifecycle (`--optimizer` flag auto-manages a TEI container)
36+
- Bring the optimizer to the local experience with managed embedding service lifecycle (`--optimizer-embedding` flag auto-manages a TEI container)
37+
- Deprecate and eventually remove the legacy standalone `mcp-optimizer` Python project (`StacklokLabs/mcp-optimizer`) once the native vMCP optimizer is stable
3738
- Maintain full feature parity with the standalone `vmcp` binary for users who need advanced configuration
3839
- Include vMCP in goreleaser so it ships with every ToolHive release (embedded in `thv` as a subcommand)
3940
- Document the library embedding pattern used by brood-box as an officially supported integration path
@@ -137,7 +138,7 @@ Flags:
137138
--optimizer Enable the tool optimizer (FTS5 keyword search)
138139
--optimizer-embedding Enable the optimizer with semantic search (auto-manages TEI container)
139140
--embedding-model string HuggingFace model for semantic search (default: "BAAI/bge-small-en-v1.5")
140-
--embedding-port int Port for the embedding service container (default: 8384)
141+
--embedding-image string TEI container image (default: "ghcr.io/huggingface/text-embeddings-inference:cpu-latest")
141142
```
142143

143144
Two modes of operation:
@@ -197,16 +198,9 @@ Example output:
197198
name: "engineering-vmcp"
198199
groupRef: "engineering"
199200
200-
# Backends discovered from group "engineering":
201-
# (These are auto-discovered at runtime from the group.
202-
# Uncomment and modify only if you need static overrides.)
203-
# backends:
204-
# - name: "github"
205-
# url: "http://localhost:8080/mcp"
206-
# transport: "streamable-http"
207-
# - name: "jira"
208-
# url: "http://localhost:8081/mcp"
209-
# transport: "streamable-http"
201+
# Backends are auto-discovered at runtime from the group.
202+
# To modify backends, use `thv run` flags or group configuration
203+
# rather than overriding them here.
210204
211205
aggregation:
212206
conflictResolution: prefix
@@ -311,7 +305,7 @@ type ServeConfig struct {
311305
Optimizer bool // Enable FTS5-only optimizer
312306
OptimizerEmbedding bool // Enable optimizer with managed TEI container
313307
EmbeddingModel string // HuggingFace model ID (default: BAAI/bge-small-en-v1.5)
314-
EmbeddingPort int // TEI container port (default: 8384)
308+
EmbeddingImage string // TEI container image (default: ghcr.io/huggingface/text-embeddings-inference:cpu-latest)
315309
}
316310

317311
// Serve starts a vMCP server with the given configuration.
@@ -419,8 +413,11 @@ thv vmcp serve --group default --optimizer-embedding
419413
420414
# With custom model:
421415
thv vmcp serve --group default --optimizer-embedding \
422-
--embedding-model BAAI/bge-small-en-v1.5 \
423-
--embedding-port 8384
416+
--embedding-model BAAI/bge-small-en-v1.5
417+
418+
# With a GPU-accelerated image:
419+
thv vmcp serve --group default --optimizer-embedding \
420+
--embedding-image ghcr.io/huggingface/text-embeddings-inference:turing-latest
424421
```
425422

426423
Lifecycle:
@@ -435,11 +432,11 @@ sequenceDiagram
435432
436433
User->>THV: --optimizer-embedding
437434
THV->>Runtime: Start TEI container<br/>(ghcr.io/huggingface/text-embeddings-inference:cpu-latest)
438-
Runtime->>TEI: Running on :8384
435+
Runtime->>TEI: Running on :{random port}
439436
THV->>TEI: GET /health (poll until ready)
440437
Note over THV,TEI: TEI downloads model on first start<br/>(cached in named volume)
441438
TEI-->>THV: 200 OK
442-
THV->>VMCP: Start with optimizer.embeddingService=http://localhost:8384
439+
THV->>VMCP: Start with optimizer.embeddingService=http://localhost:{port}
443440
VMCP-->>User: Listening on :4483/mcp
444441
445442
Note over User,VMCP: On shutdown (Ctrl+C / SIGTERM):
@@ -450,14 +447,14 @@ sequenceDiagram
450447

451448
Implementation details:
452449

453-
- **Container image**: `ghcr.io/huggingface/text-embeddings-inference:cpu-latest` (same image the K8s `EmbeddingServer` CRD uses)
454-
- **Container name**: `thv-embedding-<group>` (predictable, enables idempotent start/stop)
450+
- **Container image**: `ghcr.io/huggingface/text-embeddings-inference:cpu-latest` by default (same image the K8s `EmbeddingServer` CRD uses). Configurable via `--embedding-image` to support GPU-accelerated variants (e.g., `ghcr.io/huggingface/text-embeddings-inference:turing-latest` for CUDA) or architecture-specific images.
451+
- **Container name**: `thv-embedding-<model-short-hash>` (e.g., `thv-embedding-bge-sm-a1b2c3`), derived from a hash of the model name. This allows multiple vMCP instances using the same embedding model to share a single TEI container, avoiding unnecessary duplication. Different models get separate containers.
455452
- **Model cache**: Named volume `thv-embedding-model-cache` mounted at `/data` with `HF_HOME=/data`. This avoids re-downloading the model on every start (~130MB for `bge-small-en-v1.5`).
456453
- **Health check**: Poll `GET /health` with backoff until the TEI server reports ready. TEI must download and load the model on first start, which can take 30-60 seconds.
457-
- **Port binding**: Default `127.0.0.1:8384`. Chosen to avoid conflicting with the vMCP port (4483) or common dev ports.
458-
- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or becomes unhealthy, vMCP falls back to FTS5-only mode with a warning.
459-
- **Idempotent start**: If a `thv-embedding-<group>` container is already running (e.g., from a previous invocation), reuse it rather than creating a new one.
460-
- **Platform considerations**: The TEI CPU image is amd64. On ARM64 hosts (Apple Silicon), the container runs under emulation. A future enhancement could detect architecture and select an appropriate image variant.
454+
- **Port binding**: A random available port is allocated using ToolHive's existing `pkg/networking.FindAvailable()` pattern (the same mechanism `thv run` uses when no explicit port is given). The allocated port is reported in the logs. This avoids conflicts when multiple vMCP instances or other services are running.
455+
- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or become healthy when `--optimizer-embedding` was explicitly requested, `thv vmcp serve` exits with a clear error message. The `--optimizer-embedding` flag is a clear signal that the user wants semantic search — silently degrading to FTS5-only would mask environment issues (Docker not running, port conflicts, etc.) and produce confusingly poor results. Users who want keyword-only search should use `--optimizer` explicitly.
456+
- **Idempotent start**: If a TEI container with the matching name and ToolHive labels is already running (e.g., from a previous invocation or another vMCP instance with the same model), reuse it rather than creating a new one.
457+
- **Platform considerations**: The TEI CPU image is amd64-only. On ARM64 hosts (Apple Silicon), Docker Desktop handles this automatically via Rosetta 2 emulation — no `--platform` flag is needed. The overhead is roughly 5-15% for CPU-bound workloads, which is acceptable for embedding generation. The `--embedding-image` flag lets users select architecture-native images when available, making architecture management the user's responsibility rather than requiring auto-detection logic.
461458

462459
**Tier 3: Full config control**
463460

@@ -496,17 +493,15 @@ A new component in `pkg/vmcp/cli/` manages the TEI container lifecycle:
496493
```go
497494
// EmbeddingServiceConfig holds parameters for the managed embedding container.
498495
type EmbeddingServiceConfig struct {
499-
Image string // default: ghcr.io/huggingface/text-embeddings-inference:cpu-latest
500-
Model string // default: BAAI/bge-small-en-v1.5
501-
Port int // default: 8384
502-
GroupRef string // used in container name: thv-embedding-<group>
496+
Image string // default: ghcr.io/huggingface/text-embeddings-inference:cpu-latest
497+
Model string // default: BAAI/bge-small-en-v1.5
503498
}
504499
505500
// EmbeddingServiceManager manages the lifecycle of a local TEI container.
506501
type EmbeddingServiceManager struct { ... }
507502
508503
// Start launches (or reuses) the TEI container and waits for readiness.
509-
// Returns the embedding service URL.
504+
// Returns the embedding service URL (with a dynamically allocated port).
510505
func (m *EmbeddingServiceManager) Start(ctx context.Context) (string, error) { ... }
511506
512507
// Stop gracefully stops the TEI container.
@@ -515,6 +510,8 @@ func (m *EmbeddingServiceManager) Stop(ctx context.Context) error { ... }
515510

516511
This uses ToolHive's existing container runtime abstraction (`pkg/container/`) to manage the TEI container, keeping it consistent with how `thv run` manages MCP server containers.
517512

513+
**Container labels for lifecycle ownership**: The TEI container is labeled using ToolHive's existing `pkg/labels/` system — specifically the `toolhive` label (marks it as ToolHive-managed) and the `toolhive-auxiliary` label (marks it as an auxiliary workload, similar to how the inspector container is labeled). This lets `thv` query by label to determine whether an existing TEI container is ToolHive-managed (safe to reuse/stop) or user-managed (hands off). On startup, the manager checks for a running container matching the expected name and labels — if one exists and is healthy, it reuses it; if one exists without the `toolhive` label, it assumes the user is managing their own TEI and does not touch it.
514+
518515
#### API Changes
519516

520517
No REST API changes. The `thv vmcp` subcommand operates independently of the ToolHive API server (`thv serve`). vMCP exposes its own HTTP endpoint (default `127.0.0.1:4483/mcp`) using the Streamable HTTP transport for MCP protocol communication.
@@ -682,6 +679,7 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
682679
- **Outgoing auth**: Per-backend authentication strategies (header injection, token exchange, unauthenticated) configured via the config file. Quick mode defaults to unauthenticated.
683680
- **Cedar authorization**: When configured, Cedar policies gate access to individual tools and resources. The library embedding path (brood-box) demonstrates this with `observe`, `safe-tools`, and `full-access` profiles.
684681
- **No privilege escalation**: The `thv vmcp` subcommand runs with the same privileges as the invoking user. No root or elevated permissions required.
682+
- **Unix socket transport (future)**: For local single-user scenarios, a Unix socket transport (similar to `thv serve --socket`) would sidestep port-binding and auth concerns entirely, relying on filesystem permissions for access control. This is a natural follow-up but out of scope for this RFC.
685683

686684
### Data Security
687685

@@ -718,7 +716,7 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
718716
- Quick mode generates a conservative default config (anonymous auth, prefix conflict resolution) that is safe for local single-user scenarios.
719717
- Library consumers are guided toward infrastructure encapsulation and anti-corruption layers, reducing the risk of misusing internal vmcp APIs.
720718
- The managed TEI container binds to `127.0.0.1` only, uses the same container isolation as other ToolHive workloads, and downloads models exclusively from HuggingFace Hub (verified models).
721-
- If the TEI container becomes unhealthy, vMCP degrades gracefully to FTS5-only search rather than failing entirely.
719+
- If the TEI container fails to start when `--optimizer-embedding` was explicitly requested, `thv vmcp serve` exits with a clear error rather than silently degrading. Users who want keyword-only search should use `--optimizer` instead.
722720

723721
## Alternatives Considered
724722

@@ -783,9 +781,9 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
783781
### Phase 4: Local Optimizer Support
784782

785783
- Implement `--optimizer` flag (FTS5-only, no container management — config wiring only)
786-
- Implement `EmbeddingServiceManager` in `pkg/vmcp/cli/` for TEI container lifecycle
787-
- Implement `--optimizer-embedding` flag with container start/stop, health polling, and graceful fallback
788-
- Add `--embedding-model` and `--embedding-port` flags
784+
- Implement `EmbeddingServiceManager` in `pkg/vmcp/cli/` for TEI container lifecycle with `pkg/labels/` integration for ownership tracking
785+
- Implement `--optimizer-embedding` flag with container start/stop, health polling, and fail-fast on TEI failure
786+
- Add `--embedding-model` and `--embedding-image` flags
789787
- Add named volume management for model caching
790788
- Add E2E tests for optimizer tiers (FTS5-only, managed TEI, config-file TEI)
791789

@@ -815,7 +813,7 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
815813
- **Optimizer tests**:
816814
- FTS5-only: `thv vmcp serve --group default --optimizer` -> client sees only `find_tool`/`call_tool` -> `find_tool` discovers backend tools by keyword
817815
- Managed TEI: `thv vmcp serve --group default --optimizer-embedding` -> TEI container starts -> `find_tool` performs semantic search -> on shutdown, TEI container stops
818-
- Fallback: TEI container fails to start -> vMCP falls back to FTS5-only with warning
816+
- Fail-fast: TEI container fails to start -> `thv vmcp serve --optimizer-embedding` exits with a clear error
819817
- Idempotent: Running `thv vmcp serve --optimizer-embedding` twice reuses the existing TEI container
820818
- **Regression tests**: Verify that the standalone `vmcp serve` command still works identically after the refactor.
821819
- **Security tests**: Verify that quick mode binds to `127.0.0.1` only, that strict YAML parsing rejects unknown fields, and that HMAC session binding is enforced when configured.
@@ -826,6 +824,8 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
826824
- **Architecture documentation**: `docs/arch/vmcp-local.md` covering local deployment, `docs/arch/vmcp-library.md` covering the library embedding pattern with brood-box as reference
827825
- **Examples**: `examples/vmcp-local-quickstart/` with a minimal setup, `examples/vmcp-advanced/` with auth, composite tools, and telemetry
828826
- **Existing docs updates**: Update `docs/arch/10-virtual-mcp-architecture.md` to reference the new CLI integration and library embedding path
827+
- **Docs website**: Update the ToolHive docs website to reflect the new `thv vmcp` subcommand and local optimizer story
828+
- **Legacy mcp-optimizer**: Once the native vMCP optimizer (Phase 4) is stable and documented, deprecate and archive the standalone `mcp-optimizer` Python project (`StacklokLabs/mcp-optimizer`). Remove references from the docs website and registry.
829829

830830
## Open Questions
831831

@@ -841,10 +841,12 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
841841

842842
6. **TEI container lifecycle on crash**: If `thv vmcp serve` is killed ungracefully (SIGKILL, OOM), the TEI container will be left running. Should a cleanup mechanism be added (e.g., check for orphaned `thv-embedding-*` containers on startup)?
843843

844-
7. **GPU support for TEI**: The default `cpu-latest` image works everywhere but is slower. Should `--optimizer-embedding` detect GPU availability and select a GPU-accelerated TEI image variant (e.g., `ghcr.io/huggingface/text-embeddings-inference:latest` for CUDA)?
844+
7. **GPU support for TEI**: The `--embedding-image` flag lets users select a GPU-accelerated TEI image variant (e.g., `ghcr.io/huggingface/text-embeddings-inference:turing-latest` for CUDA). Should `thv` also auto-detect GPU availability and suggest an appropriate image?
845845

846846
8. **Embedding model recommendations**: Should `thv vmcp init` or docs recommend specific models for different use cases (small/fast vs. large/accurate)?
847847

848+
9. **K8s dependency isolation (optional)**: The `thv` binary already transitively pulls `k8s.io/client-go` and `sigs.k8s.io/controller-runtime` through `pkg/container/kubernetes/`, so importing `pkg/vmcp/` does not introduce new module dependencies. However, the K8s-specific vMCP packages (`pkg/vmcp/k8s/`, `pkg/vmcp/workloads/k8s.go`) include code that is never activated in CLI mode. Gating these behind build tags or interfaces could reduce dead code in the `thv` binary, but this is a cleanup item rather than a blocker.
849+
848850
## References
849851

850852
- [THV-0022: Optimizer Migration to vMCP](https://github.com/stacklok/toolhive-rfcs/blob/main/rfcs/THV-0022-optimizer-migration-to-vmcp.md) — Optimizer architecture (SQLite FTS5, TEI embeddings, session-scoped indexing)

0 commit comments

Comments
 (0)