You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Address review feedback on THV-0059 vMCP local experience RFC
- Add goal to deprecate legacy mcp-optimizer Python project
- Add --embedding-image flag for TEI container image configurability
- Change TEI container naming to model-hash based for shared reuse
- Switch from fixed port to random port via pkg/networking.FindAvailable()
- Change from silent FTS5 fallback to fail-fast on TEI failure
- Add pkg/labels/ integration for container lifecycle ownership
- Remove commented-out backends from init output (single source of truth)
- Expand platform considerations with Rosetta 2 emulation details
- Note Unix socket transport as future security consideration
- Add docs website update and legacy optimizer removal to Documentation
- Add K8s dependency isolation as optional cleanup item
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: rfcs/THV-0059-vmcp-local-experience.md
+37-35Lines changed: 37 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
-**Status**: Draft
4
4
-**Author(s)**: Juan Antonio Osorio (@JAORMX)
5
5
-**Created**: 2026-03-24
6
-
-**Last Updated**: 2026-03-24
6
+
-**Last Updated**: 2026-04-14
7
7
-**Target Repository**: toolhive
8
8
-**Related Issues**: N/A
9
9
@@ -33,7 +33,8 @@ All ToolHive users who manage more than one MCP server locally are affected. The
33
33
34
34
- Add a `thv vmcp` subcommand with `serve`, `validate`, and `init` sub-commands that integrate vMCP into the main CLI
35
35
- Support a zero-config quickstart: `thv vmcp serve --group <name>` should work without a config file for simple aggregation
36
-
- Bring the optimizer to the local experience with managed embedding service lifecycle (`--optimizer` flag auto-manages a TEI container)
36
+
- Bring the optimizer to the local experience with managed embedding service lifecycle (`--optimizer-embedding` flag auto-manages a TEI container)
37
+
- Deprecate and eventually remove the legacy standalone `mcp-optimizer` Python project (`StacklokLabs/mcp-optimizer`) once the native vMCP optimizer is stable
37
38
- Maintain full feature parity with the standalone `vmcp` binary for users who need advanced configuration
38
39
- Include vMCP in goreleaser so it ships with every ToolHive release (embedded in `thv` as a subcommand)
39
40
- Document the library embedding pattern used by brood-box as an officially supported integration path
@@ -137,7 +138,7 @@ Flags:
137
138
--optimizer Enable the tool optimizer (FTS5 keyword search)
138
139
--optimizer-embedding Enable the optimizer with semantic search (auto-manages TEI container)
139
140
--embedding-model string HuggingFace model for semantic search (default: "BAAI/bge-small-en-v1.5")
140
-
--embedding-port int Port for the embedding service container (default: 8384)
- **Container image**: `ghcr.io/huggingface/text-embeddings-inference:cpu-latest` by default (same image the K8s `EmbeddingServer` CRD uses). Configurable via `--embedding-image` to support GPU-accelerated variants (e.g., `ghcr.io/huggingface/text-embeddings-inference:turing-latest` for CUDA) or architecture-specific images.
451
+
- **Container name**: `thv-embedding-<model-short-hash>` (e.g., `thv-embedding-bge-sm-a1b2c3`), derived from a hash of the model name. This allows multiple vMCP instances using the same embedding model to share a single TEI container, avoiding unnecessary duplication. Different models get separate containers.
455
452
- **Model cache**: Named volume `thv-embedding-model-cache` mounted at `/data` with `HF_HOME=/data`. This avoids re-downloading the model on every start (~130MB for `bge-small-en-v1.5`).
456
453
- **Health check**: Poll `GET /health` with backoff until the TEI server reports ready. TEI must download and load the model on first start, which can take 30-60 seconds.
457
-
- **Port binding**: Default `127.0.0.1:8384`. Chosen to avoid conflicting with the vMCP port (4483) or common dev ports.
458
-
- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or becomes unhealthy, vMCP falls back to FTS5-only mode with a warning.
459
-
- **Idempotent start**: If a `thv-embedding-<group>` container is already running (e.g., from a previous invocation), reuse it rather than creating a new one.
460
-
- **Platform considerations**: The TEI CPU image is amd64. On ARM64 hosts (Apple Silicon), the container runs under emulation. A future enhancement could detect architecture and select an appropriate image variant.
454
+
- **Port binding**: A random available port is allocated using ToolHive's existing `pkg/networking.FindAvailable()` pattern (the same mechanism `thv run` uses when no explicit port is given). The allocated port is reported in the logs. This avoids conflicts when multiple vMCP instances or other services are running.
455
+
- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or become healthy when `--optimizer-embedding` was explicitly requested, `thv vmcp serve` exits with a clear error message. The `--optimizer-embedding` flag is a clear signal that the user wants semantic search — silently degrading to FTS5-only would mask environment issues (Docker not running, port conflicts, etc.) and produce confusingly poor results. Users who want keyword-only search should use `--optimizer` explicitly.
456
+
- **Idempotent start**: If a TEI container with the matching name and ToolHive labels is already running (e.g., from a previous invocation or another vMCP instance with the same model), reuse it rather than creating a new one.
457
+
- **Platform considerations**: The TEI CPU image is amd64-only. On ARM64 hosts (Apple Silicon), Docker Desktop handles this automatically via Rosetta 2 emulation — no `--platform` flag is needed. The overhead is roughly 5-15% for CPU-bound workloads, which is acceptable for embedding generation. The `--embedding-image` flag lets users select architecture-native images when available, making architecture management the user's responsibility rather than requiring auto-detection logic.
461
458
462
459
**Tier 3: Full config control**
463
460
@@ -496,17 +493,15 @@ A new component in `pkg/vmcp/cli/` manages the TEI container lifecycle:
496
493
```go
497
494
// EmbeddingServiceConfig holds parameters for the managed embedding container.
This uses ToolHive's existing container runtime abstraction (`pkg/container/`) to manage the TEI container, keeping it consistent with how `thv run` manages MCP server containers.
517
512
513
+
**Container labels for lifecycle ownership**: The TEI container is labeled using ToolHive's existing `pkg/labels/` system — specifically the `toolhive` label (marks it as ToolHive-managed) and the `toolhive-auxiliary` label (marks it as an auxiliary workload, similar to how the inspector container is labeled). This lets `thv` query by label to determine whether an existing TEI container is ToolHive-managed (safe to reuse/stop) or user-managed (hands off). On startup, the manager checks for a running container matching the expected name and labels — if one exists and is healthy, it reuses it; if one exists without the `toolhive` label, it assumes the user is managing their own TEI and does not touch it.
514
+
518
515
#### API Changes
519
516
520
517
No REST API changes. The `thv vmcp` subcommand operates independently of the ToolHive API server (`thv serve`). vMCP exposes its own HTTP endpoint (default `127.0.0.1:4483/mcp`) using the Streamable HTTP transport for MCP protocol communication.
@@ -682,6 +679,7 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
682
679
- **Outgoing auth**: Per-backend authentication strategies (header injection, token exchange, unauthenticated) configured via the config file. Quick mode defaults to unauthenticated.
683
680
- **Cedar authorization**: When configured, Cedar policies gate access to individual tools and resources. The library embedding path (brood-box) demonstrates this with `observe`, `safe-tools`, and `full-access` profiles.
684
681
- **No privilege escalation**: The `thv vmcp` subcommand runs with the same privileges as the invoking user. No root or elevated permissions required.
682
+
- **Unix socket transport (future)**: For local single-user scenarios, a Unix socket transport (similar to `thv serve --socket`) would sidestep port-binding and auth concerns entirely, relying on filesystem permissions for access control. This is a natural follow-up but out of scope for this RFC.
685
683
686
684
### Data Security
687
685
@@ -718,7 +716,7 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
718
716
- Quick mode generates a conservative default config (anonymous auth, prefix conflict resolution) that is safe for local single-user scenarios.
719
717
- Library consumers are guided toward infrastructure encapsulation and anti-corruption layers, reducing the risk of misusing internal vmcp APIs.
720
718
- The managed TEI container binds to `127.0.0.1` only, uses the same container isolation as other ToolHive workloads, and downloads models exclusively from HuggingFace Hub (verified models).
721
-
- If the TEI container becomes unhealthy, vMCP degrades gracefully to FTS5-only search rather than failing entirely.
719
+
- If the TEI container fails to start when `--optimizer-embedding` was explicitly requested, `thv vmcp serve` exits with a clear error rather than silently degrading. Users who want keyword-only search should use `--optimizer` instead.
722
720
723
721
## Alternatives Considered
724
722
@@ -783,9 +781,9 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
783
781
### Phase 4: Local Optimizer Support
784
782
785
783
- Implement `--optimizer` flag (FTS5-only, no container management — config wiring only)
786
-
- Implement `EmbeddingServiceManager` in `pkg/vmcp/cli/` for TEI container lifecycle
787
-
- Implement `--optimizer-embedding` flag with container start/stop, health polling, and graceful fallback
788
-
- Add `--embedding-model` and `--embedding-port` flags
784
+
- Implement `EmbeddingServiceManager` in `pkg/vmcp/cli/` for TEI container lifecycle with `pkg/labels/` integration for ownership tracking
785
+
- Implement `--optimizer-embedding` flag with container start/stop, health polling, and fail-fast on TEI failure
786
+
- Add `--embedding-model` and `--embedding-image` flags
- **Regression tests**: Verify that the standalone `vmcp serve` command still works identically after the refactor.
821
819
- **Security tests**: Verify that quick mode binds to `127.0.0.1` only, that strict YAML parsing rejects unknown fields, and that HMAC session binding is enforced when configured.
@@ -826,6 +824,8 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
826
824
- **Architecture documentation**: `docs/arch/vmcp-local.md` covering local deployment, `docs/arch/vmcp-library.md` covering the library embedding pattern with brood-box as reference
827
825
- **Examples**: `examples/vmcp-local-quickstart/` with a minimal setup, `examples/vmcp-advanced/` with auth, composite tools, and telemetry
828
826
- **Existing docs updates**: Update `docs/arch/10-virtual-mcp-architecture.md` to reference the new CLI integration and library embedding path
827
+
- **Docs website**: Update the ToolHive docs website to reflect the new `thv vmcp` subcommand and local optimizer story
828
+
- **Legacy mcp-optimizer**: Once the native vMCP optimizer (Phase 4) is stable and documented, deprecate and archive the standalone `mcp-optimizer` Python project (`StacklokLabs/mcp-optimizer`). Remove references from the docs website and registry.
829
829
830
830
## Open Questions
831
831
@@ -841,10 +841,12 @@ The `thv vmcp` subcommand introduces no new attack surface beyond what the stand
841
841
842
842
6. **TEI container lifecycle on crash**: If `thv vmcp serve` is killed ungracefully (SIGKILL, OOM), the TEI container will be left running. Should a cleanup mechanism be added (e.g., check for orphaned `thv-embedding-*` containers on startup)?
843
843
844
-
7. **GPU support for TEI**: The default `cpu-latest` image works everywhere but is slower. Should `--optimizer-embedding` detect GPU availability and select a GPU-accelerated TEI image variant (e.g., `ghcr.io/huggingface/text-embeddings-inference:latest` for CUDA)?
844
+
7. **GPU support for TEI**: The `--embedding-image` flag lets users select a GPU-accelerated TEI image variant (e.g., `ghcr.io/huggingface/text-embeddings-inference:turing-latest` for CUDA). Should `thv` also auto-detect GPU availability and suggest an appropriate image?
845
845
846
846
8. **Embedding model recommendations**: Should `thv vmcp init` or docs recommend specific models for different use cases (small/fast vs. large/accurate)?
847
847
848
+
9. **K8s dependency isolation (optional)**: The `thv` binary already transitively pulls `k8s.io/client-go` and `sigs.k8s.io/controller-runtime` through `pkg/container/kubernetes/`, so importing `pkg/vmcp/` does not introduce new module dependencies. However, the K8s-specific vMCP packages (`pkg/vmcp/k8s/`, `pkg/vmcp/workloads/k8s.go`) include code that is never activated in CLI mode. Gating these behind build tags or interfaces could reduce dead code in the `thv` binary, but this is a cleanup item rather than a blocker.
0 commit comments