You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create pkg/vmcp/cli/embedding_manager.go implementing EmbeddingServiceManager, the component responsible for managing the full lifecycle of a HuggingFace Text Embeddings Inference (TEI) container used by the Tier 2 semantic optimizer. The manager handles idempotent container create-or-reuse, deterministic naming via a model-name hash, HTTP health polling with exponential backoff, and fail-fast behaviour when TEI cannot start. This is Phase 4 of RFC THV-0059 and is the prerequisite for wiring --optimizer-embedding into the thv vmcp serve command in #4887.
Context
RFC THV-0059 introduces three optimizer tiers for thv vmcp serve. Tier 2 augments the FTS5 keyword search with semantic vector search by running a TEI container (ghcr.io/huggingface/text-embeddings-inference:cpu-latest) as a managed auxiliary workload. The EmbeddingServiceManager encapsulates all container orchestration details so that #4887 can simply call Start(ctx) and receive back a URL to inject into the optimizer config.
The inspector command (cmd/thv/app/inspector.go) is the closest existing prior art: it uses pkg/container.Factory.Create(), pkg/labels.LabelAuxiliary, runtime.DeployWorkloadOptions, an HTTP readiness poll, and signal-based cleanup. The EmbeddingServiceManager follows the same pattern but adds idempotent reuse (check for an existing container by name before creating a new one), named-volume model caching, and exponential backoff on health polling to accommodate the 30–60 second model-download window on first start.
Dependencies: Depends on #4879 (see item_#4879_issue_body.md for the pkg/vmcp/cli/ package shape and conventions) Blocks: #4887
Acceptance Criteria
pkg/vmcp/cli/embedding_manager.go exists with the required SPDX copyright header
EmbeddingServiceManager is a concrete struct (or interface + constructor) exported from pkg/vmcp/cli
Container name is derived as thv-embedding-<model-short-hash> where the hash is a short (8-character) hex digest of the model name string; same model always produces the same container name
Start(ctx context.Context) (teiURL string, err error) checks whether a container with the computed name is already running via runtime.Runtime.IsWorkloadRunning; if running, returns the existing container's URL without creating a new one (idempotent reuse)
When no running container exists, Start deploys a new container using runtime.Runtime.DeployWorkload with labels.LabelAuxiliary = labels.LabelToolHiveValue applied
A named Docker volume (thv-embedding-models) is mounted at the container's model cache path to persist downloaded models across restarts
Port allocation for the TEI container uses pkg/networking.FindAvailable() to avoid hardcoded ports
After deploying the container, Start polls the TEI HTTP health endpoint with exponential backoff (initial interval 2 s, multiplier 2, cap 30 s) until the service is ready or the context deadline is exceeded
If --optimizer-embedding is explicitly set and TEI fails to start (health check never succeeds within the context deadline), Start returns a descriptive error — no silent FTS5 fallback
Stop(ctx context.Context) error stops the TEI container via runtime.Runtime.StopWorkload; it is a no-op if the manager did not start the container in this process invocation (i.e., it reused an existing container)
NewEmbeddingServiceManager(factory ContainerFactory, model string, image string) (*EmbeddingServiceManager, error) validates that model is non-empty
All new code passes go build ./pkg/vmcp/cli/... and go vet ./pkg/vmcp/cli/...
No new external Go module dependencies are introduced
Unit tests cover the cases listed in the Testing Strategy section
All existing tests pass (no regressions)
Code reviewed and approved
Technical Approach
Recommended Implementation
Introduce a ContainerFactory interface (minimal subset of *container.Factory) in pkg/vmcp/cli/ so that unit tests can inject a mock without depending on a live container daemon. The production NewEmbeddingServiceManager call site (in #4887) passes container.NewFactory().
The container name hash can use crypto/sha256 over the model string, then take the first 8 hex characters — matching the naming convention from the RFC (thv-embedding-<model-short-hash>). Because the hash is deterministic, two concurrent thv vmcp serve --optimizer-embedding invocations using the same model naturally share one TEI container.
The idempotency check calls rt.IsWorkloadRunning(ctx, containerName). If that returns true, reconstruct the URL from the known port (which must be stored on the manager after DeployWorkload returns it or derived from rt.ListWorkloads label inspection). If false, deploy a new container via rt.DeployWorkload.
Health polling should use a simple loop with time.Sleep and an exponential-backoff helper (no external library needed). The poll target is http://localhost:<port>/health or http://localhost:<port>/ (whichever TEI exposes — verify against the TEI image docs; the /health endpoint returns 200 when the model is loaded). Drain response bodies before closing to enable connection reuse (per research.md coding conventions).
The Stop method should only stop the container if started bool is true on the manager struct — that flag is set to true only when DeployWorkload was called in this invocation, not when an existing container was reused. This prevents a thv vmcp serve invocation from stopping a TEI container that another invocation is still using.
Patterns & Frameworks
SPDX headers: // SPDX-FileCopyrightText: Copyright 2025 Stacklok, Inc. and // SPDX-License-Identifier: Apache-2.0 on every new file
Auxiliary container pattern: follow cmd/thv/app/inspector.go — container.NewFactory().Create(ctx), labels.AddStandardLabels, labelsMap[labels.LabelAuxiliary] = labels.LabelToolHiveValue, runtime.DeployWorkloadOptions with PortBindings bound to 127.0.0.1
Error wrapping: fmt.Errorf("...: %w", err) — no swallowing, no silent fallbacks
Immutable variable assignment: prefer single-assignment variables; use immediately-invoked anonymous functions where needed
No external deps: use only packages already in go.mod; crypto/sha256 and encoding/hex are stdlib
Interface for testability: define a narrow ContainerFactory interface so gomock can generate a mock; do not test against a live Docker daemon in unit tests
gomock generation: add //go:generate mockgen ... directive; mock generated by task gen
Code Pointers
cmd/thv/app/inspector.go — Primary pattern reference for the auxiliary container lifecycle: factory creation, label application, DeployWorkload call, HTTP health polling goroutine, and cleanup logic
pkg/container/factory.go — Factory.Create(ctx context.Context) (runtime.Runtime, error) entry point; NewFactory() for production use
pkg/container/runtime/types.go — Runtime interface: DeployWorkload, IsWorkloadRunning, StopWorkload, ListWorkloads; DeployWorkloadOptions struct; ContainerInfo for port inspection
pkg/container/runtime/mocks/mock_runtime.go — Generated mock for Runtime; use as the pattern for the ContainerFactory mock
test/integration/vmcp/helpers/vmcp_server.go — Functional-options pattern; reference for how to structure an options-accepting constructor
Component Interfaces
// pkg/vmcp/cli/embedding_manager.go// ContainerFactory is the minimal interface over *container.Factory that// EmbeddingServiceManager needs. Defined here to enable unit-test injection.typeContainerFactoryinterface {
Create(ctx context.Context) (runtime.Runtime, error)
}
// EmbeddingServiceManagerConfig holds the parameters for the manager.typeEmbeddingServiceManagerConfigstruct {
// Model is the HuggingFace model name (e.g. "BAAI/bge-small-en-v1.5").// Required; must be non-empty.Modelstring// Image is the TEI container image. Defaults to// "ghcr.io/huggingface/text-embeddings-inference:cpu-latest" if empty.Imagestring
}
// EmbeddingServiceManager manages the lifecycle of a TEI container used by// the Tier 2 semantic optimizer.typeEmbeddingServiceManagerstruct {
factoryContainerFactorycfgEmbeddingServiceManagerConfigcontainerNamestringportintstartedbool// true only if this instance called DeployWorkload
}
// NewEmbeddingServiceManager constructs a manager from the given factory and config.// Returns an error if cfg.Model is empty.funcNewEmbeddingServiceManager(factoryContainerFactory, cfgEmbeddingServiceManagerConfig) (*EmbeddingServiceManager, error)
// Start ensures the TEI container is running and returns its HTTP base URL// (e.g. "http://localhost:12345"). On first call it checks for an existing// running container; if found, returns its URL without creating a new one.// If not found, deploys a new container and polls health with exponential// backoff until ready or ctx is cancelled.// Returns a non-nil error if --optimizer-embedding was requested but TEI// fails to start within the context deadline.func (m*EmbeddingServiceManager) Start(ctx context.Context) (string, error)
// Stop stops the TEI container, but only if this manager instance started it.// Returns nil without error if the container was already running (reused).func (m*EmbeddingServiceManager) Stop(ctx context.Context) error
// Naming helper (unexported)// modelShortHash returns the first 8 hex characters of the SHA-256 hash of model.funcmodelShortHash(modelstring) string {
sum:=sha256.Sum256([]byte(model))
returnhex.EncodeToString(sum[:])[:8]
}
// containerNameForModel returns the canonical TEI container name for a given model.// Format: thv-embedding-<model-short-hash>funccontainerNameForModel(modelstring) string {
return"thv-embedding-"+modelShortHash(model)
}
Testing Strategy
Unit Tests (pkg/vmcp/cli/embedding_manager_test.go)
Use go.uber.org/mock/gomock with a MockContainerFactory and MockRuntime generated from the ContainerFactory and runtime.Runtime interfaces.
TestContainerNameForModel: verify containerNameForModel is deterministic and produces the thv-embedding-<8-char-hex> format for two different models; assert different models produce different names
TestNewEmbeddingServiceManager_EmptyModel: constructor returns an error when cfg.Model is empty
TestStart_ReuseExistingContainer: mock IsWorkloadRunning returns true; assert DeployWorkload is never called; assert returned URL matches the expected http://localhost:<port> form; assert started == false so Stop is a no-op
TestStart_DeployNewContainer: mock IsWorkloadRunning returns false; mock DeployWorkload returns a port; mock HTTP health endpoint (use httptest.NewServer) responds 200 on first poll; assert URL returned is http://localhost:<port>; assert started == true
TestStart_HealthPollTimeout: mock IsWorkloadRunning returns false; mock DeployWorkload succeeds; health endpoint never returns 200; context cancelled after one poll cycle; assert Start returns a non-nil error containing a human-readable message about TEI failing to start
TestStop_OwnsContainer: started == true; assert StopWorkload is called once
TestStop_ReuseContainer: started == false; assert StopWorkload is never called
Edge Cases
DeployWorkload returns an error — Start propagates the error immediately without polling
Model name with slashes and mixed case (e.g. BAAI/bge-small-en-v1.5) produces a valid container name (no slashes, lowercase-safe)
Port returned by FindAvailable is 0 — Start returns a descriptive error before attempting deployment
Description
Create
pkg/vmcp/cli/embedding_manager.goimplementingEmbeddingServiceManager, the component responsible for managing the full lifecycle of a HuggingFace Text Embeddings Inference (TEI) container used by the Tier 2 semantic optimizer. The manager handles idempotent container create-or-reuse, deterministic naming via a model-name hash, HTTP health polling with exponential backoff, and fail-fast behaviour when TEI cannot start. This is Phase 4 of RFC THV-0059 and is the prerequisite for wiring--optimizer-embeddinginto thethv vmcp servecommand in #4887.Context
RFC THV-0059 introduces three optimizer tiers for
thv vmcp serve. Tier 2 augments the FTS5 keyword search with semantic vector search by running a TEI container (ghcr.io/huggingface/text-embeddings-inference:cpu-latest) as a managed auxiliary workload. TheEmbeddingServiceManagerencapsulates all container orchestration details so that #4887 can simply callStart(ctx)and receive back a URL to inject into the optimizer config.The inspector command (
cmd/thv/app/inspector.go) is the closest existing prior art: it usespkg/container.Factory.Create(),pkg/labels.LabelAuxiliary,runtime.DeployWorkloadOptions, an HTTP readiness poll, and signal-based cleanup. TheEmbeddingServiceManagerfollows the same pattern but adds idempotent reuse (check for an existing container by name before creating a new one), named-volume model caching, and exponential backoff on health polling to accommodate the 30–60 second model-download window on first start.Dependencies: Depends on #4879 (see
item_#4879_issue_body.mdfor thepkg/vmcp/cli/package shape and conventions)Blocks: #4887
Acceptance Criteria
pkg/vmcp/cli/embedding_manager.goexists with the required SPDX copyright headerEmbeddingServiceManageris a concrete struct (or interface + constructor) exported frompkg/vmcp/clithv-embedding-<model-short-hash>where the hash is a short (8-character) hex digest of the model name string; same model always produces the same container nameStart(ctx context.Context) (teiURL string, err error)checks whether a container with the computed name is already running viaruntime.Runtime.IsWorkloadRunning; if running, returns the existing container's URL without creating a new one (idempotent reuse)Startdeploys a new container usingruntime.Runtime.DeployWorkloadwithlabels.LabelAuxiliary = labels.LabelToolHiveValueappliedthv-embedding-models) is mounted at the container's model cache path to persist downloaded models across restartspkg/networking.FindAvailable()to avoid hardcoded portsStartpolls the TEI HTTP health endpoint with exponential backoff (initial interval 2 s, multiplier 2, cap 30 s) until the service is ready or the context deadline is exceeded--optimizer-embeddingis explicitly set and TEI fails to start (health check never succeeds within the context deadline),Startreturns a descriptive error — no silent FTS5 fallbackStop(ctx context.Context) errorstops the TEI container viaruntime.Runtime.StopWorkload; it is a no-op if the manager did not start the container in this process invocation (i.e., it reused an existing container)NewEmbeddingServiceManager(factory ContainerFactory, model string, image string) (*EmbeddingServiceManager, error)validates thatmodelis non-emptygo build ./pkg/vmcp/cli/...andgo vet ./pkg/vmcp/cli/...Technical Approach
Recommended Implementation
Introduce a
ContainerFactoryinterface (minimal subset of*container.Factory) inpkg/vmcp/cli/so that unit tests can inject a mock without depending on a live container daemon. The productionNewEmbeddingServiceManagercall site (in #4887) passescontainer.NewFactory().The container name hash can use
crypto/sha256over the model string, then take the first 8 hex characters — matching the naming convention from the RFC (thv-embedding-<model-short-hash>). Because the hash is deterministic, two concurrentthv vmcp serve --optimizer-embeddinginvocations using the same model naturally share one TEI container.The idempotency check calls
rt.IsWorkloadRunning(ctx, containerName). If that returns true, reconstruct the URL from the known port (which must be stored on the manager afterDeployWorkloadreturns it or derived fromrt.ListWorkloadslabel inspection). If false, deploy a new container viart.DeployWorkload.Health polling should use a simple loop with
time.Sleepand an exponential-backoff helper (no external library needed). The poll target ishttp://localhost:<port>/healthorhttp://localhost:<port>/(whichever TEI exposes — verify against the TEI image docs; the/healthendpoint returns 200 when the model is loaded). Drain response bodies before closing to enable connection reuse (perresearch.mdcoding conventions).The
Stopmethod should only stop the container ifstarted boolis true on the manager struct — that flag is set to true only whenDeployWorkloadwas called in this invocation, not when an existing container was reused. This prevents athv vmcp serveinvocation from stopping a TEI container that another invocation is still using.Patterns & Frameworks
// SPDX-FileCopyrightText: Copyright 2025 Stacklok, Inc.and// SPDX-License-Identifier: Apache-2.0on every new filecmd/thv/app/inspector.go—container.NewFactory().Create(ctx),labels.AddStandardLabels,labelsMap[labels.LabelAuxiliary] = labels.LabelToolHiveValue,runtime.DeployWorkloadOptionswithPortBindingsbound to127.0.0.1fmt.Errorf("...: %w", err)— no swallowing, no silent fallbacksgo.mod;crypto/sha256andencoding/hexare stdlibContainerFactoryinterface so gomock can generate a mock; do not test against a live Docker daemon in unit tests//go:generate mockgen ...directive; mock generated bytask genCode Pointers
cmd/thv/app/inspector.go— Primary pattern reference for the auxiliary container lifecycle: factory creation, label application,DeployWorkloadcall, HTTP health polling goroutine, and cleanup logicpkg/container/factory.go—Factory.Create(ctx context.Context) (runtime.Runtime, error)entry point;NewFactory()for production usepkg/container/runtime/types.go—Runtimeinterface:DeployWorkload,IsWorkloadRunning,StopWorkload,ListWorkloads;DeployWorkloadOptionsstruct;ContainerInfofor port inspectionpkg/container/runtime/mocks/mock_runtime.go— Generated mock forRuntime; use as the pattern for theContainerFactorymockpkg/labels/labels.go—LabelAuxiliary,LabelToolHiveValue,AddStandardLabels,SetGrouppkg/networking/port.go—FindAvailable()for dynamic port allocationpkg/vmcp/optimizer/optimizer.go—Config.EmbeddingServicefield that Wire optimizer flags intothv vmcp serve#4887 will populate with the TEI URL returned byStarttest/integration/vmcp/helpers/vmcp_server.go— Functional-options pattern; reference for how to structure an options-accepting constructorComponent Interfaces
Testing Strategy
Unit Tests (
pkg/vmcp/cli/embedding_manager_test.go)Use
go.uber.org/mock/gomockwith aMockContainerFactoryandMockRuntimegenerated from theContainerFactoryandruntime.Runtimeinterfaces.TestContainerNameForModel: verifycontainerNameForModelis deterministic and produces thethv-embedding-<8-char-hex>format for two different models; assert different models produce different namesTestNewEmbeddingServiceManager_EmptyModel: constructor returns an error whencfg.Modelis emptyTestStart_ReuseExistingContainer: mockIsWorkloadRunningreturnstrue; assertDeployWorkloadis never called; assert returned URL matches the expectedhttp://localhost:<port>form; assertstarted == falsesoStopis a no-opTestStart_DeployNewContainer: mockIsWorkloadRunningreturnsfalse; mockDeployWorkloadreturns a port; mock HTTP health endpoint (usehttptest.NewServer) responds 200 on first poll; assert URL returned ishttp://localhost:<port>; assertstarted == trueTestStart_HealthPollTimeout: mockIsWorkloadRunningreturnsfalse; mockDeployWorkloadsucceeds; health endpoint never returns 200; context cancelled after one poll cycle; assertStartreturns a non-nil error containing a human-readable message about TEI failing to startTestStop_OwnsContainer:started == true; assertStopWorkloadis called onceTestStop_ReuseContainer:started == false; assertStopWorkloadis never calledEdge Cases
DeployWorkloadreturns an error —Startpropagates the error immediately without pollingBAAI/bge-small-en-v1.5) produces a valid container name (no slashes, lowercase-safe)FindAvailableis 0 —Startreturns a descriptive error before attempting deploymentOut of Scope
--optimizer-embedding/--embedding-model/--embedding-imageflags intothv vmcp serve(that is Wire optimizer flags intothv vmcp serve#4887)thv vmcpprocess exit (that plumbing is in Wire optimizer flags intothv vmcp serve#4887's serve lifecycle)--optimizer-embeddingflow (that is E2E tests: optimizer tiers and regression #4889)pkg/vmcp/cli/References
EmbeddingServiceManager, container naming convention, health polling sequence, and named-volume model cachecmd/thv/app/inspector.go— Prior art for auxiliary container lifecycle inthvdocs/arch/10-virtual-mcp-architecture.md— Existing vMCP architecture; Tier 2 optimizer context.claude/rules/go-style.md— SPDX headers, error handling, drain response bodies, no nolint unless false positive.claude/rules/testing.md— gomock usage,t.Cleanup, random ports in tests, Ginkgo/Gomega patterns