Skip to content

Implement EmbeddingServiceManager in pkg/vmcp/cli/ #4884

Description

@yrobla

Description

Create pkg/vmcp/cli/embedding_manager.go implementing EmbeddingServiceManager, the component responsible for managing the full lifecycle of a HuggingFace Text Embeddings Inference (TEI) container used by the Tier 2 semantic optimizer. The manager handles idempotent container create-or-reuse, deterministic naming via a model-name hash, HTTP health polling with exponential backoff, and fail-fast behaviour when TEI cannot start. This is Phase 4 of RFC THV-0059 and is the prerequisite for wiring --optimizer-embedding into the thv vmcp serve command in #4887.

Context

RFC THV-0059 introduces three optimizer tiers for thv vmcp serve. Tier 2 augments the FTS5 keyword search with semantic vector search by running a TEI container (ghcr.io/huggingface/text-embeddings-inference:cpu-latest) as a managed auxiliary workload. The EmbeddingServiceManager encapsulates all container orchestration details so that #4887 can simply call Start(ctx) and receive back a URL to inject into the optimizer config.

The inspector command (cmd/thv/app/inspector.go) is the closest existing prior art: it uses pkg/container.Factory.Create(), pkg/labels.LabelAuxiliary, runtime.DeployWorkloadOptions, an HTTP readiness poll, and signal-based cleanup. The EmbeddingServiceManager follows the same pattern but adds idempotent reuse (check for an existing container by name before creating a new one), named-volume model caching, and exponential backoff on health polling to accommodate the 30–60 second model-download window on first start.

Dependencies: Depends on #4879 (see item_#4879_issue_body.md for the pkg/vmcp/cli/ package shape and conventions)
Blocks: #4887

Acceptance Criteria

  • pkg/vmcp/cli/embedding_manager.go exists with the required SPDX copyright header
  • EmbeddingServiceManager is a concrete struct (or interface + constructor) exported from pkg/vmcp/cli
  • Container name is derived as thv-embedding-<model-short-hash> where the hash is a short (8-character) hex digest of the model name string; same model always produces the same container name
  • Start(ctx context.Context) (teiURL string, err error) checks whether a container with the computed name is already running via runtime.Runtime.IsWorkloadRunning; if running, returns the existing container's URL without creating a new one (idempotent reuse)
  • When no running container exists, Start deploys a new container using runtime.Runtime.DeployWorkload with labels.LabelAuxiliary = labels.LabelToolHiveValue applied
  • A named Docker volume (thv-embedding-models) is mounted at the container's model cache path to persist downloaded models across restarts
  • Port allocation for the TEI container uses pkg/networking.FindAvailable() to avoid hardcoded ports
  • After deploying the container, Start polls the TEI HTTP health endpoint with exponential backoff (initial interval 2 s, multiplier 2, cap 30 s) until the service is ready or the context deadline is exceeded
  • If --optimizer-embedding is explicitly set and TEI fails to start (health check never succeeds within the context deadline), Start returns a descriptive error — no silent FTS5 fallback
  • Stop(ctx context.Context) error stops the TEI container via runtime.Runtime.StopWorkload; it is a no-op if the manager did not start the container in this process invocation (i.e., it reused an existing container)
  • NewEmbeddingServiceManager(factory ContainerFactory, model string, image string) (*EmbeddingServiceManager, error) validates that model is non-empty
  • All new code passes go build ./pkg/vmcp/cli/... and go vet ./pkg/vmcp/cli/...
  • No new external Go module dependencies are introduced
  • Unit tests cover the cases listed in the Testing Strategy section
  • All existing tests pass (no regressions)
  • Code reviewed and approved

Technical Approach

Recommended Implementation

Introduce a ContainerFactory interface (minimal subset of *container.Factory) in pkg/vmcp/cli/ so that unit tests can inject a mock without depending on a live container daemon. The production NewEmbeddingServiceManager call site (in #4887) passes container.NewFactory().

The container name hash can use crypto/sha256 over the model string, then take the first 8 hex characters — matching the naming convention from the RFC (thv-embedding-<model-short-hash>). Because the hash is deterministic, two concurrent thv vmcp serve --optimizer-embedding invocations using the same model naturally share one TEI container.

The idempotency check calls rt.IsWorkloadRunning(ctx, containerName). If that returns true, reconstruct the URL from the known port (which must be stored on the manager after DeployWorkload returns it or derived from rt.ListWorkloads label inspection). If false, deploy a new container via rt.DeployWorkload.

Health polling should use a simple loop with time.Sleep and an exponential-backoff helper (no external library needed). The poll target is http://localhost:<port>/health or http://localhost:<port>/ (whichever TEI exposes — verify against the TEI image docs; the /health endpoint returns 200 when the model is loaded). Drain response bodies before closing to enable connection reuse (per research.md coding conventions).

The Stop method should only stop the container if started bool is true on the manager struct — that flag is set to true only when DeployWorkload was called in this invocation, not when an existing container was reused. This prevents a thv vmcp serve invocation from stopping a TEI container that another invocation is still using.

Patterns & Frameworks

  • SPDX headers: // SPDX-FileCopyrightText: Copyright 2025 Stacklok, Inc. and // SPDX-License-Identifier: Apache-2.0 on every new file
  • Auxiliary container pattern: follow cmd/thv/app/inspector.gocontainer.NewFactory().Create(ctx), labels.AddStandardLabels, labelsMap[labels.LabelAuxiliary] = labels.LabelToolHiveValue, runtime.DeployWorkloadOptions with PortBindings bound to 127.0.0.1
  • Error wrapping: fmt.Errorf("...: %w", err) — no swallowing, no silent fallbacks
  • Immutable variable assignment: prefer single-assignment variables; use immediately-invoked anonymous functions where needed
  • No external deps: use only packages already in go.mod; crypto/sha256 and encoding/hex are stdlib
  • Interface for testability: define a narrow ContainerFactory interface so gomock can generate a mock; do not test against a live Docker daemon in unit tests
  • gomock generation: add //go:generate mockgen ... directive; mock generated by task gen

Code Pointers

  • cmd/thv/app/inspector.go — Primary pattern reference for the auxiliary container lifecycle: factory creation, label application, DeployWorkload call, HTTP health polling goroutine, and cleanup logic
  • pkg/container/factory.goFactory.Create(ctx context.Context) (runtime.Runtime, error) entry point; NewFactory() for production use
  • pkg/container/runtime/types.goRuntime interface: DeployWorkload, IsWorkloadRunning, StopWorkload, ListWorkloads; DeployWorkloadOptions struct; ContainerInfo for port inspection
  • pkg/container/runtime/mocks/mock_runtime.go — Generated mock for Runtime; use as the pattern for the ContainerFactory mock
  • pkg/labels/labels.goLabelAuxiliary, LabelToolHiveValue, AddStandardLabels, SetGroup
  • pkg/networking/port.goFindAvailable() for dynamic port allocation
  • pkg/vmcp/optimizer/optimizer.goConfig.EmbeddingService field that Wire optimizer flags into thv vmcp serve #4887 will populate with the TEI URL returned by Start
  • test/integration/vmcp/helpers/vmcp_server.go — Functional-options pattern; reference for how to structure an options-accepting constructor

Component Interfaces

// pkg/vmcp/cli/embedding_manager.go

// ContainerFactory is the minimal interface over *container.Factory that
// EmbeddingServiceManager needs. Defined here to enable unit-test injection.
type ContainerFactory interface {
    Create(ctx context.Context) (runtime.Runtime, error)
}

// EmbeddingServiceManagerConfig holds the parameters for the manager.
type EmbeddingServiceManagerConfig struct {
    // Model is the HuggingFace model name (e.g. "BAAI/bge-small-en-v1.5").
    // Required; must be non-empty.
    Model string
    // Image is the TEI container image. Defaults to
    // "ghcr.io/huggingface/text-embeddings-inference:cpu-latest" if empty.
    Image string
}

// EmbeddingServiceManager manages the lifecycle of a TEI container used by
// the Tier 2 semantic optimizer.
type EmbeddingServiceManager struct {
    factory       ContainerFactory
    cfg           EmbeddingServiceManagerConfig
    containerName string
    port          int
    started       bool // true only if this instance called DeployWorkload
}

// NewEmbeddingServiceManager constructs a manager from the given factory and config.
// Returns an error if cfg.Model is empty.
func NewEmbeddingServiceManager(factory ContainerFactory, cfg EmbeddingServiceManagerConfig) (*EmbeddingServiceManager, error)

// Start ensures the TEI container is running and returns its HTTP base URL
// (e.g. "http://localhost:12345"). On first call it checks for an existing
// running container; if found, returns its URL without creating a new one.
// If not found, deploys a new container and polls health with exponential
// backoff until ready or ctx is cancelled.
// Returns a non-nil error if --optimizer-embedding was requested but TEI
// fails to start within the context deadline.
func (m *EmbeddingServiceManager) Start(ctx context.Context) (string, error)

// Stop stops the TEI container, but only if this manager instance started it.
// Returns nil without error if the container was already running (reused).
func (m *EmbeddingServiceManager) Stop(ctx context.Context) error
// Naming helper (unexported)
// modelShortHash returns the first 8 hex characters of the SHA-256 hash of model.
func modelShortHash(model string) string {
    sum := sha256.Sum256([]byte(model))
    return hex.EncodeToString(sum[:])[:8]
}

// containerNameForModel returns the canonical TEI container name for a given model.
// Format: thv-embedding-<model-short-hash>
func containerNameForModel(model string) string {
    return "thv-embedding-" + modelShortHash(model)
}

Testing Strategy

Unit Tests (pkg/vmcp/cli/embedding_manager_test.go)

Use go.uber.org/mock/gomock with a MockContainerFactory and MockRuntime generated from the ContainerFactory and runtime.Runtime interfaces.

  • TestContainerNameForModel: verify containerNameForModel is deterministic and produces the thv-embedding-<8-char-hex> format for two different models; assert different models produce different names
  • TestNewEmbeddingServiceManager_EmptyModel: constructor returns an error when cfg.Model is empty
  • TestStart_ReuseExistingContainer: mock IsWorkloadRunning returns true; assert DeployWorkload is never called; assert returned URL matches the expected http://localhost:<port> form; assert started == false so Stop is a no-op
  • TestStart_DeployNewContainer: mock IsWorkloadRunning returns false; mock DeployWorkload returns a port; mock HTTP health endpoint (use httptest.NewServer) responds 200 on first poll; assert URL returned is http://localhost:<port>; assert started == true
  • TestStart_HealthPollTimeout: mock IsWorkloadRunning returns false; mock DeployWorkload succeeds; health endpoint never returns 200; context cancelled after one poll cycle; assert Start returns a non-nil error containing a human-readable message about TEI failing to start
  • TestStop_OwnsContainer: started == true; assert StopWorkload is called once
  • TestStop_ReuseContainer: started == false; assert StopWorkload is never called

Edge Cases

  • DeployWorkload returns an error — Start propagates the error immediately without polling
  • Model name with slashes and mixed case (e.g. BAAI/bge-small-en-v1.5) produces a valid container name (no slashes, lowercase-safe)
  • Port returned by FindAvailable is 0 — Start returns a descriptive error before attempting deployment

Out of Scope

References

  • RFC THV-0059 — Authoritative design; Phase 4 covers EmbeddingServiceManager, container naming convention, health polling sequence, and named-volume model cache
  • GitHub Issue #4808 — Parent tracking issue
  • cmd/thv/app/inspector.go — Prior art for auxiliary container lifecycle in thv
  • docs/arch/10-virtual-mcp-architecture.md — Existing vMCP architecture; Tier 2 optimizer context
  • .claude/rules/go-style.md — SPDX headers, error handling, drain response bodies, no nolint unless false positive
  • .claude/rules/testing.md — gomock usage, t.Cleanup, random ports in tests, Ginkgo/Gomega patterns

Metadata

Metadata

Assignees

No one assigned

    Labels

    cliChanges that impact CLI functionalityenhancementNew feature or requestvmcpVirtual MCP Server related issues
    No fields configured for Task 📋.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions