Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions docs/arch/08-workloads-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,32 @@ thv rm my-server

**Implementation**: `pkg/workloads/manager.go`

### Upgrade

```bash
thv upgrade check [my-server] # offline metadata comparison, never pulls
thv upgrade apply my-server --yes # verify + pull, then recreate
```

Upgrades apply only to **registry-sourced** workloads — those run by a registry entry name, which records `RunConfig.RegistryServerName`. A workload run from a raw image reference has no registry server name and is reported as `not-registry-sourced`.

**Check** is an offline comparison. The checker (`pkg/workloads/upgrade.Checker`) looks up the workload's `RegistryServerName` in the configured registry and compares the running image tag against the candidate the registry advertises (semver, conservative: a `latest` tag, a different repository, or non-comparable tags yield `unknown`). When the registry advertises a strictly newer tag the status is `upgrade-available`, and the result also surfaces **env-var drift** (variables the candidate now declares that the workload does not yet supply) and **posture drift** (transport, permission profile — network isolation is a local-only choice the registry cannot express, so it is not reported as drift). Checks never pull images.

**Apply** is the single security-critical path (`pkg/workloads/upgrade.Applier`). It re-derives the check against the registry on every apply (closing the time-of-check/time-of-use window — a `CheckResult` from an earlier `check` is never trusted), then, in order:

1. Resolves and **verifies** the candidate image's provenance (by registry server name).
2. Builds a merged `RunConfig` that **preserves the entire user configuration** — env vars, secrets, OIDC/authz/audit/telemetry, tool filters, middleware, transport/posture — and changes only the image, any merged env/secrets supplied via `--env`/`--secret`, and the registry source URLs.
3. Runs the policy gate and performs the verified **pull**.
4. Only then asks the manager to recreate the workload via `UpdateWorkload` (stop → delete → start with the new config).

Steps 1–3 all complete before any destruction, so a failure while preparing the candidate leaves the running workload untouched. **There is no automatic rollback**: once recreation begins, the previous image/config is not restored — recovery is a forward operation. Posture drift (transport, permission profile) is **surfaced as a warning, not converged**: the upgraded workload keeps its full existing posture, including transport, network isolation, and permission profile.

> Runtime boundary: the "verified pull before destruction" guarantee holds precisely for local container runtimes (the scope here). On Kubernetes the verification and policy gate still precede recreation, but the byte-level pull is delegated to the kubelet and happens after recreation.

The same `Applier` backs both the CLI (`thv upgrade apply`) and the API (`POST /api/v1beta/workloads/{name}/upgrade`, with `GET .../{name}/upgrade-check` and `GET .../upgrade-check` for single and bulk checks), so the verify-then-pull ordering lives in exactly one place. `thv list --check-upgrades` annotates the workload list with each workload's check result.

**Implementation**: `pkg/workloads/upgrade/`, `cmd/thv/app/upgrade.go`

### List

Listing combines container workloads from the runtime with remote workloads from persisted state. The manager can filter workloads by label or group, and can optionally include stopped workloads.
Expand Down
31 changes: 31 additions & 0 deletions test/e2e/testdata/upgrade/registry-high.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"version": "1.0.0",
"meta": {
"last_updated": "2026-01-15T10:00:00Z"
},
"data": {
"servers": [
{
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
"name": "osv-upgrade-test",
"title": "OSV (upgrade e2e fixture)",
"description": "OSV MCP server fixture for upgrade e2e tests (high tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the higher tag advertised as the upgrade candidate. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-low.json.",
"version": "1.0.0",
"repository": {
"url": "https://github.com/StacklokLabs/osv-mcp",
"source": "github"
},
"packages": [
{
"identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.3",
"registryType": "oci",
"transport": {
"type": "streamable-http",
"url": "http://localhost:8080"
}
}
]
}
]
}
}
31 changes: 31 additions & 0 deletions test/e2e/testdata/upgrade/registry-low.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"version": "1.0.0",
"meta": {
"last_updated": "2026-01-15T10:00:00Z"
},
"data": {
"servers": [
{
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
"name": "osv-upgrade-test",
"title": "OSV (upgrade e2e fixture)",
"description": "OSV MCP server fixture for upgrade e2e tests (low tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the lower tag so the upgrade checker reports an available upgrade once repointed to registry-high.json. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-high.json.",
"version": "1.0.0",
"repository": {
"url": "https://github.com/StacklokLabs/osv-mcp",
"source": "github"
},
"packages": [
{
"identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.1",
"registryType": "oci",
"transport": {
"type": "streamable-http",
"url": "http://localhost:8080"
}
}
]
}
]
}
}
258 changes: 258 additions & 0 deletions test/e2e/upgrade_e2e_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
// SPDX-FileCopyrightText: Copyright 2026 Stacklok, Inc.
// SPDX-License-Identifier: Apache-2.0

package e2e_test

import (
"encoding/json"
"os"
"path/filepath"
"strings"
"time"

. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"

"github.com/stacklok/toolhive/pkg/runner"
"github.com/stacklok/toolhive/pkg/workloads/upgrade"
"github.com/stacklok/toolhive/test/e2e"
)

const (
// rawOSVImage is the concrete image the bundled "osv" registry entry resolves
// to. Running it by its raw reference (rather than by a registry name)
// produces a working MCP server whose RunConfig has no RegistryServerName,
// which is exactly the input the "not-registry-sourced" path needs.
rawOSVImage = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"

// osvImageLow and osvImageHigh are two real, pullable osv-mcp releases. The
// upgrade-flow fixtures advertise these tags so the checker reports an
// available upgrade from low to high. Both tags run identically (only a minor
// version bump), and osv declares no required env vars or secrets, so
// `upgrade apply --yes` runs non-interactively without prompting.
osvImageLow = "ghcr.io/stackloklabs/osv-mcp/server:0.1.1"
osvImageHigh = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"

// upgradeRegistryServerName is the server name the fixtures expose (a bare
// "name": "osv-upgrade-test", so the recorded RegistryServerName matches it
// exactly). `thv run` is invoked with this name so RegistryServerName is set.
upgradeRegistryServerName = "osv-upgrade-test"
)

// upgradeCheckResults mirrors the JSON emitted by `thv upgrade check --format
// json`, which is an array of upgrade.CheckResult. We decode into the real type
// so the test breaks if the wire shape changes.
type upgradeCheckResults []upgrade.CheckResult

var _ = Describe("Upgrade Command", Label("core", "upgrade", "e2e"), func() {
var (
config *e2e.TestConfig
serverName string

// tempHome / tempData isolate this spec's ToolHive config and workload
// state from the developer's / CI runner's real ToolHive directories.
// `thv config set-registry` writes to $XDG_CONFIG_HOME/toolhive, and
// workload state lives under $XDG_DATA_HOME/toolhive; routing every thv
// invocation (run, check, apply, export, set/unset-registry, and cleanup)
// through thvCmd keeps both off the real config and consistent across the
// run -> check -> apply -> export -> cleanup sequence so they all see the
// same workload. The container runtime (Docker/Podman socket) is not
// HOME-dependent, so an isolated HOME does not affect it; PATH and the rest
// of the environment are inherited unchanged.
tempHome string
tempData string

// thvCmd builds a THVCommand bound to the isolated config/home/data dirs.
thvCmd func(args ...string) *e2e.THVCommand
)

BeforeEach(func() {
config = e2e.NewTestConfig()
serverName = e2e.GenerateUniqueServerName("upgrade-test")
tempHome = GinkgoT().TempDir()
tempData = GinkgoT().TempDir()

thvCmd = func(args ...string) *e2e.THVCommand {
return e2e.NewTHVCommand(config, args...).
WithEnv(
"XDG_CONFIG_HOME="+tempHome,
"HOME="+tempHome,
"XDG_DATA_HOME="+tempData,
)
}

err := e2e.CheckTHVBinaryAvailable(config)
Expect(err).ToNot(HaveOccurred(), "thv binary should be available")
})

AfterEach(func() {
if config.CleanupAfter {
// Stop and remove the workload using the SAME isolated env so cleanup
// targets the workload created by this spec. Tolerate a missing
// workload (a spec may have failed before creating it).
_, _, _ = thvCmd("stop", serverName).Run()
_, _, _ = thvCmd("rm", serverName).Run()
}
// The registry config lives under the discarded temp dirs, so there is
// nothing to restore; the real ToolHive config is never touched.
})

// Negative / cheap path: a workload created from a raw image reference is not
// registry-sourced, so no upgrade can ever be determined for it. This needs
// no registry fixture and is the safety-net coverage.
Describe("Checking a workload that is not registry-sourced", func() {
Context("when the workload was run from a raw image reference", func() {
It("should report not-registry-sourced in text output", func() {
By("Running an MCP server from a raw image reference")
thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()

By("Waiting for the server to be running")
waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)

By("Checking the workload for an available upgrade")
stdout, _ := thvCmd("upgrade", "check", serverName).ExpectSuccess()

By("Verifying the report says the workload is not registry-sourced")
Expect(stdout).To(ContainSubstring(serverName), "Report should name the workload")
Expect(stdout).To(ContainSubstring(string(upgrade.StatusNotRegistrySourced)),
"Report should show the not-registry-sourced status")
})

It("should emit parseable JSON with the expected fields", func() {
By("Running an MCP server from a raw image reference")
thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()

By("Waiting for the server to be running")
waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)

By("Checking the workload for an available upgrade in JSON format")
stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()

By("Verifying the JSON output is valid and contains the expected fields")
var results upgradeCheckResults
err := json.Unmarshal([]byte(stdout), &results)
Expect(err).ToNot(HaveOccurred(), "Output should be valid JSON")
Expect(results).To(HaveLen(1), "JSON should contain exactly one result")

result := results[0]
Expect(result.WorkloadName).To(Equal(serverName), "JSON should name the workload")
Expect(result.Status).To(Equal(upgrade.StatusNotRegistrySourced),
"JSON should show the not-registry-sourced status")
Expect(result.CurrentImage).To(Equal(rawOSVImage),
"JSON should record the raw image the workload is running")
Expect(result.RegistryServer).To(BeEmpty(),
"A raw-image workload should not have a registry server name")
Expect(result.CandidateImage).To(BeEmpty(),
"There should be no candidate image when no upgrade can be determined")
})
})
})

// Full upgrade flow: a registry-sourced workload whose registry advertises a
// newer tag is upgraded in place, and we verify it runs on the new image with
// its prior configuration preserved.
//
// Determinism: the two fixtures (testdata/upgrade/registry-{low,high}.json)
// advertise tags 0.1.1 and 0.1.3 of ghcr.io/stackloklabs/osv-mcp/server, both
// real, pullable releases that run identically. The fixtures are in the
// upstream MCP-registry format that `thv config set-registry` requires for a
// local file (the local provider rejects the legacy ToolHive-native format),
// and they mirror the bundled osv entry's transport (streamable-http) and
// internal target port (8080) exactly so the workload comes up the same way
// `thv run osv` would. osv declares no required env vars, so `upgrade apply
// --yes` never prompts. This runs by default in CI.
Describe("Applying an available upgrade to a registry-sourced workload", func() {
var (
tempDir string
fixtureLow string
fixtureHigh string
)

BeforeEach(func() {
tempDir = GinkgoT().TempDir()

// set-registry persists the path and resolves it later, so it must be
// absolute regardless of the working directory at resolution time.
var err error
fixtureLow, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-low.json"))
Expect(err).ToNot(HaveOccurred())
fixtureHigh, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-high.json"))
Expect(err).ToNot(HaveOccurred())
Expect(fixtureLow).To(BeAnExistingFile(), "low-tag registry fixture should exist")
Expect(fixtureHigh).To(BeAnExistingFile(), "high-tag registry fixture should exist")
})

It("should pull the candidate, recreate the workload, and preserve config", func() {
By("Pointing thv at the older-tag registry fixture")
thvCmd("config", "set-registry", fixtureLow).ExpectSuccess()

By("Running the server by its registry name with an environment variable")
thvCmd("run", "--name", serverName, "--env", "FOO=bar", upgradeRegistryServerName).ExpectSuccess()

By("Waiting for the server to be running")
waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)

By("Confirming the workload is registry-sourced and up to date against the older fixture")
stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
var before upgradeCheckResults
Expect(json.Unmarshal([]byte(stdout), &before)).To(Succeed(), "Output should be valid JSON")
Expect(before).To(HaveLen(1))
Expect(before[0].RegistryServer).To(Equal(upgradeRegistryServerName),
"Running by registry name should record the registry server name")
Expect(before[0].CurrentImage).To(Equal(osvImageLow),
"The workload should be running the lower-tag image")
Expect(before[0].Status).To(Equal(upgrade.StatusUpToDate),
"No upgrade should be available against the older-tag fixture")

By("Repointing thv at the registry fixture advertising the newer tag")
thvCmd("config", "set-registry", fixtureHigh).ExpectSuccess()

By("Checking that an upgrade is now available")
stdout, _ = thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
var avail upgradeCheckResults
Expect(json.Unmarshal([]byte(stdout), &avail)).To(Succeed(), "Output should be valid JSON")
Expect(avail).To(HaveLen(1))
Expect(avail[0].Status).To(Equal(upgrade.StatusUpgradeAvailable),
"An upgrade should be available after repointing to the newer tag")
Expect(avail[0].CandidateImage).To(Equal(osvImageHigh),
"The candidate image should carry the newer tag")

By("Applying the upgrade non-interactively")
thvCmd("upgrade", "apply", serverName, "--yes").ExpectSuccess()

By("Waiting for the upgraded workload to be running again")
waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)

By("Verifying the recorded image carries the newer tag and config is preserved")
exportPath := filepath.Join(tempDir, "upgraded-export.json")
thvCmd("export", serverName, exportPath).ExpectSuccess()

fileContent, err := os.ReadFile(exportPath)
Expect(err).ToNot(HaveOccurred())

var runConfig runner.RunConfig
Expect(json.Unmarshal(fileContent, &runConfig)).To(Succeed(), "Export should be valid JSON")
Expect(runConfig.Image).To(Equal(osvImageHigh),
"The upgraded workload should record the newer image tag")
Expect(runConfig.EnvVars).To(HaveKeyWithValue("FOO", "bar"),
"The upgrade should preserve the workload's environment variables")
})
})
})

// waitForIsolatedMCPServer polls `thv list` (through the supplied isolated-env
// command builder) until the named workload reports running, or fails the spec
// on timeout. It mirrors e2e.WaitForMCPServer but runs every poll under the same
// isolated config/home/data env as the rest of the spec, so it observes the
// workload created in that isolated state rather than the real ToolHive config.
func waitForIsolatedMCPServer(thvCmd func(args ...string) *e2e.THVCommand, serverName string, timeout time.Duration) {
GinkgoHelper()
Eventually(func() bool {
stdout, _, err := thvCmd("list").Run()
if err != nil {
return false
}
return strings.Contains(stdout, serverName) && strings.Contains(stdout, "running")
}, timeout, 1*time.Second).Should(BeTrue(),
"workload %q should be running within %s", serverName, timeout)
}
Loading