stacklok · JAORMX · Jun 2, 2026 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
@@ -96,6 +96,32 @@ thv rm my-server
 
 **Implementation**: `pkg/workloads/manager.go`
 
+### Upgrade
+
+```bash
+thv upgrade check [my-server]      # offline metadata comparison, never pulls
+thv upgrade apply my-server --yes  # verify + pull, then recreate
+```
+
+Upgrades apply only to **registry-sourced** workloads — those run by a registry entry name, which records `RunConfig.RegistryServerName`. A workload run from a raw image reference has no registry server name and is reported as `not-registry-sourced`.
+
+**Check** is an offline comparison. The checker (`pkg/workloads/upgrade.Checker`) looks up the workload's `RegistryServerName` in the configured registry and compares the running image tag against the candidate the registry advertises (semver, conservative: a `latest` tag, a different repository, or non-comparable tags yield `unknown`). When the registry advertises a strictly newer tag the status is `upgrade-available`, and the result also surfaces **env-var drift** (variables the candidate now declares that the workload does not yet supply) and **posture drift** (transport, permission profile — network isolation is a local-only choice the registry cannot express, so it is not reported as drift). Checks never pull images.
+
+**Apply** is the single security-critical path (`pkg/workloads/upgrade.Applier`). It re-derives the check against the registry on every apply (closing the time-of-check/time-of-use window — a `CheckResult` from an earlier `check` is never trusted), then, in order:
+
+1. Resolves and **verifies** the candidate image's provenance (by registry server name).
+2. Builds a merged `RunConfig` that **preserves the entire user configuration** — env vars, secrets, OIDC/authz/audit/telemetry, tool filters, middleware, transport/posture — and changes only the image, any merged env/secrets supplied via `--env`/`--secret`, and the registry source URLs.
+3. Runs the policy gate and performs the verified **pull**.
+4. Only then asks the manager to recreate the workload via `UpdateWorkload` (stop → delete → start with the new config).
+
+Steps 1–3 all complete before any destruction, so a failure while preparing the candidate leaves the running workload untouched. **There is no automatic rollback**: once recreation begins, the previous image/config is not restored — recovery is a forward operation. Posture drift (transport, permission profile) is **surfaced as a warning, not converged**: the upgraded workload keeps its full existing posture, including transport, network isolation, and permission profile.
+
+> Runtime boundary: the "verified pull before destruction" guarantee holds precisely for local container runtimes (the scope here). On Kubernetes the verification and policy gate still precede recreation, but the byte-level pull is delegated to the kubelet and happens after recreation.
+
+The same `Applier` backs both the CLI (`thv upgrade apply`) and the API (`POST /api/v1beta/workloads/{name}/upgrade`, with `GET .../{name}/upgrade-check` and `GET .../upgrade-check` for single and bulk checks), so the verify-then-pull ordering lives in exactly one place. `thv list --check-upgrades` annotates the workload list with each workload's check result.
+
+**Implementation**: `pkg/workloads/upgrade/`, `cmd/thv/app/upgrade.go`
+
 ### List
 
 Listing combines container workloads from the runtime with remote workloads from persisted state. The manager can filter workloads by label or group, and can optionally include stopped workloads.

@@ -0,0 +1,31 @@
+{
+  "version": "1.0.0",
+  "meta": {
+    "last_updated": "2026-01-15T10:00:00Z"
+  },
+  "data": {
+    "servers": [
+      {
+        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
+        "name": "osv-upgrade-test",
+        "title": "OSV (upgrade e2e fixture)",
+        "description": "OSV MCP server fixture for upgrade e2e tests (high tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the higher tag advertised as the upgrade candidate. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-low.json.",
+        "version": "1.0.0",
+        "repository": {
+          "url": "https://github.com/StacklokLabs/osv-mcp",
+          "source": "github"
+        },
+        "packages": [
+          {
+            "identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.3",
+            "registryType": "oci",
+            "transport": {
+              "type": "streamable-http",
+              "url": "http://localhost:8080"
+            }
+          }
+        ]
+      }
+    ]
+  }
+}
@@ -0,0 +1,31 @@
+{
+  "version": "1.0.0",
+  "meta": {
+    "last_updated": "2026-01-15T10:00:00Z"
+  },
+  "data": {
+    "servers": [
+      {
+        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
+        "name": "osv-upgrade-test",
+        "title": "OSV (upgrade e2e fixture)",
+        "description": "OSV MCP server fixture for upgrade e2e tests (low tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the lower tag so the upgrade checker reports an available upgrade once repointed to registry-high.json. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-high.json.",
+        "version": "1.0.0",
+        "repository": {
+          "url": "https://github.com/StacklokLabs/osv-mcp",
+          "source": "github"
+        },
+        "packages": [
+          {
+            "identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.1",
+            "registryType": "oci",
+            "transport": {
+              "type": "streamable-http",
+              "url": "http://localhost:8080"
+            }
+          }
+        ]
+      }
+    ]
+  }
+}
@@ -0,0 +1,258 @@
+// SPDX-FileCopyrightText: Copyright 2026 Stacklok, Inc.
+// SPDX-License-Identifier: Apache-2.0
+
+package e2e_test
+
+import (
+	"encoding/json"
+	"os"
+	"path/filepath"
+	"strings"
+	"time"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/stacklok/toolhive/pkg/runner"
+	"github.com/stacklok/toolhive/pkg/workloads/upgrade"
+	"github.com/stacklok/toolhive/test/e2e"
+)
+
+const (
+	// rawOSVImage is the concrete image the bundled "osv" registry entry resolves
+	// to. Running it by its raw reference (rather than by a registry name)
+	// produces a working MCP server whose RunConfig has no RegistryServerName,
+	// which is exactly the input the "not-registry-sourced" path needs.
+	rawOSVImage = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"
+
+	// osvImageLow and osvImageHigh are two real, pullable osv-mcp releases. The
+	// upgrade-flow fixtures advertise these tags so the checker reports an
+	// available upgrade from low to high. Both tags run identically (only a minor
+	// version bump), and osv declares no required env vars or secrets, so
+	// `upgrade apply --yes` runs non-interactively without prompting.
+	osvImageLow  = "ghcr.io/stackloklabs/osv-mcp/server:0.1.1"
+	osvImageHigh = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"
+
+	// upgradeRegistryServerName is the server name the fixtures expose (a bare
+	// "name": "osv-upgrade-test", so the recorded RegistryServerName matches it
+	// exactly). `thv run` is invoked with this name so RegistryServerName is set.
+	upgradeRegistryServerName = "osv-upgrade-test"
+)
+
+// upgradeCheckResults mirrors the JSON emitted by `thv upgrade check --format
+// json`, which is an array of upgrade.CheckResult. We decode into the real type
+// so the test breaks if the wire shape changes.
+type upgradeCheckResults []upgrade.CheckResult
+
+var _ = Describe("Upgrade Command", Label("core", "upgrade", "e2e"), func() {
+	var (
+		config     *e2e.TestConfig
+		serverName string
+
+		// tempHome / tempData isolate this spec's ToolHive config and workload
+		// state from the developer's / CI runner's real ToolHive directories.
+		// `thv config set-registry` writes to $XDG_CONFIG_HOME/toolhive, and
+		// workload state lives under $XDG_DATA_HOME/toolhive; routing every thv
+		// invocation (run, check, apply, export, set/unset-registry, and cleanup)
+		// through thvCmd keeps both off the real config and consistent across the
+		// run -> check -> apply -> export -> cleanup sequence so they all see the
+		// same workload. The container runtime (Docker/Podman socket) is not
+		// HOME-dependent, so an isolated HOME does not affect it; PATH and the rest
+		// of the environment are inherited unchanged.
+		tempHome string
+		tempData string
+
+		// thvCmd builds a THVCommand bound to the isolated config/home/data dirs.
+		thvCmd func(args ...string) *e2e.THVCommand
+	)
+
+	BeforeEach(func() {
+		config = e2e.NewTestConfig()
+		serverName = e2e.GenerateUniqueServerName("upgrade-test")
+		tempHome = GinkgoT().TempDir()
+		tempData = GinkgoT().TempDir()
+
+		thvCmd = func(args ...string) *e2e.THVCommand {
+			return e2e.NewTHVCommand(config, args...).
+				WithEnv(
+					"XDG_CONFIG_HOME="+tempHome,
+					"HOME="+tempHome,
+					"XDG_DATA_HOME="+tempData,
+				)
+		}
+
+		err := e2e.CheckTHVBinaryAvailable(config)
+		Expect(err).ToNot(HaveOccurred(), "thv binary should be available")
+	})
+
+	AfterEach(func() {
+		if config.CleanupAfter {
+			// Stop and remove the workload using the SAME isolated env so cleanup
+			// targets the workload created by this spec. Tolerate a missing
+			// workload (a spec may have failed before creating it).
+			_, _, _ = thvCmd("stop", serverName).Run()
+			_, _, _ = thvCmd("rm", serverName).Run()
+		}
+		// The registry config lives under the discarded temp dirs, so there is
+		// nothing to restore; the real ToolHive config is never touched.
+	})
+
+	// Negative / cheap path: a workload created from a raw image reference is not
+	// registry-sourced, so no upgrade can ever be determined for it. This needs
+	// no registry fixture and is the safety-net coverage.
+	Describe("Checking a workload that is not registry-sourced", func() {
+		Context("when the workload was run from a raw image reference", func() {
+			It("should report not-registry-sourced in text output", func() {
+				By("Running an MCP server from a raw image reference")
+				thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()
+
+				By("Waiting for the server to be running")
+				waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)
+
+				By("Checking the workload for an available upgrade")
+				stdout, _ := thvCmd("upgrade", "check", serverName).ExpectSuccess()
+
+				By("Verifying the report says the workload is not registry-sourced")
+				Expect(stdout).To(ContainSubstring(serverName), "Report should name the workload")
+				Expect(stdout).To(ContainSubstring(string(upgrade.StatusNotRegistrySourced)),
+					"Report should show the not-registry-sourced status")
+			})
+
+			It("should emit parseable JSON with the expected fields", func() {
+				By("Running an MCP server from a raw image reference")
+				thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()
+
+				By("Waiting for the server to be running")
+				waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)
+
+				By("Checking the workload for an available upgrade in JSON format")
+				stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
+
+				By("Verifying the JSON output is valid and contains the expected fields")
+				var results upgradeCheckResults
+				err := json.Unmarshal([]byte(stdout), &results)
+				Expect(err).ToNot(HaveOccurred(), "Output should be valid JSON")
+				Expect(results).To(HaveLen(1), "JSON should contain exactly one result")
+
+				result := results[0]
+				Expect(result.WorkloadName).To(Equal(serverName), "JSON should name the workload")
+				Expect(result.Status).To(Equal(upgrade.StatusNotRegistrySourced),
+					"JSON should show the not-registry-sourced status")
+				Expect(result.CurrentImage).To(Equal(rawOSVImage),
+					"JSON should record the raw image the workload is running")
+				Expect(result.RegistryServer).To(BeEmpty(),
+					"A raw-image workload should not have a registry server name")
+				Expect(result.CandidateImage).To(BeEmpty(),
+					"There should be no candidate image when no upgrade can be determined")
+			})
+		})
+	})
+
+	// Full upgrade flow: a registry-sourced workload whose registry advertises a
+	// newer tag is upgraded in place, and we verify it runs on the new image with
+	// its prior configuration preserved.
+	//
+	// Determinism: the two fixtures (testdata/upgrade/registry-{low,high}.json)
+	// advertise tags 0.1.1 and 0.1.3 of ghcr.io/stackloklabs/osv-mcp/server, both
+	// real, pullable releases that run identically. The fixtures are in the
+	// upstream MCP-registry format that `thv config set-registry` requires for a
+	// local file (the local provider rejects the legacy ToolHive-native format),
+	// and they mirror the bundled osv entry's transport (streamable-http) and
+	// internal target port (8080) exactly so the workload comes up the same way
+	// `thv run osv` would. osv declares no required env vars, so `upgrade apply
+	// --yes` never prompts. This runs by default in CI.
+	Describe("Applying an available upgrade to a registry-sourced workload", func() {
+		var (
+			tempDir     string
+			fixtureLow  string
+			fixtureHigh string
+		)
+
+		BeforeEach(func() {
+			tempDir = GinkgoT().TempDir()
+
+			// set-registry persists the path and resolves it later, so it must be
+			// absolute regardless of the working directory at resolution time.
+			var err error
+			fixtureLow, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-low.json"))
+			Expect(err).ToNot(HaveOccurred())
+			fixtureHigh, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-high.json"))
+			Expect(err).ToNot(HaveOccurred())
+			Expect(fixtureLow).To(BeAnExistingFile(), "low-tag registry fixture should exist")
+			Expect(fixtureHigh).To(BeAnExistingFile(), "high-tag registry fixture should exist")
+		})
+
+		It("should pull the candidate, recreate the workload, and preserve config", func() {
+			By("Pointing thv at the older-tag registry fixture")
+			thvCmd("config", "set-registry", fixtureLow).ExpectSuccess()
+
+			By("Running the server by its registry name with an environment variable")
+			thvCmd("run", "--name", serverName, "--env", "FOO=bar", upgradeRegistryServerName).ExpectSuccess()
+
+			By("Waiting for the server to be running")
+			waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)
+
+			By("Confirming the workload is registry-sourced and up to date against the older fixture")
+			stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
+			var before upgradeCheckResults
+			Expect(json.Unmarshal([]byte(stdout), &before)).To(Succeed(), "Output should be valid JSON")
+			Expect(before).To(HaveLen(1))
+			Expect(before[0].RegistryServer).To(Equal(upgradeRegistryServerName),
+				"Running by registry name should record the registry server name")
+			Expect(before[0].CurrentImage).To(Equal(osvImageLow),
+				"The workload should be running the lower-tag image")
+			Expect(before[0].Status).To(Equal(upgrade.StatusUpToDate),
+				"No upgrade should be available against the older-tag fixture")
+
+			By("Repointing thv at the registry fixture advertising the newer tag")
+			thvCmd("config", "set-registry", fixtureHigh).ExpectSuccess()
+
+			By("Checking that an upgrade is now available")
+			stdout, _ = thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
+			var avail upgradeCheckResults
+			Expect(json.Unmarshal([]byte(stdout), &avail)).To(Succeed(), "Output should be valid JSON")
+			Expect(avail).To(HaveLen(1))
+			Expect(avail[0].Status).To(Equal(upgrade.StatusUpgradeAvailable),
+				"An upgrade should be available after repointing to the newer tag")
+			Expect(avail[0].CandidateImage).To(Equal(osvImageHigh),
+				"The candidate image should carry the newer tag")
+
+			By("Applying the upgrade non-interactively")
+			thvCmd("upgrade", "apply", serverName, "--yes").ExpectSuccess()
+
+			By("Waiting for the upgraded workload to be running again")
+			waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)
+
+			By("Verifying the recorded image carries the newer tag and config is preserved")
+			exportPath := filepath.Join(tempDir, "upgraded-export.json")
+			thvCmd("export", serverName, exportPath).ExpectSuccess()
+
+			fileContent, err := os.ReadFile(exportPath)
+			Expect(err).ToNot(HaveOccurred())
+
+			var runConfig runner.RunConfig
+			Expect(json.Unmarshal(fileContent, &runConfig)).To(Succeed(), "Export should be valid JSON")
+			Expect(runConfig.Image).To(Equal(osvImageHigh),
+				"The upgraded workload should record the newer image tag")
+			Expect(runConfig.EnvVars).To(HaveKeyWithValue("FOO", "bar"),
+				"The upgrade should preserve the workload's environment variables")
+		})
+	})
+})
+
+// waitForIsolatedMCPServer polls `thv list` (through the supplied isolated-env
+// command builder) until the named workload reports running, or fails the spec
+// on timeout. It mirrors e2e.WaitForMCPServer but runs every poll under the same
+// isolated config/home/data env as the rest of the spec, so it observes the
+// workload created in that isolated state rather than the real ToolHive config.
+func waitForIsolatedMCPServer(thvCmd func(args ...string) *e2e.THVCommand, serverName string, timeout time.Duration) {
+	GinkgoHelper()
+	Eventually(func() bool {
+		stdout, _, err := thvCmd("list").Run()
+		if err != nil {
+			return false
+		}
+		return strings.Contains(stdout, serverName) && strings.Contains(stdout, "running")
+	}, timeout, 1*time.Second).Should(BeTrue(),
+		"workload %q should be running within %s", serverName, timeout)
+}