Add e2e coverage and lifecycle docs for upgrades (#5412)

JAORMX · claude · web-flow · commit 12fcd15769bd · 2026-06-02T18:56:41.000+03:00
* Add Applier for upgrading workloads in place

Detecting an available upgrade is only useful if users can apply it
while keeping their configuration. Add the apply path that the CLI and
API will drive.

Add upgrade.Applier: it reloads the workload's saved config, re-runs the
check on fresh state (so a stale result can never drive an apply),
resolves the candidate from the registry, and rebuilds the run config
preserving the full user configuration — auth, authz, audit, telemetry,
tools filters, volumes, secrets, ports, permission profile, and more —
changing only the image, merged env/secrets, and re-resolved registry
URLs. New required env vars surface through the injected validator.

Crucially, the candidate image is verified and pulled (and the policy
gate runs) before the destructive stop/delete/start, so a missing or
unverifiable image leaves the running workload untouched — there is no
rollback once UpdateWorkload begins. Verification uses the same path as
thv run.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;

* Add upgrade apply for the CLI and API

With the Applier in place, expose it to users. This lets CLI users and
API clients apply an upgrade while preserving their configuration,
instead of manually re-running a workload with a new image.

Add a thv upgrade apply &lt;name&gt; command. It runs the check, shows the
candidate image, new env vars, and any permission/transport/network
posture drift, then prompts for confirmation. --dry-run reports the plan
without applying; --env/--secret supply values for newly required
variables; --yes (or a non-interactive shell) skips the prompt and fails
loudly on missing required values; --image-verification mirrors thv run.

Add POST /api/v1beta/workloads/{name}/upgrade, delegating to the same
Applier so all clients share one apply path. The API path is always
non-interactive (detached validator) and sources image verification from
server config; the request body can only supply env/secret values, never
redirect the image or weaken verification. Apply failures return a
sanitized 422 with the detailed cause logged server-side, so secret
references in an error chain are never echoed to the caller.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;

* Add e2e coverage and lifecycle docs for upgrades

The upgrade flow spans registry lookup, image verification, and a
destructive workload recreate, so it needs end-to-end coverage to catch
regressions the unit tests cannot.

Add an e2e test that exercises the full flow against real osv-mcp image
tags via custom registry fixtures: a workload run from the 0.1.1 entry
reports up-to-date, repointing the registry to 0.1.3 makes the check
report an available upgrade, and apply recreates the workload on the new
image while preserving a user-set env var. A negative spec confirms a
raw-image workload reports not-registry-sourced. All thv invocations run
under an isolated config/home/data dir so the suite never touches the
developer's real registry configuration.

Document the upgrade state transition in the workloads lifecycle
architecture doc.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/arch/08-workloads-lifecycle.md b/docs/arch/08-workloads-lifecycle.md
@@ -96,6 +96,32 @@ thv rm my-server
 
 **Implementation**: `pkg/workloads/manager.go`
 
+### Upgrade
+
+```bash
+thv upgrade check [my-server]      # offline metadata comparison, never pulls
+thv upgrade apply my-server --yes  # verify + pull, then recreate
+```
+
+Upgrades apply only to **registry-sourced** workloads — those run by a registry entry name, which records `RunConfig.RegistryServerName`. A workload run from a raw image reference has no registry server name and is reported as `not-registry-sourced`.
+
+**Check** is an offline comparison. The checker (`pkg/workloads/upgrade.Checker`) looks up the workload's `RegistryServerName` in the configured registry and compares the running image tag against the candidate the registry advertises (semver, conservative: a `latest` tag, a different repository, or non-comparable tags yield `unknown`). When the registry advertises a strictly newer tag the status is `upgrade-available`, and the result also surfaces **env-var drift** (variables the candidate now declares that the workload does not yet supply) and **posture drift** (transport, permission profile — network isolation is a local-only choice the registry cannot express, so it is not reported as drift). Checks never pull images.
+
+**Apply** is the single security-critical path (`pkg/workloads/upgrade.Applier`). It re-derives the check against the registry on every apply (closing the time-of-check/time-of-use window — a `CheckResult` from an earlier `check` is never trusted), then, in order:
+
+1. Resolves and **verifies** the candidate image's provenance (by registry server name).
+2. Builds a merged `RunConfig` that **preserves the entire user configuration** — env vars, secrets, OIDC/authz/audit/telemetry, tool filters, middleware, transport/posture — and changes only the image, any merged env/secrets supplied via `--env`/`--secret`, and the registry source URLs.
+3. Runs the policy gate and performs the verified **pull**.
+4. Only then asks the manager to recreate the workload via `UpdateWorkload` (stop → delete → start with the new config).
+
+Steps 1–3 all complete before any destruction, so a failure while preparing the candidate leaves the running workload untouched. **There is no automatic rollback**: once recreation begins, the previous image/config is not restored — recovery is a forward operation. Posture drift (transport, permission profile) is **surfaced as a warning, not converged**: the upgraded workload keeps its full existing posture, including transport, network isolation, and permission profile.
+
+> Runtime boundary: the "verified pull before destruction" guarantee holds precisely for local container runtimes (the scope here). On Kubernetes the verification and policy gate still precede recreation, but the byte-level pull is delegated to the kubelet and happens after recreation.
+
+The same `Applier` backs both the CLI (`thv upgrade apply`) and the API (`POST /api/v1beta/workloads/{name}/upgrade`, with `GET .../{name}/upgrade-check` and `GET .../upgrade-check` for single and bulk checks), so the verify-then-pull ordering lives in exactly one place. `thv list --check-upgrades` annotates the workload list with each workload's check result.
+
+**Implementation**: `pkg/workloads/upgrade/`, `cmd/thv/app/upgrade.go`
+
 ### List
 
 Listing combines container workloads from the runtime with remote workloads from persisted state. The manager can filter workloads by label or group, and can optionally include stopped workloads.
diff --git a/test/e2e/testdata/upgrade/registry-high.json b/test/e2e/testdata/upgrade/registry-high.json
@@ -0,0 +1,31 @@
+{
+  "version": "1.0.0",
+  "meta": {
+    "last_updated": "2026-01-15T10:00:00Z"
+  },
+  "data": {
+    "servers": [
+      {
+        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
+        "name": "osv-upgrade-test",
+        "title": "OSV (upgrade e2e fixture)",
+        "description": "OSV MCP server fixture for upgrade e2e tests (high tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the higher tag advertised as the upgrade candidate. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-low.json.",
+        "version": "1.0.0",
+        "repository": {
+          "url": "https://github.com/StacklokLabs/osv-mcp",
+          "source": "github"
+        },
+        "packages": [
+          {
+            "identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.3",
+            "registryType": "oci",
+            "transport": {
+              "type": "streamable-http",
+              "url": "http://localhost:8080"
+            }
+          }
+        ]
+      }
+    ]
+  }
+}
diff --git a/test/e2e/testdata/upgrade/registry-low.json b/test/e2e/testdata/upgrade/registry-low.json
@@ -0,0 +1,31 @@
+{
+  "version": "1.0.0",
+  "meta": {
+    "last_updated": "2026-01-15T10:00:00Z"
+  },
+  "data": {
+    "servers": [
+      {
+        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
+        "name": "osv-upgrade-test",
+        "title": "OSV (upgrade e2e fixture)",
+        "description": "OSV MCP server fixture for upgrade e2e tests (low tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the lower tag so the upgrade checker reports an available upgrade once repointed to registry-high.json. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-high.json.",
+        "version": "1.0.0",
+        "repository": {
+          "url": "https://github.com/StacklokLabs/osv-mcp",
+          "source": "github"
+        },
+        "packages": [
+          {
+            "identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.1",
+            "registryType": "oci",
+            "transport": {
+              "type": "streamable-http",
+              "url": "http://localhost:8080"
+            }
+          }
+        ]
+      }
+    ]
+  }
+}
diff --git a/test/e2e/upgrade_e2e_test.go b/test/e2e/upgrade_e2e_test.go
@@ -0,0 +1,258 @@
+// SPDX-FileCopyrightText: Copyright 2026 Stacklok, Inc.
+// SPDX-License-Identifier: Apache-2.0
+
+package e2e_test
+
+import (
+	"encoding/json"
+	"os"
+	"path/filepath"
+	"strings"
+	"time"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/stacklok/toolhive/pkg/runner"
+	"github.com/stacklok/toolhive/pkg/workloads/upgrade"
+	"github.com/stacklok/toolhive/test/e2e"
+)
+
+const (
+	// rawOSVImage is the concrete image the bundled "osv" registry entry resolves
+	// to. Running it by its raw reference (rather than by a registry name)
+	// produces a working MCP server whose RunConfig has no RegistryServerName,
+	// which is exactly the input the "not-registry-sourced" path needs.
+	rawOSVImage = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"
+
+	// osvImageLow and osvImageHigh are two real, pullable osv-mcp releases. The
+	// upgrade-flow fixtures advertise these tags so the checker reports an
+	// available upgrade from low to high. Both tags run identically (only a minor
+	// version bump), and osv declares no required env vars or secrets, so
+	// `upgrade apply --yes` runs non-interactively without prompting.
+	osvImageLow  = "ghcr.io/stackloklabs/osv-mcp/server:0.1.1"
+	osvImageHigh = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"
+
+	// upgradeRegistryServerName is the server name the fixtures expose (a bare
+	// "name": "osv-upgrade-test", so the recorded RegistryServerName matches it
+	// exactly). `thv run` is invoked with this name so RegistryServerName is set.
+	upgradeRegistryServerName = "osv-upgrade-test"
+)
+
+// upgradeCheckResults mirrors the JSON emitted by `thv upgrade check --format
+// json`, which is an array of upgrade.CheckResult. We decode into the real type
+// so the test breaks if the wire shape changes.
+type upgradeCheckResults []upgrade.CheckResult
+
+var _ = Describe("Upgrade Command", Label("core", "upgrade", "e2e"), func() {
+	var (
+		config     *e2e.TestConfig
+		serverName string
+
+		// tempHome / tempData isolate this spec's ToolHive config and workload
+		// state from the developer's / CI runner's real ToolHive directories.
+		// `thv config set-registry` writes to $XDG_CONFIG_HOME/toolhive, and
+		// workload state lives under $XDG_DATA_HOME/toolhive; routing every thv
+		// invocation (run, check, apply, export, set/unset-registry, and cleanup)
+		// through thvCmd keeps both off the real config and consistent across the
+		// run -> check -> apply -> export -> cleanup sequence so they all see the
+		// same workload. The container runtime (Docker/Podman socket) is not
+		// HOME-dependent, so an isolated HOME does not affect it; PATH and the rest
+		// of the environment are inherited unchanged.
+		tempHome string
+		tempData string
+
+		// thvCmd builds a THVCommand bound to the isolated config/home/data dirs.
+		thvCmd func(args ...string) *e2e.THVCommand
+	)
+
+	BeforeEach(func() {
+		config = e2e.NewTestConfig()
+		serverName = e2e.GenerateUniqueServerName("upgrade-test")
+		tempHome = GinkgoT().TempDir()
+		tempData = GinkgoT().TempDir()
+
+		thvCmd = func(args ...string) *e2e.THVCommand {
+			return e2e.NewTHVCommand(config, args...).
+				WithEnv(
+					"XDG_CONFIG_HOME="+tempHome,
+					"HOME="+tempHome,
+					"XDG_DATA_HOME="+tempData,
+				)
+		}
+
+		err := e2e.CheckTHVBinaryAvailable(config)
+		Expect(err).ToNot(HaveOccurred(), "thv binary should be available")
+	})
+
+	AfterEach(func() {
+		if config.CleanupAfter {
+			// Stop and remove the workload using the SAME isolated env so cleanup
+			// targets the workload created by this spec. Tolerate a missing
+			// workload (a spec may have failed before creating it).
+			_, _, _ = thvCmd("stop", serverName).Run()
+			_, _, _ = thvCmd("rm", serverName).Run()
+		}
+		// The registry config lives under the discarded temp dirs, so there is
+		// nothing to restore; the real ToolHive config is never touched.
+	})
+
+	// Negative / cheap path: a workload created from a raw image reference is not
+	// registry-sourced, so no upgrade can ever be determined for it. This needs
+	// no registry fixture and is the safety-net coverage.
+	Describe("Checking a workload that is not registry-sourced", func() {
+		Context("when the workload was run from a raw image reference", func() {
+			It("should report not-registry-sourced in text output", func() {
+				By("Running an MCP server from a raw image reference")
+				thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()
+
+				By("Waiting for the server to be running")
+				waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)
+
+				By("Checking the workload for an available upgrade")
+				stdout, _ := thvCmd("upgrade", "check", serverName).ExpectSuccess()
+
+				By("Verifying the report says the workload is not registry-sourced")
+				Expect(stdout).To(ContainSubstring(serverName), "Report should name the workload")
+				Expect(stdout).To(ContainSubstring(string(upgrade.StatusNotRegistrySourced)),
+					"Report should show the not-registry-sourced status")
+			})
+
+			It("should emit parseable JSON with the expected fields", func() {
+				By("Running an MCP server from a raw image reference")
+				thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()
+
+				By("Waiting for the server to be running")
+				waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)
+
+				By("Checking the workload for an available upgrade in JSON format")
+				stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
+
+				By("Verifying the JSON output is valid and contains the expected fields")
+				var results upgradeCheckResults
+				err := json.Unmarshal([]byte(stdout), &results)
+				Expect(err).ToNot(HaveOccurred(), "Output should be valid JSON")
+				Expect(results).To(HaveLen(1), "JSON should contain exactly one result")
+
+				result := results[0]
+				Expect(result.WorkloadName).To(Equal(serverName), "JSON should name the workload")
+				Expect(result.Status).To(Equal(upgrade.StatusNotRegistrySourced),
+					"JSON should show the not-registry-sourced status")
+				Expect(result.CurrentImage).To(Equal(rawOSVImage),
+					"JSON should record the raw image the workload is running")
+				Expect(result.RegistryServer).To(BeEmpty(),
+					"A raw-image workload should not have a registry server name")
+				Expect(result.CandidateImage).To(BeEmpty(),
+					"There should be no candidate image when no upgrade can be determined")
+			})
+		})
+	})
+
+	// Full upgrade flow: a registry-sourced workload whose registry advertises a
+	// newer tag is upgraded in place, and we verify it runs on the new image with
+	// its prior configuration preserved.
+	//
+	// Determinism: the two fixtures (testdata/upgrade/registry-{low,high}.json)
+	// advertise tags 0.1.1 and 0.1.3 of ghcr.io/stackloklabs/osv-mcp/server, both
+	// real, pullable releases that run identically. The fixtures are in the
+	// upstream MCP-registry format that `thv config set-registry` requires for a
+	// local file (the local provider rejects the legacy ToolHive-native format),
+	// and they mirror the bundled osv entry's transport (streamable-http) and
+	// internal target port (8080) exactly so the workload comes up the same way
+	// `thv run osv` would. osv declares no required env vars, so `upgrade apply
+	// --yes` never prompts. This runs by default in CI.
+	Describe("Applying an available upgrade to a registry-sourced workload", func() {
+		var (
+			tempDir     string
+			fixtureLow  string
+			fixtureHigh string
+		)
+
+		BeforeEach(func() {
+			tempDir = GinkgoT().TempDir()
+
+			// set-registry persists the path and resolves it later, so it must be
+			// absolute regardless of the working directory at resolution time.
+			var err error
+			fixtureLow, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-low.json"))
+			Expect(err).ToNot(HaveOccurred())
+			fixtureHigh, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-high.json"))
+			Expect(err).ToNot(HaveOccurred())
+			Expect(fixtureLow).To(BeAnExistingFile(), "low-tag registry fixture should exist")
+			Expect(fixtureHigh).To(BeAnExistingFile(), "high-tag registry fixture should exist")
+		})
+
+		It("should pull the candidate, recreate the workload, and preserve config", func() {
+			By("Pointing thv at the older-tag registry fixture")
+			thvCmd("config", "set-registry", fixtureLow).ExpectSuccess()
+
+			By("Running the server by its registry name with an environment variable")
+			thvCmd("run", "--name", serverName, "--env", "FOO=bar", upgradeRegistryServerName).ExpectSuccess()
+
+			By("Waiting for the server to be running")
+			waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)
+
+			By("Confirming the workload is registry-sourced and up to date against the older fixture")
+			stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
+			var before upgradeCheckResults
+			Expect(json.Unmarshal([]byte(stdout), &before)).To(Succeed(), "Output should be valid JSON")
+			Expect(before).To(HaveLen(1))
+			Expect(before[0].RegistryServer).To(Equal(upgradeRegistryServerName),
+				"Running by registry name should record the registry server name")
+			Expect(before[0].CurrentImage).To(Equal(osvImageLow),
+				"The workload should be running the lower-tag image")
+			Expect(before[0].Status).To(Equal(upgrade.StatusUpToDate),
+				"No upgrade should be available against the older-tag fixture")
+
+			By("Repointing thv at the registry fixture advertising the newer tag")
+			thvCmd("config", "set-registry", fixtureHigh).ExpectSuccess()
+
+			By("Checking that an upgrade is now available")
+			stdout, _ = thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
+			var avail upgradeCheckResults
+			Expect(json.Unmarshal([]byte(stdout), &avail)).To(Succeed(), "Output should be valid JSON")
+			Expect(avail).To(HaveLen(1))
+			Expect(avail[0].Status).To(Equal(upgrade.StatusUpgradeAvailable),
+				"An upgrade should be available after repointing to the newer tag")
+			Expect(avail[0].CandidateImage).To(Equal(osvImageHigh),
+				"The candidate image should carry the newer tag")
+
+			By("Applying the upgrade non-interactively")
+			thvCmd("upgrade", "apply", serverName, "--yes").ExpectSuccess()
+
+			By("Waiting for the upgraded workload to be running again")
+			waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)
+
+			By("Verifying the recorded image carries the newer tag and config is preserved")
+			exportPath := filepath.Join(tempDir, "upgraded-export.json")
+			thvCmd("export", serverName, exportPath).ExpectSuccess()
+
+			fileContent, err := os.ReadFile(exportPath)
+			Expect(err).ToNot(HaveOccurred())
+
+			var runConfig runner.RunConfig
+			Expect(json.Unmarshal(fileContent, &runConfig)).To(Succeed(), "Export should be valid JSON")
+			Expect(runConfig.Image).To(Equal(osvImageHigh),
+				"The upgraded workload should record the newer image tag")
+			Expect(runConfig.EnvVars).To(HaveKeyWithValue("FOO", "bar"),
+				"The upgrade should preserve the workload's environment variables")
+		})
+	})
+})
+
+// waitForIsolatedMCPServer polls `thv list` (through the supplied isolated-env
+// command builder) until the named workload reports running, or fails the spec
+// on timeout. It mirrors e2e.WaitForMCPServer but runs every poll under the same
+// isolated config/home/data env as the rest of the spec, so it observes the
+// workload created in that isolated state rather than the real ToolHive config.
+func waitForIsolatedMCPServer(thvCmd func(args ...string) *e2e.THVCommand, serverName string, timeout time.Duration) {
+	GinkgoHelper()
+	Eventually(func() bool {
+		stdout, _, err := thvCmd("list").Run()
+		if err != nil {
+			return false
+		}
+		return strings.Contains(stdout, serverName) && strings.Contains(stdout, "running")
+	}, timeout, 1*time.Second).Should(BeTrue(),
+		"workload %q should be running within %s", serverName, timeout)
+}