diff --git a/docs/arch/08-workloads-lifecycle.md b/docs/arch/08-workloads-lifecycle.md index 01daf676e2..0bef4ce759 100644 --- a/docs/arch/08-workloads-lifecycle.md +++ b/docs/arch/08-workloads-lifecycle.md @@ -96,6 +96,32 @@ thv rm my-server **Implementation**: `pkg/workloads/manager.go` +### Upgrade + +```bash +thv upgrade check [my-server] # offline metadata comparison, never pulls +thv upgrade apply my-server --yes # verify + pull, then recreate +``` + +Upgrades apply only to **registry-sourced** workloads — those run by a registry entry name, which records `RunConfig.RegistryServerName`. A workload run from a raw image reference has no registry server name and is reported as `not-registry-sourced`. + +**Check** is an offline comparison. The checker (`pkg/workloads/upgrade.Checker`) looks up the workload's `RegistryServerName` in the configured registry and compares the running image tag against the candidate the registry advertises (semver, conservative: a `latest` tag, a different repository, or non-comparable tags yield `unknown`). When the registry advertises a strictly newer tag the status is `upgrade-available`, and the result also surfaces **env-var drift** (variables the candidate now declares that the workload does not yet supply) and **posture drift** (transport, permission profile — network isolation is a local-only choice the registry cannot express, so it is not reported as drift). Checks never pull images. + +**Apply** is the single security-critical path (`pkg/workloads/upgrade.Applier`). It re-derives the check against the registry on every apply (closing the time-of-check/time-of-use window — a `CheckResult` from an earlier `check` is never trusted), then, in order: + +1. Resolves and **verifies** the candidate image's provenance (by registry server name). +2. Builds a merged `RunConfig` that **preserves the entire user configuration** — env vars, secrets, OIDC/authz/audit/telemetry, tool filters, middleware, transport/posture — and changes only the image, any merged env/secrets supplied via `--env`/`--secret`, and the registry source URLs. +3. Runs the policy gate and performs the verified **pull**. +4. Only then asks the manager to recreate the workload via `UpdateWorkload` (stop → delete → start with the new config). + +Steps 1–3 all complete before any destruction, so a failure while preparing the candidate leaves the running workload untouched. **There is no automatic rollback**: once recreation begins, the previous image/config is not restored — recovery is a forward operation. Posture drift (transport, permission profile) is **surfaced as a warning, not converged**: the upgraded workload keeps its full existing posture, including transport, network isolation, and permission profile. + +> Runtime boundary: the "verified pull before destruction" guarantee holds precisely for local container runtimes (the scope here). On Kubernetes the verification and policy gate still precede recreation, but the byte-level pull is delegated to the kubelet and happens after recreation. + +The same `Applier` backs both the CLI (`thv upgrade apply`) and the API (`POST /api/v1beta/workloads/{name}/upgrade`, with `GET .../{name}/upgrade-check` and `GET .../upgrade-check` for single and bulk checks), so the verify-then-pull ordering lives in exactly one place. `thv list --check-upgrades` annotates the workload list with each workload's check result. + +**Implementation**: `pkg/workloads/upgrade/`, `cmd/thv/app/upgrade.go` + ### List Listing combines container workloads from the runtime with remote workloads from persisted state. The manager can filter workloads by label or group, and can optionally include stopped workloads. diff --git a/test/e2e/testdata/upgrade/registry-high.json b/test/e2e/testdata/upgrade/registry-high.json new file mode 100644 index 0000000000..11f367aea1 --- /dev/null +++ b/test/e2e/testdata/upgrade/registry-high.json @@ -0,0 +1,31 @@ +{ + "version": "1.0.0", + "meta": { + "last_updated": "2026-01-15T10:00:00Z" + }, + "data": { + "servers": [ + { + "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json", + "name": "osv-upgrade-test", + "title": "OSV (upgrade e2e fixture)", + "description": "OSV MCP server fixture for upgrade e2e tests (high tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the higher tag advertised as the upgrade candidate. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-low.json.", + "version": "1.0.0", + "repository": { + "url": "https://github.com/StacklokLabs/osv-mcp", + "source": "github" + }, + "packages": [ + { + "identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.3", + "registryType": "oci", + "transport": { + "type": "streamable-http", + "url": "http://localhost:8080" + } + } + ] + } + ] + } +} diff --git a/test/e2e/testdata/upgrade/registry-low.json b/test/e2e/testdata/upgrade/registry-low.json new file mode 100644 index 0000000000..79aa66dcc4 --- /dev/null +++ b/test/e2e/testdata/upgrade/registry-low.json @@ -0,0 +1,31 @@ +{ + "version": "1.0.0", + "meta": { + "last_updated": "2026-01-15T10:00:00Z" + }, + "data": { + "servers": [ + { + "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json", + "name": "osv-upgrade-test", + "title": "OSV (upgrade e2e fixture)", + "description": "OSV MCP server fixture for upgrade e2e tests (low tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the lower tag so the upgrade checker reports an available upgrade once repointed to registry-high.json. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-high.json.", + "version": "1.0.0", + "repository": { + "url": "https://github.com/StacklokLabs/osv-mcp", + "source": "github" + }, + "packages": [ + { + "identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.1", + "registryType": "oci", + "transport": { + "type": "streamable-http", + "url": "http://localhost:8080" + } + } + ] + } + ] + } +} diff --git a/test/e2e/upgrade_e2e_test.go b/test/e2e/upgrade_e2e_test.go new file mode 100644 index 0000000000..5e6e4eb42a --- /dev/null +++ b/test/e2e/upgrade_e2e_test.go @@ -0,0 +1,258 @@ +// SPDX-FileCopyrightText: Copyright 2026 Stacklok, Inc. +// SPDX-License-Identifier: Apache-2.0 + +package e2e_test + +import ( + "encoding/json" + "os" + "path/filepath" + "strings" + "time" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" + + "github.com/stacklok/toolhive/pkg/runner" + "github.com/stacklok/toolhive/pkg/workloads/upgrade" + "github.com/stacklok/toolhive/test/e2e" +) + +const ( + // rawOSVImage is the concrete image the bundled "osv" registry entry resolves + // to. Running it by its raw reference (rather than by a registry name) + // produces a working MCP server whose RunConfig has no RegistryServerName, + // which is exactly the input the "not-registry-sourced" path needs. + rawOSVImage = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3" + + // osvImageLow and osvImageHigh are two real, pullable osv-mcp releases. The + // upgrade-flow fixtures advertise these tags so the checker reports an + // available upgrade from low to high. Both tags run identically (only a minor + // version bump), and osv declares no required env vars or secrets, so + // `upgrade apply --yes` runs non-interactively without prompting. + osvImageLow = "ghcr.io/stackloklabs/osv-mcp/server:0.1.1" + osvImageHigh = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3" + + // upgradeRegistryServerName is the server name the fixtures expose (a bare + // "name": "osv-upgrade-test", so the recorded RegistryServerName matches it + // exactly). `thv run` is invoked with this name so RegistryServerName is set. + upgradeRegistryServerName = "osv-upgrade-test" +) + +// upgradeCheckResults mirrors the JSON emitted by `thv upgrade check --format +// json`, which is an array of upgrade.CheckResult. We decode into the real type +// so the test breaks if the wire shape changes. +type upgradeCheckResults []upgrade.CheckResult + +var _ = Describe("Upgrade Command", Label("core", "upgrade", "e2e"), func() { + var ( + config *e2e.TestConfig + serverName string + + // tempHome / tempData isolate this spec's ToolHive config and workload + // state from the developer's / CI runner's real ToolHive directories. + // `thv config set-registry` writes to $XDG_CONFIG_HOME/toolhive, and + // workload state lives under $XDG_DATA_HOME/toolhive; routing every thv + // invocation (run, check, apply, export, set/unset-registry, and cleanup) + // through thvCmd keeps both off the real config and consistent across the + // run -> check -> apply -> export -> cleanup sequence so they all see the + // same workload. The container runtime (Docker/Podman socket) is not + // HOME-dependent, so an isolated HOME does not affect it; PATH and the rest + // of the environment are inherited unchanged. + tempHome string + tempData string + + // thvCmd builds a THVCommand bound to the isolated config/home/data dirs. + thvCmd func(args ...string) *e2e.THVCommand + ) + + BeforeEach(func() { + config = e2e.NewTestConfig() + serverName = e2e.GenerateUniqueServerName("upgrade-test") + tempHome = GinkgoT().TempDir() + tempData = GinkgoT().TempDir() + + thvCmd = func(args ...string) *e2e.THVCommand { + return e2e.NewTHVCommand(config, args...). + WithEnv( + "XDG_CONFIG_HOME="+tempHome, + "HOME="+tempHome, + "XDG_DATA_HOME="+tempData, + ) + } + + err := e2e.CheckTHVBinaryAvailable(config) + Expect(err).ToNot(HaveOccurred(), "thv binary should be available") + }) + + AfterEach(func() { + if config.CleanupAfter { + // Stop and remove the workload using the SAME isolated env so cleanup + // targets the workload created by this spec. Tolerate a missing + // workload (a spec may have failed before creating it). + _, _, _ = thvCmd("stop", serverName).Run() + _, _, _ = thvCmd("rm", serverName).Run() + } + // The registry config lives under the discarded temp dirs, so there is + // nothing to restore; the real ToolHive config is never touched. + }) + + // Negative / cheap path: a workload created from a raw image reference is not + // registry-sourced, so no upgrade can ever be determined for it. This needs + // no registry fixture and is the safety-net coverage. + Describe("Checking a workload that is not registry-sourced", func() { + Context("when the workload was run from a raw image reference", func() { + It("should report not-registry-sourced in text output", func() { + By("Running an MCP server from a raw image reference") + thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess() + + By("Waiting for the server to be running") + waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second) + + By("Checking the workload for an available upgrade") + stdout, _ := thvCmd("upgrade", "check", serverName).ExpectSuccess() + + By("Verifying the report says the workload is not registry-sourced") + Expect(stdout).To(ContainSubstring(serverName), "Report should name the workload") + Expect(stdout).To(ContainSubstring(string(upgrade.StatusNotRegistrySourced)), + "Report should show the not-registry-sourced status") + }) + + It("should emit parseable JSON with the expected fields", func() { + By("Running an MCP server from a raw image reference") + thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess() + + By("Waiting for the server to be running") + waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second) + + By("Checking the workload for an available upgrade in JSON format") + stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess() + + By("Verifying the JSON output is valid and contains the expected fields") + var results upgradeCheckResults + err := json.Unmarshal([]byte(stdout), &results) + Expect(err).ToNot(HaveOccurred(), "Output should be valid JSON") + Expect(results).To(HaveLen(1), "JSON should contain exactly one result") + + result := results[0] + Expect(result.WorkloadName).To(Equal(serverName), "JSON should name the workload") + Expect(result.Status).To(Equal(upgrade.StatusNotRegistrySourced), + "JSON should show the not-registry-sourced status") + Expect(result.CurrentImage).To(Equal(rawOSVImage), + "JSON should record the raw image the workload is running") + Expect(result.RegistryServer).To(BeEmpty(), + "A raw-image workload should not have a registry server name") + Expect(result.CandidateImage).To(BeEmpty(), + "There should be no candidate image when no upgrade can be determined") + }) + }) + }) + + // Full upgrade flow: a registry-sourced workload whose registry advertises a + // newer tag is upgraded in place, and we verify it runs on the new image with + // its prior configuration preserved. + // + // Determinism: the two fixtures (testdata/upgrade/registry-{low,high}.json) + // advertise tags 0.1.1 and 0.1.3 of ghcr.io/stackloklabs/osv-mcp/server, both + // real, pullable releases that run identically. The fixtures are in the + // upstream MCP-registry format that `thv config set-registry` requires for a + // local file (the local provider rejects the legacy ToolHive-native format), + // and they mirror the bundled osv entry's transport (streamable-http) and + // internal target port (8080) exactly so the workload comes up the same way + // `thv run osv` would. osv declares no required env vars, so `upgrade apply + // --yes` never prompts. This runs by default in CI. + Describe("Applying an available upgrade to a registry-sourced workload", func() { + var ( + tempDir string + fixtureLow string + fixtureHigh string + ) + + BeforeEach(func() { + tempDir = GinkgoT().TempDir() + + // set-registry persists the path and resolves it later, so it must be + // absolute regardless of the working directory at resolution time. + var err error + fixtureLow, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-low.json")) + Expect(err).ToNot(HaveOccurred()) + fixtureHigh, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-high.json")) + Expect(err).ToNot(HaveOccurred()) + Expect(fixtureLow).To(BeAnExistingFile(), "low-tag registry fixture should exist") + Expect(fixtureHigh).To(BeAnExistingFile(), "high-tag registry fixture should exist") + }) + + It("should pull the candidate, recreate the workload, and preserve config", func() { + By("Pointing thv at the older-tag registry fixture") + thvCmd("config", "set-registry", fixtureLow).ExpectSuccess() + + By("Running the server by its registry name with an environment variable") + thvCmd("run", "--name", serverName, "--env", "FOO=bar", upgradeRegistryServerName).ExpectSuccess() + + By("Waiting for the server to be running") + waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second) + + By("Confirming the workload is registry-sourced and up to date against the older fixture") + stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess() + var before upgradeCheckResults + Expect(json.Unmarshal([]byte(stdout), &before)).To(Succeed(), "Output should be valid JSON") + Expect(before).To(HaveLen(1)) + Expect(before[0].RegistryServer).To(Equal(upgradeRegistryServerName), + "Running by registry name should record the registry server name") + Expect(before[0].CurrentImage).To(Equal(osvImageLow), + "The workload should be running the lower-tag image") + Expect(before[0].Status).To(Equal(upgrade.StatusUpToDate), + "No upgrade should be available against the older-tag fixture") + + By("Repointing thv at the registry fixture advertising the newer tag") + thvCmd("config", "set-registry", fixtureHigh).ExpectSuccess() + + By("Checking that an upgrade is now available") + stdout, _ = thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess() + var avail upgradeCheckResults + Expect(json.Unmarshal([]byte(stdout), &avail)).To(Succeed(), "Output should be valid JSON") + Expect(avail).To(HaveLen(1)) + Expect(avail[0].Status).To(Equal(upgrade.StatusUpgradeAvailable), + "An upgrade should be available after repointing to the newer tag") + Expect(avail[0].CandidateImage).To(Equal(osvImageHigh), + "The candidate image should carry the newer tag") + + By("Applying the upgrade non-interactively") + thvCmd("upgrade", "apply", serverName, "--yes").ExpectSuccess() + + By("Waiting for the upgraded workload to be running again") + waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second) + + By("Verifying the recorded image carries the newer tag and config is preserved") + exportPath := filepath.Join(tempDir, "upgraded-export.json") + thvCmd("export", serverName, exportPath).ExpectSuccess() + + fileContent, err := os.ReadFile(exportPath) + Expect(err).ToNot(HaveOccurred()) + + var runConfig runner.RunConfig + Expect(json.Unmarshal(fileContent, &runConfig)).To(Succeed(), "Export should be valid JSON") + Expect(runConfig.Image).To(Equal(osvImageHigh), + "The upgraded workload should record the newer image tag") + Expect(runConfig.EnvVars).To(HaveKeyWithValue("FOO", "bar"), + "The upgrade should preserve the workload's environment variables") + }) + }) +}) + +// waitForIsolatedMCPServer polls `thv list` (through the supplied isolated-env +// command builder) until the named workload reports running, or fails the spec +// on timeout. It mirrors e2e.WaitForMCPServer but runs every poll under the same +// isolated config/home/data env as the rest of the spec, so it observes the +// workload created in that isolated state rather than the real ToolHive config. +func waitForIsolatedMCPServer(thvCmd func(args ...string) *e2e.THVCommand, serverName string, timeout time.Duration) { + GinkgoHelper() + Eventually(func() bool { + stdout, _, err := thvCmd("list").Run() + if err != nil { + return false + } + return strings.Contains(stdout, serverName) && strings.Contains(stdout, "running") + }, timeout, 1*time.Second).Should(BeTrue(), + "workload %q should be running within %s", serverName, timeout) +}