Skip to content

Commit 12fcd15

Browse files
JAORMXclaude
andauthored
Add e2e coverage and lifecycle docs for upgrades (#5412)
* Add Applier for upgrading workloads in place Detecting an available upgrade is only useful if users can apply it while keeping their configuration. Add the apply path that the CLI and API will drive. Add upgrade.Applier: it reloads the workload's saved config, re-runs the check on fresh state (so a stale result can never drive an apply), resolves the candidate from the registry, and rebuilds the run config preserving the full user configuration — auth, authz, audit, telemetry, tools filters, volumes, secrets, ports, permission profile, and more — changing only the image, merged env/secrets, and re-resolved registry URLs. New required env vars surface through the injected validator. Crucially, the candidate image is verified and pulled (and the policy gate runs) before the destructive stop/delete/start, so a missing or unverifiable image leaves the running workload untouched — there is no rollback once UpdateWorkload begins. Verification uses the same path as thv run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add upgrade apply for the CLI and API With the Applier in place, expose it to users. This lets CLI users and API clients apply an upgrade while preserving their configuration, instead of manually re-running a workload with a new image. Add a thv upgrade apply <name> command. It runs the check, shows the candidate image, new env vars, and any permission/transport/network posture drift, then prompts for confirmation. --dry-run reports the plan without applying; --env/--secret supply values for newly required variables; --yes (or a non-interactive shell) skips the prompt and fails loudly on missing required values; --image-verification mirrors thv run. Add POST /api/v1beta/workloads/{name}/upgrade, delegating to the same Applier so all clients share one apply path. The API path is always non-interactive (detached validator) and sources image verification from server config; the request body can only supply env/secret values, never redirect the image or weaken verification. Apply failures return a sanitized 422 with the detailed cause logged server-side, so secret references in an error chain are never echoed to the caller. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add e2e coverage and lifecycle docs for upgrades The upgrade flow spans registry lookup, image verification, and a destructive workload recreate, so it needs end-to-end coverage to catch regressions the unit tests cannot. Add an e2e test that exercises the full flow against real osv-mcp image tags via custom registry fixtures: a workload run from the 0.1.1 entry reports up-to-date, repointing the registry to 0.1.3 makes the check report an available upgrade, and apply recreates the workload on the new image while preserving a user-set env var. A negative spec confirms a raw-image workload reports not-registry-sourced. All thv invocations run under an isolated config/home/data dir so the suite never touches the developer's real registry configuration. Document the upgrade state transition in the workloads lifecycle architecture doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent fc64061 commit 12fcd15

4 files changed

Lines changed: 346 additions & 0 deletions

File tree

docs/arch/08-workloads-lifecycle.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,32 @@ thv rm my-server
9696

9797
**Implementation**: `pkg/workloads/manager.go`
9898

99+
### Upgrade
100+
101+
```bash
102+
thv upgrade check [my-server] # offline metadata comparison, never pulls
103+
thv upgrade apply my-server --yes # verify + pull, then recreate
104+
```
105+
106+
Upgrades apply only to **registry-sourced** workloads — those run by a registry entry name, which records `RunConfig.RegistryServerName`. A workload run from a raw image reference has no registry server name and is reported as `not-registry-sourced`.
107+
108+
**Check** is an offline comparison. The checker (`pkg/workloads/upgrade.Checker`) looks up the workload's `RegistryServerName` in the configured registry and compares the running image tag against the candidate the registry advertises (semver, conservative: a `latest` tag, a different repository, or non-comparable tags yield `unknown`). When the registry advertises a strictly newer tag the status is `upgrade-available`, and the result also surfaces **env-var drift** (variables the candidate now declares that the workload does not yet supply) and **posture drift** (transport, permission profile — network isolation is a local-only choice the registry cannot express, so it is not reported as drift). Checks never pull images.
109+
110+
**Apply** is the single security-critical path (`pkg/workloads/upgrade.Applier`). It re-derives the check against the registry on every apply (closing the time-of-check/time-of-use window — a `CheckResult` from an earlier `check` is never trusted), then, in order:
111+
112+
1. Resolves and **verifies** the candidate image's provenance (by registry server name).
113+
2. Builds a merged `RunConfig` that **preserves the entire user configuration** — env vars, secrets, OIDC/authz/audit/telemetry, tool filters, middleware, transport/posture — and changes only the image, any merged env/secrets supplied via `--env`/`--secret`, and the registry source URLs.
114+
3. Runs the policy gate and performs the verified **pull**.
115+
4. Only then asks the manager to recreate the workload via `UpdateWorkload` (stop → delete → start with the new config).
116+
117+
Steps 1–3 all complete before any destruction, so a failure while preparing the candidate leaves the running workload untouched. **There is no automatic rollback**: once recreation begins, the previous image/config is not restored — recovery is a forward operation. Posture drift (transport, permission profile) is **surfaced as a warning, not converged**: the upgraded workload keeps its full existing posture, including transport, network isolation, and permission profile.
118+
119+
> Runtime boundary: the "verified pull before destruction" guarantee holds precisely for local container runtimes (the scope here). On Kubernetes the verification and policy gate still precede recreation, but the byte-level pull is delegated to the kubelet and happens after recreation.
120+
121+
The same `Applier` backs both the CLI (`thv upgrade apply`) and the API (`POST /api/v1beta/workloads/{name}/upgrade`, with `GET .../{name}/upgrade-check` and `GET .../upgrade-check` for single and bulk checks), so the verify-then-pull ordering lives in exactly one place. `thv list --check-upgrades` annotates the workload list with each workload's check result.
122+
123+
**Implementation**: `pkg/workloads/upgrade/`, `cmd/thv/app/upgrade.go`
124+
99125
### List
100126

101127
Listing combines container workloads from the runtime with remote workloads from persisted state. The manager can filter workloads by label or group, and can optionally include stopped workloads.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"version": "1.0.0",
3+
"meta": {
4+
"last_updated": "2026-01-15T10:00:00Z"
5+
},
6+
"data": {
7+
"servers": [
8+
{
9+
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
10+
"name": "osv-upgrade-test",
11+
"title": "OSV (upgrade e2e fixture)",
12+
"description": "OSV MCP server fixture for upgrade e2e tests (high tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the higher tag advertised as the upgrade candidate. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-low.json.",
13+
"version": "1.0.0",
14+
"repository": {
15+
"url": "https://github.com/StacklokLabs/osv-mcp",
16+
"source": "github"
17+
},
18+
"packages": [
19+
{
20+
"identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.3",
21+
"registryType": "oci",
22+
"transport": {
23+
"type": "streamable-http",
24+
"url": "http://localhost:8080"
25+
}
26+
}
27+
]
28+
}
29+
]
30+
}
31+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"version": "1.0.0",
3+
"meta": {
4+
"last_updated": "2026-01-15T10:00:00Z"
5+
},
6+
"data": {
7+
"servers": [
8+
{
9+
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
10+
"name": "osv-upgrade-test",
11+
"title": "OSV (upgrade e2e fixture)",
12+
"description": "OSV MCP server fixture for upgrade e2e tests (low tag). Mirrors the bundled osv entry's transport (streamable-http) and image exactly, pinned to the lower tag so the upgrade checker reports an available upgrade once repointed to registry-high.json. The transport url port 8080 is the image-INTERNAL target port (becomes RunConfig.TargetPort); thv still assigns an OS-chosen host proxy port, so there is no fixed host-port clash. Only the image tag differs between this file and registry-high.json.",
13+
"version": "1.0.0",
14+
"repository": {
15+
"url": "https://github.com/StacklokLabs/osv-mcp",
16+
"source": "github"
17+
},
18+
"packages": [
19+
{
20+
"identifier": "ghcr.io/stackloklabs/osv-mcp/server:0.1.1",
21+
"registryType": "oci",
22+
"transport": {
23+
"type": "streamable-http",
24+
"url": "http://localhost:8080"
25+
}
26+
}
27+
]
28+
}
29+
]
30+
}
31+
}

test/e2e/upgrade_e2e_test.go

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
// SPDX-FileCopyrightText: Copyright 2026 Stacklok, Inc.
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
package e2e_test
5+
6+
import (
7+
"encoding/json"
8+
"os"
9+
"path/filepath"
10+
"strings"
11+
"time"
12+
13+
. "github.com/onsi/ginkgo/v2"
14+
. "github.com/onsi/gomega"
15+
16+
"github.com/stacklok/toolhive/pkg/runner"
17+
"github.com/stacklok/toolhive/pkg/workloads/upgrade"
18+
"github.com/stacklok/toolhive/test/e2e"
19+
)
20+
21+
const (
22+
// rawOSVImage is the concrete image the bundled "osv" registry entry resolves
23+
// to. Running it by its raw reference (rather than by a registry name)
24+
// produces a working MCP server whose RunConfig has no RegistryServerName,
25+
// which is exactly the input the "not-registry-sourced" path needs.
26+
rawOSVImage = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"
27+
28+
// osvImageLow and osvImageHigh are two real, pullable osv-mcp releases. The
29+
// upgrade-flow fixtures advertise these tags so the checker reports an
30+
// available upgrade from low to high. Both tags run identically (only a minor
31+
// version bump), and osv declares no required env vars or secrets, so
32+
// `upgrade apply --yes` runs non-interactively without prompting.
33+
osvImageLow = "ghcr.io/stackloklabs/osv-mcp/server:0.1.1"
34+
osvImageHigh = "ghcr.io/stackloklabs/osv-mcp/server:0.1.3"
35+
36+
// upgradeRegistryServerName is the server name the fixtures expose (a bare
37+
// "name": "osv-upgrade-test", so the recorded RegistryServerName matches it
38+
// exactly). `thv run` is invoked with this name so RegistryServerName is set.
39+
upgradeRegistryServerName = "osv-upgrade-test"
40+
)
41+
42+
// upgradeCheckResults mirrors the JSON emitted by `thv upgrade check --format
43+
// json`, which is an array of upgrade.CheckResult. We decode into the real type
44+
// so the test breaks if the wire shape changes.
45+
type upgradeCheckResults []upgrade.CheckResult
46+
47+
var _ = Describe("Upgrade Command", Label("core", "upgrade", "e2e"), func() {
48+
var (
49+
config *e2e.TestConfig
50+
serverName string
51+
52+
// tempHome / tempData isolate this spec's ToolHive config and workload
53+
// state from the developer's / CI runner's real ToolHive directories.
54+
// `thv config set-registry` writes to $XDG_CONFIG_HOME/toolhive, and
55+
// workload state lives under $XDG_DATA_HOME/toolhive; routing every thv
56+
// invocation (run, check, apply, export, set/unset-registry, and cleanup)
57+
// through thvCmd keeps both off the real config and consistent across the
58+
// run -> check -> apply -> export -> cleanup sequence so they all see the
59+
// same workload. The container runtime (Docker/Podman socket) is not
60+
// HOME-dependent, so an isolated HOME does not affect it; PATH and the rest
61+
// of the environment are inherited unchanged.
62+
tempHome string
63+
tempData string
64+
65+
// thvCmd builds a THVCommand bound to the isolated config/home/data dirs.
66+
thvCmd func(args ...string) *e2e.THVCommand
67+
)
68+
69+
BeforeEach(func() {
70+
config = e2e.NewTestConfig()
71+
serverName = e2e.GenerateUniqueServerName("upgrade-test")
72+
tempHome = GinkgoT().TempDir()
73+
tempData = GinkgoT().TempDir()
74+
75+
thvCmd = func(args ...string) *e2e.THVCommand {
76+
return e2e.NewTHVCommand(config, args...).
77+
WithEnv(
78+
"XDG_CONFIG_HOME="+tempHome,
79+
"HOME="+tempHome,
80+
"XDG_DATA_HOME="+tempData,
81+
)
82+
}
83+
84+
err := e2e.CheckTHVBinaryAvailable(config)
85+
Expect(err).ToNot(HaveOccurred(), "thv binary should be available")
86+
})
87+
88+
AfterEach(func() {
89+
if config.CleanupAfter {
90+
// Stop and remove the workload using the SAME isolated env so cleanup
91+
// targets the workload created by this spec. Tolerate a missing
92+
// workload (a spec may have failed before creating it).
93+
_, _, _ = thvCmd("stop", serverName).Run()
94+
_, _, _ = thvCmd("rm", serverName).Run()
95+
}
96+
// The registry config lives under the discarded temp dirs, so there is
97+
// nothing to restore; the real ToolHive config is never touched.
98+
})
99+
100+
// Negative / cheap path: a workload created from a raw image reference is not
101+
// registry-sourced, so no upgrade can ever be determined for it. This needs
102+
// no registry fixture and is the safety-net coverage.
103+
Describe("Checking a workload that is not registry-sourced", func() {
104+
Context("when the workload was run from a raw image reference", func() {
105+
It("should report not-registry-sourced in text output", func() {
106+
By("Running an MCP server from a raw image reference")
107+
thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()
108+
109+
By("Waiting for the server to be running")
110+
waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)
111+
112+
By("Checking the workload for an available upgrade")
113+
stdout, _ := thvCmd("upgrade", "check", serverName).ExpectSuccess()
114+
115+
By("Verifying the report says the workload is not registry-sourced")
116+
Expect(stdout).To(ContainSubstring(serverName), "Report should name the workload")
117+
Expect(stdout).To(ContainSubstring(string(upgrade.StatusNotRegistrySourced)),
118+
"Report should show the not-registry-sourced status")
119+
})
120+
121+
It("should emit parseable JSON with the expected fields", func() {
122+
By("Running an MCP server from a raw image reference")
123+
thvCmd("run", "--name", serverName, rawOSVImage).ExpectSuccess()
124+
125+
By("Waiting for the server to be running")
126+
waitForIsolatedMCPServer(thvCmd, serverName, 60*time.Second)
127+
128+
By("Checking the workload for an available upgrade in JSON format")
129+
stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
130+
131+
By("Verifying the JSON output is valid and contains the expected fields")
132+
var results upgradeCheckResults
133+
err := json.Unmarshal([]byte(stdout), &results)
134+
Expect(err).ToNot(HaveOccurred(), "Output should be valid JSON")
135+
Expect(results).To(HaveLen(1), "JSON should contain exactly one result")
136+
137+
result := results[0]
138+
Expect(result.WorkloadName).To(Equal(serverName), "JSON should name the workload")
139+
Expect(result.Status).To(Equal(upgrade.StatusNotRegistrySourced),
140+
"JSON should show the not-registry-sourced status")
141+
Expect(result.CurrentImage).To(Equal(rawOSVImage),
142+
"JSON should record the raw image the workload is running")
143+
Expect(result.RegistryServer).To(BeEmpty(),
144+
"A raw-image workload should not have a registry server name")
145+
Expect(result.CandidateImage).To(BeEmpty(),
146+
"There should be no candidate image when no upgrade can be determined")
147+
})
148+
})
149+
})
150+
151+
// Full upgrade flow: a registry-sourced workload whose registry advertises a
152+
// newer tag is upgraded in place, and we verify it runs on the new image with
153+
// its prior configuration preserved.
154+
//
155+
// Determinism: the two fixtures (testdata/upgrade/registry-{low,high}.json)
156+
// advertise tags 0.1.1 and 0.1.3 of ghcr.io/stackloklabs/osv-mcp/server, both
157+
// real, pullable releases that run identically. The fixtures are in the
158+
// upstream MCP-registry format that `thv config set-registry` requires for a
159+
// local file (the local provider rejects the legacy ToolHive-native format),
160+
// and they mirror the bundled osv entry's transport (streamable-http) and
161+
// internal target port (8080) exactly so the workload comes up the same way
162+
// `thv run osv` would. osv declares no required env vars, so `upgrade apply
163+
// --yes` never prompts. This runs by default in CI.
164+
Describe("Applying an available upgrade to a registry-sourced workload", func() {
165+
var (
166+
tempDir string
167+
fixtureLow string
168+
fixtureHigh string
169+
)
170+
171+
BeforeEach(func() {
172+
tempDir = GinkgoT().TempDir()
173+
174+
// set-registry persists the path and resolves it later, so it must be
175+
// absolute regardless of the working directory at resolution time.
176+
var err error
177+
fixtureLow, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-low.json"))
178+
Expect(err).ToNot(HaveOccurred())
179+
fixtureHigh, err = filepath.Abs(filepath.Join("testdata", "upgrade", "registry-high.json"))
180+
Expect(err).ToNot(HaveOccurred())
181+
Expect(fixtureLow).To(BeAnExistingFile(), "low-tag registry fixture should exist")
182+
Expect(fixtureHigh).To(BeAnExistingFile(), "high-tag registry fixture should exist")
183+
})
184+
185+
It("should pull the candidate, recreate the workload, and preserve config", func() {
186+
By("Pointing thv at the older-tag registry fixture")
187+
thvCmd("config", "set-registry", fixtureLow).ExpectSuccess()
188+
189+
By("Running the server by its registry name with an environment variable")
190+
thvCmd("run", "--name", serverName, "--env", "FOO=bar", upgradeRegistryServerName).ExpectSuccess()
191+
192+
By("Waiting for the server to be running")
193+
waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)
194+
195+
By("Confirming the workload is registry-sourced and up to date against the older fixture")
196+
stdout, _ := thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
197+
var before upgradeCheckResults
198+
Expect(json.Unmarshal([]byte(stdout), &before)).To(Succeed(), "Output should be valid JSON")
199+
Expect(before).To(HaveLen(1))
200+
Expect(before[0].RegistryServer).To(Equal(upgradeRegistryServerName),
201+
"Running by registry name should record the registry server name")
202+
Expect(before[0].CurrentImage).To(Equal(osvImageLow),
203+
"The workload should be running the lower-tag image")
204+
Expect(before[0].Status).To(Equal(upgrade.StatusUpToDate),
205+
"No upgrade should be available against the older-tag fixture")
206+
207+
By("Repointing thv at the registry fixture advertising the newer tag")
208+
thvCmd("config", "set-registry", fixtureHigh).ExpectSuccess()
209+
210+
By("Checking that an upgrade is now available")
211+
stdout, _ = thvCmd("upgrade", "check", serverName, "--format", "json").ExpectSuccess()
212+
var avail upgradeCheckResults
213+
Expect(json.Unmarshal([]byte(stdout), &avail)).To(Succeed(), "Output should be valid JSON")
214+
Expect(avail).To(HaveLen(1))
215+
Expect(avail[0].Status).To(Equal(upgrade.StatusUpgradeAvailable),
216+
"An upgrade should be available after repointing to the newer tag")
217+
Expect(avail[0].CandidateImage).To(Equal(osvImageHigh),
218+
"The candidate image should carry the newer tag")
219+
220+
By("Applying the upgrade non-interactively")
221+
thvCmd("upgrade", "apply", serverName, "--yes").ExpectSuccess()
222+
223+
By("Waiting for the upgraded workload to be running again")
224+
waitForIsolatedMCPServer(thvCmd, serverName, 120*time.Second)
225+
226+
By("Verifying the recorded image carries the newer tag and config is preserved")
227+
exportPath := filepath.Join(tempDir, "upgraded-export.json")
228+
thvCmd("export", serverName, exportPath).ExpectSuccess()
229+
230+
fileContent, err := os.ReadFile(exportPath)
231+
Expect(err).ToNot(HaveOccurred())
232+
233+
var runConfig runner.RunConfig
234+
Expect(json.Unmarshal(fileContent, &runConfig)).To(Succeed(), "Export should be valid JSON")
235+
Expect(runConfig.Image).To(Equal(osvImageHigh),
236+
"The upgraded workload should record the newer image tag")
237+
Expect(runConfig.EnvVars).To(HaveKeyWithValue("FOO", "bar"),
238+
"The upgrade should preserve the workload's environment variables")
239+
})
240+
})
241+
})
242+
243+
// waitForIsolatedMCPServer polls `thv list` (through the supplied isolated-env
244+
// command builder) until the named workload reports running, or fails the spec
245+
// on timeout. It mirrors e2e.WaitForMCPServer but runs every poll under the same
246+
// isolated config/home/data env as the rest of the spec, so it observes the
247+
// workload created in that isolated state rather than the real ToolHive config.
248+
func waitForIsolatedMCPServer(thvCmd func(args ...string) *e2e.THVCommand, serverName string, timeout time.Duration) {
249+
GinkgoHelper()
250+
Eventually(func() bool {
251+
stdout, _, err := thvCmd("list").Run()
252+
if err != nil {
253+
return false
254+
}
255+
return strings.Contains(stdout, serverName) && strings.Contains(stdout, "running")
256+
}, timeout, 1*time.Second).Should(BeTrue(),
257+
"workload %q should be running within %s", serverName, timeout)
258+
}

0 commit comments

Comments
 (0)