Add upgrade-check REST endpoints for workloads#5408
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
CLI and API users have no way to discover when a newer version of a registry-sourced MCP server is available; only Studio implements drift detection, in its frontend. Introduce a backend package that all clients can consume. Add pkg/workloads/upgrade with a Checker that compares a running workload's image tag against its registry entry (semver-aware, with a string fallback) and reports environment-variable and configuration (transport / permission-profile / network-isolation) drift. Comparison degrades safely to "unknown" for :latest, digest refs, repository changes, and non-registry-sourced workloads, so only a strictly-newer tag on the same repository yields "upgrade-available". This is the read-only detection core (RFC THV-0068, phase A); the apply path, API endpoints, and CLI follow in later changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b1361f7 to
3350917
Compare
0224c57 to
4574212
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5408 +/- ##
==========================================
- Coverage 68.82% 68.79% -0.03%
==========================================
Files 632 633 +1
Lines 64085 64181 +96
==========================================
+ Hits 44105 44154 +49
- Misses 16722 16760 +38
- Partials 3258 3267 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
4574212 to
6faa7be
Compare
- Lowercase an uppercase "V" tag prefix so semver comparison works; "V1.2.0" vs "V1.3.0" no longer falls through to undecidable and hides a real upgrade. - Drop the raw provider error from CheckResult.Reason (it is serialized into the API response and can leak internal addressing); log it at DEBUG and return a fixed string. Same for the CheckAll path. - Add a defensive default to the comparison switch so an unexpected value yields StatusUnknown rather than the least-safe StatusUpToDate. - Stop reporting network-isolation drift: the registry has no network-isolation field, so it fired for every isolated workload regardless of the candidate version. Remove the ConfigDrift field and the now-unused BoolChange type. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CLI, Studio, and automation need a single backend source of truth for
upgrade availability instead of each client reimplementing registry
drift detection. Expose the Phase A checker over the existing workloads
API.
Add GET /api/v1beta/workloads/upgrade-check (batch) and
GET /api/v1beta/workloads/{name}/upgrade-check (single). The batch
handler reuses the exact group/all authorization scoping of the list
endpoint and intersects it with the enumerated run configs, so it can
never report a workload outside the caller's scope. Responses carry only
non-sensitive CheckResult metadata; secret env-var defaults are cleared.
Both routes are read-only and skip image pulls, so they use the standard
timeout. Regenerate the OpenAPI spec.
The dedicated apply endpoint (POST .../upgrade) follows once the Applier
lands (RFC THV-0068, phase B).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FilterByGroup returns an empty slice (nil error) for a group that does not exist, so a typo'd ?group= silently returned 200 with an empty list instead of the documented 404. Check group existence explicitly via the group manager before filtering, in both the upgrade-check and the listWorkloads handlers, so the advertised 404 is real. Add a bulk upgrade-check test covering the unknown-group path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The upgrade detection change removed the ConfigDrift.NetworkIsolation field and the BoolChange type, so regenerate the committed OpenAPI spec to drop the stale schema and property. Fixes the Verify Swagger Documentation CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0431342 to
6cb9506
Compare
6faa7be to
664e594
Compare
Summary
The upgrade checker (PR #5407) is only useful if clients can reach it. CLI, Studio, and automation should share one backend source of truth for upgrade availability instead of each reimplementing registry drift detection. This exposes the checker over the existing workloads API. Part of RFC THV-0068, local scope.
GET /api/v1beta/workloads/upgrade-check(batch) andGET /api/v1beta/workloads/{name}/upgrade-check(single).CheckResultmetadata; secret env-var defaults are cleared. Both routes are read-only and skip image pulls, so they use the standard request timeout.Type of change
Test plan
task test) — single (up-to-date / available / 404 / 400), batch (mixed results, group filter, stale-on-disk config excluded, unloadable config skipped-not-fatal), a no-secret-leak assertion, and a chi route-ordering test (/upgrade-checkdoes not resolve togetWorkload).task lint-fix)Changes
pkg/api/v1/workloads_upgrade.gopkg/api/v1/workloads.gopkg/api/v1/workload_types.godocs/server/*Does this introduce a user-facing change?
Yes — two new read-only REST endpoints for querying upgrade availability. No existing endpoints change.
Large PR Justification
This PR is ~1,434 lines, but only 231 are hand-written source. The rest is mechanically generated or test code that cannot be meaningfully split out:
docs/server/docs.go,swagger.json,swagger.yaml), produced bytask docsfrom the swag annotations on the two new handlers. These must be regenerated and committed in the same PR (aVerify Swagger DocumentationCI check fails otherwise), and they cannot be split.Special notes for reviewers
PR 2 of 6 in the RFC THV-0068 stack; based on #5407. The dedicated apply endpoint (
POST .../upgrade) lands in PR 5, reusing the response types added here.🤖 Generated with Claude Code