Commit 164c251
Freeze MCPServer generation per pod via downward API (#5364)
* Add tests for MCPServer generation freeze invariant
Lock-in test in pkg/container/kubernetes/client_test.go documenting
the gate's blind spot when two callers pass equal MCPServerGeneration
with different images.
TDD test in cmd/thv-proxyrunner/app/run_test.go for the env-var override
(THV_MCPSERVER_GENERATION) that the fix in #5360 will introduce. Fails
today; goes green when the override is implemented.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Freeze proxyrunner MCPServerGeneration via env var
The /etc/runconfig ConfigMap volume mounts live (no subPath), so a
restarted old-RS proxyrunner pod re-reads runconfig.json after a helm
upgrade and picks up the new MCPServer generation. Both old and new
pods then call DeployWorkload with the same ourGen, defeating the
strict-greater-than gate at shouldSkipStatefulSetApply.
Honor THV_MCPSERVER_GENERATION (sourced via downward API in a
follow-up operator change) as an override on the file value, freezing
the generation per pod at creation time. Parallel to how the image is
already frozen via the CLI positional arg.
Part of #5360.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Project MCPServer generation into proxyrunner via downward API
Stamp the MCPServer generation as a pod-template annotation on the
proxy Deployment, and project that annotation into the proxyrunner
container as the THV_MCPSERVER_GENERATION env var via the downward
API. The env var is bound to the pod's own annotations at creation
time, so a restarted old-RS pod cannot acquire the new generation by
re-reading the live-mounted RunConfig ConfigMap.
The proxyrunner already honors this env var as an override on the
file value (see prior commit). With both sides wired, two coexisting
proxyrunner pods during a rolling update carry distinct generations
again and the strict-greater-than gate at shouldSkipStatefulSetApply
fires correctly.
Mirror the new env var and annotation in deploymentNeedsUpdate so
drift detection stays stable.
Closes #5360.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add envtest coverage for generation-freeze contract
Two new integration specs assert the operator side of the freeze
end-to-end against a real apiserver:
- On initial reconcile, the proxy Deployment's pod template carries
the mcpserver-generation annotation set to MCPServer.metadata.generation,
and the proxyrunner container declares THV_MCPSERVER_GENERATION via
a downward-API FieldRef pointing at that exact annotation key.
- On a Spec.Image patch the controller bumps the annotation value to
the new generation and keeps the FieldRef wired correctly. This is
the rolling-update path: new pods get a strictly-greater frozen
generation than any restarted old-RS pod.
Catches typos in either the annotation key or the FieldRef path that
would silently produce an empty env var and let the proxyrunner fall
through to the (live-mounted) file value. Complements the unit-level
TestDeploymentForMCPServer_MCPServerGenerationDownwardAPI and
TestTryLoadConfigFromFile_MCPServerGenerationEnvOverride.
Part of #5360.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address review feedback on #5360 freeze
- Defend applyMCPServerGenerationOverride against negative env values.
metadata.generation is monotonic non-negative per K8s convention, so a
negative override cannot come from a legitimate downward-API projection
and must not be allowed to silently disable the apply-gate stamp.
Add a Debug log on the success path so the override is observable in
pod logs when diagnosing future occurrences.
- Add TestApplyMCPServerGenerationOverride exercising the empty,
valid, zero, unparseable, and negative branches in isolation.
- Soften the position-mirror comment in deploymentNeedsUpdate. The
prior text referenced the Redis password slot, but
deploymentNeedsUpdate does not include the Redis password env var
(pre-existing latent drift bug, tracked separately).
- Add a header comment to TestResourceOverrides explaining that the
"0" mcpserver-generation annotation value comes from the fake client
not auto-incrementing metadata.generation. Realistic generation
tracking is exercised by the envtest in
mcpserver_generation_freeze_integration_test.go.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix CI: codespell typo and gofmt alignment
- "unparseable" → "unparsable" (codespell) in run.go, run_test.go.
- Realign env-var map literals to gofmt's preferred column for the
longest key (THV_MCPSERVER_GENERATION).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Set FieldRef.APIVersion=v1 to avoid perpetual drift
ObjectFieldSelector.APIVersion is defaulted to "v1" by the API server on
persistence. The drift comparator at deploymentNeedsUpdate was rebuilding
the expected env var with an empty APIVersion, so equality.Semantic.DeepEqual
returned false on every reconcile and the controller perpetually rewrote
the proxy Deployment. That constant churn re-triggered the exact rolling-
update race the freeze was supposed to close — confirmed by the failing
VirtualMCPServer Optimizer + Circuit Breaker recovery spec in
test-e2e-lifecycle: stale-image proxyrunner pods kept clobbering the new
StatefulSet apply, leaving the backend mcp container in ImagePullBackOff.
Explicitly set APIVersion: "v1" on both the construction site and the
drift comparator so they match the API-server-defaulted value.
Add unit-level assertion in TestDeploymentForMCPServer_MCPServerGenerationDownwardAPI
and a new envtest spec "Does not flag spurious drift on a no-op reconcile"
that watches the Deployment's resourceVersion across multiple reconciles
and fails fast if drift detection misfires.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* TEMP debug: promote gate skip log + override log to INFO
Temporary diagnostic for #5360 v1.33/v1.34 E2E failure. Will be reverted
before merge.
* Revert debug INFO logs; port STS-template wait to optimizer test
The debug commit (8ed6e39) confirmed the override + gate are working
correctly end-to-end on a real cluster:
pod env=3 (latest): file=3 env=3, apply proceeds — writes GOOD image
pod env=2 (stale): file=3 env=2, gate skips (theirs=3 > ours=2)
Revert the INFO promotion now that diagnosis is complete; both logs go
back to Debug.
The remaining E2E flake is a pre-existing race in
virtualmcp_optimizer_circuit_breaker_test.go: the test deletes the
Pending StatefulSet pod immediately after patching Spec.Image without
waiting for the proxyrunner to apply the new template, so the pod can
be recreated against the stale (bad) template and the StatefulSet
controller may not re-roll an already-unhealthy pod afterwards. Mirrors
the same guard added to virtualmcp_circuit_breaker_test.go in #5079.
Before the apply-gate, the race "resolved" by the stale-image apply
periodically resetting the template, letting the test catch a lucky
good window. With the gate in place the apply order is deterministic,
which makes the missing wait visible — hence the test fix lands here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 4bb80f0 commit 164c251
8 files changed
Lines changed: 660 additions & 27 deletions
File tree
- cmd
- thv-operator
- controllers
- test-integration/mcp-server
- thv-proxyrunner/app
- pkg/container/kubernetes
- test/e2e/thv-operator/virtualmcp
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
1151 | 1152 | | |
1152 | 1153 | | |
1153 | 1154 | | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
1154 | 1174 | | |
1155 | 1175 | | |
1156 | 1176 | | |
| |||
1264 | 1284 | | |
1265 | 1285 | | |
1266 | 1286 | | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
1267 | 1293 | | |
1268 | 1294 | | |
1269 | 1295 | | |
| |||
1759 | 1785 | | |
1760 | 1786 | | |
1761 | 1787 | | |
| 1788 | + | |
| 1789 | + | |
| 1790 | + | |
| 1791 | + | |
| 1792 | + | |
| 1793 | + | |
| 1794 | + | |
| 1795 | + | |
| 1796 | + | |
| 1797 | + | |
| 1798 | + | |
| 1799 | + | |
| 1800 | + | |
| 1801 | + | |
| 1802 | + | |
| 1803 | + | |
| 1804 | + | |
| 1805 | + | |
| 1806 | + | |
| 1807 | + | |
1762 | 1808 | | |
1763 | 1809 | | |
1764 | 1810 | | |
| |||
1879 | 1925 | | |
1880 | 1926 | | |
1881 | 1927 | | |
| 1928 | + | |
| 1929 | + | |
| 1930 | + | |
| 1931 | + | |
| 1932 | + | |
1882 | 1933 | | |
1883 | 1934 | | |
1884 | 1935 | | |
| |||
Lines changed: 107 additions & 27 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
225 | 232 | | |
226 | 233 | | |
227 | 234 | | |
| |||
252 | 259 | | |
253 | 260 | | |
254 | 261 | | |
255 | | - | |
| 262 | + | |
| 263 | + | |
256 | 264 | | |
257 | 265 | | |
258 | 266 | | |
| |||
315 | 323 | | |
316 | 324 | | |
317 | 325 | | |
318 | | - | |
| 326 | + | |
| 327 | + | |
319 | 328 | | |
320 | 329 | | |
321 | 330 | | |
| |||
376 | 385 | | |
377 | 386 | | |
378 | 387 | | |
379 | | - | |
| 388 | + | |
| 389 | + | |
380 | 390 | | |
381 | 391 | | |
382 | 392 | | |
| |||
415 | 425 | | |
416 | 426 | | |
417 | 427 | | |
418 | | - | |
| 428 | + | |
| 429 | + | |
419 | 430 | | |
420 | 431 | | |
421 | 432 | | |
| |||
481 | 492 | | |
482 | 493 | | |
483 | 494 | | |
484 | | - | |
| 495 | + | |
| 496 | + | |
485 | 497 | | |
486 | 498 | | |
487 | 499 | | |
| |||
525 | 537 | | |
526 | 538 | | |
527 | 539 | | |
528 | | - | |
529 | | - | |
530 | | - | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
531 | 544 | | |
532 | 545 | | |
533 | 546 | | |
| |||
582 | 595 | | |
583 | 596 | | |
584 | 597 | | |
585 | | - | |
586 | | - | |
587 | | - | |
588 | | - | |
589 | | - | |
590 | | - | |
591 | | - | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
592 | 606 | | |
593 | 607 | | |
594 | 608 | | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | | - | |
599 | | - | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
600 | 615 | | |
601 | 616 | | |
602 | 617 | | |
603 | | - | |
604 | | - | |
605 | | - | |
606 | | - | |
607 | | - | |
608 | | - | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
609 | 625 | | |
610 | 626 | | |
611 | 627 | | |
| |||
657 | 673 | | |
658 | 674 | | |
659 | 675 | | |
660 | | - | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
661 | 680 | | |
662 | 681 | | |
663 | 682 | | |
| |||
1140 | 1159 | | |
1141 | 1160 | | |
1142 | 1161 | | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
0 commit comments