Skip to content

Commit bbe8b85

Browse files
authored
Introduce MutateAndPatchSpec and adopt across spec-patch sites (#5004)
* Introduce MutateAndPatchSpec and adopt across spec-patch sites The inline DeepCopy + Patch(MergeFromWithOptimisticLock) pattern from #4914 landed at five MCPServer spec-write sites, each carrying the same four-line rationale comment. Extract it into a MutateAndPatchSpec[T] generic helper in cmd/thv-operator/pkg/controllerutil, siblinged with the existing MutateAndPatchStatus. The helper mirrors its sibling exactly -- same reflection-based nil-guard, same DeepCopy-then-mutate-then-Patch flow -- with two deliberate differences: - Uses MergeFromWithOptions(original, MergeFromWithOptimisticLock{}) so concurrent writers get 409-and-requeue instead of silent clobber. This is the property that defends spec.authzConfig, which the forthcoming authorization controller will own, from being zeroed on every reconcile. - No no-op short-circuit. MergeFromWithOptimisticLock always emits metadata.resourceVersion into the body, so the status helper's "body == {}" check never fires; and every current spec call site carries a real mutation. All five inline sites (mcpserver_controller.go finalizer add/remove and restart-annotation stamp; toolconfig_controller.go and mcpexternalauthconfig_controller.go config-hash stamps) adopt the helper. Pure refactor at the call sites -- no behavior change. Tests mirror the status helper's shape: happy-path + optimistic-lock wire signal, DeepCopy isolation, 409 Conflict propagation, nil-obj rejection, and disjoint-spec-field preservation (the regression guard that would fire if the helper were ever swapped back to r.Update). Existing TestMCPServerSpecPatchesAreOptimisticLock and the mcpserver_spec_patch_integration_test.go envtest remain green. The operator-rules section on spec/metadata patching is updated to reference the helper as the canonical pattern. * Address MoE review feedback on MutateAndPatchSpec Three of five review findings applied (two skipped with rebuttal in the PR thread): - Add TestMutateAndPatchSpec_NoOpMutateStillPatches. The existing AppliesMutationWithOptimisticLock test asserts resourceVersion is present in the body when a patch is issued; it cannot detect a short-circuit regression that issues zero patches. Pin the documented divergence from MutateAndPatchStatus (which DOES short-circuit) with a direct no-op-mutate wire-level assertion. - Fix the godoc "Typical usage" example in both patch.go and status.go to use the ctrlutil alias. operator.md already uses this alias, and every real caller aliases the package to avoid colliding with controller-runtime's own controllerutil package. patch.go is updated to match; status.go is brought along for symmetry. - Document that obj is mutated before Patch is issued, so on error the caller's in-memory copy is post-mutation. Callers must re-fetch rather than retrying in place. Added to both helpers. The standard reconciler return-and-requeue pattern is the correct retry path; the sentence names it explicitly so a future caller reading the doc does not invent an in-place retry loop. Also tightened the "don't use this for status" cross-reference in status.go to name the sibling MutateAndPatchSpec directly.
1 parent 037a0e8 commit bbe8b85

8 files changed

Lines changed: 437 additions & 70 deletions

File tree

.claude/rules/operator.md

Lines changed: 11 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,11 @@ so any field our local copy does not track (e.g. `spec.authzConfig`
7575
written by a separate authorization controller) gets zeroed on every
7676
reconcile.
7777

78-
Use an **optimistic-lock merge patch** instead. The merge-patch body
79-
only contains fields the caller changed, and `MergeFromWithOptimisticLock`
80-
sends `resourceVersion` as a precondition: if the server moved between
81-
our Get and Patch, the apiserver returns 409 and controller-runtime
82-
requeues with a fresh Get.
78+
Use `controllerutil.MutateAndPatchSpec` instead. The helper wraps an
79+
optimistic-lock merge patch: the body only contains fields the caller
80+
changed, and `MergeFromWithOptimisticLock` sends `resourceVersion` as a
81+
precondition, so if the server moved between our Get and Patch the
82+
apiserver returns 409 and controller-runtime requeues with a fresh Get.
8383

8484
This is what protects `metadata.finalizers`. Merge-patch has no
8585
array-append semantics — arrays are replaced wholesale — so when our
@@ -90,20 +90,16 @@ fails our precondition, and the next reconcile observes it via a fresh
9090
Get before recomputing the diff.
9191

9292
```go
93-
original := mcpServer.DeepCopy()
94-
controllerutil.AddFinalizer(mcpServer, MCPServerFinalizerName)
95-
if err := r.Patch(ctx, mcpServer,
96-
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
93+
if err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, mcpServer, func(m *mcpv1beta1.MCPServer) {
94+
controllerutil.AddFinalizer(m, MCPServerFinalizerName)
95+
}); err != nil {
9796
return ctrl.Result{}, err
9897
}
9998
```
10099

101-
Do not put unrelated work between the `DeepCopy` and the `Patch`: the
102-
wire diff is computed from that snapshot, so anything you mutate in
103-
between leaks into the patch body.
104-
105100
Expect 409s as routine log noise once the external controller lands —
106101
the guard doing its job, not a bug.
107102

108-
Status-subresource patching follows the same "never `Update`" rule and
109-
is covered separately once the shared helper lands (#4633).
103+
Status-subresource patching uses the sibling helper
104+
`controllerutil.MutateAndPatchStatus` (see the "Status Writes" section
105+
above).

cmd/thv-operator/controllers/mcpexternalauthconfig_controller.go

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -177,19 +177,13 @@ func (r *MCPExternalAuthConfigReconciler) handleConfigHashChange(
177177
logger.Info("Triggering reconciliation of MCPServer due to MCPExternalAuthConfig change",
178178
"mcpserver", server.Name, "externalAuthConfig", externalAuthConfig.Name)
179179

180-
// Use an optimistic-lock merge patch rather than Update so we do not
181-
// silently overwrite spec fields (e.g. spec.authzConfig) that are owned
182-
// by another controller. The resourceVersion is sent, so concurrent
183-
// writers cause a 409 Conflict and a clean requeue.
184-
original := server.DeepCopy()
185-
// Add an annotation to the MCPServer to trigger reconciliation
186-
if server.Annotations == nil {
187-
server.Annotations = make(map[string]string)
188-
}
189-
server.Annotations["toolhive.stacklok.dev/externalauthconfig-hash"] = configHash
190-
191-
if err := r.Patch(ctx, &server,
192-
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
180+
// Add an annotation to the MCPServer to trigger reconciliation.
181+
if err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, &server, func(m *mcpv1beta1.MCPServer) {
182+
if m.Annotations == nil {
183+
m.Annotations = make(map[string]string)
184+
}
185+
m.Annotations["toolhive.stacklok.dev/externalauthconfig-hash"] = configHash
186+
}); err != nil {
193187
logger.Error(err, "Failed to patch MCPServer annotation", "mcpserver", server.Name)
194188
// Continue with other servers even if one fails
195189
}

cmd/thv-operator/controllers/mcpserver_controller.go

Lines changed: 12 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -190,14 +190,9 @@ func (r *MCPServerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
190190
return ctrl.Result{}, err
191191
}
192192

193-
// Use an optimistic-lock merge patch rather than Update so we do not
194-
// silently overwrite spec fields (e.g. spec.authzConfig) that are owned
195-
// by another controller. The resourceVersion is sent, so concurrent
196-
// writers cause a 409 Conflict and a clean requeue.
197-
original := mcpServer.DeepCopy()
198-
controllerutil.RemoveFinalizer(mcpServer, MCPServerFinalizerName)
199-
if err := r.Patch(ctx, mcpServer,
200-
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
193+
if err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, mcpServer, func(m *mcpv1beta1.MCPServer) {
194+
controllerutil.RemoveFinalizer(m, MCPServerFinalizerName)
195+
}); err != nil {
201196
return ctrl.Result{}, err
202197
}
203198
}
@@ -206,14 +201,9 @@ func (r *MCPServerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
206201

207202
// Add finalizer for this CR
208203
if !controllerutil.ContainsFinalizer(mcpServer, MCPServerFinalizerName) {
209-
// Use an optimistic-lock merge patch rather than Update so we do not
210-
// silently overwrite spec fields (e.g. spec.authzConfig) that are owned
211-
// by another controller. The resourceVersion is sent, so concurrent
212-
// writers cause a 409 Conflict and a clean requeue.
213-
original := mcpServer.DeepCopy()
214-
controllerutil.AddFinalizer(mcpServer, MCPServerFinalizerName)
215-
if err := r.Patch(ctx, mcpServer,
216-
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
204+
if err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, mcpServer, func(m *mcpv1beta1.MCPServer) {
205+
controllerutil.AddFinalizer(m, MCPServerFinalizerName)
206+
}); err != nil {
217207
return ctrl.Result{}, err
218208
}
219209
}
@@ -759,17 +749,12 @@ func (r *MCPServerReconciler) handleRestartAnnotation(ctx context.Context, mcpSe
759749
}
760750

761751
// Update the last processed restart timestamp in annotations.
762-
// Use an optimistic-lock merge patch rather than Update so we do not
763-
// silently overwrite spec fields (e.g. spec.authzConfig) that are owned
764-
// by another controller. The resourceVersion is sent, so concurrent
765-
// writers cause a 409 Conflict and a clean requeue.
766-
original := mcpServer.DeepCopy()
767-
if mcpServer.Annotations == nil {
768-
mcpServer.Annotations = make(map[string]string)
769-
}
770-
mcpServer.Annotations[LastProcessedRestartAnnotationKey] = currentRestartedAt
771-
if err := r.Patch(ctx, mcpServer,
772-
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
752+
if err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, mcpServer, func(m *mcpv1beta1.MCPServer) {
753+
if m.Annotations == nil {
754+
m.Annotations = make(map[string]string)
755+
}
756+
m.Annotations[LastProcessedRestartAnnotationKey] = currentRestartedAt
757+
}); err != nil {
773758
return false, fmt.Errorf("failed to update MCPServer with last processed restart annotation: %w", err)
774759
}
775760

cmd/thv-operator/controllers/toolconfig_controller.go

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -159,18 +159,12 @@ func (r *ToolConfigReconciler) handleConfigHashChange(
159159
logger.Info("Triggering reconciliation of MCPServer due to MCPToolConfig change",
160160
"mcpserver", server.Name, "toolconfig", toolConfig.Name)
161161

162-
// Use an optimistic-lock merge patch rather than Update so we do not
163-
// silently overwrite spec fields (e.g. spec.authzConfig) that are owned
164-
// by another controller. The resourceVersion is sent, so concurrent
165-
// writers cause a 409 Conflict and a clean requeue.
166-
original := server.DeepCopy()
167-
if server.Annotations == nil {
168-
server.Annotations = make(map[string]string)
169-
}
170-
server.Annotations["toolhive.stacklok.dev/toolconfig-hash"] = configHash
171-
172-
if err := r.Patch(ctx, &server,
173-
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
162+
if err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, &server, func(m *mcpv1beta1.MCPServer) {
163+
if m.Annotations == nil {
164+
m.Annotations = make(map[string]string)
165+
}
166+
m.Annotations["toolhive.stacklok.dev/toolconfig-hash"] = configHash
167+
}); err != nil {
174168
logger.Error(err, "Failed to patch MCPServer annotation", "mcpserver", server.Name)
175169
}
176170
}

cmd/thv-operator/pkg/controllerutil/doc.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
// - podtemplatespec_builder.go: PodTemplateSpec builder for constructing pod template patches
1818
// - maps.go: Map comparison utilities (e.g. subset checks for annotations)
1919
// - status.go: Status-subresource merge-patch helper (MutateAndPatchStatus)
20+
// - patch.go: Spec/metadata optimistic-lock merge-patch helper (MutateAndPatchSpec)
2021
//
2122
// These utilities are used by multiple controllers including MCPServer, MCPRemoteProxy,
2223
// and ToolConfig controllers to maintain consistent behavior across the operator.
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
// SPDX-FileCopyrightText: Copyright 2025 Stacklok, Inc.
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
package controllerutil
5+
6+
import (
7+
"context"
8+
"fmt"
9+
"reflect"
10+
11+
"sigs.k8s.io/controller-runtime/pkg/client"
12+
)
13+
14+
// MutateAndPatchSpec captures the current state of obj, applies mutate, and
15+
// patches the object using a JSON merge patch with optimistic concurrency.
16+
// A concurrent writer that advances resourceVersion between our read and our
17+
// Patch triggers a 409 Conflict; controller-runtime then re-Gets, recomputes
18+
// the diff, and writes on a fresh view — preserving cross-writer coexistence
19+
// on the same resource.
20+
//
21+
// This is the canonical idiom for every spec or metadata write on a CR that
22+
// another controller may also write (see #4767). A full PUT (r.Update) is a
23+
// bug trap: any field the operator's local copy does not track — most
24+
// importantly spec.authzConfig on MCPServer, which a separate authorization
25+
// controller will own — is zeroed on every reconcile. A merge-patch body
26+
// only carries fields the caller actually changed, so untouched fields never
27+
// hit the wire and cannot be clobbered. MergeFromWithOptimisticLock sends
28+
// resourceVersion as a precondition, giving 409-on-collision semantics for
29+
// concurrent writers and defending metadata.finalizers (which has no
30+
// array-merge semantics under RFC 7396 merge-patch) against wholesale
31+
// replacement when another controller is mid-flight adding its own entry.
32+
//
33+
// Unlike MutateAndPatchStatus, this helper does NOT short-circuit on an
34+
// empty diff. MergeFromWithOptimisticLock always emits metadata.resourceVersion
35+
// into the patch body, so the status helper's "body == {}" check never fires;
36+
// and every current call site carries a real mutation (finalizer add/remove,
37+
// annotation stamp), so there is no no-op caller to optimize for.
38+
//
39+
// Do NOT use for status writes. Status-subresource writes are scoped to the
40+
// status stanza, and forcing a 409 on every disjoint-field overlap would
41+
// produce permanent churn with nothing gained — use MutateAndPatchStatus.
42+
//
43+
// If Patch returns an error, obj has already been mutated; callers must
44+
// re-fetch obj before retrying rather than reusing the modified in-memory
45+
// copy. The standard reconciler pattern — returning the error so
46+
// controller-runtime requeues with a fresh Get — is the correct retry path.
47+
//
48+
// Typical usage:
49+
//
50+
// err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, mcpServer,
51+
// func(m *mcpv1beta1.MCPServer) {
52+
// controllerutil.AddFinalizer(m, MCPServerFinalizerName)
53+
// })
54+
// if err != nil {
55+
// return ctrl.Result{}, err
56+
// }
57+
//
58+
// Expect 409s as routine log noise once external writers land — the guard
59+
// doing its job, not a bug.
60+
func MutateAndPatchSpec[T client.Object](
61+
ctx context.Context, c client.Client, obj T, mutate func(T),
62+
) error {
63+
// Reject both a true-nil interface and a typed-nil pointer. T is
64+
// constrained to client.Object; every real implementer is a pointer
65+
// to a struct, so a nil obj is always a programmer error. Returning
66+
// an explicit error is nicer than the raw panic that the subsequent
67+
// .(T) type assertion would produce.
68+
v := reflect.ValueOf(obj)
69+
if !v.IsValid() || (v.Kind() == reflect.Pointer && v.IsNil()) {
70+
return fmt.Errorf("MutateAndPatchSpec: obj must be non-nil")
71+
}
72+
original := obj.DeepCopyObject().(T)
73+
mutate(obj)
74+
return c.Patch(ctx, obj, client.MergeFromWithOptions(
75+
original, client.MergeFromWithOptimisticLock{}))
76+
}

0 commit comments

Comments
 (0)