Skip to content

fix(rbg): use SSA Apply with distinct field managers on RBG writes#309

Open
sebest wants to merge 1 commit into
sgl-project:mainfrom
sebest:fix/rbg-ssa-field-manager
Open

fix(rbg): use SSA Apply with distinct field managers on RBG writes#309
sebest wants to merge 1 commit into
sgl-project:mainfrom
sebest:fix/rbg-ssa-field-manager

Conversation

@sebest
Copy link
Copy Markdown
Contributor

@sebest sebest commented Apr 26, 2026

Summary

Three RBG-controller write paths used full-object client.Update or MergeFrom patches on RoleBasedGroup, which the API server records under the default "manager" field manager (the operator binary is /manager). On any chart-value bump that touched a passively co-owned field (containers, env, volumes, tolerations, ...), Helm v4 SSA Apply with ForceConflicts=false aborted with field_conflict — even though the RBG controller never authoritatively writes those sibling fields.

Replace the three writes with narrow SSA Apply calls and assign each its own field manager:

Apply path Field manager Claim
ensureDiscoveryConfigMode (RBG controller) rbg-discovery metadata.annotations.discovery-config-mode
updateRoleReplicas (RBGSA controller) rbg-replicas spec.roles[].{name, replicas}
updateExistingRBGs (RBGSet controller) rbg-set-sync metadata.{labels, annotations}, spec.roles

A shared "rbg" field manager across these three paths would not work. Per SSA claim-release semantics, when the same field manager re-applies with a narrower claim, fields it previously owned but no longer claims are released and (if no other manager owns them) removed by the API server. With three Apply paths claiming disjoint subsets of the same RBG, a shared field manager makes them ping-pong — each Apply silently strips the others' writes, hot-looping the controller. Distinct field managers give each path an independent ownership boundary.

pkg/utils.PatchObjectApplyConfigurationWithFieldManager is the new field-manager-aware variant; PatchObjectApplyConfiguration remains a thin wrapper that defaults to the legacy "rbg" field manager — preserved for callers that don't share the RBG main subresource with other rbg-controller Apply paths (RBG status writes, RBGSA writes, ConfigMap writes — all on different objects/subresources, no ping-pong risk).

The retry.RetryOnConflict wrappers from the previous Update paths are dropped: SSA Apply with Force=true does not 409 on field-manager conflicts, and no resourceVersion is sent in the apply payload, so version-skew conflicts do not arise either.

Stripping pre-existing legacy "manager" Update entries on already-deployed resources is intentionally out of scope; that is the responsibility of the upstream SSA writer (e.g. via csaupgrade.UpgradeManagedFields or a one-off managedFields strip).

Test plan

  • go build ./... clean
  • go vet ./... clean
  • gofmt -l clean
  • go test ./internal/controller/workloads/... ./pkg/utils/... green (incl. new tests for each Apply path: TestUpdateRoleReplicas, TestBuildSyncedMetadata, TestToRoleSpecApplyConfigurations, TestRoleBasedGroupSetReconciler_updateExistingRBGs_SSA, extended TestEnsureDiscoveryConfigMode)
  • End-to-end on kind cluster:
    • Apply RBG via kubectl apply --server-side --field-manager=external-actor with scalingAdapter.enable=true, scale auto-created RBGSA to drive updateRoleReplicas, apply parent RBGSet to drive updateExistingRBGs.
    • Verify generation is stable (would hot-loop at ~20 Hz with a shared field manager).
    • Verify discovery-config-mode annotation persists alongside external-actor's annotations.
    • Verify external-actor retains ownership of containers/env/image/resources/scalingAdapter.
    • Verify kubectl get rbg --show-managed-fields shows three independent rbg-discovery, rbg-replicas, rbg-set-sync managers each owning only its narrow claim.
    • Verify external-actor SSA chart bump of HEDWIG_CLIENT_ID.value (the original failure mode) with --force-conflicts=false applies cleanly.

Follow-up

#310 — RBGSet child RBGs carry a residual manager Update entry from client.Create in newRBGForSet. Predates this PR and doesn't affect the i8s-bridge scenario (which deploys standalone RBGs, not RBGSets), but worth fixing for hygiene.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the RoleBasedGroup, RoleBasedGroupScalingAdapter, and RoleBasedGroupSet controllers to use Server-Side Apply (SSA) for resource updates, replacing previous update and patch mechanisms that relied on manual retry logic. Key changes include the introduction of a JSON-based conversion helper for complex spec fields and refactored metadata synchronization. Review feedback correctly identifies that the new SSA configurations in the RoleBasedGroup and RoleBasedGroupSet controllers are missing required Kind and APIVersion fields, which would lead to failures when the API server attempts to identify the target resource.

Comment thread internal/controller/workloads/rolebasedgroup_controller.go
Comment thread internal/controller/workloads/rolebasedgroupset_controller.go
@sebest sebest force-pushed the fix/rbg-ssa-field-manager branch from 762f04a to e03df62 Compare April 26, 2026 18:10
@sebest sebest marked this pull request as draft April 26, 2026 18:55
@sebest sebest force-pushed the fix/rbg-ssa-field-manager branch from 31795e0 to 7c4d584 Compare April 26, 2026 20:06
@sebest sebest closed this Apr 26, 2026
@sebest sebest reopened this Apr 26, 2026
Three RBG-controller write paths used full-object client.Update or
MergeFrom patches on RoleBasedGroup, which the API server records under
the default "manager" field manager (binary name is /manager). On any
chart-value bump that touched a passively co-owned field (containers,
env, volumes, tolerations, ...), Helm v4 SSA Apply with
ForceConflicts=false aborted with field_conflict — even though the RBG
controller never authoritatively writes those sibling fields.

Replace the three writes with narrow SSA Apply calls and assign each its
own field manager:

  - ensureDiscoveryConfigMode → "rbg-discovery"
    Applies only metadata.annotations.discovery-config-mode.
  - updateRoleReplicas (RBGSA controller) → "rbg-replicas"
    Applies only spec.roles[].{name, replicas}.
  - updateExistingRBGs (RBGSet controller) → "rbg-set-sync"
    Applies metadata.{labels, annotations} and spec.roles.

A single shared "rbg" field manager would not work here. SSA's
claim-release semantics: when the same field manager re-applies with a
narrower claim, fields it previously owned but no longer claims are
released and (if no other manager owns them) removed by the API server.
With three Apply paths claiming disjoint subsets of the same RBG, a
shared field manager makes them ping-pong — each Apply silently strips
the others' writes, hot-looping the controller. Distinct field managers
give each path an independent ownership boundary.

The retry.RetryOnConflict wrappers from the previous Update path are
dropped: SSA Apply with Force=true does not 409 on field-manager
conflicts, and no resourceVersion is sent in the apply payload, so
version-skew conflicts do not arise either.

pkg/utils.PatchObjectApplyConfigurationWithFieldManager is the new
field-manager-aware variant; PatchObjectApplyConfiguration remains a
thin wrapper that defaults to the legacy "rbg" field manager for callers
that don't share the RBG main subresource with other rbg-controller
Apply paths (RBG status writes, RBGSA writes, ConfigMap writes — all on
different objects/subresources, no ping-pong risk).

Stripping pre-existing legacy "manager" Update entries on already-
deployed resources is intentionally out of scope; that is the
responsibility of the upstream SSA writer (e.g. via
csaupgrade.UpgradeManagedFields or a one-off managedFields strip).

Validated end-to-end on kind: applied an RBG via
'kubectl apply --server-side --field-manager=external-actor' with
scalingAdapter.enable=true, scaled the auto-created RBGSA to drive
updateRoleReplicas, and applied a parent RBGSet to drive
updateExistingRBGs. Generation is stable (would hot-loop at ~20 Hz with
a shared field manager), the discovery-config-mode annotation persists
alongside external-actor's annotations, external-actor retains
ownership of containers/env/image/resources/scalingAdapter, and
managedFields shows three independent rbg-* field managers each owning
only its narrow claim.
@sebest sebest force-pushed the fix/rbg-ssa-field-manager branch from 7c4d584 to 3b1ad5a Compare April 26, 2026 21:13
@sebest sebest changed the title fix(rbg): use SSA Apply for spec writes to avoid claiming "manager" field manager fix(rbg): use SSA Apply with distinct field managers on RBG writes Apr 26, 2026
@sebest sebest marked this pull request as ready for review April 26, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant