chore: patch metadata-only pod updates by weicao · Pull Request #10398 · apecloud/kubeblocks

weicao · 2026-06-17T16:58:13Z

Problem

Targets #10369.

This PR is stacked on #10394 so the resource-comparison fix stays separate from the config-hash/status convergence fix.

After a successful reconfigure, the remaining in-place Pod diff can be only metadata such as the config-hash annotation. Planning that as a full Pod update can race with concurrent status writes and leave the config-hash/status stale, which keeps the OpsRequest running even though the database-side reconfigure action returned OK.

The previous head 04f6d7e90770ed07d32a9c9b9435cab45782692d used a normal-update-before-resize approach and failed the Redis Parameters x replication-twemproxy focused runtime gate. This head replaces that attempt with the narrower metadata-only patch path.

Changes

Add an explicit ObjectTree option for planning PATCH actions.
Use PATCH for safe metadata-only in-place Pod updates in the InstanceSet and Instance reconcilers.
Keep resource-only changes on the /resize subresource path.
Keep non-metadata Pod changes on the existing switchover/full-update path.
Add kubebuilderx plan-builder coverage and update the focused InstanceSet test to assert metadata-only PATCH planning.

Tests

go test ./pkg/controller/kubebuilderx ./pkg/controller/instance ./pkg/controller/instanceset -count=1
git diff --check origin/bugfix/instanceset-resize-resource-compare..HEAD
CI for head 31908d0cb7aa64c79c882e6ee71dfbce0e668516: green / expected-skip, including make-test and push-pre-check (test).

Runtime status

Previous head 04f6d7e90770ed07d32a9c9b9435cab45782692d: exact-image Redis Parameters x replication-twemproxy focused gate failed, PASS 14 / FAIL 1 / SKIP 0; OpsRequest stayed Running; Component/InstanceSet status and config-hash did not converge.
Current head 31908d0cb7aa64c79c882e6ee71dfbce0e668516: exact-image Redis Parameters x replication-twemproxy focused gate passed, PASS 34 / FAIL 0 / SKIP 0.
Runtime identity for the passing run: local sideload image built from this head, imagePullPolicy: Never, manager pod kubeblocks-6dfbccf448-8vbpw, live imageID sha256:80435fad50cb92d42df0aaff1bb7a23a8650559d44ff8c4f0bbb1f7ae15ff523, restartCount 0.
Convergence evidence: 3 reconfigure OpsRequests reached Succeed; Cluster rds-params-twemproxy reached Running; Redis Component generation/observedGeneration reached 5/5; Redis InstanceSet generation/observedGeneration reached 4/4 with ready/updated/available replicas 2/2 and configHash 6b4845ccf7.
Data-path evidence: both Redis pods reported maxmemory-policy=allkeys-lru, hz=50, and databases=32; dynamic changes kept pod UIDs unchanged; the static databases change rolled pods as expected; twemproxy SET/GET and replica readback passed.
Cleanup evidence: namespace redis-params-twemproxy-r12-31908-r1 was NotFound after cleanup and PV residue was empty.

Boundaries

Ready for human review for this stacked delta.
Stacked on fix: converge omitted-request resize resources #10394; final merge path still depends on accepting, rebasing, or retargeting the base chain.
This is patch-version focused validation for Redis Parameters x replication-twemproxy only, not a Redis addon release-ready claim or a broader matrix result.

When a desired pod template sets CPU or memory limits but omits requests, Kubernetes can preserve a live request value defaulted from an older limit. The in-place resource comparison should not infer a desired request from the desired limit in that case; otherwise a post-resize reconcile keeps reporting a request difference even though the desired template does not own that request. Compare CPU and memory requests only when the desired template explicitly sets that request. Continue to compare limits normally.

codecov · 2026-06-17T17:10:47Z

Codecov Report

❌ Patch coverage is 79.54545% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.90%. Comparing base (00dc1b7) to head (31908d0).

Files with missing lines	Patch %	Lines
pkg/controller/instance/reconciler_update.go	0.00%	7 Missing ⚠️
pkg/controller/instanceset/reconciler_update.go	71.42%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10398      +/-   ##
==========================================
+ Coverage   61.87%   61.90%   +0.02%     
==========================================
  Files         533      533              
  Lines       63609    63623      +14     
==========================================
+ Hits        39360    39384      +24     
+ Misses      20661    20651      -10     
  Partials     3588     3588

Flag	Coverage Δ
unittests	`61.90% <79.54%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

weicao · 2026-06-17T17:39:00Z

Runtime validation for head 04f6d7e90770ed07d32a9c9b9435cab45782692d is FAIL.

Exact image / controller identity:

Image: kubeblocks:pr-10398-04f6d7e-stella with imagePullPolicy: Never.
Image tar sha256 used for sideload: dbb5ca2ab49bbeedbacf8f0d63d46d8d414a0f76071c791912bc53adf32aca2e.
Live manager pod: kubeblocks-7dcfbd744c-5k278, imageID sha256:73f857a01e42c780257ac4755dd01b26bed8591e50b15e71f43c52bb4a98d8a3, restartCount 0, startTime 2026-06-17T17:07:29Z.
The tag was imported on all 3 vcluster nodes.

Focused gate result:

Redis Parameters x replication-twemproxy: PASS 14 / FAIL 1 / SKIP 0.
Failed case: A02 dynamic maxmemory-policy.
OpsRequest rds-params-twemproxy-reconf-87783-13901 stayed Running for 900s with WaitForProgressing=True, Validated=True, and ReconfigureStarted=True.
Cluster rds-params-twemproxy stayed Updating.
Component rds-params-twemproxy-redis stayed Updating, Progressing=True, reason WorkloadNotUpdated.
InstanceSet rds-params-twemproxy-redis had generation 2, but status observedGeneration stayed 1; pod config-hash status stayed on old 7fd467d678 while desired spec config hash was 75d4889cd7.
Redis data pods stayed Running with restartCount 0, and the pod-side udf-reconfigure-cmpd-redis-replication-config action returned OK.
Controller logs show the component controller patched/updated the InstanceSet at 2026-06-17T17:13:25Z, then the InstanceSet controller repeatedly reported successful pod reconfigure for configHash 75d4889cd7 without status convergence.

First-blocker classification: control plane. The setup, exact image identity, runner wait, and probe target are sufficient for this focused result; the Redis product/action path is not the first blocker in this run.

Boundary: #10398 remains draft. This head does not unblock the Redis addon runtime gate or #2819; a follow-up controller fix is required before another exact-image validation.

After a successful reconfigure, the remaining pod diff can be only config-hash metadata. A full Pod update can conflict with concurrent kubelet/status writes and leave the annotation stale, which keeps the OpsRequest running even after the runtime config changed. Add an explicit ObjectTree patch option and use it for metadata-only in-place pod updates in the InstanceSet and Instance reconcilers.

weicao · 2026-06-17T17:54:15Z

Updated this draft PR to a narrower follow-up head: 31908d0cb7aa64c79c882e6ee71dfbce0e668516.

What changed from the failed 04f6d7e head:

Removed the normal-update-before-resize approach.
Added ObjectTree PATCH planning and use it only for safe metadata-only in-place Pod updates.
Kept resource-only changes on /resize and non-metadata Pod changes on the existing switchover/full-update path.

Local checks for the new head:

go test ./pkg/controller/kubebuilderx ./pkg/controller/instance ./pkg/controller/instanceset -count=1 PASS.
git diff --check origin/bugfix/instanceset-resize-resource-compare..HEAD clean.

Boundary: this is still draft. The previous exact-image runtime run failed, and the new head still needs CI plus exact-image Redis Parameters x replication-twemproxy validation before it can be treated as a candidate fix for #10369.

weicao · 2026-06-17T18:41:07Z

Runtime validation for head 31908d0cb7aa64c79c882e6ee71dfbce0e668516 is PASS.

Exact controller identity:

Local sideload image built from this head with imagePullPolicy: Never.
Live manager pod: kubeblocks-6dfbccf448-8vbpw.
Live imageID: sha256:80435fad50cb92d42df0aaff1bb7a23a8650559d44ff8c4f0bbb1f7ae15ff523.
restartCount 0.

Focused gate result:

Redis Parameters x replication-twemproxy: PASS 34 / FAIL 0 / SKIP 0.
A02 dynamic maxmemory-policy: OpsRequest Succeed, ConfigMap updated, both Redis pods reported CONFIG maxmemory-policy=allkeys-lru, pod UIDs unchanged, twemproxy SET/GET passed.
A03 dynamic hz: OpsRequest Succeed, ConfigMap updated, both Redis pods reported CONFIG hz=50, pod UIDs unchanged, twemproxy SET/GET passed.
A04 static databases: OpsRequest Succeed, ConfigMap updated, both Redis pods reported CONFIG databases=32, pod UIDs changed as expected for rolling restart, twemproxy SET/GET passed.
A05 replica readback passed.
A06 cleanup passed: namespace redis-params-twemproxy-r12-31908-r1 was NotFound and PV residue was empty.

Convergence evidence:

Cluster rds-params-twemproxy reached Running.
Redis Component generation/observedGeneration reached 5/5.
Redis InstanceSet generation/observedGeneration reached 4/4.
InstanceSet status shows readyReplicas 2, updatedReplicas 2, availableReplicas 2, and configHash 6b4845ccf7 on both Redis pods.

Acceptance classification: patch-version focused validation passed for the Redis Parameters x replication-twemproxy gate. No first blocker was observed in this focused run.

Boundary: this validates the current #10398 metadata-only PATCH path for this #10369-focused gate, but it is not a Redis addon release-ready claim or a broader matrix result. The PR is still stacked on #10394, so the final merge path depends on accepting, rebasing, or retargeting the base chain.

The base branch was changed.

leon-ape · 2026-06-18T05:52:57Z

-		}
-		if !equalField(oc.Resources.Requests, realRequests) {
-			return false
+		for _, resourceName := range []corev1.ResourceName{corev1.ResourceCPU, corev1.ResourceMemory} {


[P1] Preserve explicit non-CPU/memory resource changes. VerticalScaling validation still accepts hugepages-* resources, but this loop now compares only CPU and memory requests, and equalField for ResourceList also ignores non-CPU/memory limits. A desired pod that changes a hugepages-* request/limit can therefore be treated as equal, so getPodUpdatePolicy returns noOps and the scaling operation can converge without applying the requested resource change. Keep the new omitted CPU/memory request handling scoped to CPU/memory defaulting, but continue comparing explicit non-CPU/memory requests/limits or reject them before this path. The same issue exists in pkg/controller/instance/in_place_update_utils.go.

weicao force-pushed the bugfix/instanceset-normal-update-before-resize branch from 04f6d7e to 31908d0 Compare June 17, 2026 17:50

weicao changed the title ~~fix: commit pod metadata before resize~~ fix: patch metadata-only pod updates Jun 17, 2026

weicao marked this pull request as ready for review June 17, 2026 18:41

weicao requested review from a team and leon-ape as code owners June 17, 2026 18:41

weicao mentioned this pull request Jun 17, 2026

fix(redis): rework reconfigure action to file-based diff with freshness guard apecloud/kubeblocks-addons#2819

Closed

leon-ape previously approved these changes Jun 18, 2026

View reviewed changes

apecloud-bot added the approved PR Approved Test label Jun 18, 2026

leon-ape added nopick Not auto cherry-pick when PR merged and removed approved PR Approved Test labels Jun 18, 2026

leon-ape changed the title ~~fix: patch metadata-only pod updates~~ chore: patch metadata-only pod updates Jun 18, 2026

leon-ape changed the base branch from bugfix/instanceset-resize-resource-compare to main June 18, 2026 05:50

github-actions Bot added the size/L Denotes a PR that changes 100-499 lines. label Jun 18, 2026

leon-ape reviewed Jun 18, 2026

View reviewed changes

leon-ape closed this Jun 18, 2026

github-actions Bot added this to the Release 1.2.0 milestone Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: patch metadata-only pod updates#10398

chore: patch metadata-only pod updates#10398
weicao wants to merge 2 commits into
mainfrom
bugfix/instanceset-normal-update-before-resize

weicao commented Jun 17, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

weicao commented Jun 17, 2026

Uh oh!

weicao commented Jun 17, 2026

Uh oh!

weicao commented Jun 17, 2026

Uh oh!

leon-ape Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

weicao commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Tests

Runtime status

Boundaries

Uh oh!

codecov Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

weicao commented Jun 17, 2026

Uh oh!

weicao commented Jun 17, 2026

Uh oh!

weicao commented Jun 17, 2026

Uh oh!

leon-ape Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

weicao commented Jun 17, 2026 •

edited

Loading

codecov Bot commented Jun 17, 2026 •

edited

Loading