Skip to content

[FLINK-39509][flink-kubernetes-webhook] Fix mutating webhook producing spurious object modifications on no-op patches#1093

Merged
gyfora merged 1 commit into
apache:mainfrom
Dennis-Mircea:hotfix/flink-mutator
Apr 21, 2026
Merged

[FLINK-39509][flink-kubernetes-webhook] Fix mutating webhook producing spurious object modifications on no-op patches#1093
gyfora merged 1 commit into
apache:mainfrom
Dennis-Mircea:hotfix/flink-mutator

Conversation

@Dennis-Mircea
Copy link
Copy Markdown
Contributor

What is the purpose of the change

The mutating webhook's FlinkMutator always performs a Jackson convertValue serialization round-trip (HasMetadata -> typed class -> back) on every CREATE/UPDATE admission request. When no FlinkResourceMutator actually modifies the resource, the round-tripped object can serialize differently from the original input (e.g., field ordering, null handling), causing the webhook to return a JSON patch to the API server even though nothing was logically changed.

This leads to inconsistent behavior where kubectl patch reports patched instead of patched (no change) for resources like FlinkDeployment, even when the patch contains no effective change (e.g., setting restartNonce to its current value). While this does not trigger an actual reconciliation (the operator correctly detects no spec diff), it is confusing and causes unnecessary resourceVersion bumps on the Kubernetes object.

Brief change log

  • For each mutate* method in FlinkMutator, capture a snapshot of the typed object before running mutators via mapper.valueToTree(), then compare it against the state after mutators run
  • If nothing changed, return the original HasMetadata resource, bypassing the serialization round-trip entirely
  • Applied consistently to all three resource types: FlinkDeployment, FlinkSessionJob, FlinkStateSnapshot

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Manually verified the change by:

  • Deploying the operator with the fix to a minikube cluster
  • Patching a FlinkDeployment with an identical restartNonce value and confirming kubectl now correctly reports patched (no change)
  • Patching a FlinkSessionJob with an identical restartNonce value and confirming the same patched (no change) output
  • Verifying that patches with actual changes (new restartNonce values) still correctly report patched and trigger reconciliation

Concretely, this change can be manually verified the change by deploying the basic-session-deployment-and-job.yaml example and patching with an already-applied restartNonce value:

Before (without fix):

$ kubectl patch flinkdeployment basic-session-deployment-example --type=merge -p '{"spec":{"restartNonce": 1}}'
flinkdeployment.flink.apache.org/basic-session-deployment-example patched

After (with fix):

$ kubectl patch flinkdeployment basic-session-deployment-example --type=merge -p '{"spec":{"restartNonce": 1}}'
flinkdeployment.flink.apache.org/basic-session-deployment-example patched (no change)

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changes to the CustomResourceDescriptors: no
  • Core observer or reconciler logic that is regularly executed: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? no

@gyfora
Copy link
Copy Markdown
Contributor

gyfora commented Apr 21, 2026

can we please create a jira ticket and test for this?

@Dennis-Mircea Dennis-Mircea changed the title [hotfix][flink-kubernetes-webhook] Fix mutating webhook producing spurious object modifications on no-op patches [FLINK-39509][flink-kubernetes-webhook] Fix mutating webhook producing spurious object modifications on no-op patches Apr 21, 2026
@Dennis-Mircea
Copy link
Copy Markdown
Contributor Author

Dennis-Mircea commented Apr 21, 2026

can we please create a jira ticket and test for this?

Sure, I created FLINK-39509. About the tests, these are covered as part of the #1094 PR and FLINK-39508 JIRA. I suggest to proceed with the current PR as-is first.

@gyfora gyfora merged commit 24b0d99 into apache:main Apr 21, 2026
120 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants