Skip to content

Handle conflict errors on CertificateRequest updates#447

Open
ARichman555 wants to merge 1 commit into
mainfrom
fix-reconcile-errors
Open

Handle conflict errors on CertificateRequest updates#447
ARichman555 wants to merge 1 commit into
mainfrom
fix-reconcile-errors

Conversation

@ARichman555
Copy link
Copy Markdown
Contributor

Issue #446

Closes #446.

Reason for this change

When cert-manager updates a CertificateRequest (e.g., setting lastTransitionTime) concurrently with the aws-privateca-issuer updating the same resource, an optimistic locking conflict occurs. Previously this error bubbled up to controller-runtime, incrementing controller_runtime_reconcile_errors_total and causing false positive alerts.

Now conflict errors on Update trigger a requeue without returning an error, which is the standard Kubernetes pattern for handling optimistic locking conflicts.

Description of changes

  • Checks if the error received was a conflict error. If so, this has updated the logic to just re-queue the request and not surface the error to the controller.

Describe any new or updated permissions being added

N/A

Description of how you validated changes

Re-produced using customer provided steps. I was able to confirm seeing the conflict errors in the logs were not present after making this change

@cert-manager-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign irbekrm for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

When cert-manager updates a CertificateRequest (e.g., setting lastTransitionTime)
concurrently with the aws-privateca-issuer updating the same resource, an optimistic
locking conflict occurs. Previously this error bubbled up to controller-runtime,
incrementing controller_runtime_reconcile_errors_total and causing false positive alerts.

Now conflict errors on Update trigger a requeue without returning an error, which is
the standard Kubernetes pattern for handling optimistic locking conflicts.

Fixes: certificate renewal race condition causing spurious error metrics
Signed-off-by: Alex Richman <wrichman@amazon.com>
@cert-manager-prow
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Transient reconciler errors when AWS PCA certificate is renewed

1 participant