OCPBUGS-92817: delete MAPI MachineSets before CAPI in e2e cleanup#611
OCPBUGS-92817: delete MAPI MachineSets before CAPI in e2e cleanup#611pmeida wants to merge 1 commit into
Conversation
When a test creates both a CAPI MachineSet and a MAPI MachineSet with the same name and authoritativeAPI: ClusterAPI, the sync controller manages deletion through reconcileCAPItoMAPIMachineSetDeletionNormal. Deleting CAPI first causes the sync controller to issue deletion to MAPI and then loop waiting for the CAPI-specific finalizer (cluster.x-k8s.io/machineset) to be removed. The sync controller's constant requeues conflict with the CAPI controller's finalizer removal patch, causing a deadlock. Deleting MAPI first triggers reconcileCAPItoMAPIMachineSetDeletionCAPINotDeleting which removes the sync finalizer from CAPI immediately. The CAPI MachineSet can then be deleted cleanly with only its own finalizer to manage. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@pmeida: This pull request references Jira Issue OCPBUGS-92817, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@pmeida: This pull request references Jira Issue OCPBUGS-92817, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Warning Review limit reached
More reviews will be available in 45 minutes and 4 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
This fixes the test's cleanup order to use the supported deletion path (MAPI first). When MAPI is deleted first, The root cause of the deadlock is in |
|
/test e2e-aws-capi-techpreview |
|
/assign @theobarberbany |
|
@pmeida: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@pmeida: This pull request references Jira Issue OCPBUGS-92817, which is valid. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary
Fixes a deadlock in
cleanupMachineSetTestResourcesthat causese2e-aws-capi-techpreviewto fail with a 15-minute timeout.When a test creates both a CAPI MachineSet and a MAPI MachineSet with the same name and
authoritativeAPI: ClusterAPI, deleting CAPI first causes the sync controller to loop inreconcileCAPItoMAPIMachineSetDeletionNormal- it waits for the CAPI-specific finalizer (cluster.x-k8s.io/machineset) to be removed, but its own constant requeues conflict with the CAPI controller's finalizer removal patch, deadlocking cleanup.Deleting MAPI first instead triggers
reconcileCAPItoMAPIMachineSetDeletionCAPINotDeleting, which removes the sync finalizer from CAPI immediately. The CAPI MachineSet can then be deleted cleanly with no sync interference.Test plan
e2e-aws-capi-techpreviewpasses without theShould have deleted MachineSet openshift-cluster-api/capi-ms-auth-capi-*timeoutFixes: https://issues.redhat.com/browse/OCPBUGS-92817