*: proposed follow-ups after cluster deletion investigation#8422
*: proposed follow-ups after cluster deletion investigation#8422stevekuznetsov wants to merge 3 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (19)
💤 Files with no reviewable changes (17)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughIntroduces a new exported ConditionType Sequence Diagram(s)sequenceDiagram
participant HC as HostedCluster Controller
participant API as API Status
participant Resources as Cloud Resources (CAPI, Endpoint, PSC, HCP)
participant NS as Namespace (Control Plane)
HC->>API: Ensure HostedClusterDeleting = False (AsExpectedReason)
Note over HC,Resources: Deletion sequence starts
HC->>Resources: Wait for CAPI deletion
HC->>API: setDeletionProgress(DeletionWaitingForCAPIClusterDeletion, msg)
HC->>Resources: Wait for Endpoint Service deletion
HC->>API: setDeletionProgress(DeletionWaitingForEndpointServiceDeletion, msg)
HC->>Resources: Wait for GCP Private Connect deletion
HC->>API: setDeletionProgress(DeletionWaitingForPrivateConnectDeletion, msg)
HC->>Resources: Wait for HostedControlPlane deletion
HC->>API: setDeletionProgress(DeletionWaitingForControlPlaneDeletion, msg)
HC->>NS: Fetch Namespace status & blocking conditions
alt Namespace deletion blocked
HC->>API: setDeletionProgress(DeletionWaitingForNamespaceDeletion, detailed msg)
else Namespace deletion timed out
HC->>API: update CloudResourcesDestroyed (CloudResourcesDeletionTimedOutReason)
else Namespace removed
HC->>API: setDeletionProgress(DeletionCompleted, "Deletion completed")
end
🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: stevekuznetsov The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
When the CPO decides to give up on deleting cloud resources, it emits a log but otherwise does not expose this behavior through conditions, making it hard to know that this happened. Signed-off-by: Steve Kuznetsov <stekuznetsov@microsoft.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 2434-2439: The log message in hostedcontrolplane_controller.go
inside the condition checking resourcesDestroyedCond (where Status ==
metav1.ConditionFalse and Reason is CloudResourcesCleanupSkippedReason or
CloudResourcesDeletionTimedOutReason) is always "Cleanup has been skipped" which
is incorrect for the timeout case; update the log in that block (the code around
resourcesDestroyedCond handling in the Reconcile function) to inspect
resourcesDestroyedCond.Reason and log a precise terminal message such as
"Cleanup has been skipped" when Reason == CloudResourcesCleanupSkippedReason and
"Cleanup timed out" (or similar) when Reason ==
CloudResourcesDeletionTimedOutReason, ensuring you still include
resourcesDestroyedCond.Message for context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: ed0cae26-f1bf-447e-b7f1-e65403530ad0
⛔ Files ignored due to path filters (1)
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (20)
api/hypershift/v1beta1/hostedcluster_conditions.gocontrol-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.gocontrol-plane-operator/controllers/gcpprivateserviceconnect/observer.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/drainer/drainer.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/globalps/globalps.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/inplaceupgrader/inplaceupgrader.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/node/node.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/nodecount/controller.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/registry/admissionpolicies.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedclustersizing/hostedclustersizing_validation_controller.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/platform/aws/controller.gohypershift-operator/controllers/scheduler/aws/autoscaler.gohypershift-operator/controllers/scheduler/azure/controller.go
💤 Files with no reviewable changes (17)
- hypershift-operator/controllers/scheduler/azure/controller.go
- hypershift-operator/controllers/nodepool/nodepool_controller.go
- control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go
- control-plane-operator/hostedclusterconfigoperator/controllers/node/node.go
- control-plane-operator/hostedclusterconfigoperator/controllers/drainer/drainer.go
- control-plane-operator/hostedclusterconfigoperator/controllers/inplaceupgrader/inplaceupgrader.go
- hypershift-operator/controllers/scheduler/aws/autoscaler.go
- control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/registry/admissionpolicies.go
- control-plane-operator/controllers/gcpprivateserviceconnect/observer.go
- hypershift-operator/controllers/hostedclustersizing/hostedclustersizing_validation_controller.go
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
- control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go
- control-plane-operator/hostedclusterconfigoperator/controllers/nodecount/controller.go
- hypershift-operator/controllers/platform/aws/controller.go
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.go
- control-plane-operator/hostedclusterconfigoperator/controllers/globalps/globalps.go
| // check if cleanup has been skipped or timed out | ||
| if resourcesDestroyedCond != nil && resourcesDestroyedCond.Status == metav1.ConditionFalse && | ||
| resourcesDestroyedCond.Reason == string(hyperv1.CloudResourcesCleanupSkippedReason) { | ||
| (resourcesDestroyedCond.Reason == string(hyperv1.CloudResourcesCleanupSkippedReason) || | ||
| resourcesDestroyedCond.Reason == string(hyperv1.CloudResourcesDeletionTimedOutReason)) { | ||
| log.Info("Cleanup has been skipped", "reason", resourcesDestroyedCond.Message) | ||
| return true, nil |
There was a problem hiding this comment.
Make terminal-state log message accurate for timeout path.
This branch now handles both skipped and timed-out cleanup, but the log still says cleanup was skipped, which can mislead triage.
Suggested patch
- log.Info("Cleanup has been skipped", "reason", resourcesDestroyedCond.Message)
+ log.Info("Cloud resource cleanup reached terminal state",
+ "reason", resourcesDestroyedCond.Reason,
+ "message", resourcesDestroyedCond.Message)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
around lines 2434 - 2439, The log message in hostedcontrolplane_controller.go
inside the condition checking resourcesDestroyedCond (where Status ==
metav1.ConditionFalse and Reason is CloudResourcesCleanupSkippedReason or
CloudResourcesDeletionTimedOutReason) is always "Cleanup has been skipped" which
is incorrect for the timeout case; update the log in that block (the code around
resourcesDestroyedCond handling in the Reconcile function) to inspect
resourcesDestroyedCond.Reason and log a precise terminal message such as
"Cleanup has been skipped" when Reason == CloudResourcesCleanupSkippedReason and
"Cleanup timed out" (or similar) when Reason ==
CloudResourcesDeletionTimedOutReason, ensuring you still include
resourcesDestroyedCond.Message for context.
d0aaacb to
e63a899
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8422 +/- ##
==========================================
+ Coverage 37.23% 37.37% +0.13%
==========================================
Files 752 751 -1
Lines 91829 91842 +13
==========================================
+ Hits 34195 34324 +129
+ Misses 54993 54882 -111
+ Partials 2641 2636 -5
... and 32 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Instead of putting deletion progress into logs, where it's hard to notice and spammy, present a user-facing condition that tracks the phases. While doing that, improve the Namespace deletion code to be more verbose. No top-level Namespace deletion condition seems to exist, so cherry-pick a few we know are useful to highlight. We can always add more later. Signed-off-by: Steve Kuznetsov <stekuznetsov@microsoft.com>
These log messages provide no clear information and are emitted unconditionally, leading any log period from their controllers to be full of them. Signed-off-by: Steve Kuznetsov <stekuznetsov@microsoft.com>
e63a899 to
debced8
Compare
|
@stevekuznetsov: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@csrwng: Closed this PR. |
control-plane-operator: signal that we're giving up on deletion
When the CPO decides to give up on deleting cloud resources, it emits a
log but otherwise does not expose this behavior through conditions,
making it hard to know that this happened.
Signed-off-by: Steve Kuznetsov stekuznetsov@microsoft.com
hypershift-operator: emit deletion progress condition
Instead of putting deletion progress into logs, where it's hard to
notice and spammy, present a user-facing condition that tracks the
phases. While doing that, improve the Namespace deletion code to be more
verbose. No top-level Namespace deletion condition seems to exist, so
cherry-pick a few we know are useful to highlight. We can always add
more later.
Signed-off-by: Steve Kuznetsov stekuznetsov@microsoft.com
*: remove low-quality logs
These log messages provide no clear information and are emitted
unconditionally, leading any log period from their controllers to be
full of them.
Signed-off-by: Steve Kuznetsov stekuznetsov@microsoft.com
Summary by CodeRabbit
New Features
Bug Fixes
Chores