test: e2e test for UIPlugin post-uninstall cascade cleanup (COO-1404)#1087
test: e2e test for UIPlugin post-uninstall cascade cleanup (COO-1404)#1087DavidRajnoha wants to merge 3 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis pull request adds test infrastructure and validation for UIPlugin cleanup behavior on OpenShift when operators are uninstalled via OLM. The changes introduce an OpenShift-conditional CSV container patching capability to the e2e test script, add a postpone-restoration flag for timing control in operator restoration sequences, implement comprehensive OLM cleanup with label selectors, and add a detailed uninstall test that validates cascade deletion of UIPlugin operands and absence of stale resources. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: DavidRajnoha The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
test/e2e/uiplugin_uninstall_test.go (2)
113-131: ⚡ Quick winSaved Subscription loses metadata like labels/annotations and
installPlanApprovalmay be insufficient on recreate.The cleanup recreates a
Subscriptionwith onlyName,Namespace, andSpec. OLM-related labels/annotations (e.g.,operators.coreos.com/...) and any cluster-specific metadata will be lost. The recreated Subscription may behave slightly differently than the original (e.g., different label selectors used bydelete_olm_subscriptioninrun-e2e.sh). If post-test cleanup or subsequent test runs rely on those labels, this could cause flakes.Consider capturing the full original (e.g.,
savedSub := sub.DeepCopy()and clearingResourceVersion/UID) so the recreated resource matches the original modulo server-managed fields.♻️ Suggested change
- savedSub := &olmv1alpha1.Subscription{ - ObjectMeta: metav1.ObjectMeta{ - Name: sub.Name, - Namespace: sub.Namespace, - }, - Spec: sub.Spec.DeepCopy(), - } + savedSub := sub.DeepCopy() + savedSub.ResourceVersion = "" + savedSub.UID = "" + savedSub.Status = olmv1alpha1.SubscriptionStatus{}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/uiplugin_uninstall_test.go` around lines 113 - 131, The cleanup recreates savedSub with only Name/Namespace/Spec which loses labels/annotations and other metadata; update the cleanup to store the full original Subscription by doing savedSub := sub.DeepCopy() (or similar) before modifications, then before calling f.K8sClient.Create clear server-managed fields like ResourceVersion, UID, CreationTimestamp and ManagedFields but keep labels, annotations and Spec (including installPlanApproval) intact so the recreated Subscription matches the original; reference savedSub in the Cleanup closure and call forceDeleteAllUIPlugins as before, then create the cleaned savedSub.
281-294: ⚡ Quick win
waitForResourceAbsentmutates the caller'sobjand swallows non-NotFound errors.Two subtle issues:
objis reused across poll iterations and is populated byGeton each success path — that's fine here because each call site passes a freshly allocated value, but it's a footgun for future callers.- Errors other than
NotFound(transient API server errors, RBAC, etc.) are silently treated as "not yet absent". A persistent error will only surface as a generic timeout viawait.Interrupted, making test failures harder to diagnose.Consider logging the last error before failing:
♻️ Suggested change
func waitForResourceAbsent(t *testing.T, name, namespace string, obj client.Object, timeout time.Duration) { t.Helper() key := client.ObjectKey{Name: name, Namespace: namespace} + var lastErr error err := wait.PollUntilContextTimeout(context.Background(), 5*time.Second, timeout, true, func(ctx context.Context) (bool, error) { - if err := f.K8sClient.Get(ctx, key, obj); apierrors.IsNotFound(err) { + err := f.K8sClient.Get(ctx, key, obj) + if apierrors.IsNotFound(err) { return true, nil } + lastErr = err return false, nil }) if wait.Interrupted(err) { kind := fmt.Sprintf("%T", obj) - t.Fatalf("%s %s/%s was not cleaned up (waited %v)", kind, namespace, name, timeout) + t.Fatalf("%s %s/%s was not cleaned up (waited %v, lastErr=%v)", kind, namespace, name, timeout, lastErr) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/uiplugin_uninstall_test.go` around lines 281 - 294, waitForResourceAbsent currently reuses the caller-supplied obj and swallows all non-NotFound errors; change it so each poll iteration allocates a fresh object instance (e.g., via obj.DeepCopyObject() / type-specific new object) and pass that to f.K8sClient.Get to avoid mutating the caller's obj, and surface non-NotFound errors by returning them from the poll function (i.e., return false, err) while recording the last error; on timeout (in wait.Interrupted(err)) include the captured last error in the t.Fatalf message so failures show the real API error instead of only a timeout.test/run-e2e.sh (1)
185-187: ⚡ Quick winUse an array for
extra_argsto handle empty/multi-value cases safely.Static analysis flags
$extra_argsas unquoted (SC2086). A safer pattern is an array so empty stays empty and any future multi-arg case is preserved.♻️ Suggested change
- local extra_args="" - [[ -n "$POSTPONE_RESTORATION" ]] && extra_args="-postpone-restoration=$POSTPONE_RESTORATION" - go test -v -timeout $TEST_TIMEOUT ./test/e2e/... -run "$RUN_REGEX" -count 1 -args -operatorInstallNS="$OPERATORS_NS" $extra_args | tee "$LOGS_DIR/e2e.log" || ret=1 + local -a extra_args=() + [[ -n "$POSTPONE_RESTORATION" ]] && extra_args+=("-postpone-restoration=$POSTPONE_RESTORATION") + go test -v -timeout "$TEST_TIMEOUT" ./test/e2e/... -run "$RUN_REGEX" -count 1 -args -operatorInstallNS="$OPERATORS_NS" "${extra_args[@]}" | tee "$LOGS_DIR/e2e.log" || ret=1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/run-e2e.sh` around lines 185 - 187, The current use of a scalar extra_args is unsafe; change it to an array (e.g., extra_args=()) and when POSTPONE_RESTORATION is set append the argument as an array element (e.g., extra_args+=("--postpone-restoration=$POSTPONE_RESTORATION")), then invoke the go test command passing the array with "${extra_args[@]}" alongside -operatorInstallNS="$OPERATORS_NS" to avoid word-splitting and handle empty or multi-value cases safely; update references to the extra_args variable in the go test invocation accordingly.pkg/operator/operator.go (1)
437-443: 💤 Low valueUse the manager's cached client where possible, or document why a fresh client is required.
Creating a new direct client at shutdown is fine if the manager's cache has already stopped serving reads, but it bypasses any client-side optimizations and may surprise future maintainers. Consider a brief comment explaining why
o.manager.GetClient()is not used here (e.g., cache is torn down by the timectx.Done()fires).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/operator.go` around lines 437 - 443, The shutdown code currently creates a fresh direct client via client.New (assigned to directClient) which bypasses the manager cache; either switch to using the manager's cached client by calling o.manager.GetClient() for shutdown cleanup, or add a concise comment above the client.New call explaining why a fresh client is required (e.g., the manager cache may already be stopped when ctx.Done() fires), referencing directClient, client.New, o.manager.GetClient(), and o.restConfig so future maintainers understand the choice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/operator.go`:
- Around line 396-400: The shutdown runnable added via
op.newShutdownCleanupRunnable() (registered with mgr.Add when
cfg.FeatureGates.OpenShift.Enabled is true) unconditionally calls
cleanupUIPluginsFromConsole on any shutdown; change it so it only runs when the
operator is actually being uninstalled: either remove the shutdown-time
deregistration entirely and rely on reconciliation/owner-ref GC, or modify
newShutdownCleanupRunnable()/cleanupUIPluginsFromConsole to first detect an
uninstall signal (for example, verify the UIPlugin CRD is no longer served via
discovery or the Console/operator CR shows a DeletionTimestamp or a specific
uninstall sentinel) and only proceed to patch the Console when that uninstall
condition is true; ensure the check is performed inside the runnable before
listing UIPlugin CRs so normal restarts (rolling updates, leader loss, SIGTERM)
do not remove Console entries.
---
Nitpick comments:
In `@pkg/operator/operator.go`:
- Around line 437-443: The shutdown code currently creates a fresh direct client
via client.New (assigned to directClient) which bypasses the manager cache;
either switch to using the manager's cached client by calling
o.manager.GetClient() for shutdown cleanup, or add a concise comment above the
client.New call explaining why a fresh client is required (e.g., the manager
cache may already be stopped when ctx.Done() fires), referencing directClient,
client.New, o.manager.GetClient(), and o.restConfig so future maintainers
understand the choice.
In `@test/e2e/uiplugin_uninstall_test.go`:
- Around line 113-131: The cleanup recreates savedSub with only
Name/Namespace/Spec which loses labels/annotations and other metadata; update
the cleanup to store the full original Subscription by doing savedSub :=
sub.DeepCopy() (or similar) before modifications, then before calling
f.K8sClient.Create clear server-managed fields like ResourceVersion, UID,
CreationTimestamp and ManagedFields but keep labels, annotations and Spec
(including installPlanApproval) intact so the recreated Subscription matches the
original; reference savedSub in the Cleanup closure and call
forceDeleteAllUIPlugins as before, then create the cleaned savedSub.
- Around line 281-294: waitForResourceAbsent currently reuses the
caller-supplied obj and swallows all non-NotFound errors; change it so each poll
iteration allocates a fresh object instance (e.g., via obj.DeepCopyObject() /
type-specific new object) and pass that to f.K8sClient.Get to avoid mutating the
caller's obj, and surface non-NotFound errors by returning them from the poll
function (i.e., return false, err) while recording the last error; on timeout
(in wait.Interrupted(err)) include the captured last error in the t.Fatalf
message so failures show the real API error instead of only a timeout.
In `@test/run-e2e.sh`:
- Around line 185-187: The current use of a scalar extra_args is unsafe; change
it to an array (e.g., extra_args=()) and when POSTPONE_RESTORATION is set append
the argument as an array element (e.g.,
extra_args+=("--postpone-restoration=$POSTPONE_RESTORATION")), then invoke the
go test command passing the array with "${extra_args[@]}" alongside
-operatorInstallNS="$OPERATORS_NS" to avoid word-splitting and handle empty or
multi-value cases safely; update references to the extra_args variable in the go
test invocation accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 573f555c-d1e9-43a2-86be-22dfc1e3f0fc
📒 Files selected for processing (8)
hack/dev-deploy.shpkg/controllers/uiplugin/controller.gopkg/controllers/uiplugin/plugin_info_builder.gopkg/operator/operator.gotest/e2e/framework/framework.gotest/e2e/main_test.gotest/e2e/uiplugin_uninstall_test.gotest/run-e2e.sh
f8dce61 to
cda37d2
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/aaa_uiplugin_uninstall_test.go`:
- Around line 259-261: The test only fails when both foundCSV and foundSub are
nil; change the guard to require both resources be present before proceeding by
failing if either is missing (i.e., replace the current conditional on foundCSV
and foundSub with a check that errors when foundCSV == nil || foundSub == nil)
so the uninstall cannot run with a partial lookup and leave the cluster in a bad
state; update the t.Fatal message to reflect that both Subscription and CSV are
required.
- Around line 271-276: The polling helpers waitForResourceGone and
waitForResourceAbsent currently swallow all non-NotFound errors by returning
(false, nil) in the wait.PollUntilContextTimeout callback; change the callback
so that when f.K8sClient.Get(ctx, key, obj) returns a non-nil error that is not
apierrors.IsNotFound(err) you return (false, err) to propagate the failure to
the poller (instead of retrying silently). Also adjust the callers of
waitForResourceGone / waitForResourceAbsent to check for and handle the returned
error from wait.PollUntilContextTimeout (or from the helper) rather than
treating everything as a timeout/interruption.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c4abc1c6-2091-4fbe-8236-42f2fc2232ab
📒 Files selected for processing (3)
test/e2e/aaa_uiplugin_uninstall_test.gotest/e2e/main_test.gotest/run-e2e.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- test/e2e/main_test.go
- test/run-e2e.sh
| if foundCSV == nil && foundSub == nil { | ||
| t.Fatal("Could not find COO Subscription or CSV — operator may not be installed via OLM") | ||
| } |
There was a problem hiding this comment.
Require both Subscription and CSV before proceeding with uninstall.
This currently only fails when both are missing. If one lookup fails, the test can perform a partial uninstall and skip restoration setup, which can leave the cluster in a bad state for later tests.
Suggested fix
- if foundCSV == nil && foundSub == nil {
- t.Fatal("Could not find COO Subscription or CSV — operator may not be installed via OLM")
- }
+ if foundCSV == nil || foundSub == nil {
+ t.Fatalf(
+ "Could not find both COO Subscription and CSV (foundCSV=%t, foundSub=%t) — refusing partial uninstall",
+ foundCSV != nil,
+ foundSub != nil,
+ )
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if foundCSV == nil && foundSub == nil { | |
| t.Fatal("Could not find COO Subscription or CSV — operator may not be installed via OLM") | |
| } | |
| if foundCSV == nil || foundSub == nil { | |
| t.Fatalf( | |
| "Could not find both COO Subscription and CSV (foundCSV=%t, foundSub=%t) — refusing partial uninstall", | |
| foundCSV != nil, | |
| foundSub != nil, | |
| ) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/aaa_uiplugin_uninstall_test.go` around lines 259 - 261, The test
only fails when both foundCSV and foundSub are nil; change the guard to require
both resources be present before proceeding by failing if either is missing
(i.e., replace the current conditional on foundCSV and foundSub with a check
that errors when foundCSV == nil || foundSub == nil) so the uninstall cannot run
with a partial lookup and leave the cluster in a bad state; update the t.Fatal
message to reflect that both Subscription and CSV are required.
| err := wait.PollUntilContextTimeout(context.Background(), 5*time.Second, timeout, true, func(ctx context.Context) (bool, error) { | ||
| if err := f.K8sClient.Get(ctx, key, obj); apierrors.IsNotFound(err) { | ||
| return true, nil | ||
| } | ||
| return false, nil | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n test/e2e/aaa_uiplugin_uninstall_test.go | sed -n '260,300p'Repository: rhobs/observability-operator
Length of output: 2053
Propagate non-NotFound API errors from polling helpers instead of silently swallowing them.
Both waitForResourceGone (line 272) and waitForResourceAbsent (line 285) check only for IsNotFound errors and ignore all other errors. Any transient API failure, RBAC denial, or network error triggers return false, nil, which causes the polling loop to retry silently until timeout. This masks real failures and misreports them as cleanup timeouts.
Restructure the error handling to return (false, err) for non-NotFound errors, allowing the polling mechanism to immediately surface the actual error:
if err := f.K8sClient.Get(ctx, key, obj); err != nil {
if apierrors.IsNotFound(err) {
return true, nil
}
return false, err
}
return false, nilThen update the callers to handle the propagated error appropriately instead of relying solely on wait.Interrupted.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| err := wait.PollUntilContextTimeout(context.Background(), 5*time.Second, timeout, true, func(ctx context.Context) (bool, error) { | |
| if err := f.K8sClient.Get(ctx, key, obj); apierrors.IsNotFound(err) { | |
| return true, nil | |
| } | |
| return false, nil | |
| }) | |
| err := wait.PollUntilContextTimeout(context.Background(), 5*time.Second, timeout, true, func(ctx context.Context) (bool, error) { | |
| if err := f.K8sClient.Get(ctx, key, obj); err != nil { | |
| if apierrors.IsNotFound(err) { | |
| return true, nil | |
| } | |
| return false, err | |
| } | |
| return false, nil | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/aaa_uiplugin_uninstall_test.go` around lines 271 - 276, The polling
helpers waitForResourceGone and waitForResourceAbsent currently swallow all
non-NotFound errors by returning (false, nil) in the
wait.PollUntilContextTimeout callback; change the callback so that when
f.K8sClient.Get(ctx, key, obj) returns a non-nil error that is not
apierrors.IsNotFound(err) you return (false, err) to propagate the failure to
the poller (instead of retrying silently). Also adjust the callers of
waitForResourceGone / waitForResourceAbsent to check for and handle the returned
error from wait.PollUntilContextTimeout (or from the helper) rather than
treating everything as a timeout/interruption.
Adds TestUIPluginUninstallCleanup to verify that deleting a UIPlugin CR after the operator has been uninstalled via OLM correctly cascade-deletes all child resources (Deployments, Services, ServiceAccounts, ClusterRoles, ClusterRoleBindings, pods) through Kubernetes garbage collection. Per OLM design, uninstalling an operator (deleting CSV + Subscription) does NOT remove CRDs or CRs. The admin is expected to delete CRs manually. This test verifies that when they do, the OwnerReference chain works without the operator running. Before the finalizer fix: UIPlugin CR gets stuck in Terminating forever. After the fix: CR deletes immediately, GC cascades to all children. Also adds: - --postpone-restoration flag to run-e2e.sh for manual cluster inspection - --openshift flag forwarding for UIPlugin tests on OpenShift clusters Co-authored-by: Cursor <cursoragent@cursor.com>
Rename to aaa_uiplugin_uninstall_test.go so TestUIPluginUninstallCleanup runs before all other tests. Verifies that Subscription restoration works and subsequent tests are not affected by the destructive uninstall. Temporary experiment — revert after CI confirms.
- Set InstallPlanApproval to Automatic on recreated Subscription (operator-sdk run bundle uses Manual, so OLM was waiting forever for approval) - Wait for CSV to reach Succeeded phase before proceeding - Re-enable --openshift.enabled=true on reinstalled CSV (the flag is added by enable_openshift() at deploy time and lost on reinstall) - Always wait for operator deployment readiness (the previous isLastRegisteredTest check was buggy — alphabetical vs file order) - Remove test registration mechanism (registerTest/isLastRegisteredTest) Co-authored-by: Cursor <cursoragent@cursor.com>
fa2ba91 to
bf0060a
Compare
Summary
TestUIPluginUninstallCleanupe2e test that verifies deleting a UIPlugin CR after the operator has been uninstalled via OLM correctly cascade-deletes all child resources through Kubernetes garbage collection (OwnerReferences).Also adds:
CapturePodLogsandFindPodByLabelhelpers to the e2e framework for operator shutdown diagnostics--postpone-restorationflag torun-e2e.shfor delayed Subscription restoration (allows manual cluster inspection)--openshiftflag support inrun-e2e.shfor UIPlugin tests on OpenShift clustersDependencies
Test plan
TestUIPluginUninstallCleanupon OCP cluster with the finalizer fix — should pass (CR deletes immediately, children cascade-deleted)TestUIPluginUninstallCleanupon OCP cluster without the finalizer fix — should fail at "Waiting for UIPlugin CR to be fully deleted" (finalizer blocks deletion)Supersedes #1083.
Made with Cursor