Adding toggle that removes finalizers post-delete#1347
Adding toggle that removes finalizers post-delete#1347
Conversation
|
@chapmanc How urgent is this to review could this wait until next week? |
|
@david-yu - I talked to Chappie about this already, it's not super urgent, he's just trying to self-serve on this feature to make things in cloud a bit easier in case something goes awry in cleaning up clusters. Seems like in the past they have had issues where the resources they've created haven't been deleted properly before uninstalling the operator and when they delete the namespace it sees orphaned resources and can't cascade delete them due to the finalizers. Since finalizers are a safe-guard against leaving around side-effects that really should be cleaned up, I'm going to call this additional flag a "feature" not a "bug" since the behavior of leaving around CR instances with finalizers post uninstall of the operator is intended (and, at least in my experience, fairly common). Definitely fine with making an option for finalizer removal (or even CR + CRD removal as well) for those who really want it, but think it should definitely be opt-in. |
|
Fixed some lint issues for you, hopefully should pass all checks. |
andrewstucki
left a comment
There was a problem hiding this comment.
A few changes, also, if possible I'd like an acceptance test to verify the finalizer removal/namespace deletion unblock. LMK if you need help with that.
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
Addressed all review feedback from @andrewstucki in 379591f:
|
|
Added a Without this, CI's |
Adds a new `finalizerRemoval.enabled` boolean (default false) to the operator chart values. This will gate a pre-delete helm hook job that strips finalizers from all operator-managed CRs on uninstall. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds SA name helper and SA object for the upcoming pre-delete finalizer removal job, following the same pattern as MigrationJobServiceAccount. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a pre-delete helm hook Job that runs the finalizer-removal subcommand on helm uninstall when finalizerRemoval.enabled=true. Mirrors the pattern from PostUpgradeMigrationJob (f1112cb). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds RBAC bundle (reusing operator's v2-manager rules) for the pre-delete finalizer removal job and wires the job + SA into the chart render function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the finalizer-removal CLI subcommand that lists all operator-managed CR types (RedpandaRole, User, Group, Topic, Schema, ShadowLink, Redpanda, NodePool) across all namespaces and removes the operator finalizer from each. If a CRD is not installed, the error is logged and skipped gracefully. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the finalizer-removal CLI subcommand that discovers all CR types in the operator's API groups (cluster.redpanda.com, redpanda.vectorized.io) from the scheme at runtime, then lists and removes the operator finalizer from each. New types are automatically picked up without code changes. If a CRD is not installed the error is logged and that type is skipped gracefully so the job succeeds even in partial-install scenarios. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regenerates templates, schema, and partial values struct to include the new finalizerRemoval opt-in field and pre-delete hook job. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If a namespace is being deleted, the controller no longer adds the operator finalizer to CRs. Previously a re-created CR in a terminating namespace would have a finalizer added, blocking namespace deletion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies that the reconciler does not add the operator finalizer when the object's namespace has DeletionTimestamp set. envtest has no namespace controller so deleting a namespace leaves its objects intact, exactly modelling the window the check guards against. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Running post-delete means the operator Deployment is already gone before finalizers are stripped, eliminating the race where the controller could re-add them. A failed job no longer blocks the uninstall since chart resources are already deleted by the time the hook runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Swap import order of corev1 and apierrors in resource_controller.go to satisfy goimports, and add missing finalizerRemoval entries to the operator chart README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch to post-delete hook (already done) and delete stale _pre-delete-finalizer-removal-job.go.tpl template - Use crds.All() instead of scheme reflection to discover operator CRD types for finalizer removal - Remove terminating namespace check from resource controller as the post-delete hook is the proper mechanism for orphaned CR cleanup - Remove corresponding terminating namespace test - Fix all pre-delete -> post-delete references in values.yaml, README, changelog, and package doc - Add envtest-based tests for finalizer removal covering orphaned CRs, multiple CR types, missing CRDs, and end-to-end deletion unblock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set spec.cluster.clusterRef.name on test CRs so they pass CRD validation (Topic/User require cluster or kafkaApiSpec) - Remove trailing blank line in resource_controller_test.go that caused git diff --exit-code lint failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a TestTemplate test case that exercises the finalizerRemoval toggle, ensuring the post-delete Job, ServiceAccount, ClusterRole, and ClusterRoleBinding are rendered correctly when enabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
231674d to
63ffeb6
Compare
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Two related fixes for orphaned CRs with finalizers blocking namespace/operator teardown:
Pre-delete finalizer removal job (finalizerRemoval.enabled=false by default)
Adds an opt-in pre-delete helm hook that strips operator.redpanda.com/finalizer from all operator-managed CRs before helm uninstall completes. Discovers CR types dynamically from the scheme — no code changes needed when new types are added.
Skip finalizer addition in terminating namespaces
The reconciler now checks if the CR's namespace has DeletionTimestamp set before adding the operator finalizer. Prevents the controller from re-adding finalizers to objects in a namespace that's already being deleted, which was the root cause of the blocking behavior.