Skip to content

🐛 OCPBUGS-62942: Fix ClusterExtension deletion when BoxcutterRuntime is enabled#2299

Closed
tmshort wants to merge 3 commits intooperator-framework:mainfrom
tmshort:fix-OCPBUGS-62942
Closed

🐛 OCPBUGS-62942: Fix ClusterExtension deletion when BoxcutterRuntime is enabled#2299
tmshort wants to merge 3 commits intooperator-framework:mainfrom
tmshort:fix-OCPBUGS-62942

Conversation

@tmshort
Copy link
Copy Markdown
Contributor

@tmshort tmshort commented Nov 3, 2025

Summary

Fixes OCPBUGS-62942 where ClusterExtensions cannot be deleted after enabling the BoxcutterRuntime feature gate when the original catalog is no longer available.

Problem

When BoxcutterRuntime feature gate is enabled after ClusterExtensions have been installed, attempting to delete those ClusterExtensions fails with:

error walking catalogs: error getting package "X" from catalog "Y": cache for catalog "Y" not found

Root Cause

The controller's reconcile loop attempted to resolve bundles even when a ClusterExtension was being deleted. If the original catalog was no longer available (deleted or cache cleared), this resolution would fail and prevent deletion from completing.

Solution

Move the deletion timestamp check in the reconcile() function to occur immediately after finalizer error handling. This ensures that resolution, unpacking, and installation are skipped entirely when a ClusterExtension is being deleted, regardless of whether finalizers have been updated.

Changes

  1. Controller Fix: Modified clusterextension_controller.go to check for deletion timestamp before attempting resolution
  2. Test Coverage: Added TestClusterExtensionDeletionWithUnavailableCatalog to verify the fix
  3. Documentation: Added specification document in contrib/spec-OCPBUGS-62942.md

Testing

  • ✅ All existing unit tests pass
  • ✅ New regression test verifies deletion succeeds when catalog is unavailable
  • ✅ Code formatting verified with make fmt

Related Issues

  • Fixes: OCPBUGS-62942

🤖 Generated with Claude Code via /jira:solve OCPBUGS-62942 origin

Co-Authored-By: Claude noreply@anthropic.com

tmshort and others added 3 commits November 3, 2025 15:24
Add detailed specification document explaining the root cause and
solution for OCPBUGS-62942, where ClusterExtensions cannot be deleted
when the BoxcutterRuntime feature gate is enabled and catalogs are
unavailable.

The specification documents:
- Problem statement and root cause analysis
- Solution design with code flow analysis
- Implementation steps and testing plan
- Risks and mitigations

🤖 Generated with [Claude Code](https://claude.com/claude-code) via /jira:solve OCPBUGS-62942 origin

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
Move the deletion timestamp check to occur immediately after finalizer
error handling in the reconcile loop. This ensures that resolution,
unpacking, and installation are skipped entirely when a ClusterExtension
is being deleted, regardless of whether finalizers have been updated.

This fixes OCPBUGS-62942 where ClusterExtensions could not be deleted
after enabling the BoxcutterRuntime feature gate, because the controller
would attempt to resolve bundles even during deletion. If the original
catalog was no longer available (deleted or cache cleared), this would
fail with "cache for catalog not found" error and prevent deletion.

The fix ensures that when a ClusterExtension has a deletion timestamp,
the reconcile loop returns early without attempting any resolution or
installation operations.

Fixes: OCPBUGS-62942

🤖 Generated with [Claude Code](https://claude.com/claude-code) via /jira:solve OCPBUGS-62942 origin

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
Add TestClusterExtensionDeletionWithUnavailableCatalog to verify that
when a ClusterExtension is being deleted, the reconcile loop does not
attempt resolution even if the catalog is unavailable (which would cause
a "cache for catalog not found" error).

This test ensures that both the RevisionStatesGetter and Resolver are
not called during deletion, preventing errors that would block deletion
when catalogs are unavailable.

The test creates a ClusterExtension with a finalizer, deletes it, and
verifies that reconciliation succeeds without calling the resolver or
revision states getter, both of which are configured to return errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code) via /jira:solve OCPBUGS-62942 origin

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2025
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Nov 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tmshort for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link
Copy Markdown

netlify Bot commented Nov 3, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 0dd2057
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/69090fd64bb4ec0008ba9cd6
😎 Deploy Preview https://deploy-preview-2299--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@tmshort tmshort changed the title OCPBUGS-62942: Fix ClusterExtension deletion when BoxcutterRuntime is enabled 🐛 OCPBUGS-62942: Fix ClusterExtension deletion when BoxcutterRuntime is enabled Nov 3, 2025
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Nov 3, 2025

Closing this PR - the fix doesn't actually change the logic. Just reordering two early-return checks that both check for the same condition (deletion) doesn't fix anything. Need to investigate the actual root cause more carefully.

@tmshort tmshort closed this Nov 3, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.24%. Comparing base (18142b3) to head (0dd2057).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2299      +/-   ##
==========================================
- Coverage   74.32%   74.24%   -0.09%     
==========================================
  Files          90       90              
  Lines        7008     7008              
==========================================
- Hits         5209     5203       -6     
- Misses       1392     1395       +3     
- Partials      407      410       +3     
Flag Coverage Δ
e2e 45.93% <100.00%> (-0.04%) ⬇️
experimental-e2e 48.16% <100.00%> (-0.07%) ⬇️
unit 58.83% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant