-
Notifications
You must be signed in to change notification settings - Fork 72
🐛 Fix catalogd ha readiness #2674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tmshort
wants to merge
2
commits into
operator-framework:main
Choose a base branch
from
tmshort:fix-catalogd-ha-readiness
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| Feature: HA failover for catalogd | ||
|
|
||
| When catalogd is deployed with multiple replicas, the remaining pods must | ||
| elect a new leader and resume serving catalogs if the leader pod is lost. | ||
|
|
||
| Background: | ||
| Given OLM is available | ||
| And an image registry is available | ||
|
|
||
| @CatalogdHA | ||
| Scenario: Catalogd resumes serving catalogs after leader pod failure | ||
| Given a catalog "test" with packages: | ||
| | package | version | channel | replaces | contents | | ||
| | test | 1.0.0 | stable | | CRD, Deployment, ConfigMap | | ||
| And catalogd is ready to reconcile resources | ||
| And catalog "test" is reconciled | ||
| When the catalogd leader pod is force-deleted | ||
| Then a new catalogd leader is elected | ||
| And catalog "test" reports Serving as True with Reason Available |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| package steps | ||
|
|
||
| import ( | ||
| "context" | ||
| "fmt" | ||
| "strings" | ||
|
|
||
| "k8s.io/component-base/featuregate" | ||
| ) | ||
|
|
||
| // catalogdHAFeature gates scenarios that require a multi-node cluster. | ||
| // It is set to true in BeforeSuite when the cluster has at least 2 nodes, | ||
| // which is the case for the experimental e2e suite (kind-config-2node.yaml) | ||
| // but not the standard suite. | ||
| const catalogdHAFeature featuregate.Feature = "CatalogdHA" | ||
|
|
||
| // CatalogdLeaderPodIsForceDeleted force-deletes the catalogd leader pod to simulate leader loss. | ||
| // The pod is identified from sc.leaderPods["catalogd"] (populated by a prior | ||
| // "catalogd is ready to reconcile resources" step). Force-deletion is equivalent to | ||
| // an abrupt process crash: the lease is no longer renewed and the surviving pod | ||
| // acquires leadership after the lease expires. | ||
| // | ||
| // Note: stopping the kind node container is not used here because both nodes in the | ||
| // experimental 2-node cluster are control-plane nodes that run etcd — stopping either | ||
| // would break etcd quorum and make the API server unreachable for the rest of the test. | ||
| func CatalogdLeaderPodIsForceDeleted(ctx context.Context) error { | ||
| sc := scenarioCtx(ctx) | ||
| leaderPod := sc.leaderPods["catalogd"] | ||
| if leaderPod == "" { | ||
| return fmt.Errorf("catalogd leader pod not found in scenario context; run 'catalogd is ready to reconcile resources' first") | ||
| } | ||
|
|
||
| logger.Info("Force-deleting catalogd leader pod", "pod", leaderPod) | ||
| if _, err := k8sClient("delete", "pod", leaderPod, "-n", olmNamespace, | ||
| "--force", "--grace-period=0"); err != nil { | ||
| return fmt.Errorf("failed to force-delete catalogd leader pod %q: %w", leaderPod, err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // NewCatalogdLeaderIsElected polls the catalogd leader election lease until the holder | ||
| // identity changes to a pod other than the deleted leader. It updates | ||
| // sc.leaderPods["catalogd"] with the new leader pod name. | ||
| func NewCatalogdLeaderIsElected(ctx context.Context) error { | ||
| sc := scenarioCtx(ctx) | ||
| oldLeader := sc.leaderPods["catalogd"] | ||
|
|
||
| waitFor(ctx, func() bool { | ||
| holder, err := k8sClient("get", "lease", leaseNames["catalogd"], "-n", olmNamespace, | ||
| "-o", "jsonpath={.spec.holderIdentity}") | ||
| if err != nil || holder == "" { | ||
| return false | ||
| } | ||
| newPod := strings.Split(strings.TrimSpace(holder), "_")[0] | ||
| if newPod == oldLeader { | ||
| return false | ||
| } | ||
| sc.leaderPods["catalogd"] = newPod | ||
| logger.Info("New catalogd leader elected", "pod", newPod) | ||
| return true | ||
| }) | ||
| return nil | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.