OCPCLOUD-3507: Add OTE e2e tests for ClusterAPIMachineManagement feature gate#606
OCPCLOUD-3507: Add OTE e2e tests for ClusterAPIMachineManagement feature gate#606pmeida wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@pmeida: This pull request references OCPCLOUD-3507 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughA new feature-gated ordered e2e suite verifies Cluster API operator and controller deployment health, CRD establishment, and the presence of the management cluster kubeconfig Secret. New framework constants are added for Cluster API operator-related names. ChangesCluster API Machine Management E2E Suite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/ok-to-test |
|
To confirm if e2e tests go trough |
|
@pmeida: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
@pmeida: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-single-node-techpreview-serial openshift/origin#31307 |
|
@pmeida: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7d80c200-6ef1-11f1-8a6b-01a933de5987-0 |
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-single-node-techpreview openshift/origin#31307 |
|
@pmeida: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f9d50680-6f15-11f1-876b-e0aef72228f3-0 |
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-single-node-techpreview openshift/origin#31307 |
|
@pmeida: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0da6c440-6f17-11f1-88d3-1d650fa5fcee-0 |
|
/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview openshift/origin#31307 |
|
@pmeida: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ccc72ee0-6f18-11f1-8b1e-eefb4a048f63-0 |
OTE run validation. Has to be done like this until we get the origin PR merged. Afterwards, we can also see OTE tests run through the |
from: periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview All OTE tests passing ✅ |
| deployment := &appsv1.Deployment{} | ||
| Eventually(func(g Gomega) { | ||
| g.Expect(cl.Get(ctx, client.ObjectKey{ | ||
| Name: "capi-operator", | ||
| Namespace: framework.CAPIOperatorNamespace, | ||
| }, deployment)).To(Succeed()) | ||
| g.Expect(deployment.Status.Conditions).To(ContainElement(SatisfyAll( | ||
| HaveField("Type", Equal(appsv1.DeploymentAvailable)), | ||
| HaveField("Status", Equal(corev1.ConditionTrue)), | ||
| ))) | ||
| }).WithTimeout(framework.WaitMedium).WithPolling(framework.RetryMedium).Should(Succeed()) |
There was a problem hiding this comment.
Please can you write these so that the test assertion is done by Eventually, not inside the closure. This means we get a much better error message if it fails. Something like:
| deployment := &appsv1.Deployment{} | |
| Eventually(func(g Gomega) { | |
| g.Expect(cl.Get(ctx, client.ObjectKey{ | |
| Name: "capi-operator", | |
| Namespace: framework.CAPIOperatorNamespace, | |
| }, deployment)).To(Succeed()) | |
| g.Expect(deployment.Status.Conditions).To(ContainElement(SatisfyAll( | |
| HaveField("Type", Equal(appsv1.DeploymentAvailable)), | |
| HaveField("Status", Equal(corev1.ConditionTrue)), | |
| ))) | |
| }).WithTimeout(framework.WaitMedium).WithPolling(framework.RetryMedium).Should(Succeed()) | |
| Eventually(func(g Gomega) *appsv1.Deployment { | |
| // N.B. This closure can be replaced by komega.Object if you import that | |
| deployment := &appsv1.Deployment{} | |
| err := cl.Get(ctx, client.ObjectKey{ | |
| Name: "capi-operator", | |
| Namespace: framework.CAPIOperatorNamespace, | |
| }, deployment) | |
| if err != nil { | |
| return nil, err | |
| } | |
| }).WithTimeout(framework.WaitMedium).WithPolling(framework.RetryMedium).Should( | |
| HaveField("Status.Conditions", test.HasCondition(appsv1.DeploymentAvailable)).WithStatus(corev1.ConditionTrue)) | |
| ) |
The repo has a Condition matcher which you should also use. I'll have used it incorrectly here, but it's something like that.
The advantage of this structure is that Gomega has the full context when it fails and can produce a better error.
There was a problem hiding this comment.
I think I implemented the requested with a slight difference.
pkg/test.HaveCondition - can't import pkg/test from e2e/*.go files. The chain e2e → pkg/test → pkg/providerimages → manifests-gen causes go mod tidy to fail during make vendor because manifests-gen has a placeholder version (v0.0.0-00010101000000-000000000000) that doesn't exist in any registry.
Reference: OTE binary (go build ./openshift-tests-extension/cmd/) imports e2e/ as a regular package → any regular .go file in e2e/ is compiled into the binary → if those files import pkg/test, go mod tidy traces the chain to manifests-gen and fails. _test.go files are ignored by go build so importing there is fine.
The go work vendor path works, but make vendor runs go mod tidy per-module first and hits this wall.
I used HaveField + ContainElement + SatisfyAll which produces equivalent failure output.
The fix could be to move provider_fixtures.go (the only file in pkg/test that pulls in providerimages) to its own sub-package - all its callers are _test.go files so the impact is contained.
Happy to do that as a follow-up if that's preferred.
| Context("Provider images & initialization", func() { | ||
| It("should have the capi-installer-images ConfigMap present", func() { | ||
| cm := &corev1.ConfigMap{} | ||
| Expect(cl.Get(ctx, client.ObjectKey{ | ||
| Name: framework.CAPIInstallerImagesConfigMapName, | ||
| Namespace: framework.CAPIOperatorNamespace, | ||
| }, cm)).To(Succeed()) | ||
| Expect(cm.Data).NotTo(BeEmpty()) | ||
| }) | ||
| }) |
There was a problem hiding this comment.
I would prefer we did not test this in an e2e. It's an implementation detail which may change without affecting the readiness of the component. And if it was not present, all the other e2es would also fail anyway. I think this is noise.
There was a problem hiding this comment.
You are absolutely right!
I was trying to extend the tests to 10. So I proposed this in the PR waiting for possible negative feedback.
5 tests are enough to fulfil the promotion requirements.
| }).WithTimeout(framework.WaitMedium).WithPolling(framework.RetryMedium).Should(Succeed()) | ||
| }) | ||
|
|
||
| It("should have the cluster-api ClusterOperator reporting healthy", func() { |
There was a problem hiding this comment.
Do we need this? I guess it doesn't hurt, but a failure here would definitely trigger other failures. I wonder if it's redundant. @simkam thoughts?
There was a problem hiding this comment.
I kept it for now. A deployment being Available and a ClusterOperator reporting healthy are two different signals. It also is the primary user-visible health indicator for the stack, so verifying the operator writes it correctly seemed worth the test.
what do you think @simkam
| It("should have platform-specific infrastructure CRDs installed", func() { | ||
| cluster := &clusterv1.Cluster{} | ||
| Expect(cl.Get(ctx, client.ObjectKey{Name: clusterName, Namespace: framework.CAPINamespace}, cluster)).To(Succeed()) | ||
|
|
||
| mapping, err := cl.RESTMapper().RESTMapping(cluster.Spec.InfrastructureRef.GroupKind()) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
| crdName := mapping.Resource.Resource + "." + mapping.Resource.Group | ||
| crd := &apiextensionsv1.CustomResourceDefinition{} | ||
|
|
||
| By("Fetching the infrastructure CRD " + crdName) | ||
| Expect(cl.Get(ctx, client.ObjectKey{Name: crdName}, crd)).To(Succeed()) | ||
|
|
||
| By("Verifying the infrastructure CRD is established") | ||
| Expect(crd.Status.Conditions).To(ContainElement(SatisfyAll( | ||
| HaveField("Type", Equal(apiextensionsv1.Established)), | ||
| HaveField("Status", Equal(apiextensionsv1.ConditionTrue)), | ||
| ))) | ||
| }) |
There was a problem hiding this comment.
We MUST NOT test this here. It is not controlled by this feature gate. This is controlled by multiple feature-specific gates, e.g. ClusterAPIMachineManagementAWS.
When we do test it, the tests should be specific to a platform and hard-code the expected names rather than infer them through heuristics.
There was a problem hiding this comment.
again doing an abstracted test for the same reasons as above
now removed
…ure gate Add 10 non-disruptive health-check tests registered with the OTE binary that validate the Cluster API stack is correctly installed and operational when the ClusterAPIMachineManagement feature gate is enabled. Tests provide the Sippy signal required for feature gate promotion. Tests (all blocking, capio/parallel): - capi-operator deployment available in openshift-cluster-api-operator - cluster-api ClusterOperator reporting Available, not Degraded, not Progressing - capi-controllers deployment available in openshift-cluster-api - capi-installer deployment available in openshift-cluster-api-operator - capi-installer-images ConfigMap present in openshift-cluster-api-operator - Core CAPI CRDs (clusters, machines, machinesets) installed and established - Platform-specific infrastructure CRDs installed (derived from Cluster.Spec.InfrastructureRef) - Management Cluster object present with InfrastructureReady=True - Platform-specific infrastructure cluster object present (AWSCluster etc.) - Management cluster kubeconfig Secret present Platform-specific tests derive the infra Kind from the live Cluster object's InfrastructureRef rather than a hardcoded platform map, so no code changes are needed when new platforms are added. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
/test e2e-aws-capi-techpreview |
|
/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview openshift/origin#31307 |
|
@pmeida: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d19fcd00-70ad-11f1-9de7-ebc4da7c8a95-0 |
|
ci/prow/e2e-aws-capi-techpreview flake addressed in #610 |
|
/test e2e-aws-capi-techpreview |
|
new flake addressed in #611 |
|
/test e2e-aws-capi-techpreview |
|
@pmeida: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |

Summary
Adds 6 non-disruptive health-check e2e tests registered with the OTE binary that validate the Cluster API stack is correctly installed and operational when the
ClusterAPIMachineManagementfeature gate is enabled. These tests provide the Sippy signal required for feature gate promotion.Tests
All tests are
blocking,capio/parallel → openshift/conformance/parallel:Design notes
InfrastructureReadycheck usesWaitLong(15 min) since infra reconciliation can be slow on degraded clustersbin/cluster-capi-operator-tests-ext run-suite capio/paralleland viamake e2eDependencies
Summary by CodeRabbit