Skip to content

fix(e2e): topology-rbac and kubernetes-rbac tests in k8s jobs#4656

Open
sanketpathak wants to merge 1 commit intoredhat-developer:mainfrom
sanketpathak:fix-K8s-jobs-topology-tests
Open

fix(e2e): topology-rbac and kubernetes-rbac tests in k8s jobs#4656
sanketpathak wants to merge 1 commit intoredhat-developer:mainfrom
sanketpathak:fix-K8s-jobs-topology-tests

Conversation

@sanketpathak
Copy link
Copy Markdown
Contributor

Description

Please explain the changes you made here.
Fixing E2E Failures on K8s jobs in topology-rbac and kubernetes-rbac tests

Which issue(s) does this PR fix

https://redhat.atlassian.net/browse/RHDHBUGS-2817

  • Fixes #?

PR acceptance criteria

Please make sure that the following steps are complete:

  • GitHub Actions are completed and successful
  • Unit Tests are updated and passing
  • E2E Tests are updated and passing
  • Documentation is updated if necessary (requirement for new features)
  • Add a screenshot if the change is UX/UI related

How to test changes / Special notes to the reviewer

@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge Bot commented Apr 21, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (1)

Grey Divider


Action required

1. OCP Tekton webhook race 🐞 Bug ☼ Reliability ⭐ New
Description
cluster_setup_ocp_helm() no longer waits for the OpenShift Pipelines/Tekton webhook
deployment+endpoint to be ready after operator::install_pipelines(), but Tekton
Pipeline/PipelineRun manifests are applied shortly after. This can make oc apply fail
intermittently (webhook not ready), leaving topology/kubernetes RBAC E2E tests without the expected
resources.
Code

.ci/pipelines/utils.sh[R480-482]

cluster_setup_ocp_helm() {
  operator::install_pipelines
-
-  # Wait for OpenShift Pipelines to be ready before proceeding
-  log::info "Waiting for OpenShift Pipelines to be ready..."
-  k8s_wait::deployment "${OPERATOR_NAMESPACE}" "pipelines" 30 10 || return 1
-  k8s_wait::endpoint "${TEKTON_PIPELINES_WEBHOOK}" "openshift-pipelines" 1800 10 || return 1
-
  operator::install_postgres_ocp
Relevance

⭐⭐⭐ High

Team previously fixed webhook/CRD race flakiness by adding endpoint/CRD waits; same Tekton webhook
readiness issue.

PR-#3314
PR-#3843

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
operator::install_pipelines() only waits for CRDs and explicitly states the caller must still wait
for deployment readiness and the tekton-pipelines-webhook endpoint; cluster_setup_ocp_helm() now
proceeds immediately without those waits. Shortly after cluster setup, base_deployment() calls
apply_yaml_files(), which applies Tekton Pipeline/PipelineRun manifests (including topology
test resources), which are subject to Tekton's validating webhook being ready.

.ci/pipelines/utils.sh[480-497]
.ci/pipelines/lib/operators.sh[89-109]
.ci/pipelines/utils.sh[524-535]
.ci/pipelines/utils.sh[348-367]
.ci/pipelines/resources/topology_test/topology-test.yaml[1-31]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
On OpenShift jobs, `cluster_setup_ocp_helm()` (and similarly `cluster_setup_ocp_operator()`) calls `operator::install_pipelines()` but no longer waits for Tekton/OpenShift Pipelines webhook readiness. Tekton resources are applied soon after, which can fail if the webhook service/endpoints aren’t up yet.

### Issue Context
`operator::install_pipelines()` currently waits for CRDs only and documents that the calling script should still wait for deployment readiness and webhook endpoint availability.

### Fix Focus Areas
- .ci/pipelines/utils.sh[480-497]
- .ci/pipelines/lib/operators.sh[89-109]

### Suggested fix
Either:
1) Re-introduce the explicit waits in `cluster_setup_ocp_helm()` and `cluster_setup_ocp_operator()` right after `operator::install_pipelines`:
  - `k8s_wait::deployment "${OPERATOR_NAMESPACE}" "pipelines" 30 10 || return 1`
  - `k8s_wait::endpoint "${TEKTON_PIPELINES_WEBHOOK}" "openshift-pipelines" 1800 10 || return 1`

or
2) Move those waits into `operator::install_pipelines()` itself so all callers are safe by default.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. test.fixme disabled in kubernetes-rbac 📎 Requirement gap ☼ Reliability
Description
The PR removes the non-OpenShift skip gate for kubernetes-rbac.spec.ts without adding any
handling/assertion to ensure Kubernetes objects load successfully (and no retrieval-warning banner
appears) in Kubernetes CI jobs. This risks reintroducing locator timeouts and failing the RBAC E2E
requirement on AKS/EKS/GKE.
Code

e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[R8-10]

test.describe("Test Kubernetes Plugin", () => {
-  // TODO: https://issues.redhat.com/browse/RHDHBUGS-2817
-  test.fixme(() => process.env.IS_OPENSHIFT === "false");
+  // test.fixme(() => process.env.IS_OPENSHIFT === "false");
Evidence
Compliance requires that in Kubernetes-based CI runs the UI must not show the Kubernetes object
retrieval warning banner and that expected objects render so Playwright locators succeed. The change
comments out the environment-based test.fixme guard for non-OpenShift runs, but no accompanying
change is made to ensure resources load or that the banner does not appear when these tests now run
on Kubernetes jobs.

Kubernetes/Topology RBAC E2E tests must not show Kubernetes object retrieval warning banner
e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`kubernetes-rbac.spec.ts` no longer skips when `IS_OPENSHIFT === "false"`, but the PR does not add any mitigation to ensure Kubernetes resources reliably render and the UI does not show the "There was a problem retrieving Kubernetes objects" warning banner in Kubernetes CI jobs.

## Issue Context
Compliance requires Kubernetes/Topology RBAC E2E tests to pass consistently on AKS/EKS/GKE without the retrieval-warning banner and with expected resources visible within existing locator timeouts.

## Fix Focus Areas
- e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

3. Commented fixme dead code 🐞 Bug ⚙ Maintainability
Description
Both specs retain a commented-out test.fixme line, which obscures intent and makes future triage
harder. The repo already provides a shared helper (skipIfIsOpenShift) and constants for platform
gating, so leaving a commented inline env check is unnecessary and inconsistent.
Code

e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts[8]

+  // test.fixme(() => process.env.IS_OPENSHIFT === "false");
Evidence
The specs now contain a commented-out platform guard instead of either removing it entirely (if
tests must run everywhere) or expressing the gating via the shared helper and constants intended for
this purpose.

e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]
e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts[7-9]
e2e-tests/playwright/utils/helper.ts[88-107]
e2e-tests/playwright/utils/constants.ts[52-60]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Commented-out `test.fixme(...)` lines are dead code and make it unclear whether platform gating is intended.

### Issue Context
If these tests are meant to run on non-OpenShift, the commented lines should be deleted. If gating is still desired, use the shared helper (`skipIfIsOpenShift(IS_OPENSHIFT_VALUES.FALSE)`) instead of an inline env-string check.

### Fix Focus Areas
- e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]
- e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts[7-9]
- e2e-tests/playwright/utils/helper.ts[88-107]
- e2e-tests/playwright/utils/constants.ts[52-60]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit e46aa3b

Results up to commit b9f2d56


🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (1)


Remediation recommended
1. test.fixme disabled in kubernetes-rbac 📎 Requirement gap ☼ Reliability
Description
The PR removes the non-OpenShift skip gate for kubernetes-rbac.spec.ts without adding any
handling/assertion to ensure Kubernetes objects load successfully (and no retrieval-warning banner
appears) in Kubernetes CI jobs. This risks reintroducing locator timeouts and failing the RBAC E2E
requirement on AKS/EKS/GKE.
Code

e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[R8-10]

test.describe("Test Kubernetes Plugin", () => {
-  // TODO: https://issues.redhat.com/browse/RHDHBUGS-2817
-  test.fixme(() => process.env.IS_OPENSHIFT === "false");
+  // test.fixme(() => process.env.IS_OPENSHIFT === "false");
Evidence
Compliance requires that in Kubernetes-based CI runs the UI must not show the Kubernetes object
retrieval warning banner and that expected objects render so Playwright locators succeed. The change
comments out the environment-based test.fixme guard for non-OpenShift runs, but no accompanying
change is made to ensure resources load or that the banner does not appear when these tests now run
on Kubernetes jobs.

Kubernetes/Topology RBAC E2E tests must not show Kubernetes object retrieval warning banner
e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`kubernetes-rbac.spec.ts` no longer skips when `IS_OPENSHIFT === "false"`, but the PR does not add any mitigation to ensure Kubernetes resources reliably render and the UI does not show the "There was a problem retrieving Kubernetes objects" warning banner in Kubernetes CI jobs.

## Issue Context
Compliance requires Kubernetes/Topology RBAC E2E tests to pass consistently on AKS/EKS/GKE without the retrieval-warning banner and with expected resources visible within existing locator timeouts.

## Fix Focus Areas
- e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments
2. Commented fixme dead code 🐞 Bug ⚙ Maintainability
Description
Both specs retain a commented-out test.fixme line, which obscures intent and makes future triage
harder. The repo already provides a shared helper (skipIfIsOpenShift) and constants for platform
gating, so leaving a commented inline env check is unnecessary and inconsistent.
Code

e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts[8]

+  // test.fixme(() => process.env.IS_OPENSHIFT === "false");
Evidence
The specs now contain a commented-out platform guard instead of either removing it entirely (if
tests must run everywhere) or expressing the gating via the shared helper and constants intended for
this purpose.

e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]
e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts[7-9]
e2e-tests/playwright/utils/helper.ts[88-107]
e2e-tests/playwright/utils/constants.ts[52-60]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Commented-out `test.fixme(...)` lines are dead code and make it unclear whether platform gating is intended.

### Issue Context
If these tests are meant to run on non-OpenShift, the commented lines should be deleted. If gating is still desired, use the shared helper (`skipIfIsOpenShift(IS_OPENSHIFT_VALUES.FALSE)`) instead of an inline env-string check.

### Fix Focus Areas
- e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts[8-10]
- e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts[7-9]
- e2e-tests/playwright/utils/helper.ts[88-107]
- e2e-tests/playwright/utils/constants.ts[52-60]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-eks-helm-nightly

@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge Bot commented Apr 21, 2026

Review Summary by Qodo

Re-enable kubernetes-rbac and topology-rbac E2E tests

🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
• Uncomment kubernetes-rbac and topology-rbac E2E tests
• Remove test.fixme() skip conditions for non-OpenShift environments
• Re-enable previously disabled tests in K8s job pipelines
Diagram
flowchart LR
  A["E2E Tests<br/>Disabled"] -- "Uncomment fixme<br/>conditions" --> B["E2E Tests<br/>Re-enabled"]
  C["RHDHBUGS-2817<br/>Issue"] -- "Fix applied" --> B
Loading

Grey Divider

File Changes

1. e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts 🧪 Tests +1/-2

Re-enable kubernetes-rbac E2E tests

• Commented out test.fixme() condition that was skipping tests
• Removed TODO reference to [RHDHBUGS-2817](https://redhat.atlassian.net/browse/RHDHBUGS-2817) issue
• Re-enables kubernetes-rbac tests for non-OpenShift environments

e2e-tests/playwright/e2e/plugins/kubernetes/kubernetes-rbac.spec.ts


2. e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts 🧪 Tests +1/-2

Re-enable topology-rbac E2E tests

• Commented out test.fixme() condition that was skipping tests
• Removed TODO reference to [RHDHBUGS-2817](https://redhat.atlassian.net/browse/RHDHBUGS-2817) issue
• Re-enables topology-rbac tests for non-OpenShift environments

e2e-tests/playwright/e2e/plugins/topology/topology-rbac.spec.ts


Grey Divider

Qodo Logo

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@sanketpathak sanketpathak force-pushed the fix-K8s-jobs-topology-tests branch from b9f2d56 to bce3f73 Compare April 21, 2026 16:03
@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-aks-helm-nightly
/test e2e-gke-helm-nightly
/test e2e-eks-helm-nightly

@sanketpathak sanketpathak force-pushed the fix-K8s-jobs-topology-tests branch from bce3f73 to 5767ff5 Compare April 22, 2026 04:24
@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-aks-helm-nightly
/test e2e-gke-helm-nightly
/test e2e-eks-helm-nightly

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@sanketpathak sanketpathak force-pushed the fix-K8s-jobs-topology-tests branch from 5767ff5 to 833598c Compare April 22, 2026 06:59
@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@sanketpathak sanketpathak force-pushed the fix-K8s-jobs-topology-tests branch from 833598c to 588eee7 Compare April 22, 2026 07:29
@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@sanketpathak sanketpathak force-pushed the fix-K8s-jobs-topology-tests branch from 588eee7 to e9180e3 Compare April 22, 2026 07:33
@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-aks-helm-nightly
/test e2e-gke-helm-nightly
/test e2e-eks-helm-nightly

@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-gke-helm-nightly

1 similar comment
@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-gke-helm-nightly

Comment thread .ci/pipelines/utils.sh Outdated
Comment on lines +525 to +528
log::info "Waiting for Tekton Pipelines to be ready..."
k8s_wait::deployment "tekton-pipelines" "tekton-pipelines-webhook" 30 10 || return 1
k8s_wait::endpoint "tekton-pipelines-webhook" "tekton-pipelines" 1800 10 || return 1
k8s_wait::crd "pipelines.tekton.dev" 120 5 || return 1
Copy link
Copy Markdown
Member

@jrichter1 jrichter1 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these would make sense as part of install_tekton I think

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to make it consistent with the pipeline setup in OCP

rhdh/.ci/pipelines/utils.sh

Lines 507 to 508 in 9045e7b

k8s_wait::deployment "${OPERATOR_NAMESPACE}" "pipelines" 30 10 || return 1
k8s_wait::endpoint "${TEKTON_PIPELINES_WEBHOOK}" "openshift-pipelines" 1800 10 || return 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why waiting for the operator shouldn't be part of the respective install function in either case. Both are always called, then followed by the waits, which is also duplicating code.

If you want to be consistent, merge both install functions with the waits :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of waitfor_tekton_pipelines

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jrichter1. I'd merge all the installations with the waits to keep it readable, consistent, and easy to understand.

The waitfor_tekton_pipelines was probably orphaned during refactorings.

I recall that at some point, the installation was intentionally split from the waits, so that installation of multiple resources could be triggered, and then we could wait for all of them to be ready. But I don't see much value in that and it's making the scripts overly complicated.

@sanketpathak sanketpathak force-pushed the fix-K8s-jobs-topology-tests branch from e9180e3 to 74703d2 Compare April 23, 2026 12:17
@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-aks-helm-nightly
/test e2e-gke-helm-nightly
/test e2e-eks-helm-nightly

@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-gke-helm-nightly

@sanketpathak sanketpathak force-pushed the fix-K8s-jobs-topology-tests branch from 74703d2 to e46aa3b Compare April 23, 2026 13:51
@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-aks-helm-nightly
/test e2e-gke-helm-nightly
/test e2e-eks-helm-nightly

@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@sonarqubecloud
Copy link
Copy Markdown

@sanketpathak
Copy link
Copy Markdown
Contributor Author

/test e2e-gke-helm-nightly

@zdrapela
Copy link
Copy Markdown
Member

/agentic_review

@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge Bot commented Apr 24, 2026

Persistent review updated to latest commit e46aa3b

Copy link
Copy Markdown
Member

@zdrapela zdrapela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! It's cool to see it passing at all the platforms 👍 LGTM

@zdrapela
Copy link
Copy Markdown
Member

/retest

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 24, 2026

@sanketpathak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ocp-helm e46aa3b link true /test e2e-ocp-helm

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants