Skip to content

NO-JIRA: Clarify why Envoy pods have no CPU or memory limit#31145

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
rikatz:remove-sail-exception
Jun 3, 2026
Merged

NO-JIRA: Clarify why Envoy pods have no CPU or memory limit#31145
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
rikatz:remove-sail-exception

Conversation

@rikatz

@rikatz rikatz commented May 7, 2026

Copy link
Copy Markdown
Member

Justify why the exception for Envoy CPU memory and limits is required, adding comments on the exception with proper information about the exception needs

Summary by CodeRabbit

  • Tests
    • Clarified test comments about Istio (Envoy) proxy CPU limits and why removing those limits is not desirable; no test logic or behavior changed.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 7, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Previously, there was a need for exception on Gateway API Proxy pods to allow resources limits, as this wasn't properly configurable.

The situation is now fixed on cluster ingress operator, and the exception can be removed. The fix is implemented on openshift/cluster-ingress-operator#1440

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2026
@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown

Walkthrough

This PR updates a comment in test/extended/operators/resources.go to clarify that the entries refer to Istio (Envoy) Proxy Pods and explains why removing Envoy CPU limits is not desirable; no code or test data entries were modified.

Changes

Istio comment clarification

Layer / File(s) Summary
Istio (Envoy) Proxy Pods comment
test/extended/operators/resources.go (lines 60–63)
Replaced the generic "Istio pods" comment above knownBrokenPods with an expanded multi-line note about Envoy proxy pods, including a rationale and an issue reference. No executable logic or data in knownBrokenPods changed.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test names are stable and deterministic. Describe("[sig-arch] Managed cluster") and It("should set requests but not limits [Early]") contain no dynamic content.
Test Structure And Quality ✅ Passed The PR only updates a comment in test/extended/operators/resources.go, adding documentation about Envoy CPU limits. No test logic, assertions, or infrastructure code was modified.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. The only changes are comment updates in the existing test file to better document why Envoy CPU limits must be retained.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The new test validates pod resource limits in platform namespaces and has no multi-node assumptions, making it SNO-compatible.
Topology-Aware Scheduling Compatibility ✅ Passed PR only updates comments in test/extended/operators/resources.go explaining resource limit exceptions; no scheduling constraints, manifests, or controller code are modified.
Ote Binary Stdout Contract ✅ Passed The PR only changes comments in knownBrokenPods map. All logging (e2e.Logf, e2e.Failf) is inside g.It blocks, not process-level. No stdout violations detected.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Test is cluster-internal with no IPv4 assumptions, hardcoded addresses, IPv4-specific parsing, or external connectivity requirements.
No-Weak-Crypto ✅ Passed PR only modifies comments in a test file. No cryptographic code, weak crypto patterns, or custom crypto implementations are present or modified.
Container-Privileges ✅ Passed PR only modifies a comment in a Go test file with no K8s manifest or privilege escalation configurations like privileged, hostPID, hostNetwork, hostIPC, SYS_ADMIN, or allowPrivilegeEscalation.
No-Sensitive-Data-In-Logs ✅ Passed PR adds comments and entries with pod metadata and public issue URLs only. No passwords, tokens, API keys, PII, or sensitive data exposed.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title describes updating a comment to clarify why Envoy pods lack CPU/memory limits, which matches the file change (updating the knownBrokenPods comment for Istio/Envoy pods). However, the PR's actual objective is removing the Gateway API Proxy exception, not just clarifying documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rikatz rikatz force-pushed the remove-sail-exception branch from d1b0eef to 275387f Compare May 7, 2026 16:30

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/extended/operators/resources.go (1)

123-131: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent swallowing of non-NotFound Deployment lookup errors may cause spurious test failures.

The previous code explicitly handled apierrors.IsNotFound (orphaned RS → keep RS ref) versus other errors (some alternate path). The new code collapses all error cases into a no-op: any error—transient 5xx, RBAC denial, rate-limiting—leaves the ref as kind=ReplicaSet, so the constructed controller key will be apps/v1/ReplicaSet/<ns>/<rs-name> instead of apps/v1/Deployment/<ns>/<deploy-name>.

Consequence: if such an error occurs at test runtime, any matching entry in exceptionGranted or knownBrokenPods (which are Deployment-keyed) will not match, and the violation is promoted from waitingForFix/excepted → notAllowed, causing a hard test failure unrelated to the actual resource-limit compliance.

Consider at minimum logging the unexpected error so it's visible in test output:

🛡️ Proposed fix
 if deploy, err := oc.KubeFramework().ClientSet.AppsV1().Deployments(pod.Namespace).Get(context.Background(), name, metav1.GetOptions{}); err == nil {
     ref.Name = deploy.Name
     ref.Kind = "Deployment"
     ref.APIVersion = "apps/v1"
+} else if !apierrors.IsNotFound(err) {
+    e2e.Logf("WARNING: unexpected error resolving Deployment for ReplicaSet %s/%s: %v", pod.Namespace, ref.Name, err)
 }

And restore the apierrors import:

+   "k8s.io/apimachinery/pkg/api/errors"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/operators/resources.go` around lines 123 - 131, When resolving
ReplicaSet refs in the switch case (the block that calls
oc.KubeFramework().ClientSet.AppsV1().Deployments(...).Get and then sets
ref.Name/ref.Kind/ref.APIVersion), do not silently ignore all errors: import
k8s.io/apimachinery/pkg/api/errors (apierrors), check apierrors.IsNotFound(err)
and keep the ReplicaSet ref in that case, but for any other non-nil error log
the unexpected error (with context: namespace, rs name and the error) and avoid
treating it as a benign NotFound — this preserves the original behavior for
orphaned RS while surfacing transient/permission errors so they don't produce
spurious test failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/extended/operators/resources.go`:
- Around line 123-131: When resolving ReplicaSet refs in the switch case (the
block that calls oc.KubeFramework().ClientSet.AppsV1().Deployments(...).Get and
then sets ref.Name/ref.Kind/ref.APIVersion), do not silently ignore all errors:
import k8s.io/apimachinery/pkg/api/errors (apierrors), check
apierrors.IsNotFound(err) and keep the ReplicaSet ref in that case, but for any
other non-nil error log the unexpected error (with context: namespace, rs name
and the error) and avoid treating it as a benign NotFound — this preserves the
original behavior for orphaned RS while surfacing transient/permission errors so
they don't produce spurious test failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f7326b7f-5fca-45c9-8e2c-f06703cc9d14

📥 Commits

Reviewing files that changed from the base of the PR and between 65f4e3a and d1b0eef.

📒 Files selected for processing (1)
  • test/extended/operators/resources.go

@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 7, 2026
@rikatz

rikatz commented May 7, 2026

Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 7, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rikatz

rikatz commented May 7, 2026

Copy link
Copy Markdown
Member Author

I am leaving this as a draft just while openshift/cluster-ingress-operator#1440 is not merged yet

@bentito

bentito commented May 13, 2026

Copy link
Copy Markdown
Contributor

/assign @Thealisyed @gcs278

OCPBUGS-55050 originally requests that a CPU limit is added to Envoy proxies
created for OCP Gateway API.

This commit adds some more explanation on why we should not do it, and why
the exception must be persisted
@rikatz rikatz force-pushed the remove-sail-exception branch from 275387f to 6acfc4f Compare June 1, 2026 12:35
@rikatz rikatz marked this pull request as ready for review June 1, 2026 12:36
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

Previously, there was a need for exception on Gateway API Proxy pods to allow resources limits, as this wasn't properly configurable.

The situation is now fixed on cluster ingress operator, and the exception can be removed. The fix is implemented on openshift/cluster-ingress-operator#1440

Summary by CodeRabbit

  • Tests
  • Clarified test comments about Istio (Envoy) proxy CPU limits and why removing those limits is not desirable; no test logic or behavior changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from bparees and sdodson June 1, 2026 12:38
@rikatz

rikatz commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

/verified by @rikatz this is just a comment change

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 1, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@rikatz: This PR has been marked as verified by @rikatz this is just a comment change.

Details

In response to this:

/verified by @rikatz this is just a comment change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci-robot

Copy link
Copy Markdown

@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

Justify why the exception for Envoy CPU memory and limits is required, adding comments on the exception with proper information about the exception needs

Summary by CodeRabbit

  • Tests
  • Clarified test comments about Istio (Envoy) proxy CPU limits and why removing those limits is not desirable; no test logic or behavior changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gcs278

gcs278 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Seems reasonable to leave the exception as Envoy will utilize all of our the available cores that it can see. Users will likely need to customize this for their specific use cases, but that is something to be done in a separate effort.

/approve
/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2026
@rikatz

rikatz commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

/retest
quay was out during the morning

@rikatz rikatz changed the title OCPBUGS-55050: Remove exception for Gateway API Proxy resources limits NO-JIRA: Clarify why Envoy pods have no CPU or memory limit Jun 1, 2026
@openshift-ci-robot openshift-ci-robot removed jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 1, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@rikatz: This pull request explicitly references no jira issue.

Details

In response to this:

Justify why the exception for Envoy CPU memory and limits is required, adding comments on the exception with proper information about the exception needs

Summary by CodeRabbit

  • Tests
  • Clarified test comments about Istio (Envoy) proxy CPU limits and why removing those limits is not desirable; no test logic or behavior changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-trt

openshift-trt Bot commented Jun 1, 2026

Copy link
Copy Markdown

Job Failure Risk Analysis for sha: 6acfc4f

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (26) are below the historical average (3143): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (26) are below the historical average (1825): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (25) are below the historical average (1781): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (27) are below the historical average (3181): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (28) are below the historical average (3057): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@dgoodwin

dgoodwin commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

/lgtm

Exception will be permanent given the advice for Envoy.

@openshift-ci

openshift-ci Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, gcs278, rikatz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD d2f5b59 and 2 for PR HEAD 6acfc4f in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 95613d1 and 1 for PR HEAD 6acfc4f in total

@openshift-ci

openshift-ci Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@rikatz: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 05c80ba into openshift:main Jun 3, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants