NO-JIRA: Clarify why Envoy pods have no CPU or memory limit#31145
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis PR updates a comment in test/extended/operators/resources.go to clarify that the entries refer to Istio (Envoy) Proxy Pods and explains why removing Envoy CPU limits is not desirable; no code or test data entries were modified. ChangesIstio comment clarification
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Skipping CI for Draft Pull Request. |
d1b0eef to
275387f
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/extended/operators/resources.go (1)
123-131:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSilent swallowing of non-NotFound Deployment lookup errors may cause spurious test failures.
The previous code explicitly handled
apierrors.IsNotFound(orphaned RS → keep RS ref) versus other errors (some alternate path). The new code collapses all error cases into a no-op: any error—transient 5xx, RBAC denial, rate-limiting—leaves therefaskind=ReplicaSet, so the constructedcontrollerkey will beapps/v1/ReplicaSet/<ns>/<rs-name>instead ofapps/v1/Deployment/<ns>/<deploy-name>.Consequence: if such an error occurs at test runtime, any matching entry in
exceptionGrantedorknownBrokenPods(which are Deployment-keyed) will not match, and the violation is promoted fromwaitingForFix/excepted →notAllowed, causing a hard test failure unrelated to the actual resource-limit compliance.Consider at minimum logging the unexpected error so it's visible in test output:
🛡️ Proposed fix
if deploy, err := oc.KubeFramework().ClientSet.AppsV1().Deployments(pod.Namespace).Get(context.Background(), name, metav1.GetOptions{}); err == nil { ref.Name = deploy.Name ref.Kind = "Deployment" ref.APIVersion = "apps/v1" +} else if !apierrors.IsNotFound(err) { + e2e.Logf("WARNING: unexpected error resolving Deployment for ReplicaSet %s/%s: %v", pod.Namespace, ref.Name, err) }And restore the
apierrorsimport:+ "k8s.io/apimachinery/pkg/api/errors"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/operators/resources.go` around lines 123 - 131, When resolving ReplicaSet refs in the switch case (the block that calls oc.KubeFramework().ClientSet.AppsV1().Deployments(...).Get and then sets ref.Name/ref.Kind/ref.APIVersion), do not silently ignore all errors: import k8s.io/apimachinery/pkg/api/errors (apierrors), check apierrors.IsNotFound(err) and keep the ReplicaSet ref in that case, but for any other non-nil error log the unexpected error (with context: namespace, rs name and the error) and avoid treating it as a benign NotFound — this preserves the original behavior for orphaned RS while surfacing transient/permission errors so they don't produce spurious test failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/extended/operators/resources.go`:
- Around line 123-131: When resolving ReplicaSet refs in the switch case (the
block that calls oc.KubeFramework().ClientSet.AppsV1().Deployments(...).Get and
then sets ref.Name/ref.Kind/ref.APIVersion), do not silently ignore all errors:
import k8s.io/apimachinery/pkg/api/errors (apierrors), check
apierrors.IsNotFound(err) and keep the ReplicaSet ref in that case, but for any
other non-nil error log the unexpected error (with context: namespace, rs name
and the error) and avoid treating it as a benign NotFound — this preserves the
original behavior for orphaned RS while surfacing transient/permission errors so
they don't produce spurious test failures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: f7326b7f-5fca-45c9-8e2c-f06703cc9d14
📒 Files selected for processing (1)
test/extended/operators/resources.go
|
/jira refresh |
|
@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I am leaving this as a draft just while openshift/cluster-ingress-operator#1440 is not merged yet |
|
/assign @Thealisyed @gcs278 |
OCPBUGS-55050 originally requests that a CPU limit is added to Envoy proxies created for OCP Gateway API. This commit adds some more explanation on why we should not do it, and why the exception must be persisted
275387f to
6acfc4f
Compare
|
@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by @rikatz this is just a comment change |
|
@rikatz: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling required tests: |
|
@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Seems reasonable to leave the exception as Envoy will utilize all of our the available cores that it can see. Users will likely need to customize this for their specific use cases, but that is something to be done in a separate effort. /approve |
|
/retest |
|
@rikatz: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Job Failure Risk Analysis for sha: 6acfc4f
|
|
/lgtm Exception will be permanent given the advice for Envoy. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin, gcs278, rikatz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@rikatz: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Justify why the exception for Envoy CPU memory and limits is required, adding comments on the exception with proper information about the exception needs
Summary by CodeRabbit