OCPBUGS-78775: Add namespace constraints to scheduler and controller-manager PromQL queries#16296
Conversation
…omQL queries The SCHEDULERS_UP and CONTROLLER_MANAGERS_UP queries lacked namespace selectors, causing false positive control plane degradation when user workloads create Prometheus targets with job="scheduler" or job="kube-controller-manager". Scope both queries to their respective openshift-kube-* namespaces, consistent with the existing API_SERVERS_UP query which already constrains to namespace="openshift-kube-apiserver".
WalkthroughThe Prometheus metric queries in the console app for monitoring control plane component health were updated to scope the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Leo6Leo The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@Leo6Leo: This pull request references Jira Issue OCPBUGS-77017, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@Leo6Leo: This pull request references Jira Issue OCPBUGS-78775, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
@Leo6Leo: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
namespace="openshift-kube-scheduler"constraint to theSCHEDULERS_UPPromQL querynamespace="openshift-kube-controller-manager"constraint to theCONTROLLER_MANAGERS_UPPromQL queryjob="scheduler"orjob="kube-controller-manager"Root Cause
The
SCHEDULERS_UPandCONTROLLER_MANAGERS_UPqueries lacked namespace selectors, matching any Prometheus target with the corresponding job label across all namespaces. When a user creates aServicenamedschedulerwith aServiceMonitor, Prometheus assignsjob="scheduler"to that scrape target. Since the user workload doesn't expose valid Prometheus metrics, theupmetric returns0, diluting the response rate calculation (e.g.,3/4 * 100 = 75%) and triggering a false "Degraded" status in Overview > Control Plane.This fix aligns both queries with the existing
API_SERVERS_UPquery, which already constrains tonamespace="openshift-kube-apiserver".Reproduction Steps
Servicenamedschedulerand aServiceMonitortargeting itVerification
Tested on a live 4.18 cluster with the fake workload present:
75%— false degradation100%— correctTest plan
job="scheduler") in a user namespace🤖 Generated with Claude Code
Summary by CodeRabbit