LOG-9356: Implement Collector ServiceMonitor Changes by Clee2691 · Pull Request #3270 · openshift/cluster-logging-operator

Clee2691 · 2026-05-05T21:16:52Z

Description

This PR makes use of metrics collection profiles.

The CMO supports metrics collection profiles that select ServiceMonitors by the monitoring.openshift.io/collection-profile label. This change adds that label to all ServiceMonitors created by CLO, enabling cluster admins to reduce metrics cardinality by switching to the minimal profile.

Because the minimal ServiceMonitor has a different resource name (e.g., minimal-collector) but targets the same backing service, a serviceName parameter was added to ensure TLS ServerName verification uses the correct service DNS name.

Minimal Metrics - Collector:

logcollector_component_event_unmatched_count
vector_buffer_byte_size
vector_buffer_discarded_events_total
vector_buffer_events
vector_buffer_sent_events_total
vector_component_discarded_events_total
vector_component_errors_total
vector_component_received_bytes_total
vector_component_received_event_bytes_total
vector_component_received_events_total
vector_component_sent_bytes_total
vector_events_in_total
vector_http_client_errors_total
vector_http_client_requests_sent_total
vector_http_client_responses_total
vector_open_files

Minimal Metrics - LFME:

log_logged_bytes_total

/cc @vparfonov
/assign @jcantrill

Links

JIRA: https://redhat.atlassian.net/browse/LOG-9356

Summary by CodeRabbit

New Features
- Metrics scraping now supports distinct "full" and "minimal" collection profiles, creating separate monitoring endpoints and adding a profile label to ServiceMonitors.
- Prometheus query helper added for test-time validation of scraped metrics.
Tests
- Expanded unit and end-to-end tests to validate relabeling rules, profile-specific relabel configs, TLS server name, and that full/minimal profiles expose expected metrics.

openshift-ci-robot · 2026-05-05T21:16:56Z

@Clee2691: This pull request references LOG-9356 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

Details

In response to this:

Description

Add MetricRelabelConfigs to the collector ServiceMonitor to keep only the metrics needed for alerts, dashboards, and telemetry

Drop transform component metrics except errors and discarded events to further reduce cardinality

Allowed Metrics:

logcollector_component_event_unmatched_count

vector_buffer_byte_size

vector_buffer_discarded_events_total

vector_buffer_events

vector_buffer_sent_events_total

vector_component_discarded_events_total

vector_component_errors_total

vector_component_received_bytes_total

vector_component_received_event_bytes_total

vector_component_received_events_total

vector_component_sent_bytes_total

vector_events_in_total

vector_http_client_errors_total

vector_http_client_requests_sent_total

vector_http_client_responses_total

vector_open_files

/cc @vparfonov
/assign @jcantrill

Links

JIRA: https://redhat.atlassian.net/browse/LOG-9356

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

qodo-code-review · 2026-05-05T21:17:06Z

Review Summary by Qodo

Add metric filtering to collector ServiceMonitor

✨ Enhancement

Walkthroughs

Description

• Add metric filtering to ServiceMonitor with keep/drop relabel configs
• Keep only essential metrics for alerts, dashboards, telemetry
• Drop transform component metrics except errors and discarded events
• Refactor relabel config structure for improved readability

Diagram

flowchart LR
  SM["ServiceMonitor<br/>MetricRelabelConfigs"]
  R1["Relabel 1:<br/>Replace - with _"]
  R2["Relabel 2:<br/>Keep allowed metrics"]
  R3["Relabel 3:<br/>Drop transform metrics"]
  SM --> R1
  SM --> R2
  SM --> R3
  R2 --> Filter["Filtered metrics<br/>for monitoring"]
  R3 --> Filter

File Changes

1. internal/metrics/service_monitor.go ✨ Enhancement +14/-9

Add metric filtering relabel configurations

• Removed comment explaining label replacement logic
• Reformatted first relabel config for consistency
• Added second relabel config with keep action to filter metrics by name
• Added third relabel config with drop action to exclude transform component metrics

internal/metrics/service_monitor.go

2. internal/metrics/service_monitor_test.go 🧪 Tests +14/-0

Add MetricRelabelConfigs verification tests

• Added test section verifying MetricRelabelConfigs structure
• Validates three relabel configs are present
• Verifies first config replaces hyphens with underscores
• Verifies second config keeps only allowed metrics
• Verifies third config drops transform component metrics

internal/metrics/service_monitor_test.go

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

qodo-code-review · 2026-05-05T21:17:08Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Context used

✅ Tickets: [LOG-9356](https://redhat.atlassian.net/browse/LOG-9356)

1. LFME metrics filtered out 🐞 Bug ≡ Correctness

Description

newServiceMonitor() now applies a metric "keep" allowlist to all ServiceMonitors, including
LogFileMetricsExporter’s ServiceMonitor, which will drop log_logged_bytes_total from Prometheus.
This breaks the shipped recording rule and dashboard panels that query log_logged_bytes_total.

Code

internal/metrics/service_monitor.go[R40-44]

+				{
+					Action:       "keep",
+					SourceLabels: []monitoringv1.LabelName{"__name__"},
+					Regex:        "logcollector_component_event_unmatched_count|vector_buffer_byte_size|vector_buffer_discarded_events_total|vector_buffer_events|vector_buffer_sent_events_total|vector_component_discarded_events_total|vector_component_errors_total|vector_component_received_bytes_total|vector_component_received_event_bytes_total|vector_component_received_events_total|vector_component_sent_bytes_total|vector_events_in_total|vector_http_client_errors_total|vector_http_client_requests_sent_total|vector_http_client_responses_total|vector_open_files",
+				},

Evidence
The new metric relabel "keep" allowlist in newServiceMonitor() only allows vector_* and one
logcollector_* metric name, so LogFileMetricsExporter metrics (e.g., log_logged_bytes_total) will be
filtered out. The LogFileMetricsExporter controller reuses metrics.ReconcileServiceMonitor(), so it
will receive the same filtering; meanwhile, repository-shipped PrometheusRule recording and
dashboard queries depend on log_logged_bytes_total being present in Prometheus.
internal/metrics/service_monitor.go[33-50]
internal/metrics/logfilemetricexporter/metric_exporter.go[78-82]
config/prometheus/collector_alerts.yaml[97-107]
internal/metrics/dashboard/openshift-logging-dashboard.json[585-593]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`metrics.newServiceMonitor()` now installs a metric relabel `keep` allowlist for **every** ServiceMonitor created via `metrics.ReconcileServiceMonitor()`. This unintentionally filters out LogFileMetricsExporter metrics (notably `log_logged_bytes_total`), breaking shipped recording rules and dashboard panels.

## Issue Context
- LogFileMetricsExporter reconciles its ServiceMonitor through the same `metrics.ReconcileServiceMonitor()` path used by the collector.
- The new keep-regex only allows a vector/logcollector metric subset, excluding `log_logged_bytes_total`.

## Fix Focus Areas
- internal/metrics/service_monitor.go[33-50]
- internal/metrics/logfilemetricexporter/metric_exporter.go[78-82]
- config/prometheus/collector_alerts.yaml[97-107]

### Expected fix direction
- Make ServiceMonitor relabeling **component-specific**:
 - Either split into `newCollectorServiceMonitor()` (with allowlist/drop rules) and a generic `newServiceMonitor()` (without restrictive keep), or
 - Add a parameter to `newServiceMonitor(..., metricRelabelConfigs []*monitoringv1.RelabelConfig)` and pass the collector-only configs from the collector reconciler.
- Ensure LFME-required metrics (at least `log_logged_bytes_total`) are not filtered out of Prometheus.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

coderabbitai · 2026-05-05T21:17:09Z

Warning

Rate limit exceeded

@Clee2691 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 30 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3cc41c1c-1c00-4211-9ce1-58112f007950

📥 Commits

Reviewing files that changed from the base of the PR and between 99d21e3 and d94ab5f.

📒 Files selected for processing (16)

internal/constants/constants.go
internal/constants/labels.go
internal/controller/observability/collector.go
internal/controller/observability/collector_test.go
internal/metrics/logfilemetricexporter/metric_exporter.go
internal/metrics/logfilemetricexporter/metric_exporter_test.go
internal/metrics/relabel.go
internal/metrics/relabel_test.go
internal/metrics/service_monitor.go
internal/metrics/service_monitor_test.go
test/e2e/collection/metrics/profile_test.go
test/e2e/collection/metrics/suite_test.go
test/e2e/flowcontrol/utils.go
test/e2e/logfilesmetricexporter/lfme_test.go
test/e2e/operator/metrics/e2e_test.go
test/helpers/prometheus/prometheus.go

Walkthrough

Reconciler now creates two ServiceMonitors (full and minimal) per component, with configurable Prometheus relabel configs and a metrics-collection-profile label. New relabel builder produces precomputed config sets; ServiceMonitor construction/signature and tests updated to accept and validate metricRelabelConfigs, TLS ServerName, and profile labeling.

Changes

Metrics relabel & ServiceMonitor flow

Layer / File(s)	Summary
Constants / Labels `internal/constants/...`, `internal/constants/labels.go`	Introduce/relocate metrics-related constants and add `LabelMetricsCollectionProfile` constant.
Relabel Data & Builders `internal/metrics/relabel.go`	Add `buildRelabelConfigs` with allowlist/drop helpers; export `CollectorMinimalRelabelConfigs`, `LFMEMinimalRelabelConfigs`, and `FullRelabelConfigs`.
Relabel Tests `internal/metrics/relabel_test.go`	New tests validating ordering and contents of generated relabel configs across scenarios; asserts prebuilt sets' lengths/actions/regexes.
ServiceMonitor API `internal/metrics/service_monitor.go`	`newServiceMonitor` and `ReconcileServiceMonitor` signatures updated to accept `serviceName`, `metricRelabelConfigs`, and `profile`; endpoint `TLSConfig.ServerName` set from `serviceName`; endpoint `MetricRelabelConfigs` set from input; profile label applied.
ServiceMonitor Tests `internal/metrics/service_monitor_test.go`	Expanded tests for full/minimal profiles asserting exact `MetricRelabelConfigs` counts and specific `Regex`/`TargetLabel`/`Action`/`SourceLabels` ordering and TLS server name.
Controller Wiring `internal/controller/observability/collector.go`, `internal/metrics/logfilemetricexporter/metric_exporter.go`	Replace single ServiceMonitor reconciliation with two reconciliations (full and minimal) per component, passing appropriate relabel config sets and profile labels; adjust names/logging.
Controller & Exporter Tests `internal/controller/observability/collector_test.go`, `internal/metrics/logfilemetricexporter/metric_exporter_test.go`	Update tests to expect both full and minimal ServiceMonitors and validate endpoint presence and relabel config counts/actions.
E2E Prometheus Helpers `test/helpers/prometheus/prometheus.go`	Add Prometheus/Thanos query helper: token/host discovery via `oc`, HTTPS query with bearer token, JSON parsing, and `HasResults` helper.
E2E Tests Updated/Added `test/e2e/...`	Add e2e suite and tests asserting ServiceMonitor profile labels and that both allowlisted and non-allowlisted metrics are scraped; adapt utilities to use Prometheus helper and add Prometheus-based assertions in existing e2e tests.
Test Utils `test/e2e/flowcontrol/utils.go`	Simplify/remove prior direct Prometheus HTTP logic; delegate queries to `test/helpers/prometheus`.

Sequence Diagram

sequenceDiagram
    participant Controller as Controller
    participant Relabel as Relabel Builder
    participant K8s as Kubernetes API
    participant PromOp as Prometheus Operator

    Controller->>Relabel: Request relabel configs (full/minimal)
    Relabel-->>Controller: Return relabel config sets
    Controller->>K8s: Create/Update ServiceMonitor (endpoints, TLS ServerName, metricRelabelConfigs, profile label)
    K8s->>PromOp: Prometheus Operator observes ServiceMonitor
    PromOp->>K8s: Begin scraping targets per ServiceMonitor spec

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 8 | ❌ 4

❌ Failed checks (4 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	Test quality issues: (1) profile_test.go violates single responsibility - tests two unrelated metrics in one test block; (2) prometheus.go hard Fail() breaks Eventually polling with transient errors	Separate profile_test.go metrics test into individual tests. Fix prometheus.go error handling per review: replace Fail() with graceful logging and empty-map returns
Microshift Test Compatibility	⚠️ Warning	New e2e tests reference openshift-monitoring namespace, prometheus-k8s, thanos-querier, and ServiceMonitor API unavailable on MicroShift. No skip mechanisms present.	Add [Skipped:MicroShift] labels to tests in profile_test.go, e2e_test.go, and lfme_test.go; or guard with exutil.IsMicroShiftCluster() + Skip(). Update prometheus helper for graceful error handling.
Ipv6 And Disconnected Network Test Compatibility	⚠️ Warning	IPv6 issue in test/helpers/prometheus/prometheus.go line 49: fmt.Sprintf("https://%s/...", host) fails for IPv6 addresses. Also hard-fails on errors instead of graceful retry.	Use net.JoinHostPort() for IPv6-safe URLs. Replace Fail() calls with graceful error logging and empty-map returns. Add HTTP timeout.

✅ Passed checks (8 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: implementing Collector ServiceMonitor changes related to metrics collection profiles support.
Description check	✅ Passed	The description includes mandatory sections (Description with context and rationale), reviewer assignment (/cc), approver assignment (/assign), and links including JIRA reference. It adequately explains the implementation of metrics collection profiles and lists allowed minimal metrics.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All 31 test declarations use static string literals with no dynamic content, pod name suffixes, timestamps, UUIDs, IPs, or random identifiers. Test names are descriptive and stable.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	New e2e tests are SNO-compatible. DaemonSets work on SNO (1 node). Tests use standard OpenShift APIs. No multi-node assumptions detected.
Topology-Aware Scheduling Compatibility	✅ Passed	Changes only modify ServiceMonitors and metrics configs. No pod affinity, topology spread constraints, nodeSelector, or replica adjustments based on topology introduced.
Ote Binary Stdout Contract	✅ Passed	All Fail() calls are within test contexts (Eventually/It blocks). No module-level stdout writes. Suite setup uses standard Ginkgo patterns correctly.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-05-05T21:21:22Z

+				{
+					Action:       "keep",
+					SourceLabels: []monitoringv1.LabelName{"__name__"},
+					Regex:        "logcollector_component_event_unmatched_count|vector_buffer_byte_size|vector_buffer_discarded_events_total|vector_buffer_events|vector_buffer_sent_events_total|vector_component_discarded_events_total|vector_component_errors_total|vector_component_received_bytes_total|vector_component_received_event_bytes_total|vector_component_received_events_total|vector_component_sent_bytes_total|vector_events_in_total|vector_http_client_errors_total|vector_http_client_requests_sent_total|vector_http_client_responses_total|vector_open_files",
+				},


1. Lfme metrics filtered out 🐞 Bug ≡ Correctness

newServiceMonitor() now applies a metric "keep" allowlist to all ServiceMonitors, including LogFileMetricsExporter’s ServiceMonitor, which will drop log_logged_bytes_total from Prometheus. This breaks the shipped recording rule and dashboard panels that query log_logged_bytes_total.

Agent Prompt

## Issue description `metrics.newServiceMonitor()` now installs a metric relabel `keep` allowlist for **every** ServiceMonitor created via `metrics.ReconcileServiceMonitor()`. This unintentionally filters out LogFileMetricsExporter metrics (notably `log_logged_bytes_total`), breaking shipped recording rules and dashboard panels. ## Issue Context - LogFileMetricsExporter reconciles its ServiceMonitor through the same `metrics.ReconcileServiceMonitor()` path used by the collector. - The new keep-regex only allows a vector/logcollector metric subset, excluding `log_logged_bytes_total`. ## Fix Focus Areas - internal/metrics/service_monitor.go[33-50] - internal/metrics/logfilemetricexporter/metric_exporter.go[78-82] - config/prometheus/collector_alerts.yaml[97-107] ### Expected fix direction - Make ServiceMonitor relabeling **component-specific**: - Either split into `newCollectorServiceMonitor()` (with allowlist/drop rules) and a generic `newServiceMonitor()` (without restrictive keep), or - Add a parameter to `newServiceMonitor(..., metricRelabelConfigs []*monitoringv1.RelabelConfig)` and pass the collector-only configs from the collector reconciler. - Ensure LFME-required metrics (at least `log_logged_bytes_total`) are not filtered out of Prometheus.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

internal/metrics/service_monitor.go (1)

43-43: 💤 Low value

Consider extracting the keep-list regex to a named constant.

The allowlist is the core deliverable of this PR and a maintenance hotspot (new metrics are added, removed, or renamed). Inlining it as a 500+ character string makes diffs noisy and typos hard to spot.

♻️ Suggested refactor

+const collectorMetricsKeepRegex = "logcollector_component_event_unmatched_count|" +
+    "vector_buffer_byte_size|vector_buffer_discarded_events_total|" +
+    "vector_buffer_events|vector_buffer_sent_events_total|" +
+    "vector_component_discarded_events_total|vector_component_errors_total|" +
+    "vector_component_received_bytes_total|vector_component_received_event_bytes_total|" +
+    "vector_component_received_events_total|vector_component_sent_bytes_total|" +
+    "vector_events_in_total|vector_http_client_errors_total|" +
+    "vector_http_client_requests_sent_total|vector_http_client_responses_total|" +
+    "vector_open_files"

 // in newServiceMonitor:
-                Regex: "logcollector_component_event_unmatched_count|vector_buffer_byte_size|...",
+                Regex: collectorMetricsKeepRegex,

The constant can then be referenced in the test as well, removing duplication.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/metrics/service_monitor.go` at line 43, Extract the long inline
allowlist string used in the Regex field into a clearly named package-level
constant (e.g., allowedMetricsRegex or keepListRegex) in
internal/metrics/service_monitor.go, replace the inlined string in the struct
literal's Regex field with that constant, and update any tests to
import/reference the same constant instead of duplicating the string so the
allowlist is a single source of truth.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/metrics/service_monitor_test.go`:
- Around line 83-84: The test currently omits asserting the allowlist regex so
changes silently pass; update the test in service_monitor_test.go to assert
relabelConfigs[1].Regex equals the expected allowlist regex (or, better, compare
it to the constant defined in service_monitor.go if you extract the regex
there), i.e., add an
Expect(relabelConfigs[1].Regex).To(Equal(<expectedRegexOrConstant>)) alongside
the existing Action and SourceLabels assertions.

---

Nitpick comments:
In `@internal/metrics/service_monitor.go`:
- Line 43: Extract the long inline allowlist string used in the Regex field into
a clearly named package-level constant (e.g., allowedMetricsRegex or
keepListRegex) in internal/metrics/service_monitor.go, replace the inlined
string in the struct literal's Regex field with that constant, and update any
tests to import/reference the same constant instead of duplicating the string so
the allowlist is a single source of truth.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c0367dba-620a-4f1b-abb3-d1fbd24ba9de

📥 Commits

Reviewing files that changed from the base of the PR and between ff06c7a and 962597a.

📒 Files selected for processing (2)

internal/metrics/service_monitor.go
internal/metrics/service_monitor_test.go

coderabbitai · 2026-05-05T21:25:40Z

+		Expect(string(relabelConfigs[1].Action)).To(Equal("keep"))
+		Expect(relabelConfigs[1].SourceLabels).To(Equal([]monitoringv1.LabelName{"__name__"}))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The keep-list regex is unasserted — the PR's core change has no test validation.

relabelConfigs[1].Regex is never checked. A typo, accidental truncation, or future metric name change in the allowlist would silently pass the test suite.

✅ Proposed fix

Expect(string(relabelConfigs[1].Action)).To(Equal("keep")) Expect(relabelConfigs[1].SourceLabels).To(Equal([]monitoringv1.LabelName{"__name__"})) +Expect(relabelConfigs[1].Regex).To(Equal( + "logcollector_component_event_unmatched_count|vector_buffer_byte_size|" + + "vector_buffer_discarded_events_total|vector_buffer_events|" + + "vector_buffer_sent_events_total|vector_component_discarded_events_total|" + + "vector_component_errors_total|vector_component_received_bytes_total|" + + "vector_component_received_event_bytes_total|vector_component_received_events_total|" + + "vector_component_sent_bytes_total|vector_events_in_total|" + + "vector_http_client_errors_total|vector_http_client_requests_sent_total|" + + "vector_http_client_responses_total|vector_open_files", +))

(Even better: extract the regex to a constant in service_monitor.go and reference it here directly.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Expect(string(relabelConfigs[1].Action)).To(Equal("keep"))

Expect(relabelConfigs[1].SourceLabels).To(Equal([]monitoringv1.LabelName{"__name__"}))

Expect(string(relabelConfigs[1].Action)).To(Equal("keep"))

Expect(relabelConfigs[1].SourceLabels).To(Equal([]monitoringv1.LabelName{"__name__"}))

Expect(relabelConfigs[1].Regex).To(Equal(

"logcollector_component_event_unmatched_count|vector_buffer_byte_size|" +

"vector_buffer_discarded_events_total|vector_buffer_events|" +

"vector_buffer_sent_events_total|vector_component_discarded_events_total|" +

"vector_component_errors_total|vector_component_received_bytes_total|" +

"vector_component_received_event_bytes_total|vector_component_received_events_total|" +

"vector_component_sent_bytes_total|vector_events_in_total|" +

"vector_http_client_errors_total|vector_http_client_requests_sent_total|" +

"vector_http_client_responses_total|vector_open_files",

))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/metrics/service_monitor_test.go` around lines 83 - 84, The test currently omits asserting the allowlist regex so changes silently pass; update the test in service_monitor_test.go to assert relabelConfigs[1].Regex equals the expected allowlist regex (or, better, compare it to the constant defined in service_monitor.go if you extract the regex there), i.e., add an Expect(relabelConfigs[1].Regex).To(Equal(<expectedRegexOrConstant>)) alongside the existing Action and SourceLabels assertions.

Clee2691 · 2026-05-06T03:09:36Z

/retest

jcantrill · 2026-05-06T14:14:57Z

/hold

simonpasquier · 2026-05-06T15:30:47Z

Have you considered implementing collection profiles?

The general idea is that the admin defines the level of details for metrics at the cluster level. We have 3 profiles:

full (default) => collects all metrics available
minimal => collects only metrics relevant for alerting rules, dashboards and telemetry
telemetry => collects only metrics required for telemetry

We can help with the implementation (in summary, you need to define 1 additional service monitor per profile). Happy to discuss directly on your forum.

cc @jan--f and @r2d2rnd (who informed us about the work in progress, thanks!)

coderabbitai

🧹 Nitpick comments (2)

internal/metrics/relabel.go (2)
13-17: 💤 Low value

excludeMetrics is semantically ambiguous — consider droppedMetrics.

The field name excludeMetrics can be read as "metrics excluded from the drop action" (i.e., kept), but the implementation puts them inside the drop rule regex, meaning they are the metrics that get dropped for the given label combination. droppedMetrics or metricsToDropForLabel would remove this ambiguity for future maintainers.
♻️ Proposed rename
 type metricDropConfig struct {
 	labelName      string
 	labelValue     string
-	excludeMetrics []string
+	droppedMetrics []string
 }
 var collectorMinimalDropConfigs = []metricDropConfig{
 	{
 		labelName:  "component_kind",
 		labelValue: "transform",
-		excludeMetrics: []string{
+		droppedMetrics: []string{
And in buildRelabelConfigs:
-		Regex: drop.labelValue + ";(" + strings.Join(drop.excludeMetrics, "|") + ")",
+		Regex: drop.labelValue + ";(" + strings.Join(drop.droppedMetrics, "|") + ")",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/metrics/relabel.go` around lines 13 - 17, Rename the ambiguous
struct field excludeMetrics in metricDropConfig to a clearer name like
droppedMetrics (or metricsToDropForLabel) and update all references accordingly;
specifically change the field declaration in metricDropConfig and adjust any
code that reads or writes metricDropConfig (e.g., the buildRelabelConfigs
function and any callers or JSON/YAML tags) so they use the new symbol name to
reflect that these are the metrics that will be dropped for the given label
combination.
89-95: 💤 Low value

Add a length guard for excludeMetrics to match the allowlist pattern.

The allowlist block guards len(allowlist.allowedMetrics) > 0 before appending a keep rule, but the drop loop has no equivalent guard. An empty excludeMetrics slice produces the regex labelValue;() with action: drop. While Prometheus's anchored full-string matching makes this effectively a no-op for real metrics (would only match a sample with an empty __name__), it emits a spurious relabel rule and is inconsistent with the allowlist guard style.
♻️ Proposed fix
 	for _, drop := range dropConfigs {
+		if len(drop.excludeMetrics) == 0 {
+			continue
+		}
 		configs = append(configs, &monitoringv1.RelabelConfig{
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/metrics/relabel.go` around lines 89 - 95, The drop loop appends a
"drop" relabel rule even when drop.excludeMetrics is empty, producing a spurious
regex like "labelValue;()"; add the same length guard used for the allowlist:
check len(drop.excludeMetrics) > 0 before creating/appending the
monitoringv1.RelabelConfig in the loop (the loop iterating over dropConfigs that
currently builds SourceLabels/Regex), so only append the drop rule when
excludeMetrics is non-empty.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/metrics/relabel.go`:
- Around line 13-17: Rename the ambiguous struct field excludeMetrics in
metricDropConfig to a clearer name like droppedMetrics (or
metricsToDropForLabel) and update all references accordingly; specifically
change the field declaration in metricDropConfig and adjust any code that reads
or writes metricDropConfig (e.g., the buildRelabelConfigs function and any
callers or JSON/YAML tags) so they use the new symbol name to reflect that these
are the metrics that will be dropped for the given label combination.
- Around line 89-95: The drop loop appends a "drop" relabel rule even when
drop.excludeMetrics is empty, producing a spurious regex like "labelValue;()";
add the same length guard used for the allowlist: check len(drop.excludeMetrics)
> 0 before creating/appending the monitoringv1.RelabelConfig in the loop (the
loop iterating over dropConfigs that currently builds SourceLabels/Regex), so
only append the drop rule when excludeMetrics is non-empty.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: da0fa976-0000-4188-84d0-d85035bde888

📥 Commits

Reviewing files that changed from the base of the PR and between 962597a and ef5f21d.

📒 Files selected for processing (10)

internal/constants/constants.go
internal/constants/labels.go
internal/controller/observability/collector.go
internal/controller/observability/collector_test.go
internal/metrics/logfilemetricexporter/metric_exporter.go
internal/metrics/logfilemetricexporter/metric_exporter_test.go
internal/metrics/relabel.go
internal/metrics/relabel_test.go
internal/metrics/service_monitor.go
internal/metrics/service_monitor_test.go

✅ Files skipped from review due to trivial changes (2)

internal/constants/labels.go
internal/constants/constants.go

Clee2691 · 2026-05-07T20:30:04Z

Have you considered implementing collection profiles?

The general idea is that the admin defines the level of details for metrics at the cluster level. We have 3 profiles:
* full (default) => collects all metrics available

* minimal => collects only metrics relevant for alerting rules, dashboards and telemetry

* telemetry => collects only metrics required for telemetry
We can help with the implementation (in summary, you need to define 1 additional service monitor per profile). Happy to discuss directly on your forum.

cc @jan--f and @r2d2rnd (who informed us about the work in progress, thanks!)

@simonpasquier I've made changes to this to implement profiles. Can someone on the monitoring team also review?

jcantrill

/approve

openshift-ci · 2026-05-07T20:53:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Clee2691, jcantrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [jcantrill]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-05-07T21:42:20Z

@Clee2691: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/helpers/prometheus/prometheus.go`:
- Around line 42-74: In QueryPrometheus replace the no-timeout http.Client with
one that has a 30s Timeout, stop using Fail(...) on request/response errors and
instead log the error and return an empty map[string]interface{} (so callers can
retry), validate response.StatusCode (expect 200) before attempting to
read/unmarshal the body and handle non-200 by logging and returning empty map,
and ensure response.Body is always closed; update references in this function
(QueryPrometheus, the http.Client/Request handling, and the json.Unmarshal
logic) accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9624b7e9-9df4-450c-918c-851df0a88fa1

📥 Commits

Reviewing files that changed from the base of the PR and between ef5f21d and 99d21e3.

📒 Files selected for processing (16)

internal/constants/constants.go
internal/constants/labels.go
internal/controller/observability/collector.go
internal/controller/observability/collector_test.go
internal/metrics/logfilemetricexporter/metric_exporter.go
internal/metrics/logfilemetricexporter/metric_exporter_test.go
internal/metrics/relabel.go
internal/metrics/relabel_test.go
internal/metrics/service_monitor.go
internal/metrics/service_monitor_test.go
test/e2e/collection/metrics/profile_test.go
test/e2e/collection/metrics/suite_test.go
test/e2e/flowcontrol/utils.go
test/e2e/logfilesmetricexporter/lfme_test.go
test/e2e/operator/metrics/e2e_test.go
test/helpers/prometheus/prometheus.go

✅ Files skipped from review due to trivial changes (3)

internal/constants/labels.go
internal/metrics/logfilemetricexporter/metric_exporter_test.go
internal/constants/constants.go

🚧 Files skipped from review as they are similar to previous changes (4)

internal/controller/observability/collector.go
internal/metrics/logfilemetricexporter/metric_exporter.go
internal/controller/observability/collector_test.go
internal/metrics/service_monitor.go

openshift-ci Bot assigned jcantrill May 5, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 5, 2026

openshift-ci Bot requested a review from vparfonov May 5, 2026 21:16

qodo-code-review Bot reviewed May 5, 2026

View reviewed changes

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Clee2691 force-pushed the LOG-9356 branch from 962597a to 3287076 Compare May 5, 2026 21:51

jcantrill added the release/6.6 label May 6, 2026

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2026

jcantrill reviewed May 6, 2026

View reviewed changes

Comment thread internal/metrics/service_monitor.go Outdated

Comment thread internal/metrics/service_monitor.go Outdated

Clee2691 changed the title ~~LOG-9356: Implement Collector ServiceMontitor Changes~~ LOG-9356: Implement Collector ServiceMonitor Changes May 6, 2026

Clee2691 force-pushed the LOG-9356 branch from 3287076 to ef5f21d Compare May 7, 2026 19:30

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

jcantrill reviewed May 7, 2026

View reviewed changes

Comment thread internal/metrics/relabel.go

Comment thread internal/metrics/relabel.go

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2026

Clee2691 force-pushed the LOG-9356 branch from ef5f21d to 99d21e3 Compare May 8, 2026 22:43

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread test/helpers/prometheus/prometheus.go

LOG-9356: Implement Collector ServiceMonitor Changes

d94ab5f

Clee2691 force-pushed the LOG-9356 branch from 99d21e3 to d94ab5f Compare May 8, 2026 23:27

		Expect(string(relabelConfigs[1].Action)).To(Equal("keep"))
		Expect(relabelConfigs[1].SourceLabels).To(Equal([]monitoringv1.LabelName{"__name__"}))

-		Expect(string(relabelConfigs[1].Action)).To(Equal("keep"))
-		Expect(relabelConfigs[1].SourceLabels).To(Equal([]monitoringv1.LabelName{"__name__"}))
+		Expect(string(relabelConfigs[1].Action)).To(Equal("keep"))
+		Expect(relabelConfigs[1].SourceLabels).To(Equal([]monitoringv1.LabelName{"__name__"}))
+		Expect(relabelConfigs[1].Regex).To(Equal(
+		    "logcollector_component_event_unmatched_count|vector_buffer_byte_size|" +
+		    "vector_buffer_discarded_events_total|vector_buffer_events|" +
+		    "vector_buffer_sent_events_total|vector_component_discarded_events_total|" +
+		    "vector_component_errors_total|vector_component_received_bytes_total|" +
+		    "vector_component_received_event_bytes_total|vector_component_received_events_total|" +
+		    "vector_component_sent_bytes_total|vector_events_in_total|" +
+		    "vector_http_client_errors_total|vector_http_client_requests_sent_total|" +
+		    "vector_http_client_responses_total|vector_open_files",
+		))

Conversation

Clee2691 commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Links

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented May 5, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Links

Uh oh!

qodo-code-review Bot commented May 5, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented May 5, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (4 warnings)

Uh oh!

qodo-code-review Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Clee2691 commented May 6, 2026

Uh oh!

jcantrill commented May 6, 2026

Uh oh!

Uh oh!

Uh oh!

simonpasquier commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Clee2691 commented May 7, 2026

Uh oh!

jcantrill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

openshift-ci Bot commented May 7, 2026

Uh oh!

openshift-ci Bot commented May 7, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Clee2691 commented May 5, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented May 5, 2026 •

edited by openshift-ci Bot

Loading

qodo-code-review Bot commented May 5, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading

simonpasquier commented May 6, 2026 •

edited

Loading