Skip to content

[WIP] Add configurable metric attributes deny list to config-observability#9090

Open
creydr wants to merge 6 commits into
knative:mainfrom
creydr:srvke-1840-high-cardinality-metrics
Open

[WIP] Add configurable metric attributes deny list to config-observability#9090
creydr wants to merge 6 commits into
knative:mainfrom
creydr:srvke-1840-high-cardinality-metrics

Conversation

@creydr
Copy link
Copy Markdown
Member

@creydr creydr commented May 20, 2026

Proposed Changes

  • 🎁 Add a metrics.attributes.deny key to the config-observability ConfigMap that accepts a comma-separated list of metric attribute keys to filter out
    • When configured, an OTel View strips the specified attributes from all kn.eventing.* metrics at the SDK level
    • Default is empty (existing behavior preserved, no filtering)

Example usage to strip high-cardinality attributes:

metrics.attributes.deny: "cloudevents.type,messaging.destination.name"
Add `metrics.attributes.deny` option to config-observability ConfigMap to allow filtering out configurable metric attributes from kn.eventing.* metrics, helping prevent potential OOM issues caused by unbounded metric cardinality in production.

Add a `metrics.high-cardinality.disable` key to the config-observability
ConfigMap that allows stripping unbounded metric attributes
(cloudevents.type, messaging.destination.name) via an OTel View.
This helps prevent OOM issues caused by high metric cardinality in
production without changing any instrumentation code.
@knative-prow knative-prow Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 20, 2026
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: creydr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow Bot requested review from Leo6Leo and aliok May 20, 2026 07:21
@knative-prow knative-prow Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2026
@creydr creydr requested a review from maschmid May 20, 2026 07:22
@creydr creydr changed the title Add config option to disable high-cardinality metric attributes [WIP] Add config option to disable high-cardinality metric attributes May 20, 2026
@knative-prow knative-prow Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.03%. Comparing base (504f7e0) to head (99cabfb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/observability/otel/otel.go 71.42% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9090      +/-   ##
==========================================
+ Coverage   50.99%   51.03%   +0.04%     
==========================================
  Files         411      411              
  Lines       22112    22133      +21     
==========================================
+ Hits        11276    11296      +20     
- Misses       9967     9969       +2     
+ Partials      869      868       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

creydr added 2 commits May 20, 2026 10:03
Replace DisableHighCardinalityMetrics boolean with a configurable
metrics.attributes.deny comma-separated list in config-observability.
Users can specify exactly which metric attribute keys to strip from
kn.eventing.* metrics, allowing fine-grained control over cardinality.
Default is empty (no filtering), preserving existing behavior.
@creydr creydr changed the title [WIP] Add config option to disable high-cardinality metric attributes Add configurable metric attributes deny list May 20, 2026
@knative-prow knative-prow Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2026
@creydr
Copy link
Copy Markdown
Member Author

creydr commented May 20, 2026

Waiting until all pods in namespace cert-manager are up......................................................................................................................................................
ERROR: timeout waiting for pods to come up

/test reconciler-tests

@creydr creydr changed the title Add configurable metric attributes deny list Add config option to disable high-cardinality metric attributes May 20, 2026
@creydr creydr changed the title Add config option to disable high-cardinality metric attributes Add configurable metric attributes deny list to config-observability May 20, 2026
Comment thread pkg/observability/otel/otel.go Outdated
keys[i] = attribute.Key(k)
}
return metric.NewView(
metric.Instrument{Name: "kn.eventing.*"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the filter would only filter the kn_eventing metrics, and not touch e.g. the http metrics, like

http_client_request_body_size_bytes_count ? (because , those also have stuff like

(from prometheus:)

http_client_request_body_size_bytes_count{cloudevents_datacontenttype="application/json",cloudevents_source="ocf-qe/test/imc-broker-k8s-counter-0/counter-0",cloudevents_specversion="1.0",cloudevents_type="com.redhat.cee.gitlab.ocf-qe.ocf-qe-testsuite.structured",http_request_method="POST",http_response_status_code="202",kn_broker_name="broker-0",kn_broker_namespace="imc-broker-k8s-counter-0",messaging_destination_name="trigger:receiver-0.imc-broker-k8s-counter-0",messaging_operation_name="send",messaging_system="knative",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="receiver-0.imc-broker-k8s-counter-0.svc.cluster.local",url_scheme="http"} 5

which might be problematic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this filter applies only to kn_eventing metrics (I thought, that the "problematic" ones are from there only).
But we could also add another config field, which allows the user to set this (defaulting to "kn.eventing.*" for example) 🤷

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, looking at mt-broker-filter and mt-broker-ingress, the most problematic ones are in the http.*

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. then lets use * as filter (instead of adding another config option)

The deny filter was scoped to kn.eventing.* instruments only, but
high-cardinality attributes like cloudevents.type also appear on
http.* metrics via the otelhttp instrumentation. Apply the filter
to all instruments so it covers every metric that inherits labels
from the context labeler.
@creydr creydr requested a review from maschmid May 20, 2026 15:16
Comment thread pkg/observability/config.go Outdated
EnableSinkEventErrorReporting bool `json:"enableSinkEventErrorReporting"`

// MetricAttributesDenyList is a list of metric attribute keys to filter out
// from kn.eventing.* metrics (e.g. cloudevents.type, messaging.destination.name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment should be updated (not specific to kn.eventing.* anymore)

@maschmid
Copy link
Copy Markdown
Contributor

I have tried building and running the patch on top of serverless release-v1.21 , it seems to work for the mt-broker filter and ingress, but not for the imc-dispatcher.

Claude says:

Check if the in_memory channel_dispatcher initializes the otel in such way that this new setting also applies there.                                                                                                                                                                    

● Explore(Trace channel dispatcher OTel init)
  ⎿  Done (29 tool uses · 53.0k tokens · 1m 20s)
  (ctrl+o to expand)

● No, the in-memory channel dispatcher will NOT pick up the deny filter. It uses a completely different observability init path.

  The channel dispatcher calls sharedmain.MainWithContext() from knative.dev/pkg, which has its own SetupObservabilityOrDie — separate from eventing's version in pkg/observability/otel/otel.go. The eventing-specific metricAttributesDenyFilter view is never injected into that path.

  Here's how the two paths compare:

  ┌────────────────────────────────────────┬──────────────────────────────────────────────────────┬───────────────────┐
  │               Component                │                      Init path                       │ Gets deny filter? │
  ├────────────────────────────────────────┼──────────────────────────────────────────────────────┼───────────────────┤
  │ broker/filter, broker/ingress, jobsink │ eventing's otel.SetupObservabilityOrDie              │ Yes               │
  ├────────────────────────────────────────┼──────────────────────────────────────────────────────┼───────────────────┤
  │ in-memory channel dispatcher           │ knative.dev/pkg/injection/sharedmain.MainWithContext │ No                │
  └────────────────────────────────────────┴──────────────────────────────────────────────────────┴───────────────────┘

  The dispatcher's controller (pkg/reconciler/inmemorychannel/dispatcher/controller.go:86) does call otel.GetMeterProvider() to get the global, but that global was set by sharedmain — not by eventing's setup function — so it has no views attached.

  To fix this, knative.dev/pkg's sharedmain already supports WithOTelViews(ctx, ...metric.View) via context. In cmd/in_memory/channel_dispatcher/main.go, you could read the observability configmap, parse the deny list, and call sharedmain.WithOTelViews(ctx, 
  metricAttributesDenyFilter(denyList)) before sharedmain.MainWithContext(). Alternatively, you could push the deny-list view support into knative.dev/pkg's sharedmain itself so all components get it automatically.

@creydr creydr changed the title Add configurable metric attributes deny list to config-observability [WIP] Add configurable metric attributes deny list to config-observability May 21, 2026
@knative-prow knative-prow Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@creydr
Copy link
Copy Markdown
Member Author

creydr commented May 21, 2026

@maschmid I've started moving this functionality into knative.dev/pkg directly: knative/pkg#3356

The main reason is that components like the IMC channel dispatcher use sharedmain.SetupObservabilityOrDie from knative.dev/pkg instead of eventing's own observability setup as you pointed out. To make the deny filter work there from eventing, we'd need to do a one-shot kubeclient call in the dispatcher's main.go to read the config-observability ConfigMap, parse the deny list, construct the View, and inject it via sharedmain.WithOTelViews(ctx, ...) all before calling sharedmain.MainWithContext, having the following issues:

  • It duplicates config reading that sharedmain already does internally (it reads config-observability as part of SetupObservabilityOrDie)
  • It requires building a kubeclient earlier than normal just to read one config key
  • Every component using sharedmain would need the same boilerplate

By adding metrics-attributes-deny in knative.dev/pkg (alongside metrics-protocol, metrics-endpoint, etc.), the deny filter is automatically wired into sharedmain.SetupObservabilityOrDie. This means every Knative component using sharedmain (eventing, serving, etc.) gets deny-list support without any component-specific code.

Once the knative.dev/pkg PR is merged, we don't need this PR here anymore and only need to revendor it.

Once the knative.dev/pkg PR is merged and it got vendored into eventing-core, we can rebase this PR (we still need it for components which not use sharedmain.SetupObservabilityOrDie (like mt-broker-filter and -ingress)

@creydr
Copy link
Copy Markdown
Member Author

creydr commented May 21, 2026

/hold

@knative-prow knative-prow Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2026
Remove eventing-specific MetricAttributesDenyList field, parsing, and
filter function. Use cfg.Metrics.AttributesDenyList() and
metrics.MetricAttributesDenyFilter from knative.dev/pkg instead.

Rename ConfigMap key from metrics.attributes.deny to
metrics-attributes-deny to match knative.dev/pkg convention.

Depends on: knative/pkg#3356
@knative-prow knative-prow Bot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 21, 2026
@knative-prow knative-prow Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 21, 2026
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 21, 2026

@creydr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
build-tests_eventing_main c288ebb link true /test build-tests
unit-tests_eventing_main c288ebb link true /test unit-tests
upgrade-tests_eventing_main c288ebb link true /test upgrade-tests
reconciler-tests_eventing_main c288ebb link true /test reconciler-tests
conformance-tests_eventing_main c288ebb link true /test conformance-tests

Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants