[WIP] Add configurable metric attributes deny list to config-observability#9090
[WIP] Add configurable metric attributes deny list to config-observability#9090creydr wants to merge 6 commits into
Conversation
Add a `metrics.high-cardinality.disable` key to the config-observability ConfigMap that allows stripping unbounded metric attributes (cloudevents.type, messaging.destination.name) via an OTel View. This helps prevent OOM issues caused by high metric cardinality in production without changing any instrumentation code.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: creydr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9090 +/- ##
==========================================
+ Coverage 50.99% 51.03% +0.04%
==========================================
Files 411 411
Lines 22112 22133 +21
==========================================
+ Hits 11276 11296 +20
- Misses 9967 9969 +2
+ Partials 869 868 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replace DisableHighCardinalityMetrics boolean with a configurable metrics.attributes.deny comma-separated list in config-observability. Users can specify exactly which metric attribute keys to strip from kn.eventing.* metrics, allowing fine-grained control over cardinality. Default is empty (no filtering), preserving existing behavior.
/test reconciler-tests |
| keys[i] = attribute.Key(k) | ||
| } | ||
| return metric.NewView( | ||
| metric.Instrument{Name: "kn.eventing.*"}, |
There was a problem hiding this comment.
Does this mean the filter would only filter the kn_eventing metrics, and not touch e.g. the http metrics, like
http_client_request_body_size_bytes_count ? (because , those also have stuff like
(from prometheus:)
http_client_request_body_size_bytes_count{cloudevents_datacontenttype="application/json",cloudevents_source="ocf-qe/test/imc-broker-k8s-counter-0/counter-0",cloudevents_specversion="1.0",cloudevents_type="com.redhat.cee.gitlab.ocf-qe.ocf-qe-testsuite.structured",http_request_method="POST",http_response_status_code="202",kn_broker_name="broker-0",kn_broker_namespace="imc-broker-k8s-counter-0",messaging_destination_name="trigger:receiver-0.imc-broker-k8s-counter-0",messaging_operation_name="send",messaging_system="knative",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.64.0",server_address="receiver-0.imc-broker-k8s-counter-0.svc.cluster.local",url_scheme="http"} 5
which might be problematic.
There was a problem hiding this comment.
yes this filter applies only to kn_eventing metrics (I thought, that the "problematic" ones are from there only).
But we could also add another config field, which allows the user to set this (defaulting to "kn.eventing.*" for example) 🤷
There was a problem hiding this comment.
well, looking at mt-broker-filter and mt-broker-ingress, the most problematic ones are in the http.*
There was a problem hiding this comment.
ah ok. then lets use * as filter (instead of adding another config option)
The deny filter was scoped to kn.eventing.* instruments only, but high-cardinality attributes like cloudevents.type also appear on http.* metrics via the otelhttp instrumentation. Apply the filter to all instruments so it covers every metric that inherits labels from the context labeler.
| EnableSinkEventErrorReporting bool `json:"enableSinkEventErrorReporting"` | ||
|
|
||
| // MetricAttributesDenyList is a list of metric attribute keys to filter out | ||
| // from kn.eventing.* metrics (e.g. cloudevents.type, messaging.destination.name) |
There was a problem hiding this comment.
comment should be updated (not specific to kn.eventing.* anymore)
|
I have tried building and running the patch on top of serverless release-v1.21 , it seems to work for the mt-broker filter and ingress, but not for the imc-dispatcher. Claude says: |
|
@maschmid I've started moving this functionality into The main reason is that components like the IMC channel dispatcher use
By adding
Once the |
|
/hold |
Remove eventing-specific MetricAttributesDenyList field, parsing, and filter function. Use cfg.Metrics.AttributesDenyList() and metrics.MetricAttributesDenyFilter from knative.dev/pkg instead. Rename ConfigMap key from metrics.attributes.deny to metrics-attributes-deny to match knative.dev/pkg convention. Depends on: knative/pkg#3356
|
@creydr: The following tests failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Proposed Changes
metrics.attributes.denykey to theconfig-observabilityConfigMap that accepts a comma-separated list of metric attribute keys to filter outkn.eventing.*metrics at the SDK levelExample usage to strip high-cardinality attributes: