Skip to content

Commit 297b775

Browse files
committed
cardinality of user metrics should be configurable
1 parent 33435ce commit 297b775

4 files changed

Lines changed: 154 additions & 5 deletions

File tree

metrics-collector/README.md

Lines changed: 80 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,9 +184,88 @@ The following 3 metrics are used to monitor the collector itself:
184184
#### Metric Labels
185185
186186
All container metrics include the following labels:
187-
- `ibm_codeengine_instance_name`: Name of the pod instance
188187
- `ibm_codeengine_component_type`: Type of component (`app`, `job`, or `build`)
189188
- `ibm_codeengine_component_name`: Name of the Code Engine component
189+
- `ibm_codeengine_instance_name`: Name of the pod instance (optional, see cardinality control below)
190+
- `ibm_codeengine_subcomponent_name`: Name of the app revision (optional, see cardinality control below)
191+
192+
#### User Metrics Scraping
193+
194+
The metrics collector can automatically discover and scrape custom Prometheus metrics from your Code Engine applications. To enable this feature, add the following annotations to your application:
195+
196+
**Required annotation:**
197+
- `codeengine.cloud.ibm.com/userMetricsScrape: 'true'` - Enables metrics scraping for this application
198+
199+
**Optional annotations:**
200+
- `codeengine.cloud.ibm.com/userMetricsPath: '/metrics'` - Custom metrics endpoint path (default: `/metrics`)
201+
- `codeengine.cloud.ibm.com/userMetricsPort: '2112'` - Custom metrics port
202+
203+
**Example:**
204+
```bash
205+
kubectl patch ksvc myapp --type merge -p '{
206+
"spec": {
207+
"template": {
208+
"metadata": {
209+
"annotations": {
210+
"codeengine.cloud.ibm.com/userMetricsScrape": "true",
211+
"codeengine.cloud.ibm.com/userMetricsPath": "/metrics",
212+
"codeengine.cloud.ibm.com/userMetricsPort": "2112"
213+
}
214+
}
215+
}
216+
}
217+
}'
218+
```
219+
220+
#### Cardinality Control for User Metrics
221+
222+
To manage metric cardinality and reduce costs, you can control which labels are included in scraped user metrics using the following annotations:
223+
224+
**Cardinality control annotations:**
225+
- `codeengine.cloud.ibm.com/userMetricsInstance: 'true'` - Include the `ibm_codeengine_instance_name` label (pod name)
226+
- `codeengine.cloud.ibm.com/userMetricsSubcomponent: 'true'` - Include the `ibm_codeengine_subcomponent_name` label (app revision name)
227+
228+
**Default behavior:** By default, both `ibm_codeengine_instance_name` and `ibm_codeengine_subcomponent_name` labels are **excluded** from user metrics to minimize cardinality. These labels can create high cardinality because:
229+
- Instance names change with each pod restart or scale event
230+
- Revision names change with each application update
231+
232+
**When to enable these labels:**
233+
- Enable `userMetricsInstance` when you need to track metrics per individual pod instance
234+
- Enable `userMetricsSubcomponent` when you need to compare metrics across different application revisions
235+
236+
**Example with cardinality control:**
237+
```bash
238+
# Enable user metrics scraping with instance-level granularity
239+
kubectl patch ksvc myapp --type merge -p '{
240+
"spec": {
241+
"template": {
242+
"metadata": {
243+
"annotations": {
244+
"codeengine.cloud.ibm.com/userMetricsScrape": "true",
245+
"codeengine.cloud.ibm.com/userMetricsInstance": "true"
246+
}
247+
}
248+
}
249+
}
250+
}'
251+
252+
# Enable user metrics scraping with both instance and revision granularity
253+
kubectl patch ksvc myapp --type merge -p '{
254+
"spec": {
255+
"template": {
256+
"metadata": {
257+
"annotations": {
258+
"codeengine.cloud.ibm.com/userMetricsScrape": "true",
259+
"codeengine.cloud.ibm.com/userMetricsInstance": "true",
260+
"codeengine.cloud.ibm.com/userMetricsSubcomponent": "true"
261+
}
262+
}
263+
}
264+
}
265+
}'
266+
```
267+
268+
**Note:** Only set these annotations to `'true'` when you specifically need the additional label granularity. Keeping them disabled (default) helps reduce metric cardinality and associated monitoring costs.
190269
191270
#### Example Metrics Output
192271

metrics-collector/prometheus.yml.template

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,12 @@ scrape_configs:
4949
- action: replace
5050
replacement: '${CE_PROJECT_NAME}'
5151
target_label: ibm_codeengine_project_name
52-
- source_labels: [__meta_kubernetes_pod_name]
52+
# only set ibm_codeengine_instance_name if annotation codeengine.cloud.ibm.com/userMetricsInstance is set to 'true'
53+
- source_labels: [__meta_kubernetes_pod_name, __meta_kubernetes_pod_annotation_codeengine_cloud_ibm_com_userMetricsInstance]
5354
action: replace
5455
target_label: ibm_codeengine_instance_name
56+
regex: ([^;]+);true
57+
replacement: ${1}
5558
- source_labels: [__meta_kubernetes_pod_label_serving_knative_dev_service]
5659
action: replace
5760
target_label: ibm_codeengine_component_name
@@ -60,9 +63,12 @@ scrape_configs:
6063
regex: (.+)
6164
replacement: app
6265
target_label: ibm_codeengine_component_type
63-
- source_labels: [__meta_kubernetes_pod_label_serving_knative_dev_revision]
66+
# only set ibm_codeengine_subcomponent_name if annotation codeengine.cloud.ibm.com/userMetricsSubcomponent is set to 'true'
67+
- source_labels: [__meta_kubernetes_pod_label_serving_knative_dev_revision, __meta_kubernetes_pod_annotation_codeengine_cloud_ibm_com_userMetricsSubcomponent]
6468
action: replace
6569
target_label: ibm_codeengine_subcomponent_name
70+
regex: ([^;]+);true
71+
replacement: ${1}
6672
- source_labels: [__meta_kubernetes_pod_label_serving_knative_dev_revisionUID]
6773
action: replace
6874
regex: (.+)
@@ -96,4 +102,4 @@ remote_write:
96102
# Dropping scrape metrics (e.g. scrape_duration_seconds)
97103
- source_labels: [__name__]
98104
regex: 'scrape_duration_seconds|scrape_samples_scraped|scrape_series_added|scrape_samples_post_metric_relabeling'
99-
action: drop
105+
action: drop

metrics-examples/run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ for i in "${languages[@]}"; do
4646
--src "." \
4747
--memory 0.5G \
4848
--cpu 0.25 \
49-
--env HTTPBIN_BASE_URL=https://httpbin.284uby30ujzw.ca-tor.codeengine.appdomain.cloud
49+
--env "HTTPBIN_BASE_URL=$httbin_url"
5050

5151
# Wait for the Kubernetes service resource to be ready before patching
5252
echo "Waiting for ksvc '$app_name' to be ready..."

metrics-examples/run-job

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
#!/bin/bash
2+
set -eo pipefail
3+
4+
PREFIX="${PREFIX:=metrics-example}"
5+
LANGUAGE="${LANGUAGE:=all}"
6+
7+
# Create the Code Engine project if it does not exist, yet
8+
if ! ibmcloud ce project get --name "$PREFIX" >/dev/null 2>&1;then
9+
ibmcloud ce project create --name "$PREFIX"
10+
fi
11+
12+
# Select the Code Engine project
13+
ibmcloud ce project select --name "$PREFIX" --kubecfg
14+
15+
httbin_url=https://httpbin.org
16+
if ibmcloud ce app get --name httpbin >/dev/null 2>&1; then
17+
httbin_url=`ibmcloud ce app get --name httpbin --output url`
18+
fi
19+
echo "Using HTTPBin backend URL: '$httbin_url'"
20+
21+
languages=(
22+
python
23+
)
24+
25+
for i in "${languages[@]}"; do
26+
27+
if [[ "$LANGUAGE" == "all" || "$LANGUAGE" == "$i" ]];then
28+
29+
# Create or update the job
30+
echo "Deploying Code Engine job for $i ..."
31+
job_name="${PREFIX}-job-$i"
32+
33+
create_or_update=update
34+
if ! ibmcloud ce job get --name $job_name >/dev/null 2>&1; then
35+
echo -e "\nCreating the job '$job_name' ..."
36+
create_or_update=create
37+
else
38+
echo -e "\nUpdating the job '$job_name' ..."
39+
fi
40+
41+
# Create or update the job
42+
ibmcloud ce job $create_or_update \
43+
--name $job_name \
44+
--context-dir "$i/" \
45+
--src "." \
46+
--memory 0.5G \
47+
--cpu 0.25 \
48+
--env "HTTPBIN_BASE_URL=$httbin_url" \
49+
--wait
50+
51+
52+
kubectl patch jobdefinition "$job_name" --type=json -p='[
53+
{"op":"add","path":"/metadata/labels/codeengine.cloud.ibm.com~1userMetricsScrape","value":"true"},
54+
{"op":"add","path":"/metadata/labels/codeengine.cloud.ibm.com~1userMetricsPath","value":"/metrics"},
55+
{"op":"add","path":"/metadata/labels/codeengine.cloud.ibm.com~1userMetricsPort","value":"2112"}
56+
]'
57+
58+
# Submit a jobrun
59+
else
60+
continue;
61+
fi
62+
done
63+
64+
echo "Done"

0 commit comments

Comments
 (0)