You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: metrics-collector/README.md
+26-43Lines changed: 26 additions & 43 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,6 @@
1
1
# IBM Cloud Code Engine - Metrics Collector
2
2
3
-
Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds.
4
-
5
-
Those metrics can either be render
3
+
Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds. Those metrics can either be render
6
4
7
5
in **IBM Cloud Monitoring** (see [instructions](#Send-metrics-to-IBM-Cloud-Monitoring))
8
6
@@ -12,8 +10,19 @@ or in **IBM Cloud Logs** (see [instructions](#ibm-cloud-logs-setup))
1. The metrics collector exposes Prometheus metrics on `localhost:9100/metrics`
21
+
2. The embedded Prometheus agent scrapes these metrics every 30 seconds
22
+
3. The agent also discovers and scrapes pods with the `codeengine.cloud.ibm.com/userMetricsScrape: 'true'` annotation
23
+
4. All metrics are forwarded to IBM Cloud Monitoring via remote write
24
+
5. If either the collector or Prometheus agent crashes, the container exits with a non-zero code to trigger a restart
25
+
17
26
### Setup Instructions
18
27
19
28
**Step 1:** You need an IBM Cloud Monitoring instance
@@ -82,46 +91,6 @@ curl -X POST https://$REGION.monitoring.cloud.ibm.com/api/v3/dashboards \
82
91
83
92
**Note:** A more elaborated approach to manage custom Cloud Monitoring dashboards can be found [here](setup/ibm-cloud-monitoring/README.md)
84
93
85
-
### How It Works
86
-
87
-
1. The metrics collector exposes Prometheus metrics on `localhost:9100/metrics`
88
-
2. The embedded Prometheus agent scrapes these metrics every 30 seconds
89
-
3. The agent also discovers and scrapes pods with the `codeengine.cloud.ibm.com/userMetricsScrape: 'true'` annotation
90
-
4. All metrics are forwarded to IBM Cloud Monitoring via remote write
91
-
5. If either the collector or Prometheus agent crashes, the container exits with a non-zero code to trigger a restart
92
-
93
-
### Required Environment Variables for Prometheus Integration
94
-
95
-
- **`METRICS_ENABLED=true`**: Enables the Prometheus agent
96
-
- **`METRICS_REMOTE_WRITE_FQDN`**: IBM Cloud Monitoring ingestion endpoint FQDN (required when `METRICS_ENABLED=true`)
97
-
- **Secret Mount**: `/etc/secrets/monitoring-apikey` must contain your IBM Cloud Monitoring API key
98
-
99
-
### Troubleshooting
100
-
101
-
If the container fails to start with `METRICS_ENABLED=true`, check the logs for:
102
-
- Missing `/etc/secrets/monitoring-apikey` file
103
-
- Missing or wrong `METRICS_REMOTE_WRITE_FQDN` environment variable
104
-
105
-
### Configuration
106
-
107
-
Per default the metrics collector collects memory and CPU statistics, like `usage`, `current` and `configured`.
108
-
109
-
#### Environment Variables
110
-
111
-
- **`INTERVAL`** (default: `30`): Collection interval in seconds (minimum 30 seconds). Controls how frequently metrics are collected in daemon mode.
112
-
- **`COLLECT_DISKUSAGE`** (default: `false`): Set to `true` to collect disk space usage. Note: The metrics collector calculates the overall file size stored in the pod's filesystem, which includes files from the container image, ephemeral storage, and mounted COS buckets. This metric cannot be used to calculate ephemeral storage usage alone.
113
-
- **`METRICS_ENABLED`** (default: `false`): Set to `true` to enable the HTTP metrics server. When disabled, the collector still runs and logs metrics to stdout but does not expose the HTTP endpoint.
114
-
- **`METRICS_PORT`** (default: `9100`): HTTP server port for the Prometheus metrics endpoint. Only used when `METRICS_ENABLED=true` in daemon mode.
115
-
116
-
### Prometheus Metrics Endpoint
117
-
118
-
When running in **daemon mode** with **`METRICS_ENABLED=true`**, the metrics collector exposes an HTTP server on port 9100 (configurable via `METRICS_PORT`) with a `/metrics` endpoint that provides Prometheus-compatible metrics.
119
-
120
-
**Note**: The HTTP server is only started when `METRICS_ENABLED=true`. When disabled, the collector continues to run and log metrics to stdout in JSON format, but does not expose the HTTP endpoint.
121
-
122
-
#### Accessing the Metrics Endpoint
123
-
124
-
The metrics endpoint is available at `http://<pod-ip>:9100/metrics` and can be scraped by Prometheus or accessed directly.
125
94
126
95
#### Exposed Metrics
127
96
@@ -232,3 +201,17 @@ app:"codeengine" AND message.metric:"instance-resources"
If the container fails to start with `METRICS_ENABLED=true`, check the logs for:
208
+
- Missing `/etc/secrets/monitoring-apikey` file
209
+
- Missing or wrong `METRICS_REMOTE_WRITE_FQDN` environment variable
210
+
211
+
#### Environment Variables
212
+
213
+
- **`INTERVAL`** (default: `30`): Collection interval in seconds (minimum 30 seconds). Controls how frequently metrics are collected from the Kubernetes API endpoint in daemon mode.
214
+
- **`COLLECT_DISKUSAGE`** (default: `false`): Set to `true` to collect disk space usage. Note: The metrics collector calculates the overall file size stored in the pod's filesystem, which includes files from the container image, ephemeral storage, and mounted COS buckets. This metric cannot be used to calculate ephemeral storage usage alone.
215
+
- **`METRICS_ENABLED`** (default: `false`): Set to `true` to enable the HTTP metrics server. When disabled, the collector still runs and logs metrics to stdout but does not expose the HTTP endpoint.
216
+
- **`METRICS_REMOTE_WRITE_FQDN`**: IBM Cloud Monitoring ingestion endpoint FQDN (required when `METRICS_ENABLED=true`)
217
+
- **`METRICS_PORT`** (default: `9100`): HTTP server port for the Prometheus metrics endpoint. Only used when `METRICS_ENABLED=true` in daemon mode.
0 commit comments