Skip to content

Commit 954371a

Browse files
committed
Updated the readme
1 parent 10118f0 commit 954371a

2 files changed

Lines changed: 26 additions & 43 deletions

File tree

metrics-collector/README.md

Lines changed: 26 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
# IBM Cloud Code Engine - Metrics Collector
22

3-
Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds.
4-
5-
Those metrics can either be render
3+
Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds. Those metrics can either be render
64

75
in **IBM Cloud Monitoring** (see [instructions](#Send-metrics-to-IBM-Cloud-Monitoring))
86

@@ -12,8 +10,19 @@ or in **IBM Cloud Logs** (see [instructions](#ibm-cloud-logs-setup))
1210

1311
![Dashboard overview](./images/icl-dashboard-overview.png)
1412

13+
1514
## Send metrics to IBM Cloud Monitoring
1615

16+
### How It Works
17+
18+
![](./images/metrics-collector.overview.png)
19+
20+
1. The metrics collector exposes Prometheus metrics on `localhost:9100/metrics`
21+
2. The embedded Prometheus agent scrapes these metrics every 30 seconds
22+
3. The agent also discovers and scrapes pods with the `codeengine.cloud.ibm.com/userMetricsScrape: 'true'` annotation
23+
4. All metrics are forwarded to IBM Cloud Monitoring via remote write
24+
5. If either the collector or Prometheus agent crashes, the container exits with a non-zero code to trigger a restart
25+
1726
### Setup Instructions
1827

1928
**Step 1:** You need an IBM Cloud Monitoring instance
@@ -82,46 +91,6 @@ curl -X POST https://$REGION.monitoring.cloud.ibm.com/api/v3/dashboards \
8291
8392
**Note:** A more elaborated approach to manage custom Cloud Monitoring dashboards can be found [here](setup/ibm-cloud-monitoring/README.md)
8493
85-
### How It Works
86-
87-
1. The metrics collector exposes Prometheus metrics on `localhost:9100/metrics`
88-
2. The embedded Prometheus agent scrapes these metrics every 30 seconds
89-
3. The agent also discovers and scrapes pods with the `codeengine.cloud.ibm.com/userMetricsScrape: 'true'` annotation
90-
4. All metrics are forwarded to IBM Cloud Monitoring via remote write
91-
5. If either the collector or Prometheus agent crashes, the container exits with a non-zero code to trigger a restart
92-
93-
### Required Environment Variables for Prometheus Integration
94-
95-
- **`METRICS_ENABLED=true`**: Enables the Prometheus agent
96-
- **`METRICS_REMOTE_WRITE_FQDN`**: IBM Cloud Monitoring ingestion endpoint FQDN (required when `METRICS_ENABLED=true`)
97-
- **Secret Mount**: `/etc/secrets/monitoring-apikey` must contain your IBM Cloud Monitoring API key
98-
99-
### Troubleshooting
100-
101-
If the container fails to start with `METRICS_ENABLED=true`, check the logs for:
102-
- Missing `/etc/secrets/monitoring-apikey` file
103-
- Missing or wrong `METRICS_REMOTE_WRITE_FQDN` environment variable
104-
105-
### Configuration
106-
107-
Per default the metrics collector collects memory and CPU statistics, like `usage`, `current` and `configured`.
108-
109-
#### Environment Variables
110-
111-
- **`INTERVAL`** (default: `30`): Collection interval in seconds (minimum 30 seconds). Controls how frequently metrics are collected in daemon mode.
112-
- **`COLLECT_DISKUSAGE`** (default: `false`): Set to `true` to collect disk space usage. Note: The metrics collector calculates the overall file size stored in the pod's filesystem, which includes files from the container image, ephemeral storage, and mounted COS buckets. This metric cannot be used to calculate ephemeral storage usage alone.
113-
- **`METRICS_ENABLED`** (default: `false`): Set to `true` to enable the HTTP metrics server. When disabled, the collector still runs and logs metrics to stdout but does not expose the HTTP endpoint.
114-
- **`METRICS_PORT`** (default: `9100`): HTTP server port for the Prometheus metrics endpoint. Only used when `METRICS_ENABLED=true` in daemon mode.
115-
116-
### Prometheus Metrics Endpoint
117-
118-
When running in **daemon mode** with **`METRICS_ENABLED=true`**, the metrics collector exposes an HTTP server on port 9100 (configurable via `METRICS_PORT`) with a `/metrics` endpoint that provides Prometheus-compatible metrics.
119-
120-
**Note**: The HTTP server is only started when `METRICS_ENABLED=true`. When disabled, the collector continues to run and log metrics to stdout in JSON format, but does not expose the HTTP endpoint.
121-
122-
#### Accessing the Metrics Endpoint
123-
124-
The metrics endpoint is available at `http://<pod-ip>:9100/metrics` and can be scraped by Prometheus or accessed directly.
12594
12695
#### Exposed Metrics
12796
@@ -232,3 +201,17 @@ app:"codeengine" AND message.metric:"instance-resources"
232201
233202
![Logs overview](./images/icl-logs-view-overview.png)
234203
204+
205+
### Troubleshooting & Configuration
206+
207+
If the container fails to start with `METRICS_ENABLED=true`, check the logs for:
208+
- Missing `/etc/secrets/monitoring-apikey` file
209+
- Missing or wrong `METRICS_REMOTE_WRITE_FQDN` environment variable
210+
211+
#### Environment Variables
212+
213+
- **`INTERVAL`** (default: `30`): Collection interval in seconds (minimum 30 seconds). Controls how frequently metrics are collected from the Kubernetes API endpoint in daemon mode.
214+
- **`COLLECT_DISKUSAGE`** (default: `false`): Set to `true` to collect disk space usage. Note: The metrics collector calculates the overall file size stored in the pod's filesystem, which includes files from the container image, ephemeral storage, and mounted COS buckets. This metric cannot be used to calculate ephemeral storage usage alone.
215+
- **`METRICS_ENABLED`** (default: `false`): Set to `true` to enable the HTTP metrics server. When disabled, the collector still runs and logs metrics to stdout but does not expose the HTTP endpoint.
216+
- **`METRICS_REMOTE_WRITE_FQDN`**: IBM Cloud Monitoring ingestion endpoint FQDN (required when `METRICS_ENABLED=true`)
217+
- **`METRICS_PORT`** (default: `9100`): HTTP server port for the Prometheus metrics endpoint. Only used when `METRICS_ENABLED=true` in daemon mode.
103 KB
Loading

0 commit comments

Comments
 (0)