Skip to content

Commit 10118f0

Browse files
committed
Streamlined the Readme
1 parent 78015b5 commit 10118f0

8 files changed

Lines changed: 53 additions & 82 deletions

metrics-collector/README.md

Lines changed: 53 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,50 @@
11
# IBM Cloud Code Engine - Metrics Collector
22

3-
Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds
3+
Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds.
44

5-
![Dashboard overview](./images/icl-dashboard-overview.png)
6-
7-
## Installation
5+
Those metrics can either be render
86

9-
## Capture metrics every n seconds
7+
in **IBM Cloud Monitoring** (see [instructions](#Send-metrics-to-IBM-Cloud-Monitoring))
108

11-
* Create Code Engine job template
12-
```
13-
$ ibmcloud ce job create \
14-
--name metrics-collector \
15-
--src . \
16-
--mode daemon \
17-
--cpu 0.25 \
18-
--memory 0.5G \
19-
--wait
20-
```
9+
![](./images/monitoring-dashboard-ce-component-resources.png)
2110

22-
* Submit a daemon job that collects metrics in an endless loop. The daemon job queries the Metrics API every 30 seconds
23-
```
24-
$ ibmcloud ce jobrun submit \
25-
--job metrics-collector \
26-
--env INTERVAL=30
27-
```
11+
or in **IBM Cloud Logs** (see [instructions](#ibm-cloud-logs-setup))
2812

13+
![Dashboard overview](./images/icl-dashboard-overview.png)
2914

3015
## Send metrics to IBM Cloud Monitoring
3116

32-
When `METRICS_ENABLED=true`, the metrics collector runs an embedded Prometheus agent that scrapes metrics from the local `/metrics` endpoint and forwards them to IBM Cloud Monitoring.
33-
34-
![](./images/monitoring-dashboard-ce-component-resources.png)
35-
36-
### Prerequisites
37-
38-
1. **IBM Cloud Monitoring Instance**: You need an IBM Cloud Monitoring instance with an API key
39-
2. **Code Engine project**: The collector must run in a Code Engine project
40-
4117
### Setup Instructions
4218

43-
**Step 1: Create a secret with your IBM Cloud Monitoring API key**
19+
**Step 1:** You need an IBM Cloud Monitoring instance
4420
```bash
45-
ibmcloud ce secret create --name monitoring-apikey --from-literal monitoring-apikey=<YOUR_IBM_CLOUD_MONITORING_API_KEY>
21+
REGION=<yourMonitoringInstanceRegion>
22+
MONITORING_INSTANCE_NAME="<yourMonitoringInstanceName>"
23+
MONITORING_INSTANCE_GUID=$(ibmcloud resource service-instance "$MONITORING_INSTANCE_NAME" -o JSON|jq -r '.[0].guid')
24+
echo "MONITORING_INSTANCE_GUID: '$MONITORING_INSTANCE_GUID'"
4625
```
26+
**Step 2:** The collector must run in a Code Engine project
27+
```bash
28+
# Create new Code Engine project
29+
ibmcloud ce project create --name <yourCodeEngineProjectName>
4730

48-
**Step 2: Determine your IBM Cloud Monitoring ingestion endpoint**
31+
# Select an existing Code Engine project
32+
ibmcloud ce project select --name <yourProjectName>
33+
```
4934
35+
**Step 3:** Create a secret with your IBM Cloud Monitoring API token
36+
```bash
37+
# Obtain the Monitoring API token of the IBM Cloud Monitoring instance
38+
# using the IAM access token of the current IBM CLI Session
39+
MONITORING_INSTANCE_MONITORING_API_KEY=$(curl --silent -X GET https://$REGION.monitoring.cloud.ibm.com/api/token -H "Authorization: $(ibmcloud iam oauth-tokens --output JSON|jq -r '.iam_token')" -H "IBMInstanceID: $MONITORING_INSTANCE_GUID" -H "content-type: application/json"|jq -r '.token.key')
40+
41+
# Create a Code Engine secret that stores the Monitoring API Key
42+
ibmcloud ce secret create \
43+
--name monitoring-apikey \
44+
--from-literal monitoring-apikey=$MONITORING_INSTANCE_MONITORING_API_KEY
45+
```
5046
51-
**Step 3: Update your job with the required configuration**
47+
**Step 4:** Create your metrics-collector job with the required configuration
5248
```bash
5349
ibmcloud ce job create \
5450
--name metrics-collector \
@@ -64,18 +60,32 @@ ibmcloud ce job create \
6460
--mount-secret /etc/secrets=monitoring-apikey
6561
```
6662
67-
**Step 4: Submit a job run**
63+
**Step 5:** Submit a daemon job run**
6864
```bash
6965
ibmcloud ce jobrun submit \
7066
--job metrics-collector
7167
```
7268
73-
**Step 5: Setup the Cloud Monitoring dashboard as decribed [here](setup/ibm-cloud-monitoring/README.md)**
69+
**Step 6:** Import the "IBM Cloud Code Engine - Component Resource Overview" dashboard
70+
```bash
71+
# Load the most recent dashboard configuration
72+
CE_MONITORING_DASHBOARD=$(curl -sL https://raw.githubusercontent.com/IBM/CodeEngine/main/metrics-collector/setup/ibm-cloud-monitoring/code-engine-component-resource-overview.json)
73+
74+
# Import the dashboard
75+
curl -X POST https://$REGION.monitoring.cloud.ibm.com/api/v3/dashboards \
76+
-H "Authorization: $(ibmcloud iam oauth-tokens --output JSON|jq -r '.iam_token')" \
77+
-H "IBMInstanceID: $MONITORING_INSTANCE_GUID" \
78+
-H "Content-Type: application/json" \
79+
-d "{\"dashboard\": $CE_MONITORING_DASHBOARD}"
80+
81+
```
82+
83+
**Note:** A more elaborated approach to manage custom Cloud Monitoring dashboards can be found [here](setup/ibm-cloud-monitoring/README.md)
7484
7585
### How It Works
7686
7787
1. The metrics collector exposes Prometheus metrics on `localhost:9100/metrics`
78-
2. The embedded Prometheus agent scrapes these metrics every 15 seconds
88+
2. The embedded Prometheus agent scrapes these metrics every 30 seconds
7989
3. The agent also discovers and scrapes pods with the `codeengine.cloud.ibm.com/userMetricsScrape: 'true'` annotation
8090
4. All metrics are forwarded to IBM Cloud Monitoring via remote write
8191
5. If either the collector or Prometheus agent crashes, the container exits with a non-zero code to trigger a restart
@@ -90,7 +100,7 @@ ibmcloud ce jobrun submit \
90100
91101
If the container fails to start with `METRICS_ENABLED=true`, check the logs for:
92102
- Missing `/etc/secrets/monitoring-apikey` file
93-
- Missing `METRICS_REMOTE_WRITE_FQDN` environment variable
103+
- Missing or wrong `METRICS_REMOTE_WRITE_FQDN` environment variable
94104
95105
### Configuration
96106
@@ -132,26 +142,22 @@ The following 3 metrics are used to monitor the collector itself:
132142
#### Metric Labels
133143
134144
All container metrics include the following labels:
135-
- `instance_name`: Name of the pod instance
136-
- `component_type`: Type of component (`app`, `job`, or `build`)
137-
- `component_name`: Name of the Code Engine component
145+
- `ibm_codeengine_instance_name`: Name of the pod instance
146+
- `ibm_codeengine_component_type`: Type of component (`app`, `job`, or `build`)
147+
- `ibm_codeengine_component_name`: Name of the Code Engine component
138148
139149
#### Example Metrics Output
140150
141151
```prometheus
142152
# HELP ibm_codeengine_instance_cpu_usage_millicores Current CPU usage in millicores
143153
# TYPE ibm_codeengine_instance_cpu_usage_millicores gauge
144-
ibm_codeengine_instance_cpu_usage_millicores{pod_name="myapp-00001-deployment-abc123",component_type="app",component_name="myapp"} 250
154+
ibm_codeengine_instance_cpu_usage_millicores{ibm_codeengine_instance_name="myapp-00001-deployment-abc123",ibm_codeengine_component_type="app",ibm_codeengine_component_name="myapp"} 250
145155
146156
# HELP ibm_codeengine_instance_memory_usage_bytes Current memory usage in bytes
147157
# TYPE ibm_codeengine_instance_memory_usage_bytes gauge
148-
ibm_codeengine_instance_memory_usage_bytes{pod_name="myapp-00001-deployment-abc123",component_type="app",component_name="myapp"} 134217728
158+
ibm_codeengine_instance_memory_usage_bytes{ibm_codeengine_instance_name="myapp-00001-deployment-abc123",ibm_codeengine_component_type="app",ibm_codeengine_component_name="myapp"} 134217728
149159
```
150160
151-
#### Prometheus Scrape Configuration
152-
153-
**Note**: The HTTP server is only started when `METRICS_ENABLED=true` and running in daemon mode (`JOB_MODE != "task"`). In task mode, metrics are collected once and logged to stdout without starting the HTTP server. When `METRICS_ENABLED` is not set to `true`, the collector runs in daemon mode but only logs metrics to stdout without exposing the HTTP endpoint.
154-
155161
## IBM Cloud Logs setup
156162
157163
Once your IBM Cloud Code Engine project has detected a corresponding IBM Cloud Logs instance, which is configured to receive platform logs, you can consume the resource metrics in IBM Cloud Logs. Use the filter `metric:instance-resources` to filter for log lines that print resource metrics for each detected IBM Cloud Code Engine instance that is running in a project.
@@ -214,8 +220,6 @@ app:"codeengine" AND message.metric:"instance-resources"
214220
215221
* In the top-right corner, select `1-line` as view mode
216222
217-
![View](./images/icl-logs-view-mode.png)
218-
219223
* In the graph title it says "**Count** all grouped by **Severity**". Click on `Severity` and select `message.component_name` instead. Furthermore, select `Max` as aggregation metric and choose `message.memory.usage` as aggregation field
220224
221225
![Graph](./images/icl-logs-view-graph.png)
@@ -228,36 +232,3 @@ app:"codeengine" AND message.metric:"instance-resources"
228232
229233
![Logs overview](./images/icl-logs-view-overview.png)
230234
231-
### Log graphs
232-
233-
Best is to create IBM Cloud Logs Board, in order to visualize the CPU and Memory usage per Code Engine component.
234-
235-
1. In your log instance navigate to Boards
236-
1. Give it a proper name, enter `metric:instance-resources` as query and submit by clicking `Add Graph`
237-
![New Board](./images/new-board.png)
238-
1. Now the graph shows the overall amount of logs captured for the specified query per time interval
239-
![Count of metrics log lines ](./images/count-of-metrics-lines.png)
240-
1. Click on the filter icon above the graph and put in `metric:instance-resources AND component_name:<app-name>`
241-
1. Switch the metric of the Graph to `Maximums`
242-
1. Below the graph Add a new plot`cpu.usage` as field and choose `ANY` as field values
243-
![Configure Graph plots](./images/configure-plots.png)
244-
1. Add another plot for the field `memory.usage` and values `ANY`
245-
1. Finally delete the plot `metrics:instance-resources` and adjust the plot colors to your likings
246-
![Resource Usage graph](./images/resource-usage-graph.png)
247-
1. The usage graph above renders the utilization in % of the CPU and Memory
248-
249-
#### Add CPU utilization
250-
1. Duplicate the graph, change its name to CPU and replace its plots with `cpu.configured` and `cpu.current`.
251-
- The resulting graph will render the actual CPU usage compared to the configured limit. The the unit is milli vCPUs (1000 -> 1 vCPU).
252-
![](./images/cpu-utilization.png)
253-
254-
#### Add memory utilization
255-
1. Duplicate the graph, change its name to Memory and replace its plots with `memory.configured` and `memory.current`.
256-
1. The resulting graph will render the actual memory usage compared to the configured limit. The the unit is MB (1000 -> 1 GB).
257-
![](./images/memory-utilization.png)
258-
259-
#### Add disk utilization
260-
1. Duplicate the graph or create a new one, change its name to "Disk usage" and replace its plots with `disk_usage.current`.
261-
1. The resulting graph will render the actual disk usage. While this does not allow to identify the usage of disk space compared with the configured ephemeral storage limit, this graph gives an impression on whether the disk usage is growing over time. The the unit is MB (1000 -> 1 GB).
262-
![](./images/disk-utilization.png)
263-
-55 KB
Binary file not shown.
-41.1 KB
Binary file not shown.
-39.2 KB
Binary file not shown.
-378 KB
Binary file not shown.
-44.4 KB
Binary file not shown.
-30.7 KB
Binary file not shown.
-54.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)