Skip to content

Commit be63b64

Browse files
authored
Merge branch 'master' into rob-1126_headers_for_centralized_prometheus
2 parents 93d87b9 + edc11dd commit be63b64

6 files changed

Lines changed: 425 additions & 45 deletions

File tree

docs/images/logs-to-metrics.png

263 KB
Loading
132 KB
Loading
27.8 KB
Loading

docs/playbook-reference/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@
1212
automatic-remediation-examples/index
1313
prometheus-examples/index
1414
kubernetes-examples/index
15+
logs-triggers/index
Lines changed: 378 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,378 @@
1+
:hide-toc:
2+
3+
Automate Responses to Application Logs
4+
==========================================
5+
6+
This tutorial walks you through building an automation that detects specific patterns in Kubernetes pod logs and responds automatically.
7+
8+
For example, we’ll show how to restart a pod if it logs a database connection error.
9+
10+
To achieve this, we’ll use:
11+
12+
- **Fluent Bit**: Monitors pod logs and converts matching log lines into Prometheus metrics.
13+
- **Prometheus**: Stores the metrics and triggers alerts based on them. (We're going to use Robusta's bundled Prometheus here, but it can work with any other Prometheus distribution)
14+
- **Robusta**: Executes an automated playbook when an alert is fired, such as restarting the affected pod.
15+
16+
.. image:: /images/logs-to-metrics.png
17+
18+
Let's get started!
19+
20+
**Step 1: Create a namespace for the demo**
21+
----------------------------------------------------
22+
23+
.. code-block:: yaml
24+
25+
kubectl create namespace log-triggers
26+
kubectl config set-context --current --namespace log-triggers
27+
28+
**Step 2: Parse Logs into Metrics with Fluent Bit**
29+
----------------------------------------------------
30+
31+
First, lets configure Fluent Bit to monitor your pod logs and generate Prometheus metrics for specific log patterns.
32+
33+
- We'll configure Fluent Bit as a DaemonSet - there will be a pod on each k8s node
34+
- In the example below, we define 2 logs matchers, that create 2 different metrics
35+
- We'll use Prometheus exporter, so deliver the metrics to Prometheus
36+
37+
This is our Fluent Bit configuration (``fluentbit-values.yaml``):
38+
39+
.. code-block:: yaml
40+
41+
config:
42+
service: |
43+
[SERVICE]
44+
Flush 1
45+
Daemon Off
46+
Log_Level info
47+
HTTP_Server On
48+
HTTP_Listen 0.0.0.0
49+
HTTP_Port 2020
50+
51+
inputs: |
52+
[INPUT]
53+
Name tail
54+
Tag kube.*
55+
Path /var/log/containers/*.log
56+
Parser json_message
57+
DB /var/log/flb_kube.db
58+
Mem_Buf_Limit 5MB
59+
Skip_Long_Lines On
60+
Refresh_Interval 10
61+
62+
[INPUT]
63+
Name dummy
64+
Tag dummy.alive
65+
Dummy {"log":"keepalive"}
66+
67+
parsers: |
68+
[PARSER]
69+
Name wrap_raw_line
70+
Format regex
71+
Regex ^(?<log>.*)$
72+
73+
filters: |
74+
[FILTER]
75+
Name kubernetes
76+
Match kube.*
77+
K8S-Logging.Parser On
78+
K8S-Logging.Exclude On
79+
80+
[FILTER]
81+
name log_to_metrics
82+
match *
83+
tag log_metrics
84+
metric_mode counter
85+
metric_name mysql_connection_error
86+
metric_description MySql connection errors
87+
regex log .*mysql connection error.*
88+
add_label pod $kubernetes['pod_name']
89+
add_label namespace $kubernetes['namespace_name']
90+
add_label container $kubernetes['container_name']
91+
92+
[FILTER]
93+
name log_to_metrics
94+
match *
95+
tag log_metrics
96+
metric_mode counter
97+
metric_name dns_error
98+
metric_description DNS Resolution errors
99+
regex log .*dns error.*
100+
add_label pod $kubernetes['pod_name']
101+
add_label namespace $kubernetes['namespace_name']
102+
add_label container $kubernetes['container_name']
103+
104+
[FILTER]
105+
Name log_to_metrics
106+
Match dummy.alive
107+
Metric_Name fluentbit_keepalive
108+
Metric_Description Dummy metric to keep /metrics available
109+
Metric_Mode counter
110+
Tag log_metrics
111+
Regex log .*keepalive.*
112+
Flush_Interval_Sec 10 # Process and flush metrics every 60 seconds
113+
114+
outputs: |
115+
[OUTPUT]
116+
Name prometheus_exporter
117+
Match log_metrics
118+
119+
[OUTPUT]
120+
Name stdout
121+
Match log_metrics
122+
123+
# export metrics
124+
metrics:
125+
enabled: true
126+
127+
extraPorts:
128+
- name: metrics
129+
targetPort: metrics
130+
protocol: TCP
131+
port: 2021
132+
containerPort: 2021
133+
134+
serviceMonitor:
135+
enabled: true
136+
additionalEndpoints:
137+
- port: metrics
138+
path: /metrics
139+
honorLabels: true # important - keep the original label on the metrics (pod, namespace, container)
140+
141+
142+
.. note::
143+
By default, the ``log_to_metrics`` FILTER, adds the ``log_metric_counter_`` prefix to every metric
144+
145+
.. raw:: html
146+
147+
<details>
148+
<summary><strong>Understanding the Configuration</strong></summary>
149+
<ul>
150+
<li>The <code>tail</code> INPUT section defines all Kubernetes container logs as input</li>
151+
<li>The <code>dummy</code> INPUT section defines a keepalive input - it's required to create at least 1 active metric</li>
152+
<li>The <code>kubernetes</code> FILTER section is for adding the Kubernetes labels to the log lines/li>
153+
<li>The 1st <code>log-to-metrics</code> FILTER - match any log line containing "mysql connection error", and increase the ``mysql_connection_error`` counter. Add the pod labels to the metric</li>
154+
<li>The 2nd <code>log-to-metrics</code> FILTER - match any log line containing "dns error", and increase the ``dns_error`` counter. Add the pod labels to the metric</li>
155+
<li>The 3rd <code>log-to-metrics</code> FILTER - for the keepalive metric</li>
156+
<li>The <code>prometheus_exporter</code> OUTPUT is for exporting the Prometheus metrics</li>
157+
<li>The <code>stdout</code> OUTPUT is used for debugging. It prints the metrics to the fluentbit pod logs. Not required for production deployment</li>
158+
</ul>
159+
</details>
160+
161+
162+
163+
Let's deploy the Fluent Bit DaemonSet:
164+
165+
.. code-block:: bash
166+
167+
helm repo add fluent https://fluent.github.io/helm-charts && helm repo update
168+
helm install metrics-fluent-bit fluent/fluent-bit -f ./fluentbit-values.yaml
169+
170+
171+
172+
**Step 3: Configure Prometheus**
173+
----------------------------------------------------
174+
175+
In this step, we will configure Prometheus to:
176+
177+
1. **Collect metrics from Fluent Bit** via a `ServiceMonitor`
178+
2. **Configure an alert** based on the metrics extracted from the logs
179+
180+
1. **Configure Prometheus to read the new ServiceMonitor**
181+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182+
183+
Assuming you're using Robusta's bundled Prometheus, add this to your ``generated_values.yaml``:
184+
185+
.. code-block:: yaml
186+
187+
kube-prometheus-stack:
188+
prometheus:
189+
prometheusSpec:
190+
serviceMonitorSelectorNilUsesHelmValues: false
191+
192+
This ensures that Prometheus will read all the service monitors defined in the cluster, not just those installed by the same Helm release (which is the default behavior).
193+
194+
To apply it, upgrade with helm:
195+
196+
.. code-block:: bash
197+
198+
helm upgrade robusta robusta/robusta -f generated_values.yaml -set clusterName=YOUR_CLUSTER
199+
200+
201+
2. **Configure an Alert**
202+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203+
204+
This is the alerting rule that will be used to trigger an alert when a `MySqlConnectionErrors` is detected in the logs (``mysql-alert.yaml``):
205+
206+
.. code-block:: yaml
207+
208+
apiVersion: monitoring.coreos.com/v1
209+
kind: PrometheusRule
210+
metadata:
211+
labels:
212+
release: robusta
213+
name: log-alerting-rule
214+
namespace: log-triggers
215+
spec:
216+
groups:
217+
- name: log-alerting
218+
rules:
219+
- alert: MySqlConnectionErrors
220+
annotations:
221+
description: 'Pod {{$labels.namespace}}/{{$labels.pod}} logs had {{ printf "%.0f" $value }} MySql connection errors'
222+
summary: Increase in MySql connection errors in the pod logs
223+
expr: increase(log_metric_counter_mysql_connection_error[5m]) > 1
224+
for: 1m
225+
labels:
226+
severity: critical
227+
228+
.. note::
229+
230+
- This alert will fire starting from the 2nd time the log line appears - catching the first time is not possible due to how fluentbit works (it only creates the metric after the log appears at least once)
231+
- The label ``release: robusta`` is required for Robusta's Prometheus to read this alerting rule. Make sure the release name matches the name of your Robusta release
232+
233+
To apply it run:
234+
235+
.. code-block:: bash
236+
237+
kubectl apply -f mysql-alert.yaml
238+
239+
240+
**Step 4: Adding a Robusta playbook**
241+
----------------------------------------------------
242+
243+
Now, we'd like to configure an automated action that will run each time this alert is fired.
244+
For that, we'll use a Robusta ``playbook``
245+
246+
This is the playbooks we're going to use (add it as a ``customPlaybooks`` section in your ``generated_values.yaml`` file):
247+
248+
.. code-block:: yaml
249+
250+
customPlaybooks:
251+
- triggers:
252+
- on_prometheus_alert:
253+
alert_name: MySqlConnectionErrors # Run when ever the MySqlConnectionErrors alert starts firing
254+
actions:
255+
- logs_enricher: {} # Add the pod logs to the alert notification
256+
- delete_pod: {} # Delete (restart) the pod the alert was fired on
257+
- template_enricher: # Add a note to the alert notification, that the pod was restarted
258+
template: "**Automated Action**: Pod **${namespace}/${name}** restarted due to MySQL connection errors"
259+
260+
261+
To apply it, upgrade with helm:
262+
263+
.. code-block:: bash
264+
265+
helm upgrade robusta robusta/robusta -f generated_values.yaml -set clusterName=YOUR_CLUSTER
266+
267+
268+
**Step 5: See It in Action**
269+
----------------------------------------------------
270+
271+
Let’s test the full automation pipeline by generating a log line that simulates a MySQL connection error.
272+
273+
1. **Deploy a demo pod**
274+
275+
Use this manifest to deploy a demo pod that prints to the logs whatever is sent to its API (``postlog.yaml``):
276+
277+
.. code-block:: yaml
278+
279+
apiVersion: apps/v1
280+
kind: Deployment
281+
metadata:
282+
name: postlog1
283+
namespace: log-triggers
284+
spec:
285+
replicas: 1
286+
selector:
287+
matchLabels:
288+
app: postlog1
289+
template:
290+
metadata:
291+
labels:
292+
app: postlog1
293+
spec:
294+
containers:
295+
- name: postlog1
296+
image: me-west1-docker.pkg.dev/robusta-development/development/postlog:2.0
297+
ports:
298+
- containerPort: 8000
299+
resources:
300+
requests:
301+
memory: "128Mi"
302+
cpu: "50m"
303+
limits:
304+
memory: "256Mi"
305+
---
306+
apiVersion: v1
307+
kind: Service
308+
metadata:
309+
name: postlog1
310+
namespace: log-triggers
311+
spec:
312+
selector:
313+
app: postlog1
314+
ports:
315+
- port: 80
316+
targetPort: 8000
317+
type: ClusterIP
318+
319+
Apply it to your cluster:
320+
321+
.. code-block:: bash
322+
323+
kubectl apply -f postlog.yaml
324+
325+
2. **Generate MySQL errors in the logs**
326+
327+
Call the pod’s API to print some simulated MySQL errors.
328+
329+
Since the metric has no initial value, we'll call it twice, to simulate an increase.
330+
First time with 1 log line:
331+
332+
.. code-block:: bash
333+
334+
kubectl run curl --rm -it --image=curlimages/curl --restart=Never -- \
335+
curl -X POST http://postlog1.log-triggers.svc.cluster.local/api/log \
336+
-H "Content-Type: application/json" \
337+
-d '{"content": "mysql connection error", "count": 1}'
338+
339+
Then, after 60 seconds, with 10 log lines:
340+
341+
.. code-block:: bash
342+
343+
kubectl run curl --rm -it --image=curlimages/curl --restart=Never -- \
344+
curl -X POST http://postlog1.log-triggers.svc.cluster.local/api/log \
345+
-H "Content-Type: application/json" \
346+
-d '{"content": "mysql connection error", "count": 10}'
347+
348+
This will produce 10 log lines containing the error. Fluent Bit will match the log lines and emit metrics, which Prometheus will collect.
349+
350+
3. **Trigger the alert and observe the automation**
351+
352+
Wait a few minutes (typically up to 5) for the alert to fire. This delay is due to the ``for`` condition in the alert and Prometheus' ``group_interval``.
353+
354+
Once the alert fires, you’ll see the ``MySqlConnectionErrors`` alert in:
355+
356+
- The Robusta UI (if installed)
357+
- Slack, Microsoft Teams, or your configured destination
358+
359+
.. image:: /images/mysql-connection-error-alert.png
360+
:alt: Robusta alert screenshot
361+
:class: with-shadow
362+
:width: 700px
363+
:height: 700px
364+
365+
You should also see that the ``postlog1`` pod was restarted:
366+
367+
.. image:: /images/postlog-pod-restart.png
368+
:alt: Automated pod restart
369+
:class: with-shadow
370+
:width: 500px
371+
:height: 500px
372+
373+
374+
.. note::
375+
376+
This example used a restart pod automation, but you can replace it with any other action in Robusta – such as creating a Jira ticket, scaling a deployment, or notifying a human.
377+
378+
🎉 That’s it! You've now built a full pipeline that watches logs, turns them into alerts, and takes automated action.

0 commit comments

Comments
 (0)