|
| 1 | +:hide-toc: |
| 2 | + |
| 3 | +Automate Responses to Application Logs |
| 4 | +========================================== |
| 5 | + |
| 6 | +This tutorial walks you through building an automation that detects specific patterns in Kubernetes pod logs and responds automatically. |
| 7 | + |
| 8 | +For example, we’ll show how to restart a pod if it logs a database connection error. |
| 9 | + |
| 10 | +To achieve this, we’ll use: |
| 11 | + |
| 12 | +- **Fluent Bit**: Monitors pod logs and converts matching log lines into Prometheus metrics. |
| 13 | +- **Prometheus**: Stores the metrics and triggers alerts based on them. (We're going to use Robusta's bundled Prometheus here, but it can work with any other Prometheus distribution) |
| 14 | +- **Robusta**: Executes an automated playbook when an alert is fired, such as restarting the affected pod. |
| 15 | + |
| 16 | + .. image:: /images/logs-to-metrics.png |
| 17 | + |
| 18 | +Let's get started! |
| 19 | + |
| 20 | +**Step 1: Create a namespace for the demo** |
| 21 | +---------------------------------------------------- |
| 22 | + |
| 23 | + .. code-block:: yaml |
| 24 | +
|
| 25 | + kubectl create namespace log-triggers |
| 26 | + kubectl config set-context --current --namespace log-triggers |
| 27 | +
|
| 28 | +**Step 2: Parse Logs into Metrics with Fluent Bit** |
| 29 | +---------------------------------------------------- |
| 30 | + |
| 31 | +First, lets configure Fluent Bit to monitor your pod logs and generate Prometheus metrics for specific log patterns. |
| 32 | + |
| 33 | +- We'll configure Fluent Bit as a DaemonSet - there will be a pod on each k8s node |
| 34 | +- In the example below, we define 2 logs matchers, that create 2 different metrics |
| 35 | +- We'll use Prometheus exporter, so deliver the metrics to Prometheus |
| 36 | + |
| 37 | +This is our Fluent Bit configuration (``fluentbit-values.yaml``): |
| 38 | + |
| 39 | + .. code-block:: yaml |
| 40 | +
|
| 41 | + config: |
| 42 | + service: | |
| 43 | + [SERVICE] |
| 44 | + Flush 1 |
| 45 | + Daemon Off |
| 46 | + Log_Level info |
| 47 | + HTTP_Server On |
| 48 | + HTTP_Listen 0.0.0.0 |
| 49 | + HTTP_Port 2020 |
| 50 | +
|
| 51 | + inputs: | |
| 52 | + [INPUT] |
| 53 | + Name tail |
| 54 | + Tag kube.* |
| 55 | + Path /var/log/containers/*.log |
| 56 | + Parser json_message |
| 57 | + DB /var/log/flb_kube.db |
| 58 | + Mem_Buf_Limit 5MB |
| 59 | + Skip_Long_Lines On |
| 60 | + Refresh_Interval 10 |
| 61 | +
|
| 62 | + [INPUT] |
| 63 | + Name dummy |
| 64 | + Tag dummy.alive |
| 65 | + Dummy {"log":"keepalive"} |
| 66 | +
|
| 67 | + parsers: | |
| 68 | + [PARSER] |
| 69 | + Name wrap_raw_line |
| 70 | + Format regex |
| 71 | + Regex ^(?<log>.*)$ |
| 72 | +
|
| 73 | + filters: | |
| 74 | + [FILTER] |
| 75 | + Name kubernetes |
| 76 | + Match kube.* |
| 77 | + K8S-Logging.Parser On |
| 78 | + K8S-Logging.Exclude On |
| 79 | +
|
| 80 | + [FILTER] |
| 81 | + name log_to_metrics |
| 82 | + match * |
| 83 | + tag log_metrics |
| 84 | + metric_mode counter |
| 85 | + metric_name mysql_connection_error |
| 86 | + metric_description MySql connection errors |
| 87 | + regex log .*mysql connection error.* |
| 88 | + add_label pod $kubernetes['pod_name'] |
| 89 | + add_label namespace $kubernetes['namespace_name'] |
| 90 | + add_label container $kubernetes['container_name'] |
| 91 | +
|
| 92 | + [FILTER] |
| 93 | + name log_to_metrics |
| 94 | + match * |
| 95 | + tag log_metrics |
| 96 | + metric_mode counter |
| 97 | + metric_name dns_error |
| 98 | + metric_description DNS Resolution errors |
| 99 | + regex log .*dns error.* |
| 100 | + add_label pod $kubernetes['pod_name'] |
| 101 | + add_label namespace $kubernetes['namespace_name'] |
| 102 | + add_label container $kubernetes['container_name'] |
| 103 | +
|
| 104 | + [FILTER] |
| 105 | + Name log_to_metrics |
| 106 | + Match dummy.alive |
| 107 | + Metric_Name fluentbit_keepalive |
| 108 | + Metric_Description Dummy metric to keep /metrics available |
| 109 | + Metric_Mode counter |
| 110 | + Tag log_metrics |
| 111 | + Regex log .*keepalive.* |
| 112 | + Flush_Interval_Sec 10 # Process and flush metrics every 60 seconds |
| 113 | +
|
| 114 | + outputs: | |
| 115 | + [OUTPUT] |
| 116 | + Name prometheus_exporter |
| 117 | + Match log_metrics |
| 118 | +
|
| 119 | + [OUTPUT] |
| 120 | + Name stdout |
| 121 | + Match log_metrics |
| 122 | +
|
| 123 | + # export metrics |
| 124 | + metrics: |
| 125 | + enabled: true |
| 126 | +
|
| 127 | + extraPorts: |
| 128 | + - name: metrics |
| 129 | + targetPort: metrics |
| 130 | + protocol: TCP |
| 131 | + port: 2021 |
| 132 | + containerPort: 2021 |
| 133 | +
|
| 134 | + serviceMonitor: |
| 135 | + enabled: true |
| 136 | + additionalEndpoints: |
| 137 | + - port: metrics |
| 138 | + path: /metrics |
| 139 | + honorLabels: true # important - keep the original label on the metrics (pod, namespace, container) |
| 140 | +
|
| 141 | +
|
| 142 | + .. note:: |
| 143 | + By default, the ``log_to_metrics`` FILTER, adds the ``log_metric_counter_`` prefix to every metric |
| 144 | + |
| 145 | + .. raw:: html |
| 146 | + |
| 147 | + <details> |
| 148 | + <summary><strong>Understanding the Configuration</strong></summary> |
| 149 | + <ul> |
| 150 | + <li>The <code>tail</code> INPUT section defines all Kubernetes container logs as input</li> |
| 151 | + <li>The <code>dummy</code> INPUT section defines a keepalive input - it's required to create at least 1 active metric</li> |
| 152 | + <li>The <code>kubernetes</code> FILTER section is for adding the Kubernetes labels to the log lines/li> |
| 153 | + <li>The 1st <code>log-to-metrics</code> FILTER - match any log line containing "mysql connection error", and increase the ``mysql_connection_error`` counter. Add the pod labels to the metric</li> |
| 154 | + <li>The 2nd <code>log-to-metrics</code> FILTER - match any log line containing "dns error", and increase the ``dns_error`` counter. Add the pod labels to the metric</li> |
| 155 | + <li>The 3rd <code>log-to-metrics</code> FILTER - for the keepalive metric</li> |
| 156 | + <li>The <code>prometheus_exporter</code> OUTPUT is for exporting the Prometheus metrics</li> |
| 157 | + <li>The <code>stdout</code> OUTPUT is used for debugging. It prints the metrics to the fluentbit pod logs. Not required for production deployment</li> |
| 158 | + </ul> |
| 159 | + </details> |
| 160 | + |
| 161 | + |
| 162 | + |
| 163 | +Let's deploy the Fluent Bit DaemonSet: |
| 164 | + |
| 165 | + .. code-block:: bash |
| 166 | +
|
| 167 | + helm repo add fluent https://fluent.github.io/helm-charts && helm repo update |
| 168 | + helm install metrics-fluent-bit fluent/fluent-bit -f ./fluentbit-values.yaml |
| 169 | +
|
| 170 | +
|
| 171 | +
|
| 172 | +**Step 3: Configure Prometheus** |
| 173 | +---------------------------------------------------- |
| 174 | + |
| 175 | +In this step, we will configure Prometheus to: |
| 176 | + |
| 177 | +1. **Collect metrics from Fluent Bit** via a `ServiceMonitor` |
| 178 | +2. **Configure an alert** based on the metrics extracted from the logs |
| 179 | + |
| 180 | +1. **Configure Prometheus to read the new ServiceMonitor** |
| 181 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 182 | + |
| 183 | +Assuming you're using Robusta's bundled Prometheus, add this to your ``generated_values.yaml``: |
| 184 | + |
| 185 | + .. code-block:: yaml |
| 186 | +
|
| 187 | + kube-prometheus-stack: |
| 188 | + prometheus: |
| 189 | + prometheusSpec: |
| 190 | + serviceMonitorSelectorNilUsesHelmValues: false |
| 191 | +
|
| 192 | +This ensures that Prometheus will read all the service monitors defined in the cluster, not just those installed by the same Helm release (which is the default behavior). |
| 193 | + |
| 194 | +To apply it, upgrade with helm: |
| 195 | + |
| 196 | + .. code-block:: bash |
| 197 | +
|
| 198 | + helm upgrade robusta robusta/robusta -f generated_values.yaml -set clusterName=YOUR_CLUSTER |
| 199 | +
|
| 200 | +
|
| 201 | +2. **Configure an Alert** |
| 202 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 203 | + |
| 204 | +This is the alerting rule that will be used to trigger an alert when a `MySqlConnectionErrors` is detected in the logs (``mysql-alert.yaml``): |
| 205 | + |
| 206 | + .. code-block:: yaml |
| 207 | +
|
| 208 | + apiVersion: monitoring.coreos.com/v1 |
| 209 | + kind: PrometheusRule |
| 210 | + metadata: |
| 211 | + labels: |
| 212 | + release: robusta |
| 213 | + name: log-alerting-rule |
| 214 | + namespace: log-triggers |
| 215 | + spec: |
| 216 | + groups: |
| 217 | + - name: log-alerting |
| 218 | + rules: |
| 219 | + - alert: MySqlConnectionErrors |
| 220 | + annotations: |
| 221 | + description: 'Pod {{$labels.namespace}}/{{$labels.pod}} logs had {{ printf "%.0f" $value }} MySql connection errors' |
| 222 | + summary: Increase in MySql connection errors in the pod logs |
| 223 | + expr: increase(log_metric_counter_mysql_connection_error[5m]) > 1 |
| 224 | + for: 1m |
| 225 | + labels: |
| 226 | + severity: critical |
| 227 | +
|
| 228 | + .. note:: |
| 229 | + |
| 230 | + - This alert will fire starting from the 2nd time the log line appears - catching the first time is not possible due to how fluentbit works (it only creates the metric after the log appears at least once) |
| 231 | + - The label ``release: robusta`` is required for Robusta's Prometheus to read this alerting rule. Make sure the release name matches the name of your Robusta release |
| 232 | + |
| 233 | +To apply it run: |
| 234 | + |
| 235 | + .. code-block:: bash |
| 236 | +
|
| 237 | + kubectl apply -f mysql-alert.yaml |
| 238 | +
|
| 239 | +
|
| 240 | +**Step 4: Adding a Robusta playbook** |
| 241 | +---------------------------------------------------- |
| 242 | + |
| 243 | +Now, we'd like to configure an automated action that will run each time this alert is fired. |
| 244 | +For that, we'll use a Robusta ``playbook`` |
| 245 | + |
| 246 | +This is the playbooks we're going to use (add it as a ``customPlaybooks`` section in your ``generated_values.yaml`` file): |
| 247 | + |
| 248 | + .. code-block:: yaml |
| 249 | +
|
| 250 | + customPlaybooks: |
| 251 | + - triggers: |
| 252 | + - on_prometheus_alert: |
| 253 | + alert_name: MySqlConnectionErrors # Run when ever the MySqlConnectionErrors alert starts firing |
| 254 | + actions: |
| 255 | + - logs_enricher: {} # Add the pod logs to the alert notification |
| 256 | + - delete_pod: {} # Delete (restart) the pod the alert was fired on |
| 257 | + - template_enricher: # Add a note to the alert notification, that the pod was restarted |
| 258 | + template: "**Automated Action**: Pod **${namespace}/${name}** restarted due to MySQL connection errors" |
| 259 | +
|
| 260 | +
|
| 261 | +To apply it, upgrade with helm: |
| 262 | + |
| 263 | + .. code-block:: bash |
| 264 | +
|
| 265 | + helm upgrade robusta robusta/robusta -f generated_values.yaml -set clusterName=YOUR_CLUSTER |
| 266 | +
|
| 267 | +
|
| 268 | +**Step 5: See It in Action** |
| 269 | +---------------------------------------------------- |
| 270 | + |
| 271 | +Let’s test the full automation pipeline by generating a log line that simulates a MySQL connection error. |
| 272 | + |
| 273 | +1. **Deploy a demo pod** |
| 274 | + |
| 275 | +Use this manifest to deploy a demo pod that prints to the logs whatever is sent to its API (``postlog.yaml``): |
| 276 | + |
| 277 | +.. code-block:: yaml |
| 278 | +
|
| 279 | + apiVersion: apps/v1 |
| 280 | + kind: Deployment |
| 281 | + metadata: |
| 282 | + name: postlog1 |
| 283 | + namespace: log-triggers |
| 284 | + spec: |
| 285 | + replicas: 1 |
| 286 | + selector: |
| 287 | + matchLabels: |
| 288 | + app: postlog1 |
| 289 | + template: |
| 290 | + metadata: |
| 291 | + labels: |
| 292 | + app: postlog1 |
| 293 | + spec: |
| 294 | + containers: |
| 295 | + - name: postlog1 |
| 296 | + image: me-west1-docker.pkg.dev/robusta-development/development/postlog:2.0 |
| 297 | + ports: |
| 298 | + - containerPort: 8000 |
| 299 | + resources: |
| 300 | + requests: |
| 301 | + memory: "128Mi" |
| 302 | + cpu: "50m" |
| 303 | + limits: |
| 304 | + memory: "256Mi" |
| 305 | + --- |
| 306 | + apiVersion: v1 |
| 307 | + kind: Service |
| 308 | + metadata: |
| 309 | + name: postlog1 |
| 310 | + namespace: log-triggers |
| 311 | + spec: |
| 312 | + selector: |
| 313 | + app: postlog1 |
| 314 | + ports: |
| 315 | + - port: 80 |
| 316 | + targetPort: 8000 |
| 317 | + type: ClusterIP |
| 318 | +
|
| 319 | +Apply it to your cluster: |
| 320 | + |
| 321 | + .. code-block:: bash |
| 322 | +
|
| 323 | + kubectl apply -f postlog.yaml |
| 324 | +
|
| 325 | +2. **Generate MySQL errors in the logs** |
| 326 | + |
| 327 | +Call the pod’s API to print some simulated MySQL errors. |
| 328 | + |
| 329 | +Since the metric has no initial value, we'll call it twice, to simulate an increase. |
| 330 | +First time with 1 log line: |
| 331 | + |
| 332 | + .. code-block:: bash |
| 333 | +
|
| 334 | + kubectl run curl --rm -it --image=curlimages/curl --restart=Never -- \ |
| 335 | + curl -X POST http://postlog1.log-triggers.svc.cluster.local/api/log \ |
| 336 | + -H "Content-Type: application/json" \ |
| 337 | + -d '{"content": "mysql connection error", "count": 1}' |
| 338 | +
|
| 339 | +Then, after 60 seconds, with 10 log lines: |
| 340 | + |
| 341 | + .. code-block:: bash |
| 342 | +
|
| 343 | + kubectl run curl --rm -it --image=curlimages/curl --restart=Never -- \ |
| 344 | + curl -X POST http://postlog1.log-triggers.svc.cluster.local/api/log \ |
| 345 | + -H "Content-Type: application/json" \ |
| 346 | + -d '{"content": "mysql connection error", "count": 10}' |
| 347 | +
|
| 348 | +This will produce 10 log lines containing the error. Fluent Bit will match the log lines and emit metrics, which Prometheus will collect. |
| 349 | + |
| 350 | +3. **Trigger the alert and observe the automation** |
| 351 | + |
| 352 | +Wait a few minutes (typically up to 5) for the alert to fire. This delay is due to the ``for`` condition in the alert and Prometheus' ``group_interval``. |
| 353 | + |
| 354 | +Once the alert fires, you’ll see the ``MySqlConnectionErrors`` alert in: |
| 355 | + |
| 356 | +- The Robusta UI (if installed) |
| 357 | +- Slack, Microsoft Teams, or your configured destination |
| 358 | + |
| 359 | +.. image:: /images/mysql-connection-error-alert.png |
| 360 | + :alt: Robusta alert screenshot |
| 361 | + :class: with-shadow |
| 362 | + :width: 700px |
| 363 | + :height: 700px |
| 364 | + |
| 365 | +You should also see that the ``postlog1`` pod was restarted: |
| 366 | + |
| 367 | +.. image:: /images/postlog-pod-restart.png |
| 368 | + :alt: Automated pod restart |
| 369 | + :class: with-shadow |
| 370 | + :width: 500px |
| 371 | + :height: 500px |
| 372 | + |
| 373 | + |
| 374 | +.. note:: |
| 375 | + |
| 376 | + This example used a restart pod automation, but you can replace it with any other action in Robusta – such as creating a Jira ticket, scaling a deployment, or notifying a human. |
| 377 | + |
| 378 | +🎉 That’s it! You've now built a full pipeline that watches logs, turns them into alerts, and takes automated action. |
0 commit comments