|
89 | 89 | key: password |
90 | 90 | ``` |
91 | 91 |
|
| 92 | +## Log-Based Alerts |
| 93 | +
|
| 94 | +Once log aggregation is wired up, you can build alerts directly off of your log stream using Plural's log-based monitors. A monitor is a recurring, server-side log query that evaluates on a cron schedule, buckets results over a lookback window, and fires a Plural alert whenever an aggregate (max, min, or avg) crosses a threshold you define. Because monitors run as part of the Plural control plane, they work multi-cluster out of the box and reuse the same ElasticSearch (or other) backend you've already configured for log search. |
| 95 | +
|
| 96 | +### How it works |
| 97 | +
|
| 98 | +Each monitor is attached to a Service Deployment and is composed of three pieces: |
| 99 | +
|
| 100 | +* **Log query** -- the query string, lookback `duration` (e.g. `1h`, `30m`), `bucketSize` used to aggregate matches over time (e.g. `5m`), an `operator` (`AND` / `OR`) for multi-term queries, and an optional list of key/value `facets` (for example `namespace=payments` or a pod label) layered on top of the query. |
| 101 | +* **Threshold** -- an `aggregate` function (`max`, `min`, or `avg`) applied across the bucketed counts and a numeric `value` the aggregate must cross to trigger the alert. |
| 102 | +* **Schedule** -- a standard cron expression (for example `*/5 * * * *` or `@daily`) that controls how often the monitor is re-evaluated. |
| 103 | + |
| 104 | +On every tick, Plural runs the log query against your configured logging driver, builds a vector of per-bucket counts, and aggregates that vector into a single number. If it crosses the threshold the monitor transitions to `firing` and a Plural alert is created (or updated); when the next evaluation comes back under the threshold, the alert is automatically marked `resolved`. Alerts produced this way are first-class `Alert` objects in the Plural Console -- they show up alongside Datadog/Grafana alerts and can be routed through your existing notification routers, attached to AI insights, and surfaced in service dashboards. |
| 105 | + |
| 106 | +### Creating a monitor in the UI |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | +Monitors are managed per-service under `Service` -> `Observability` -> `Monitors`. Click `Create Monitor` and the wizard walks through three steps: |
| 112 | + |
| 113 | +1. **Log query** -- pick a lookback duration and bucket size, type a query, and optionally pin facet filters from the same label picker used in the standard logs view. A live preview of matching log lines is rendered in the side panel so you can validate the query before saving. |
| 114 | + |
| 115 | + |
| 116 | + |
| 117 | +2. **Threshold config** -- enter the numeric threshold and choose the aggregate (`max` / `min` / `avg`) used to compare against it. |
| 118 | + |
| 119 | + |
| 120 | + |
| 121 | +3. **Description** -- give the monitor a name, an evaluation cron, an optional severity, and an optional alert template. |
| 122 | + |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | +The alert template is rendered with [Liquid](https://shopify.github.io/liquid/) and has access to the full monitor context, so you can interpolate dynamic values into the alert body, e.g.: |
| 127 | + |
| 128 | +```text |
| 129 | +Monitor {{ monitor.name }} is firing for {{ monitor.service.name }} on |
| 130 | +{{ monitor.service.cluster.name }} -- threshold {{ monitor.threshold.value }} |
| 131 | +({{ monitor.threshold.aggregate }}) was breached. |
| 132 | +``` |
| 133 | + |
| 134 | +If no template is provided, Plural renders a default Markdown summary that includes the firing service, the threshold settings, and a JSON dump of the log query and the bucketed results that triggered the alert -- handy for triage and for the AI insight engine. |
| 135 | + |
| 136 | +### Routing and AI integration |
| 137 | + |
| 138 | +Because log-based alerts flow through the same `Alert` pipeline as third-party providers, they automatically benefit from the rest of the Plural observability stack: |
| 139 | + |
| 140 | +* **NotificationRouters** can fan them out to Slack, email, or any configured sink, with the same severity- and tag-based filters used for other alert sources. |
| 141 | +* **Alert resolutions** authored against firing monitors are vectorized into ElasticSearch (see below) and reused by Plural AI to suggest fixes the next time a similar monitor fires. |
| 142 | +* **AI Insights** can correlate the firing monitor with recent service logs, deployments, and pull requests to produce a Root Cause Analysis without you having to leave the alert view. |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | +This makes log-based monitors a particularly low-friction way to bootstrap alerting on a new service: write a query you'd run in the logs view anyway, set a threshold, and Plural handles scheduling, deduplication, notification, and AI-assisted triage from there. |
| 147 | + |
92 | 148 | ## ElasticSearch as a Vector Store |
93 | 149 |
|
94 | 150 | Beyond the single-pane-of-glass benefits, log data significantly enhances the dataset used by Plural AI, which is why we highly recommend enabling log aggregation for production deployments. We lean on ElasticSearch as the default log store because it's a broadly usable data store with recent support for vector search, enabling Plural AI to: |
|
0 commit comments