Skip to content

Commit 54c14ce

Browse files
committed
feat(php-fpm-status)!: multi-pool support and rework of alerting
`--url` can now be specified multiple times to check several pools on the same host in one plugin run; overall state is the worst of all pools. Basic auth is supported per URL via embedded userinfo (`http://user:pw@host/path`), delegated to lib.url.split_basic_auth from linuxfabrik-lib 3.2.0. All perfdata labels are now prefixed with the pool name in `<pool>_snake_case` form, matching the procs / disk-io convention now documented in CONTRIBUTING.md. The wrongly-named `queue usage` metric has been renamed to `saturation` and now correctly reports `active / total worker processes` instead of the socket backlog fill rate. The cumulative counters `accepted conn` and `slow requests` have been replaced by `<pool>_accepted_conn_rate` (req/s) and `<pool>_slow_requests_delta` (new since last run), following the "no continuous counters" rule in CONTRIBUTING and making the slow-request thresholds actually actionable. The `max children reached` metric and its implicit WARN have been removed and will return later with delta semantics. New `--severity warn|crit` parameter controls the state reported for unreachable pools (default warn). `--lengthy` now defaults to off and in --lengthy includes idle workers in the process table so the table is always populated, even on quiet pools. Process-table formatting has been made friendlier for narrow Icinga views: request URIs are stripped of their query strings (just `?` is kept as a visual cue), script paths are abbreviated by reducing every directory component to its first character (`/usr/share/icingaweb2/public/index.php` → `/u/s/i/p/index.php`), request durations are shown at millisecond precision (no µs noise), and worker start times drop seconds and the trailing "ago". CONTRIBUTING.md documents the snake_case perfdata label preference with the `re.sub(r'\W+', '_', name)` sanitizer recipe for per-instance prefixes. Unit tests run against real PHP-FPM status JSON captured from UBI8/PHP 7.2, UBI9/PHP 8.0, Debian 12/PHP 8.2, and Ubuntu 24.04/ PHP 8.3 containers. BREAKING CHANGE: perfdata labels renamed to <pool>_snake_case; Grafana dashboards using this plugin need updating (the bundled dashboard has been rewritten to use a $pool template variable).
1 parent e7c00d1 commit 54c14ce

23 files changed

+1359
-365
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Monitoring Plugins:
1818

1919
* haproxy-status: `--username` and `--password` were removed in favour of HTTP basic auth embedded in the URL itself, e.g. `--url=https://stats:s3cret@webserver:8443/server-status`. The plugin strips the credentials from the netloc before the request is sent and instead carries them in an `Authorization: Basic` header so they never reach the request line or any proxy access log. Existing Icinga services that still pass `--username` / `--password` will hard-UNKNOWN with a clear migration pointer; migrate the URL field to include the credentials and drop the username/password fields from the service template
2020
* mailq: the alerting semantic has flipped from "number of mails in the queue" to "age of the oldest mail in the queue". `--warning` and `--critical` now take a duration string with a unit suffix (`1h`, `3D`, `30m`, `72h`, ...) and the defaults are `1h` / `3D` (down from `2` / `250` mails). The rationale is that a queue with 100 fresh mails is still OK when they are delivered within minutes, while a single mail stuck for more than an hour is always interesting and is exactly when an admin wants to look. The mail count stays in perfdata (`mailq`) so Grafana trending keeps working, and the new `oldest_mail_age` perfdata metric carries the age in seconds. Existing Icinga services that set `mailq_warning=10` and `mailq_critical=500` need to be migrated to duration strings (e.g. `mailq_warning=1h`, `mailq_critical=3D`). Also adds `--mta=auto|postfix|exim|sendmail` to override MTA autodetection: Postfix is now read via `postqueue -j` (JSON with `arrival_time` as Unix epoch) for a rock-solid timestamp, Exim still uses `mailq` (= `exim -bp`) and its built-in age literals, and everything else falls back to `mailq` with `Date:` line parsing ([#781](https://github.com/Linuxfabrik/monitoring-plugins/issues/781))
21-
* php-fpm-status: multi-pool support plus full rework of the alerting and perfdata semantics. `--url` can now be specified multiple times to check several pools on the same host in a single plugin run; the overall state is the worst of all pools, and the summary line calls out which pool is in trouble. Basic authentication can be embedded in the URL as `http://user:password@host/pool-status`, with the credentials stripped from the request URL and sent via an `Authorization` header. All perfdata labels are now prefixed with the pool name (for example `nextcloud saturation` instead of `saturation`), including in the single-pool case, so existing Grafana panels and InfluxDB queries need to be updated; the bundled Grafana dashboard has been rewritten to use a `$pool` template variable that lets the admin switch between pools. The wrongly-named `queue usage` metric has been renamed to `saturation` and now correctly reports the percentage of busy worker processes (`active_processes / total_processes`) instead of the socket backlog fill rate, which was never "pool near capacity". The cumulative counters `accepted conn` and `slow requests` have been replaced by the computed `accepted conn rate` (requests per second since the previous run) and `slow requests delta` (new slow requests since the previous run), which follow the "no continuous counters" rule in CONTRIBUTING and make the `--warning-slowreq` / `--critical-slowreq` thresholds actually actionable again (previously they alerted permanently after the first historical slow request, until the next FPM restart). The `max children reached` metric and its implicit WARN have been removed and will return with delta-based semantics in a later release. A new `--severity warn|crit` parameter controls the state reported for pools that are unreachable or whose status JSON cannot be parsed; the default is `warn`. The `--lengthy` flag now correctly defaults to off (matching the project convention) and adds extended columns to both the pool overview table and the per-process table. The plugin uses a local SQLite cache (`linuxfabrik-monitoring-plugins-php-fpm-status.db`) to compute deltas between runs; the first run after install, and the first run after an FPM restart, report OK for the delta metrics with a `baseline captured, waiting for more data` note while the saturation metric is alerted normally. Unit tests now run against real PHP-FPM status JSON captured from four container platforms (UBI8 + PHP 7.2, UBI9 + PHP 8.0, Debian 12 + PHP 8.2, Ubuntu 24.04 + PHP 8.3)
21+
* php-fpm-status: multi-pool support plus full rework of the alerting and perfdata semantics. `--url` can now be specified multiple times to check several pools on the same host in a single plugin run; the overall state is the worst of all pools, and the summary line calls out which pool is in trouble. Basic authentication can be embedded in the URL as `http://user:password@host/pool-status`, with the credentials stripped from the request URL and sent via an `Authorization` header. All perfdata labels are now prefixed with the pool name in `<pool>_snake_case` form (for example `nextcloud_saturation`), including in the single-pool case, matching the `procs` / `disk-io` convention now documented in `CONTRIBUTING.md`. Existing Grafana panels and InfluxDB queries need to be updated; the bundled Grafana dashboard has been rewritten to use a `$pool` template variable that lets the admin switch between pools. The wrongly-named `queue usage` metric has been renamed to `saturation` and now correctly reports the percentage of busy worker processes (`active_processes / total_processes`) instead of the socket backlog fill rate, which was never "pool near capacity". The cumulative counters `accepted conn` and `slow requests` have been replaced by the computed `<pool>_accepted_conn_rate` (requests per second since the previous run) and `<pool>_slow_requests_delta` (new slow requests since the previous run), which follow the "no continuous counters" rule in CONTRIBUTING and make the `--warning-slowreq` / `--critical-slowreq` thresholds actually actionable again (previously they alerted permanently after the first historical slow request, until the next FPM restart). The `max children reached` metric and its implicit WARN have been removed and will return with delta-based semantics in a later release. A new `--severity warn|crit` parameter controls the state reported for pools that are unreachable or whose status JSON cannot be parsed; the default is `warn`. The `--lengthy` flag now correctly defaults to off (matching the project convention) and adds extended columns to both the pool overview table and the per-process table; in `--lengthy` it also includes idle workers in the process table so the table is always populated, even on quiet pools. The process table itself has been made friendlier for narrow Icinga views: request URIs are stripped of their query strings (just `?` is kept as a visual cue), script paths are abbreviated by reducing every directory component to its first character (`/usr/share/icingaweb2/public/index.php` → `/u/s/i/p/index.php`), request durations are shown at millisecond precision (no µs noise), and worker start times drop seconds and the trailing "ago". The plugin uses a local SQLite cache (`linuxfabrik-monitoring-plugins-php-fpm-status.db`) to compute deltas between runs; the first run after install, and the first run after an FPM restart, report OK for the delta metrics with a `baseline captured, waiting for more data` note while the saturation metric is alerted normally. Unit tests now run against real PHP-FPM status JSON captured from four container platforms (UBI8 + PHP 7.2, UBI9 + PHP 8.0, Debian 12 + PHP 8.2, Ubuntu 24.04 + PHP 8.3)
2222
* procs: `--argument`, `--command` and `--username` now use regular expressions instead of substring/startswith matching. Existing filters like `--command=httpd` still work but now match anywhere in the name. Use `--command='^httpd'` for the previous startswith behavior, or `--username='^apache$'` for exact matches.
2323

2424

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -633,13 +633,13 @@ Format (space-separated label/value pairs):
633633
Rules:
634634

635635
* Labels may contain any characters except `=` (equals) and `'` (single quote).
636-
* Single quotes around the label are optional but required if the label contains spaces.
636+
* **Prefer `snake_case` labels.** Multi-word labels should use underscores, not spaces (`active_processes`, not `active processes`). Per-instance labels should prefix the instance name with an underscore (`<instance>_<metric>`), for example `sda_read_bytes`, `www_saturation`, `procs_cpu_percent`. This matches the convention used by `procs`, `disk-io`, and other per-instance plugins, makes Grafana regex and InfluxDB tag matching trivial, and avoids the need for single quotes around labels in the `STATUS_TEXT | perfdata` line. Sanitize pool/instance names with `re.sub(r'\W+', '_', name)` so exotic names like `my-app` become `my_app_<metric>`.
637+
* Single quotes around the label are optional but required if the label contains spaces. Prefer underscores over spaces so the quotes are never needed.
637638
* The first 19 characters of a label should be unique (RRD data source limitation).
638639
* `value`, `min`, and `max` must match the character class `[-0-9.]` and share the same UOM.
639640
* `warn` and `crit` use the range format (see [Threshold and Ranges](#threshold-and-ranges)).
640641
* `min` and `max` are not required for percentage (`%`) UOM.
641642
* Trailing unfilled semicolons may be dropped.
642-
* `label` doesn't need to be machine friendly, so `Pages scanned=100;;;;;` is as valuable as `pages-scanned=100;;;;;`.
643643

644644
UOM suffixes:
645645

0 commit comments

Comments
 (0)