Skip to content

Commit 64aa4f7

Browse files
committed
fix(docker-stats,podman-stats): restore per-container CPU and memory perfdata (fix #1104)
v2026041002 replaced the per-container cpu_usage / mem_usage perfdata with aggregates ('containers_running', 'cpu' for docker; plus 'block_input', 'block_output', 'images', 'net_rx', 'net_tx', 'ram' for podman) on the rationale that container names come and go and bloat the time-series backend. That broke the long-term trending of individual workloads, which is the primary use case for these checks. Re-emit <container>_cpu_usage and <container>_mem_usage per running container alongside the aggregates. Names are still shortened via shorten() unless --full-name is passed, matching v2025022501 semantics. Extend the unit-test assertions to pin the per-container perfdata labels so this can't regress silently again. Bump __version__ to 2026051201.
1 parent b467025 commit 64aa4f7

7 files changed

Lines changed: 89 additions & 10 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ Grafana:
8585
* `schemaVersion` fixed to `42`; Grafana 12 was failing to import the date-encoded value
8686

8787

88+
Monitoring Plugins:
89+
90+
* docker-stats, podman-stats: per-container CPU and memory perfdata restored. The previous release reported only aggregate values, breaking long-term trending of individual containers ([#1104](https://github.com/Linuxfabrik/monitoring-plugins/issues/1104))
91+
92+
8893
### Removed
8994

9095
Monitoring Plugins:

check-plugins/docker-stats/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,12 @@ myconti_ds_1 ! 0.0 ! 11.42
108108

109109
## Perfdata / Metrics
110110

111+
The plugin emits one CPU and one memory metric per container so individual workloads can be plotted long-term. Because container names appear and disappear as workloads come and go, the time-series backend (Graphite, InfluxDB, ...) will keep stale entries until they are pruned.
112+
111113
| Name | Type | Description |
112114
|----|----|----|
115+
| `<container>_cpu_usage` | Percentage | Per-container CPU usage, normalized by host CPU count. |
116+
| `<container>_mem_usage` | Percentage | Per-container memory usage, relative to the container memory limit or host memory. |
113117
| containers_running | Number | Number of running containers. |
114118
| cpu | Number | Number of host CPUs. |
115119

check-plugins/docker-stats/docker-stats

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ import lib.txt
2222
from lib.globals import STATE_CRIT, STATE_OK, STATE_UNKNOWN, STATE_WARN
2323

2424
__author__ = 'Linuxfabrik GmbH, Zurich/Switzerland'
25-
__version__ = '2026041002'
25+
__version__ = '2026051201'
2626

2727
DESCRIPTION = """Reports CPU and memory usage for all running Docker containers. CPU usage is
2828
normalized by dividing by the number of available host CPU cores. CPU alerts only
@@ -220,6 +220,26 @@ def main():
220220
cpu_usage = round(float(cpu_percent.replace('%', '').strip()) / host_cpus, 1)
221221
mem_usage = round(float(mem_percent.replace('%', '').strip()), 1)
222222

223+
# per-container perfdata for long-term trending of individual workloads
224+
perfdata += lib.base.get_perfdata(
225+
f'{name}_cpu_usage',
226+
cpu_usage,
227+
uom='%',
228+
warn=args.WARN_CPU,
229+
crit=args.CRIT_CPU,
230+
_min=0,
231+
_max=100,
232+
)
233+
perfdata += lib.base.get_perfdata(
234+
f'{name}_mem_usage',
235+
mem_usage,
236+
uom='%',
237+
warn=args.WARN_MEM,
238+
crit=args.CRIT_MEM,
239+
_min=0,
240+
_max=100,
241+
)
242+
223243
# save trend data to local sqlite database, limited to "count" rows max.
224244
lib.base.coe(
225245
lib.db_sqlite.insert(

check-plugins/docker-stats/unit-test/run

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ TESTS = [
3333
'Container ! CPU % ! Mem %',
3434
'------------------+-------+------',
3535
'traefik_traefik.2 ! 0.0 ! 0.0',
36+
# per-container perfdata (https://github.com/Linuxfabrik/monitoring-plugins/issues/1104)
37+
"'traefik_traefik.2_cpu_usage'=0.0%",
38+
"'traefik_traefik.2_mem_usage'=0.0%",
3639
],
3740
},
3841
{
@@ -45,6 +48,8 @@ TESTS = [
4548
'Container ! CPU % ! Mem %',
4649
'--------------------------------------------+-------+------',
4750
'traefik_traefik.2.1idw12p2yqpxutlzkcwign4at ! 0.0 ! 0.0',
51+
"'traefik_traefik.2.1idw12p2yqpxutlzkcwign4at_cpu_usage'=0.0%",
52+
"'traefik_traefik.2.1idw12p2yqpxutlzkcwign4at_mem_usage'=0.0%",
4853
],
4954
},
5055
{
@@ -58,6 +63,12 @@ TESTS = [
5863
'elasticsearch ! 188.8 ! 16.7',
5964
'graylog ! 204.2 ! 5.7',
6065
'mongo ! 0.3 ! 1.9',
66+
"'elasticsearch_cpu_usage'=188.8%",
67+
"'elasticsearch_mem_usage'=16.7%",
68+
"'graylog_cpu_usage'=204.2%",
69+
"'graylog_mem_usage'=5.7%",
70+
"'mongo_cpu_usage'=0.3%",
71+
"'mongo_mem_usage'=1.9%",
6172
],
6273
},
6374
{
@@ -71,6 +82,8 @@ TESTS = [
7182
'runner-7ayh6h5f-project-107-concurrent-0-37b2c7aee9359db9-build ! 95.0 ! 1.2',
7283
'runner-7ayh6h5f-project-19-concurrent-0-99f0211c36d59d01-build ! 59.5 ! 1.0',
7384
'runner-7ayh6h5f-project-49-concurrent-0-e180afe41fc754dc-predefined ! 79.5 ! 0.1',
85+
"'runner-7ayh6h5f-project-107-concurrent-0-37b2c7aee9359db9-build_cpu_usage'=95.0%",
86+
"'runner-7ayh6h5f-project-49-concurrent-0-e180afe41fc754dc-predefined_mem_usage'=0.1%",
7487
],
7588
},
7689
]

check-plugins/podman-stats/README.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -109,16 +109,20 @@ myconti_ds_1 ! 0.0 ! 11.42
109109

110110
## Perfdata / Metrics
111111

112+
The plugin emits one CPU and one memory metric per container so individual workloads can be plotted long-term. Because container names appear and disappear as workloads come and go, the time-series backend (Graphite, InfluxDB, ...) will keep stale entries until they are pruned.
113+
112114
| Name | Type | Description |
113115
|----|----|----|
114-
| block_input | Bytes | Total data read from block device across all containers. |
115-
| block_output | Bytes | Total data written to block device across all containers. |
116-
| containers_running | Number | Number of running containers. |
117-
| cpu | Number | Number of host CPUs. |
118-
| images | Number | Number of images. |
119-
| net_rx | Bytes | Total network bytes received across all containers. |
120-
| net_tx | Bytes | Total network bytes transmitted across all containers. |
121-
| ram | Bytes | Total host memory. |
116+
| `<container>_cpu_usage` | Percentage | Per-container CPU usage, normalized by host CPU count. |
117+
| `<container>_mem_usage` | Percentage | Per-container memory usage, relative to the container memory limit or host memory. |
118+
| block_input | Bytes | Total data read from block device across all containers. |
119+
| block_output | Bytes | Total data written to block device across all containers. |
120+
| containers_running | Number | Number of running containers. |
121+
| cpu | Number | Number of host CPUs. |
122+
| images | Number | Number of images. |
123+
| net_rx | Bytes | Total network bytes received across all containers. |
124+
| net_tx | Bytes | Total network bytes transmitted across all containers. |
125+
| ram | Bytes | Total host memory. |
122126

123127

124128
## Credits, License

check-plugins/podman-stats/podman-stats

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ import lib.shell
2222
from lib.globals import STATE_CRIT, STATE_OK, STATE_UNKNOWN, STATE_WARN
2323

2424
__author__ = 'Linuxfabrik GmbH, Zurich/Switzerland'
25-
__version__ = '2026041002'
25+
__version__ = '2026051201'
2626

2727
DESCRIPTION = """Reports CPU and memory usage for all running Podman containers. CPU usage is
2828
normalized by dividing by the number of available host CPU cores. CPU alerts only
@@ -232,6 +232,26 @@ def main():
232232
cpu_usage = round(float(container.get('CPU', 0)) / host_cpus, 1)
233233
mem_usage = round(float(container.get('MemPerc', 0)), 1)
234234

235+
# per-container perfdata for long-term trending of individual workloads
236+
perfdata += lib.base.get_perfdata(
237+
f'{name}_cpu_usage',
238+
cpu_usage,
239+
uom='%',
240+
warn=args.WARN_CPU,
241+
crit=args.CRIT_CPU,
242+
_min=0,
243+
_max=100,
244+
)
245+
perfdata += lib.base.get_perfdata(
246+
f'{name}_mem_usage',
247+
mem_usage,
248+
uom='%',
249+
warn=args.WARN_MEM,
250+
crit=args.CRIT_MEM,
251+
_min=0,
252+
_max=100,
253+
)
254+
235255
# accumulate totals for aggregate perfdata
236256
total_block_input += int(container.get('BlockInput', 0))
237257
total_block_output += int(container.get('BlockOutput', 0))

check-plugins/podman-stats/unit-test/run

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ TESTS = [
3333
'Container ! CPU % ! Mem %',
3434
'------------------+-------+------',
3535
'traefik_traefik.2 ! 0.0 ! 0.0',
36+
# per-container perfdata (https://github.com/Linuxfabrik/monitoring-plugins/issues/1104)
37+
"'traefik_traefik.2_cpu_usage'=0.0%",
38+
"'traefik_traefik.2_mem_usage'=0.0%",
3639
],
3740
},
3841
{
@@ -45,6 +48,8 @@ TESTS = [
4548
'Container ! CPU % ! Mem %',
4649
'--------------------------------------------+-------+------',
4750
'traefik_traefik.2.1idw12p2yqpxutlzkcwign4at ! 0.0 ! 0.0',
51+
"'traefik_traefik.2.1idw12p2yqpxutlzkcwign4at_cpu_usage'=0.0%",
52+
"'traefik_traefik.2.1idw12p2yqpxutlzkcwign4at_mem_usage'=0.0%",
4853
],
4954
},
5055
{
@@ -58,6 +63,12 @@ TESTS = [
5863
'elasticsearch ! 188.8 ! 16.7',
5964
'graylog ! 204.2 ! 5.7',
6065
'mongo ! 0.3 ! 1.9',
66+
"'elasticsearch_cpu_usage'=188.8%",
67+
"'elasticsearch_mem_usage'=16.7%",
68+
"'graylog_cpu_usage'=204.2%",
69+
"'graylog_mem_usage'=5.7%",
70+
"'mongo_cpu_usage'=0.3%",
71+
"'mongo_mem_usage'=1.9%",
6172
],
6273
},
6374
{
@@ -71,6 +82,8 @@ TESTS = [
7182
'runner-7ayh6h5f-project-107-concurrent-0-37b2c7aee9359db9-build ! 95.0 ! 1.2',
7283
'runner-7ayh6h5f-project-19-concurrent-0-99f0211c36d59d01-build ! 59.5 ! 1.0',
7384
'runner-7ayh6h5f-project-49-concurrent-0-e180afe41fc754dc-predefined ! 79.5 ! 0.1',
85+
"'runner-7ayh6h5f-project-107-concurrent-0-37b2c7aee9359db9-build_cpu_usage'=95.0%",
86+
"'runner-7ayh6h5f-project-49-concurrent-0-e180afe41fc754dc-predefined_mem_usage'=0.1%",
7487
],
7588
},
7689
]

0 commit comments

Comments
 (0)