Skip to content

Commit 636b71a

Browse files
committed
fix(cpu-usage): skip bogus all-zero CPU samples on Windows with 64+ cores (#626)
1 parent d669cbf commit 636b71a

File tree

2 files changed

+26
-2
lines changed

2 files changed

+26
-2
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Monitoring Plugins:
2828

2929
Build, CI/CD:
3030

31+
* Add MkDocs-based documentation site, deployed automatically to GitHub Pages via `tools/build-docs` and a GitHub Actions workflow
3132
* Add support for sle15 packages
3233
* Add support for sle16 packages
3334

@@ -124,6 +125,7 @@ Monitoring Plugins:
124125
* about-me: error in perfdata if using `--dmidecode` and there is no HW information
125126
* about-me: fix various errors with `sys_dimensions` on some machines ([#1006](https://github.com/Linuxfabrik/monitoring-plugins/issues/1006))
126127
* by-ssh: add missing `--verbose` parameter
128+
* cpu-usage: fix false 100% readings on Windows with 64+ cores caused by all-zero CPU time samples from psutil ([#626](https://github.com/Linuxfabrik/monitoring-plugins/issues/626))
127129
* file-age: handle `FileNotFoundError` race condition when files disappear on busy file systems
128130
* fs-ro: ignore `/run/credentials` (https://systemd.io/CREDENTIALS/)
129131
* keycloak-stats: fix incorrect symlink for lib

check-plugins/cpu-usage/cpu-usage

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ except ImportError:
2929

3030

3131
__author__ = 'Linuxfabrik GmbH, Zurich/Switzerland'
32-
__version__ = '2026040701'
32+
__version__ = '2026040801'
3333

3434
DESCRIPTION = """Reports CPU utilization percentages for all available time categories
3535
(user, system, idle, nice, iowait, irq, softirq, steal, guest, guest_nice) plus the overall
@@ -292,6 +292,28 @@ def main():
292292
stats['system'] = getattr(cpu_times_percent, 'system', 0)
293293
stats['user'] = getattr(cpu_times_percent, 'user', 0)
294294

295+
# Guard against bogus all-zero samples (#626).
296+
#
297+
# psutil.cpu_times_percent() can return 0% for ALL fields (including idle)
298+
# when the cumulative CPU time counters do not change between two samples.
299+
# This happens on some Windows systems with many cores (64+, multiple
300+
# processor groups), where the underlying GetSystemTimes() counters
301+
# occasionally stall or go backwards. psutil clips negative deltas to zero
302+
# (see psutil issues #392, #645, #1210), which can result in a total delta
303+
# of zero. In that case psutil returns 0% for every field.
304+
#
305+
# Without this guard, our formula "100 - idle(0) - nice(0)" would
306+
# incorrectly report 100% CPU usage. We detect this physically impossible
307+
# state (some CPU time MUST pass) and skip the sample entirely, so no
308+
# bogus data is stored or alerted on.
309+
if stats['idle'] == 0 and stats['user'] == 0 and stats['system'] == 0:
310+
lib.db_sqlite.close(conn)
311+
lib.base.oao(
312+
'Waiting for more data (got an all-zero CPU sample, skipping).',
313+
STATE_OK,
314+
always_ok=args.ALWAYS_OK,
315+
)
316+
295317
# this is what we want to warn about: 100% - idle - nice
296318
stats['cpu_usage'] = round(
297319
100.0 - stats['idle'] - stats['nice'],
@@ -383,5 +405,5 @@ def main():
383405
if __name__ == '__main__':
384406
try:
385407
main()
386-
except Exception: # pylint: disable=W0703
408+
except Exception:
387409
lib.base.cu()

0 commit comments

Comments
 (0)