The following standards apply to all Linuxfabrik repositories.
Please read and follow our Code of Conduct.
Open issues are tracked on GitHub Issues in the respective repository.
Some repositories use pre-commit for automated linting and formatting checks. If the repository contains a .pre-commit-config.yaml, install pre-commit and configure the hooks after cloning:
pre-commit installCommit messages follow the Conventional Commits specification:
<type>(<scope>): <subject>
If there is a related issue, append (fix #N):
<type>(<scope>): <subject> (fix #N)
<type> must be one of:
chore: Changes to the build process or auxiliary tools and librariesdocs: Documentation only changesfeat: A new featurefix: A bug fixperf: A code change that improves performancerefactor: A code change that neither fixes a bug nor adds a featurestyle: Changes that do not affect the meaning of the code (whitespace, formatting, etc.)test: Adding missing tests
Document all changes in CHANGELOG.md following Keep a Changelog. Sort entries within sections alphabetically.
Code, comments, commit messages, and documentation must be written in English.
GitHub Actions in .github/workflows/ are pinned by commit SHA, not by tag. Dependabot's github-actions ecosystem keeps these pins up to date.
Python packages installed via pip inside workflows follow a two-tier policy:
pre-commitis installed from a hash-pinned requirements file at.github/pre-commit/requirements.txt, generated withpip-compile --generate-hashes --strip-extrasfrom.github/pre-commit/requirements.in. Dependabot'spipecosystem watches that directory and maintains both files.- One-shot installs such as
ansible-builder,build,mkdocs,pdoc, andruffin release, docs, or test workflows are version-pinned only (package==X.Y.Z) and kept fresh by Dependabot. Scorecard'spipCommand not pinned by hashfindings for these are considered acceptable risk and may be dismissed.
- Sort variables, parameters, lists, and similar items alphabetically where possible.
- Always use long parameters when using shell commands.
- Use RFC 5737, 3849, 7042, and 2606 in examples and documentation:
- IPv4:
192.0.2.0/24,198.51.100.0/24,203.0.113.0/24 - IPv6:
2001:DB8::/32 - MAC:
00-00-5E-00-53-00through00-00-5E-00-53-FF(unicast),01-00-5E-90-10-00through01-00-5E-90-10-FF(multicast) - Domains:
*.example,example.com
- IPv4:
Use the example plugin as a skeleton for new plugins. It demonstrates all standard patterns, library functions, and coding conventions described below.
Monitoring an application can be complex and produce a wide variety of data. In order to standardize the handling of threshold values on the command line, to reduce the number of command line parameters and their interdependencies and to enable independent and thus extended designs of the Grafana panels, each topic should be dealt with in a separate check (following the Linux mantra: "one tool, one task").
Avoid an extensive check that covers a wide variety of aspects:
myapp --action threading --warning 1500 --critical 2000myapp --action memory-usage --warning 80 --critical 90myapp --action deployment-status(warning and critical command line options not supported)
Better write three separate checks:
myapp-threading --warning 1500 --critical 2000myapp-memory-usage --warning 80 --critical 90myapp-deployment-status
All plugins are written in Python and will be licensed under the UNLICENSE, which is a license with no conditions whatsoever that dedicates works to the public domain.
All plugins are coded using Python 3.9. Simply clone the libraries and monitoring plugins and start working:
git clone git@github.com:Linuxfabrik/lib.git
git clone git@github.com:Linuxfabrik/monitoring-plugins.gitChecklist:
- The plugin itself, tested on RHEL and Debian.
- README file explaining "How?" and "Why?"
- A free, monochrome, transparent SVG icon from https://simpleicons.org or https://fontawesome.com/search?ic=free, placed in the
icondirectory. - Optional:
unit-test/run- the unittest file (see Unit Tests) - Optional:
requirements.txt - If providing performance data: Grafana dashboard (see GRAFANA) and
.inifile for the Icinga Web 2 Grafana Module - Icinga Director Basket Config for the check plugin (
build-basket) - Icinga Service Set in
all-the-rest.json - Optional: sudoers file (see sudoers File)
- Optional: A screenshot of the plugins' output from within Icinga, resized to 423x106, using background-color
#f5f9fa, hosted on download.linuxfabrik.ch, and listed alphabetically in the projects README. - CHANGELOG
- Be brief by default. Report what needs to be reported to fix a problem. If there is more information that might help the admin, support a
--lengthyparameter. If the default output still grows unbounded on large systems (thousands of disk mounts, DHCP scopes, backends, services), also support a--briefparameter that hides rows within the thresholds. See "Verbosity parameter convention" below. - The plugin should be "self configuring" and/or using best practise defaults, so that it runs without parameters wherever possible.
- Develop with a minimal Linux in mind.
- Develop with Icinga2 in mind.
- Avoid complicated or fancy (and therefore unreadable) Python statements.
- If possible avoid libraries that have to be installed.
- Validate user input.
- It is ok to use temp files if needed.
- Much better: use a local SQLite database if you want to use a temp file.
- Keep in mind: Plugins have a limited runtime - typically 10 seconds max. Therefore it is ideal if the plugin executes fast and uses minimal resources (CPU time, memory etc.).
- Timeout gracefully on errors (for example
dfon a failed network drive) and return WARN. - Return UNKNOWN on missing dependencies or wrong parameters.
- Mainly return WARN. Only return CRIT if the operators want to or have to wake up at night. CRIT means "react immediately".
- EAFP: Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements.
- Pick the right unit-test flavor. If the plugin parses the output of a shell command, the body of a file, or an HTTP endpoint that returns a stable text format, write fixture-based tests driven by
lib.lftest.run()and aTESTSlist. They run in a fraction of a second, are fully reproducible, and cover the fulltox/ Python matrix. Only reach for container-based tests (vialib.lftest.run_container()and testcontainers-python) when the check's behaviour really depends on live runtime state of the service (log markers, cluster topology, write-then-read flows, version-dependent API responses that cannot be captured statically). - Combine container tests with fixtures for real coverage. Container tests anchor the happy path against a real service, but they rarely expose the interesting edge cases: a service that just crashed, a stale cache, a half-configured cluster, a component that responds with a 503, a counter that overflowed, a config that is syntactically valid but semantically broken. Those behaviours almost only show up in real operation, not in a freshly-started clean container. The pragmatic pattern is: one testcontainers scenario for the nominal state (so we notice when the vendor changes their API), plus a handful of fixture-based testcases that capture the weird states — ideally captured from real incidents, or synthesised from the plugin code and the vendor's documentation. Both flavours live side-by-side in the same
unit-test/runfile;tools/run-unit-tests --no-containerpicks the fixture path for the fast matrix andtools/run-container-testspicks the live scenarios for the integration runner.
Plugins must return one of the following POSIX-compliant exit codes. Use the constants from lib.base:
| Exit Code | Status | Constant | Meaning |
|---|---|---|---|
| 0 | OK | STATE_OK |
Service functioning properly |
| 1 | Warning | STATE_WARN |
Service above warning threshold or not working properly |
| 2 | Critical | STATE_CRIT |
Service not running or above critical threshold |
| 3 | Unknown | STATE_UNKNOWN |
Invalid arguments, missing dependencies, or internal plugin failures |
Guidelines:
- Return
STATE_UNKNOWNon missing dependencies, wrong parameters, or when--help/--versionis requested. - Return
STATE_WARNfor most alert conditions. Only returnSTATE_CRITif the situation requires immediate human intervention ("wake up at night"). - Never return any exit code other than 0, 1, 2, or 3.
- Use
lib.base.oao()(output and out) to print the result and exit with the appropriate state in a single call.
Short:
- Use
txt.to_text()andtxt.to_bytes().
The theory:
- Data coming into your plugins must be bytes, encoded with
UTF-8. - Decode incoming bytes as soon as possible (best by using the
txtlibrary), producing unicode. - Use unicode throughout your plugin.
- When outputting data, use library functions, they should do output conversions for you. Library functions like
base.oaoorurl.fetch_jsonwill take care of the conversion to and from bytes.
See https://nedbatchelder.com/text/unipain.html for details.
The plugin name should match the following regex: ^[a-zA-Z0-9\-\_]*$. This allows the plugin name to be used as the grafana dashboard uid (according to here).
There are a few Nagios-compatible reserved options that should not be used for other purposes:
-a, --authentication authentication password
-C, --community SNMP community
-c, --critical critical threshold
-h, --help help
-H, --hostname hostname
-l, --logname login name
-p, --password password
-p, --port network port
-t, --timeout timeout
-u, --url URL
-u, --username username
-V, --version version
-v, --verbose verbose
-w, --warning warning threshold
Every plugin must support at least --help and --version:
--help(-h): Print a short usage statement followed by a detailed description of all options with their defaults. Keep the output within 80 characters width. Exit withSTATE_UNKNOWN(3).--version(-V): Print the plugin name and version (__version__). Exit withSTATE_UNKNOWN(3).
Positional arguments are not allowed. All parameters must be named options.
For all other options, use long parameters only. Separate words using a -. We recommend using some out of those:
--activestate
--alarm-duration
--always-ok
--argument
--authtype
--brief
--cache-expire
--command
--community
--config
--count
--critical
--critical-count
--critical-cpu
--critical-maxchildren
--critical-mem
--critical-pattern
--critical-regex
--critical-slowreq
--database
--datasource
--date
--device
--donor
--filename
--filter
--full
--hide-ok
--hostname
--icinga-callback
--icinga-password
--icinga-service-name
--icinga-url
--icinga-username
--idsite
--ignore
--input
--insecure
--instance
--interface
--interval
--ipv6
--key
--latest
--lengthy
--loadstate
--message
--message-key
--metric
--mib
--mibdir
--mode
--module
--mount
--no-kthreads
--no-proxy
--no-summary
--node
--only-dirs
--only-files
--password
--path
--pattern
--perfdata
--perfdata-key
--period
--port
--portname
--prefix
--privlevel
--response
--service
--severity
--snmp-version
--starttype
--state
--state-key
--status
--substate
--suppress-lines
--task
--team
--test
--timeout
--timerange
--token
--trigger
--type
--unit
--unitfilestate
--url
--username
--version
--virtualenv
--warning
--warning-count
--warning-cpu
--warning-maxchildren
--warning-mem
--warning-pattern
--warning-regex
--warning-slowreq
Parameter types are usually:
type=floattype=inttype=lib.args.csvtype=lib.args.float_or_nonetype=lib.args.int_or_nonetype=str(the default)choices=['udp', 'udp6', 'tcp', 'tcp6']action='store_true',action='store_false'for switches
Threshold parameters (--warning, --critical) in new plugins must use type=str (not int or float) to support Nagios range expressions like 80, 10:, ~:50, @10:20. In main(), use lib.base.get_state(value, args.WARN, args.CRIT, _operator='range'). See the example plugin and the Threshold and Ranges section for details.
Hints:
- For complex parameter tupels, use the
csvtype.--input='Name, Value, Warn, Crit'results in[ 'Name', 'Value', 'Warn', 'Crit' ] - For repeating parameters, use the
appendaction. Adefaultvariable has to be a list then.--input=a --input=bresults in[ 'a', 'b' ] - If you combine
csvtype andappendaction, you get a two-dimensional list:--repeating-csv='1, 2, 3' --repeating-csv='a, b, c'results in[['1', '2', '3'], ['a', 'b', 'c']] - If you want to provide default values together with
append, inparser.add_argument(), leave thedefaultasNone. If aftermain:parse_args()the value is stillNone, put the desired default list (or any other object) there. The primary purpose of the parser is to parse the commandline - to figure out what the user wants to tell you. There's nothing wrong with tweaking (and checking) theargsNamespace after parsing. (According to https://bugs.python.org/issue16399) - When it comes to parameters, stay backwards compatible. If you have to rename or drop parameters, keep the old ones, but silently ignore them. This helps admins deploy the monitoring plugins to thousands of servers, while the monitoring server is updated later for various reasons. To be as tolerant as possible, replace the parameter's help text with
help=argparse.SUPPRESS:
def parse_args():
"""Parse command line arguments using argparse.
"""
parser = argparse.ArgumentParser(description=DESCRIPTION)
parser.add_argument(
'--my-old-and-deprecated-parameter',
help=argparse.SUPPRESS,
dest='MY_OLD_VAR',
)- A plugin should tolerate unknown parameters. Imagine an monitoring system that checks thousand hosts. You want to update a plugin offering a new parameter that is essential for you, so you adjust the service definition, add the new parameter and update the plugin on one host. The non-updated plugin on the other 999 hosts will throw an 'UNKNOWN' error when argparse is used with
parser.parse_args(). This would significantly disrupt operations and cause stress. Therefore, it makes more sense to be tolerant and useparser.parse_known_args().
Help texts must be consistent across all plugins. Each property of a parameter goes on its own line (using Python implicit string concatenation). This makes it easy to scan, compare, and maintain. The order is:
- Purpose (what the parameter does)
- Data type or format (if not obvious)
- Regex or case-sensitivity note (if applicable)
- Repeating note (if applicable)
- Nagios range support (if applicable)
- Example (if helpful)
- Default value (always last, always present if there is one)
Standard help texts for common parameters are defined centrally in lib.args.HELP_TEXTS. Use lib.args.help('--parameter-name') wherever possible instead of writing help text inline:
# standard parameter - use lib.args.help()
parser.add_argument(
'--timeout',
help=lib.args.help('--timeout'),
dest='TIMEOUT',
type=int,
default=DEFAULT_TIMEOUT,
)
# plugin-specific parameter - write help text inline, same format
parser.add_argument(
'--token',
help='Software API token.',
dest='TOKEN',
required=True,
)
# plugin-specific prefix + global help text
parser.add_argument(
'--url',
help='GitLab health URL endpoint. ' + lib.args.help('--url'),
dest='URL',
default=DEFAULT_URL,
)Rules:
- Use
%(default)sfor defaults, never hardcode the value. Omit the default forstore_true/store_falseswitches (e.g.--always-ok,--insecure,--no-proxy,--lengthy) since they are always False when not specified. - Defaults and examples go on their own lines.
- Say "Can be specified multiple times." for
action='append'parameters (not "(repeating)"). - Say "Supports Nagios ranges." when
lib.base.get_state()is used with the value. - Always state case-sensitivity explicitly: "Case-insensitive." or "Case-sensitive."
- Say "Uses Python regular expressions." when the parameter accepts a regex.
- End every help text with a period.
- Parameters that are identical across plugins must use identical help texts.
Use the plugin name as commit scope:
fix(about-me): cryptography deprecation warning (fix #341)
For the first commit, use the message Add <plugin-name>.
If a threshold has to be handled as a range parameter, this is how to interpret them. Compatible with the Monitoring Plugins Development Guidelines and the Nagios Plugin Development Guidelines.
The generalized range format is [@]start:end:
startmust be less than or equal toend.startand:are not required ifstartis 0.- simple value: a range from 0 up to and including the value
- empty value after
:: positive infinity ~: negative infinity@: if range starts with "@", then alert if inside this range (including endpoints)- An alert is raised if the metric is outside the range (inclusive of endpoints). The
@prefix inverts this logic.
Examples:
-w, -c |
OK if result is | WARN/CRIT if |
|---|---|---|
| 10 | in (0..10) | not in (0..10) |
| -10:0 | in (-10..0) | not in (-10..0) |
| 10: | in (10..inf) | not in (10..inf) |
| : | in (0..inf) | not in (0..inf) |
| ~:10 | in (-inf..10) | not in (-inf..10) |
| 10:20 | in (10..20) | not in (10..20) |
| @10:20 | not in (10..20) | in 10..20 |
| @~:20 | not in (-inf..20) | in (-inf..20) |
| @ | not in (0..inf) | in (0..inf) |
So, a definition like --warning 2:100 --critical 1:150 should return the states:
val 0 1 2 .. 100 101 .. 150 151
-w WA WA OK OK WA WA WA
-c CR OK OK OK OK OK CR
=> CR WA OK OK WA WA CR
Another example: --warning 190: --critical 200:
val 189 190 191 .. 199 200 201
-w WA OK OK OK OK OK
-c CR CR CR CR OK OK
=> CR CR CR CR OK OK
Another example: --warning ~:0 --critical 10
val -2 -1 0 1 .. 9 10 11
-w OK OK OK WA WA WA WA
-c CR CR OK OK OK OK CR
=> CR CR OK WA WA WA CR
Have a look at procs on how to implement this.
Use cache if you need a simple key-value store, for example as used in nextcloud-version. Otherwise, use db_sqlite as used in cpu-usage.
- Catch exceptions using
try/except, especially in functions. Never use bareexcept:without specifying the exception type. Useexcept Exception:as the broadest acceptable catch-all. - In functions, if you have to catch exceptions, on such an exception always return
(False, errormessage). Otherwise return(True, result)if the function succeeds in any way. For example, returning(True, False)means that the function has not raised an exception and its result is simplyFalse. - A function calling a function with such an extended error handling has to return a
(retc, result)tuple itself. - In
main()you can uselib.base.coe()to simplify error handling. - Have a look at
nextcloud-versionfor details.
By the way, when running the compiled variants, this gives the nice and intended error if the module is missing:
try:
import psutil
except ImportError:
print('Python module "psutil" is not installed.')
sys.exit(STATE_UNKNOWN)while this leads to an ugly multi-exception stacktrace:
try:
import psutil
except ImportError:
lib.base.cu('Python module "psutil" is not installed.')Plugins have a limited runtime - typically 10 seconds max. Every plugin must handle timeouts gracefully to prevent hanging processes (e.g. df on a failed network drive, unresponsive API endpoints, stuck database connections).
- Always support a
--timeoutparameter (default: 8 seconds, leaving headroom for Icinga's own 10s timeout). - Use
lib.base.coe(lib.url.fetch(..., timeout=args.TIMEOUT))for HTTP requests - the library handles timeouts. - For shell commands, pass a timeout to
lib.shell.shell_exec(). - If a timeout occurs, return
STATE_WARNwith a meaningful message (e.g. "Timeout after 8s while connecting to ...").
- External commands: When executing system commands, use
lib.shell.shell_exec(). Avoidos.system()orsubprocesswithshell=True, as these are vulnerable to shell injection. The official Monitoring Plugins guidelines require full paths for all external commands to prevent PATH-based trojan hijacking. Ourlib.shell.shell_exec()usessubprocesswithshell=False, which eliminates shell injection. We accept PATH-based command resolution for cross-platform compatibility (paths differ across distributions), but be aware that a compromised PATH could still redirect commands. - Input validation: Validate all user-supplied input. Use
argparsetype converters (type=int,type=float,type=lib.args.csv) to enforce expected types. - Temporary files: Avoid temporary files where possible. Prefer a local SQLite database via
lib.db_sqliteorlib.cache. If temp files are unavoidable, fail cleanly if the file cannot be created, and delete it when done. - Symlinks: If a plugin opens or reads files, ensure it does not follow symlinks to unintended locations.
- Credentials: Never log or print passwords, tokens, or other secrets in plugin output - not even in verbose mode.
- Network communication: Use HTTPS by default. Support
--insecureto allow self-signed certificates where needed, but never make insecure the default.
Plugins must only print to STDOUT. Never print to STDERR, as Icinga/Nagios does not capture it.
The output structure follows the Monitoring Plugins standard:
STATUS_TEXT - summary message | perfdata
detailed line 1
detailed line 2 | more_perfdata
The first line is the most important - Icinga/Nagios uses it for notifications, web interface display, and SMS alerts. Everything after the first newline is considered "long output" and only shown in detail views.
Rules:
- Print a short concise message in the first line within the first 80 chars if possible.
- Use multi-line output for details (
msg_body), with the most important output in the first line (msg_header). - Performance data is separated from text output by a pipe (
|) character. Additional perfdata can follow on subsequent lines after a pipe. - Do not use the pipe character (
|) in the text output itself, as Icinga/Nagios uses it as a delimiter to separate text from performance data.lib.base.oao()automatically replaces stray pipes in the message. - Don't print "OK".
- Print "[WARNING]" or "[CRITICAL]" for clarification next to a specific item using
lib.base.state2str(). - If possible give a help text to solve the problem.
- Multiple items checked, and ...
- ... everything ok? Print "Everything is ok." or the most important output in the first line, and optional the items and their data attached in multiple lines.
- ... there are warnings or errors? Print "There are warnings." or "There are errors." or the most important output in the first line, and optional the items and their data attached in multiple lines.
- Based on parameters etc. nothing is checked at the end? Print "Nothing checked."
- Wrong username or password? Print "Failed to authenticate."
- Use short "Units of Measurements" without white spaces, including these terms:
- Bits: use
human.bits2human() - Bytes: use
human.bytes2human() - I/O and Throughput:
human.bytes2human() + '/s'(Byte per Second) - Network: "Rx/s", "Tx/s", use
human.bps2human() - Numbers: use
human.number2human() - Percentage: 93.2%
- Read/Write: "R/s", "W/s", "IO/s"
- Seconds, Minutes etc.: use
human.seconds2human() - Temperatures: 7.3C, 45F.
- Bits: use
- Use ISO format for date or datetime ("yyyy-mm-dd", "yyyy-mm-dd hh:mm:ss")
- Print human readable datetimes and time periods ("Up 3d 4h", "2019-12-31 23:59:59", "1.5s")
If a plugin supports -v/--verbose, it should implement up to three verbosity levels (stackable -v -v -v or --verbose --verbose --verbose):
| Level | Output |
|---|---|
| 0 (default) | Single-line summary, minimal output |
1 (-v) |
Single-line with additional detail (e.g. list of affected items) |
2 (-v -v) |
Multi-line with configuration debug info (e.g. commands executed, API endpoints queried) |
3 (-v -v -v) |
Extensive diagnostic detail for troubleshooting |
Note: Most of our plugins use --lengthy instead of -v for extended output. The verbosity levels above apply if the plugin explicitly supports --verbose.
--lengthy and --brief are the two verbosity knobs admins use to tune what a plugin prints. They are orthogonal (not mutually exclusive) and control different axes of the output:
| Parameter | Axis | Effect |
|---|---|---|
| default | rows × columns | Show all checked items with the core columns. |
--lengthy |
columns | Add extra columns to every row (e.g. full details, debug info). |
--brief |
rows | Hide rows that are within the thresholds. Show only items in WARN/CRIT state. |
--lengthy --brief |
rows × columns | Hide OK rows, show extra columns on the rows that remain. |
Rules:
- Perfdata is always complete.
--briefand--lengthyonly reshape the human-readable message. Every checked item still emits perfdata so Grafana can trend everything. - Alerting is unaffected. All items (including the ones
--briefhides) still drive the overall check state.--briefis a display filter, not a threshold. - When
--briefhides everything, the plugin prints only the summary header ("Everything is ok. (thresholds)"), not an empty table. Admins on a quiet system see one line. --lengthyand--briefare always combinable. Do not mark them mutually exclusive inargparse.- When to support
--brief: add it whenever the default output can grow unbounded on large systems (hundreds of disk mounts, thousands of DHCP scopes, hundreds of HAProxy backends, etc.). Reference implementations:check-plugins/disk-usageandcheck-plugins/dhcp-scope-usage. - Help text for
--briefshould describe the filter semantic and explicitly state that perfdata and alerting are unaffected, so the admin understands that--briefis safe to use on production without losing trending data.
"UOM" means "Unit of Measurement".
Format (space-separated label/value pairs):
'label'=value[UOM];[warn];[crit];[min];[max]
Rules:
- Labels may contain any characters except
=(equals) and'(single quote). - Prefer
snake_caselabels. Multi-word labels should use underscores, not spaces (active_processes, notactive processes). Per-instance labels should prefix the instance name with an underscore (<instance>_<metric>), for examplesda_read_bytes,www_saturation,procs_cpu_percent. This matches the convention used byprocs,disk-io, and other per-instance plugins, makes Grafana regex and InfluxDB tag matching trivial, and avoids the need for single quotes around labels in theSTATUS_TEXT | perfdataline. Sanitize pool/instance names withre.sub(r'\W+', '_', name)so exotic names likemy-appbecomemy_app_<metric>. - Single quotes around the label are optional but required if the label contains spaces. Prefer underscores over spaces so the quotes are never needed.
- The first 19 characters of a label should be unique (RRD data source limitation).
value,min, andmaxmust match the character class[-0-9.]and share the same UOM.warnandcrituse the range format (see Threshold and Ranges).minandmaxare not required for percentage (%) UOM.- Trailing unfilled semicolons may be dropped.
UOM suffixes:
no unit specified - assume a number (int or float) of things (eg, users, processes, load averages)
s - seconds (also us, ms etc.)
% - percentage
B - bytes (also KB, MB, TB etc.). Bytes preferred, they are exact.
c - a continuous counter (such as bytes transmitted on an interface [so instead of 'B']) - do not use
Do not use continuous counters (c). Instead, calculate the delta between two measurements in the plugin itself and emit the result as an absolute value with a real unit (#320). Store the previous measurement in a local SQLite database using lib.db_sqlite. This approach:
- Avoids forcing Grafana to compute
non_negative_difference()over millions of data points on every panel refresh. - Enables correct aggregation in Grafana (
mean(),min(),max()work as expected on absolute values, but produce wrong results on cumulative counters). - Allows meaningful legend tables (first, min, mean, max, last) in Grafana panels.
- Preserves the actual unit of measurement (
B,%,s, etc.) in perfdata. - Saves resources on the monitoring server by doing the calculation once per check run instead of repeatedly in Grafana.
See the example plugin for a complete implementation of this pattern.
Wherever possible, prefer percentages over absolute values to assist users in comparing different systems with different absolute sizes.
Be aware of already-aggregated values returned by systems and applications. Apache for example returns a value "137.5 kB/request". Sounds good, but this is not a value at the current time of measurement. Instead, it is the average of all requests during the lifetime of the Apache worker process. If you use this in some sort of Grafana panel, you just get a boring line which converges towards a constant value very fast. Not useful at all.
A monitoring plugin has to calculate such values always on its own. If this is not possible because of missing data, discard them.
We use PEP 8 -- Style Guide for Python Code where it makes sense.
String quoting: Use single quotes as the default. Use double quotes only inside f-string expressions (e.g. f'{lib.base.state2str(state, prefix=" ")}') or when the string itself contains single quotes (e.g. 'Python module "psutil" is not installed.'). Use """ for all triple-quoted strings (docstrings, DESCRIPTION, SQL, etc.). This is enforced by ruff format.
Every plugin must define a DESCRIPTION variable that is passed to argparse.ArgumentParser(description=DESCRIPTION). Rules:
- At least 2-3 sentences that explain what the plugin does, from the perspective of the admin deploying it.
- Written in fluent English, no implementation details (no mention of library functions, class names, or internal patterns).
- The first sentence must describe the purpose: what the plugin monitors or collects (e.g. "Monitors CPU utilization on ...", "Checks the installed ... version against ...").
- Must include at least one "Alerts when ..." or "Alerts if ..." clause that tells the admin under which conditions the plugin raises a warning or critical state.
- Use
"""triple quotes. - Keep line length around 90 characters.
- Plugins of the same type (e.g. all
-versionchecks, allhuawei-dorado-*checks) must use identical or near-identical DESCRIPTION text, with only the product name swapped. Consistency across plugin families is mandatory. - If the plugin has a sudoers file in
assets/sudoers/, the DESCRIPTION must end with "Requires root or sudo." - If the plugin uses
lib.smb, the DESCRIPTION must mention SMB share support. - If the plugin uses
--countfor consecutive threshold violations (Handles Periods), the DESCRIPTION must mention this behavior, e.g. "Alerts only if the threshold has been exceeded for a configurable number of consecutive check runs (default: 5), suppressing short spikes." - If the plugin supports
--lengthy, the DESCRIPTION must mention "Supports extended reporting via --lengthy." - The README Overview must include at least the text from the
DESCRIPTION.
We document our Libraries using numpydoc docstrings, so that calling pydoc lib/base.py works, for example.
We use ruff as the primary linter and formatter. It covers PEP 8 enforcement, import sorting (replaces isort), and common bug patterns. Configuration is in pyproject.toml. Both ruff-check and ruff-format run automatically as pre-commit hooks.
# check a single plugin
ruff check check-plugins/my-check/my-check
# format a single plugin
ruff format check-plugins/my-check/my-checkPyLint runs as a second linter after ruff in the pre-commit hooks. It catches additional issues that ruff does not cover (e.g. undefined variables across module boundaries).
# lint a single plugin
pylint check-plugins/my-check/my-checkUnit tests are implemented using the unittest framework (https://docs.python.org/3/library/unittest.html) with a declarative, data-driven approach. Test definitions are a list of dicts (or a list of platform/image items for container tests), materialised into one real unittest test method per item via lib.lftest.attach_tests() or lib.lftest.attach_each(). See the example plugin for the reference implementation.
check-plugins/my-check/unit-test/
├── run # the test file
└── stdout/ # test data files (fixtures)
├── empty-response # scenario-based names, not EXAMPLE01
├── three-nodes-healthy
└── three-nodes-one-down
Only create stderr/ if a test actually needs to inject stderr data. Do not create empty retc/ or stderr/ directories.
Fixture files in stdout/ are named after the scenario they represent, not after an expected plugin state. The expected state depends on the combination of fixture content and plugin parameters (thresholds, filters, switches) and therefore cannot be encoded in the fixture filename alone.
Use descriptive, lowercase, hyphenated names that describe the shape of the data:
empty-response,single-node,three-nodes-healthycpu-80-percent,disk-nearly-full,memory-400mb-usedthree-nodes-one-down,malformed-json,service-unreachable
The same fixture may (and should) be reused by multiple testcases that vary the plugin parameters to reach different states. For example, a single cpu-80-percent fixture can drive an ok-below-warn test with --warning 90 --critical 95, a warn-above-warn test with --warning 70 --critical 95, and a crit-above-crit test with --warning 50 --critical 75.
The expected state is encoded in the testcase id instead (see below).
Define a TESTS list and use lib.lftest.attach_tests() to materialise one real unittest test method per testcase. The testcase id should lead with the expected state (ok-, warn-, crit-, unknown-), followed by a short description of what is being verified. The id becomes the test method name, so it shows up in ./run -v output and in the unittest test count.
#!/usr/bin/env python3
import sys
sys.path.insert(0, '..')
import unittest
from lib.globals import STATE_CRIT, STATE_OK, STATE_UNKNOWN, STATE_WARN
import lib.lftest
TESTS = [
# Same fixture, three different threshold combinations.
{
'id': 'ok-below-warn',
'test': 'stdout/cpu-80-percent,,0',
'params': '--warning 90 --critical 95',
'assert-retc': STATE_OK,
'assert-in': ['80%'],
},
{
'id': 'warn-above-warn',
'test': 'stdout/cpu-80-percent,,0',
'params': '--warning 70 --critical 95',
'assert-retc': STATE_WARN,
'assert-regex': r'80%.*\[WARNING\]',
},
{
'id': 'crit-above-crit',
'test': 'stdout/cpu-80-percent,,0',
'params': '--warning 50 --critical 75',
'assert-retc': STATE_CRIT,
'assert-regex': r'80%.*\[CRITICAL\]',
},
# Different fixture, different scenario.
{
'id': 'unknown-malformed-json',
'test': 'stdout/malformed-json,,0',
'params': '--warning 80 --critical 90',
'assert-retc': STATE_UNKNOWN,
},
]
class TestCheck(unittest.TestCase):
check = '../my-check'
lib.lftest.attach_tests(TestCheck, TESTS)
if __name__ == '__main__':
unittest.main()The reason for attach_tests() over a plain for loop with subTest() is reporting accuracy. A plain for ... subTest() loop runs every testcase but unittest only counts the surrounding method, so ./run reports Ran 1 test in ...s regardless of how many fixtures the file actually exercises. attach_tests() materialises one test_<id> method per entry in TESTS, so ./run reports the real count and ./run -v lists every scenario by name. Failures still surface in either pattern, but the count is misleading without the helper.
Naming rules for the testcase id:
- Lead with the expected state:
ok-,warn-,crit-,unknown-. - Follow with a short description of what the test verifies (not the fixture name):
ok-below-warn,warn-above-warn,crit-disk-full,unknown-missing-dependency. idmust be unique within theTESTSlist.
Available assertion keys in each testcase dict:
assert-retc(int, required): Expected return code (STATE_OK,STATE_WARN,STATE_CRIT,STATE_UNKNOWN).assert-in(listofstr, optional): Strings that must appear in stdout.assert-not-in(listofstr, optional): Strings that must not appear in stdout.assert-regex(str, optional): Regex pattern that must match stdout.assert-stderr(str, optional): Expected stderr content. Default:''.
Two iteration shapes show up in the test files, and each has its own helper. Both materialise one real unittest test method per item so ./run reports an accurate count and ./run -v names every case.
-
TESTS list (the default): a list of testcase dicts executed by
lib.lftest.run(). Uselib.lftest.attach_tests(TestCheck, TESTS). This is the right shape for everything that injects fixture data via--test=stdout/.... -
Platform list (container-based tests): a list of images, Containerfiles, or scenario dicts where each item needs its own setup (spin up a container, build an image, reset a cache DB). Use
lib.lftest.attach_each(TestCheck, ITEMS, action, id_func=...)with anaction(test, item)callable that does the per-item work. Theid_functurns one item into the test method name. Examples in-tree:check-plugins/mysql-connections/unit-test/runiterates over anIMAGESlist of(image, label)tuples, withid_func=lambda it: it[1].check-plugins/cpu-usage/unit-test/runiterates over aCONTAINERFILESlist of strings, with the defaultid_func=str.check-plugins/apache-httpd-status/unit-test/runiterates over aSCENARIOSlist of dicts where the action resets a cache DB and then replays a multi-step sequence.
Do not fall back to a plain for ... subTest() loop. It still executes every case and failures still surface, but unittest collapses the whole loop into a single method and reports Ran 1 test, which hides the real coverage count from ./run and from the tox summary.
Unit tests come in two flavors:
- Fast tests use
--testto inject fixture data and run in a fraction of a second. They are safe for CI and for the multi-Python-versiontoxmatrix. - Container tests build a podman image per target OS, inject the plugin and
lib/, and exercise the check against a live service. They need podman on the host and take minutes per plugin. A plugin counts as a container test when itsunit-test/directory has acontainerfiles/subdirectory.
Everyday commands:
# single plugin (from its unit-test directory)
cd check-plugins/my-check/unit-test
./run
# single plugin (from the repo root)
python tools/run-unit-tests my-check
# all plugins (fast + container)
python tools/run-unit-tests
# only the fast tests (used by tox)
python tools/run-unit-tests --no-container
# only the container tests (also available as a thin wrapper)
python tools/run-unit-tests --only-container
python tools/run-container-testsMulti-Python coverage via tox:
tox # all supported Python versions, fast tests only
tox -e py39 # single environmenttox invokes tools/run-unit-tests --no-container so the multi-Python matrix skips the container suite. Run tools/run-container-tests separately before a release for full integration coverage.
tox builds each Python environment from sdist, and linuxfabrik-lib pulls in netifaces which has no binary wheels on PyPI. For every Python version in the tox matrix you want to run locally, the matching development headers must be installed on the host, otherwise pip falls back to building netifaces from source and aborts with fatal error: Python.h: No such file or directory:
# Fedora
sudo dnf install python3.9-devel python3.10-devel python3.11-devel \
python3.12-devel python3.13-devel python3.14-devel
# Debian / Ubuntu
sudo apt install python3.9-dev python3.10-dev python3.11-dev \
python3.12-dev python3.13-dev python3.14-devskip_missing_interpreters = true already skips environments for Python versions that are not installed at all.
Container-based tests come in two shapes:
- Plugin runs from the host, service runs in the container. Used when the plugin talks to a real service over the network (Keycloak, Redis, a database, a web API). Pull an upstream service image, expose its port, point the plugin at the container from outside. This is the common case.
- Plugin runs inside the container. Used when the plugin reads host-local resources (
/proc,/sys, distro-shipped binaries, distro-specific Python/psutil field availability) and there is no meaningful way to fixture the input. The plugin is bind-mounted into the container and executed viacontainer.exec(). Seecheck-plugins/cpu-usage/unit-test/runfor the reference implementation.
Use the lib.lftest.run_container() helper from the linuxfabrik-lib package. It wraps testcontainers-python so that container lifecycle, port exposure, environment variables and log-based readiness waits are declarative rather than hand-rolled podman orchestration.
Minimal example (see check-plugins/keycloak-version/unit-test/run for the full reference):
import subprocess
import sys
import unittest
sys.path.append('..')
import lib.lftest
from lib.globals import STATE_OK, STATE_WARN, STATE_CRIT
IMAGES = [
('quay.io/keycloak/keycloak:25.0.6', 'v25'),
('quay.io/keycloak/keycloak:26.6', 'v26'),
]
class TestCheck(unittest.TestCase):
pass
def _check_image(test, image_pair):
image, version_tag = image_pair
with lib.lftest.run_container(
image,
env={
'KEYCLOAK_ADMIN': 'admin',
'KEYCLOAK_ADMIN_PASSWORD': 'admin',
},
ports=[8080],
command='start-dev',
wait_log='Listening on:',
) as container:
url = f'http://{container.get_container_host_ip()}:{container.get_exposed_port(8080)}'
result = subprocess.run(
['python3', '../keycloak-version',
f'--url={url}', '--username=admin', '--password=admin',
'--path=/nonexistent'],
capture_output=True, text=True,
)
test.assertRegex(
result.stdout + result.stderr,
rf'Keycloak\s+{version_tag}',
)
test.assertIn(
result.returncode, (STATE_OK, STATE_WARN, STATE_CRIT),
)
lib.lftest.attach_each(TestCheck, IMAGES, _check_image, id_func=lambda it: it[1])Rules and tips:
- Pull upstream images whenever possible. You do not need a custom
Containerfilethat injects Python into the service image, because the plugin runs from the host and connects to the container via the exposed port. That is the common case for API-driven checks. - Wait on a log marker, not a sleep. The
wait_logargument takes a substring that the service writes to stdout/stderr when it is ready (e.g.Listening on:for Keycloak,ready for connections.for MariaDB). Usewait_log_timeoutfor services that take longer than 2 minutes to start. - Do not hardcode state-shifting assertions. If the plugin reports something that depends on today's date (EOL windows, "last seen N days ago", "expires in X days"), assert only that the plugin returned a valid state (any of
STATE_OK,STATE_WARN,STATE_CRIT) and that the output contains the expected version / service identifier. Locking in a specific state will break the test every time the calendar moves past a boundary. - Multi-version matrix goes in an
IMAGESlist (orCONTAINERFILES,SCENARIOS, ...) at the top of the test file, materialised into one real test method per item vialib.lftest.attach_each(). Add a new major release at the bottom of the list when it becomes available upstream. See the "Iterating over TESTS vs. platforms" subsection below for the rationale. - Rootless podman: testcontainers-python works, but the Ryuk cleanup container needs to be disabled. Set
TESTCONTAINERS_RYUK_DISABLED=trueandCONTAINER_HOST=unix:///run/user/$UID/podman/podman.sockbefore running the tests.tools/run-unit-testssets both automatically when it detects a container-based test. - Do not run container tests via
tox. They are integration tests and belong intools/run-container-tests, not in the multi-Python matrix.tools/run-unit-testsdetects them automatically by inspecting therunfile forpodmanortestcontainersreferences. - Keep hand-rolled podman orchestration out of new tests. If you find a plugin that still builds containers via
subprocess.run(['podman', 'build', ...]), migrate it tolib.lftest.run_container(); the old pattern is being retired.
Some plugins can only be tested meaningfully when they run inside a distribution they target, because the data source is host-local (/proc, /sys, a distro binary like mariadb --version, or a psutil field whose availability depends on kernel version + python version + distro packaging). There is no network endpoint we can redirect, and a static fixture would hide the thing we actually want to test: "does this plugin run cleanly on the distros our customers run".
For that case, use a Containerfile per target distro under unit-test/containerfiles/<distro>-v<version> that installs python3, the plugin's requirements and keeps the container alive via CMD ["sleep", "infinity"]. Iterate over the file list via lib.lftest.attach_each() so every distro shows up as its own test_<distro> method. Bind-mount lib/ and the plugin script into /tmp and run them via container.exec().
The canonical distro matrix is the cpu-usage CONTAINERFILES list. Where possible, new "plugin runs inside the container" tests should target the same OS platforms so the coverage stays consistent across plugins and adding a new distro is a single-line change everywhere:
archlinux-vlatest
debian-v11 / v12 / v13
fedora-v35 / v40 / v41 / v42 / v43
rhel-v8 / v9 / v10
sles-v15 / v16
ubuntu-v2004 / v2204 / v2404 / v2604
Rules and tips:
- Reuse cpu-usage's
containerfiles/as a starting point for a new plugin - the per-distro bootstrap (pacman / apt / dnf / zypper + venv +pip install -r requirements.txt --require-hashes) is identical, only the bind-mount path for the plugin script changes. clean_up=FalseonDockerImage. Testcontainers' default cleans up the built image and prunes dangling parent layers on exit, which turns every run into a full rebuild.clean_up=Falsekeeps the image around so subsequent runs hit podman's layer cache and finish in seconds.,Zon bind mounts. On SELinux-enforcing hosts (RHEL, Fedora, Rocky) unrelabelled bind mounts are denied by the container runtime.mode='ro,Z'relabels the source so the container can read it; without theZflag the plugin inside the container sees "Permission denied" onimport lib.- Rootless podman caveats - same as for the service-container pattern:
TESTCONTAINERS_RYUK_DISABLED=truemust be set,CONTAINER_HOST/DOCKER_HOSTmust point at the rootless socket.tools/run-unit-testsdoes this automatically.
If the plugin requires sudo-permissions to run, please add the plugin to the sudoers-files for all supported operating systems in assets/sudoers/. The OS name should match the ansible variables ansible_facts['distribution'] + ansible_facts['distribution_major_version'] (eg CentOS7). Use symbolic links to prevent duplicate files.
Attention: The newline at the end is required!
Each plugin should provide its required Director config in form of a Director basket. The basket usually contains at least one Command, one Service Template and some associated Datafields. The rest of the Icinga Director configuration (Host Templates, Service Sets, Notification Templates, Tag Lists, etc) can be placed in the assets/icingaweb2-module-director/all-the-rest.json file.
The Icinga Director Basket for one or all plugins can be created using the build-basket tool.
Always review the basket before committing.
After writing a new check called new-check, generate a basket file using:
./tools/build-basket --plugin-file check-plugins/new-check/new-checkThe basket will be saved as check-plugins/new-check/icingaweb2-module-director/new-check.json. Inspect the basket, paying special attention to:
- Command:
timeout - ServiceTemplate:
check_interval - ServiceTemplate:
criticality - ServiceTemplate:
enable_perfdata - ServiceTemplate:
max_check_attempts - ServiceTemplate:
retry_interval
Never directly edit a basket JSON file. If adjustments must be made to the basket, create a YML/YAML config file for build-basket.
For example, to set the timeout to 30s, to enable notifications and some other options, the config in check-plugins/new-check/icingaweb2-module-director/new-check.yml should look as follows:
---
variants:
- linux
- windows
overwrites:
'["Command"]["cmd-check-new-check"]["command"]': '/usr/bin/sudo /usr/lib64/nagios/plugins/new-check'
'["Command"]["cmd-check-new-check"]["timeout"]': 30
'["ServiceTemplate"]["tpl-service-new-check"]["check_command"]': 'cmd-check-new-check-sudo'
'["ServiceTemplate"]["tpl-service-new-check"]["check_interval"]': 3600
'["ServiceTemplate"]["tpl-service-new-check"]["enable_perfdata"]': true
'["ServiceTemplate"]["tpl-service-new-check"]["max_check_attempts"]': 5
'["ServiceTemplate"]["tpl-service-new-check"]["retry_interval"]': 30
'["ServiceTemplate"]["tpl-service-new-check"]["use_agent"]': false
'["ServiceTemplate"]["tpl-service-new-check"]["vars"]["criticality"]': 'C'Then, re-run build-basket to apply the overwrites:
./tools/build-basket --plugin-file check-plugins/new-check/new-checkIf a parameter was added, changed or deleted in the plugin, simply re-run the build-basket to update the basket file.
The build-basket tool also offers to generate so-called variants of the checks (different flavours of the check command call to run on different operating systems):
linux: This is the default, and will be used if no other variant is defined. It generates acmd-check-...,tpl-service-...and the associated datafields.windows: Generates acmd-check-...-windows,cmd-check-...-windows-python,tpl-service-...-windowsand the associated datafields.sudo: Generates acmd-check-...-sudoimporting thecmd-check-..., but with/usr/bin/sudoprepended to the command, and atpl-service...-sudoimporting thetpl-service..., but with thecmd-check-...-sudoas the check command.no-agent: Generates atpl-service...-no-agentimporting thetpl-service..., but with command endpoint set to the Icinga2 master.
Specify them in the check-plugins/new-check/icingaweb2-module-director/new-check.yml configuration as follows:
---
variants:
- linux
- sudo
- windows
- no-agentTo run build-basket against all checks, for example due to a change in the build-basket script itself, use:
./tools/build-basket --autoIf you want to create a Service Set, edit assets/icingaweb2-module-director/all-the-rest.json and append the definition using JSON. Provide new unique UUIDs. Do a syntax check using cat assets/icingaweb2-module-director/all-the-rest.json | jq afterwards.
If you want to move a service from one Service Set to another, you have to create a new UUID for the new service (this isn't even possible in the Icinga Director GUI).
Each plugin README follows a fixed structure. Use check-plugins/example/README.md as the reference template for the structure, and check-plugins/php-fpm-status/README.md as the reference for the level of detail and especially for troubleshooting depth that a Linux system engineer expects. When updating or writing READMEs, orient yourself on these two. The sections are:
-
Overview: Describes what the plugin does. A leading sentence stating the main purpose. This must include at least the text from the plugin's
DESCRIPTIONvariable. Followed by subsections:- Important Notes (optional, but comes first if present): Operational edge cases the admin must know before deploying, for example: "Requires sudo", "Only works with Redis 3.0+", "First run returns OK with 'Waiting for more data.'", "After a reboot, counters reset and the check waits for a new baseline". No implementation details - only things that affect deployment and daily operations.
- Data Collection: How data is gathered (shell command, API, psutil, etc.), filtering options, SQLite usage, non-blocking measurement.
-
Fact Sheet: Key properties as a table (download link, check name, check interval, parameters required, Windows support, 3rd party modules, state file path, etc.). Only list applicable rows.
Fact Value Check Plugin Download https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/example Nagios/Icinga Check Name check_example(for SEO: helps admins find the plugin when searching for the traditional Nagios-style name). Always use underscores, never dashes.Check Interval Recommendation Every minute, Every 5/15/30 minutes, Every hour, Every 4/8/12 hours, Every day, Every week Can be called without parameters Yes/No Runs on Cross-platform / Linux / Windows. Use "Cross-platform" by default since Python runs everywhere. Only use "Linux" if the plugin uses Linux-specific APIs ( /proc,systemd,dmesg,dnf/apt/yum,journalctl, etc.). The absence of a.windowsfile does not mean the plugin is Linux-only.Compiled for Windows Yes (when .windowsfile exists)/No (runs with Python interpreter)Requirements command-line tool foo; User with higher permissions3rd Party Python modules module-nameHandles Periods Yes (alerts only after --countconsecutive threshold violations)Uses State File $TEMP/linuxfabrik-monitoring-plugins-<plugin-name>.db -
Help: The full
--helpoutput in a code block. Regenerate viatools/update-readmes. -
Usage Examples: One or more realistic invocations with their output. Show at least one OK case. If the plugin has
--lengthy, show both variants. -
States: Describes when the plugin returns which state. Be precise about OK, WARN, CRIT, UNKNOWN conditions (e.g. "WARN if the percentage value is >=
--warning"). Include--always-okbehavior, consecutive-run requirements, and first-run/reboot edge cases. -
Perfdata / Metrics: Table with columns Name, Type, Description. Types:
Bytes,Number,Percentage,Seconds. Where possible, use the metric descriptions from the vendor's official documentation (e.g. Redis INFO, psutil docs, API references). -
Troubleshooting (optional): Known error messages with their solutions. Format: error message in backticks on its own line, followed by two trailing spaces for a Markdown line break, solution on the next line. Separate entries with a blank line.
-
Credits, License: Always present.
The title of the dashboard should be capitalized, the name has to match the folder/plugin name (spaces will be replaced with -, / will be ignored. eg Network I/O will become network-io). Each Grafana panel should be meaningful, especially when comparing it to other related panels (eg memory usage and CPU usage).
Dashboard definitions are stored as YAML files in check-plugins/<plugin-name>/grafana/. Only define properties that differ from Grafana defaults to keep files minimal and maintainable.
Dashboards are currently managed using Grizzly (apiVersion: grizzly.grafana.com/v1alpha1). Grizzly is being phased out in favor of grafanactl (apiVersion: dashboard.grafana.app/v1), which requires Grafana 12+. Continue using the Grizzly format for now. A migration to grafanactl is planned (#1062).
Incomplete list of special features in some check-plugins.
README explains Python regular expression negative lookaheads to exclude matches:
Lists "Top X" values (search for --top parameter):
Alerts only after a certain amount of calls (search for --count parameter):
Cuts (truncates) its SQLite database table:
Pure/raw network communication using byte-structs and sockets:
Checks for a minimum required 3rd party library version:
"Learns" thresholds on its own (implementing some kind of "threshold warm-up"):
Ports of applications:
- disk-smart: port of GSmartControl to Python.
- All mysql-* plugins: Port of MySQLTuner to Python.
Makes use of FREE and USED wording in parameters:
--perfdata-regex parameter lets you filter for a subset of performance data:
Is aware of its acknowledgement status in Icinga, and will suppress further warnings if it has been ACKed:
Calculates mean and median perfdata over a set of individual items:
Supports human-readable Nagios ranges for bytes:
Sanitizes complex data before querying MySQL/MariaDB:
Reads a file line-by-line, but backwards:
Makes heavy use of patterns versus compiled regexes, matching any() of them:
Using application's config file for authentication:
- All mysql-* plugins
Optionally uses an asset:
- php-status: relies on
monitoring.phpthat can provide more PHP insight in the context of the web server
Provides useful feedback from Redis' Memory Doctor:
Work without the jolokia.war plugin and use the native API:
- All wildfly-* checks
Supports human-readable Nagios ranges for durations:
Differentiates between Windows and Linux (search for lib.base.LINUX or lib.base.WINDOWS):
Unit tests use Docker/Podman to test against a range of versions or a range of operating systems:
- cpu-usage
- keycloak-version (checking the filesystem in the container as well as the API)
Read ini files (example use case: password file parsing):