Skip to content

Commit e713a76

Browse files
committed
feat: add kubectl-get-pods
1 parent 7aee5c7 commit e713a76

10 files changed

Lines changed: 2518 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
1414

1515
Monitoring Plugins:
1616

17+
* kubectl-get-pods: checks the health and status of kubernetes pods by running `kubectl get pods` and parsing the results
1718
* redfish-sel: add support for Supermicro ([#866](https://github.com/Linuxfabrik/monitoring-plugins/issues/866))
1819
* systemd-unit: implement support for `systemctl --machine` and `--user`
1920

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
Check kubectl-get-pods
2+
======================
3+
4+
Overview
5+
--------
6+
7+
Checks the health and status of Kubernetes Pods by running ``kubectl get pods`` and parsing the results. Prints a table listing namespace, pod name, readiness, status, restart count, pod age, and IP address. Adds performance data for each pod status (Running, Pending, Failed, Succeeded, Unknown). By default, only shows pods from the current namespace. Use ``--all-namespaces`` to check across all namespaces. The plugin also stores a temporary local SQLite database during runtime (no persistent history). Results can therefore be filtered with a custom SQL ``--query`` (e.g., by namespace, pod name, or status). See the README for more details. Pending and Failed pods can trigger a WARNING or CRITICAL state (configurable via ``--severity``), while Unknown pods result in an UNKNOWN state. Intended for use with Nagios/Icinga to detect Kubernetes pod issues like stuck, failing, or unreachable pods.
8+
9+
For the ``--query`` parameter, the following columns can be used:
10+
11+
* namespace (TEXT)
12+
* name (TEXT)
13+
* ready (TEXT)
14+
* status (TEXT)
15+
* restarts (INT)
16+
* age (INT)
17+
* ip (TEXT)
18+
19+
Hints:
20+
21+
* OIDC-based login to Kubernetes is not yet supported by this plugin.
22+
23+
24+
Fact Sheet
25+
----------
26+
27+
.. csv-table::
28+
:widths: 30, 70
29+
30+
"Check Plugin Download", "https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/kubectl-get-pods"
31+
"Check Interval Recommendation", "Once a minute"
32+
"Can be called without parameters", "Yes"
33+
"Compiled for", "Linux"
34+
"Requirements", "Command-line tool ``kubectl`` (you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.32 client can communicate with v1.31, v1.32, and v1.33 control planes. Using the latest compatible version of kubectl helps avoid unforeseen issues. See `installation <https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/>`__)."
35+
"Uses SQLite DBs", "``$TEMP/linuxfabrik-monitoring-plugins-kubectl-get-pods.db``"
36+
37+
38+
Help
39+
----
40+
41+
.. code-block:: text
42+
43+
usage: kubectl-get-pods [-h] [-V] [--always-ok] [--all-namespaces]
44+
[--kubeconfig KUBECONFIG] [--query QUERY]
45+
[--severity {warn,crit}] [--test TEST]
46+
47+
Checks the health and status of Kubernetes Pods by running `kubectl get pods`
48+
and parsing the results. Prints a table listing namespace, pod name,
49+
readiness, status, restart count, pod age, and IP address. Adds performance
50+
data for each pod status (Running, Pending, Failed, Succeeded, Unknown). By
51+
default, only shows pods from the current namespace. Use `--all-namespaces` to
52+
check across all namespaces. The plugin also stores a temporary local SQLite
53+
database during runtime (no persistent history). Results can therefore be
54+
filtered with a custom SQL `--query` (e.g., by namespace, pod name, or
55+
status). See the README for more details. Pending and Failed pods can trigger
56+
a WARNING or CRITICAL state (configurable via `--severity`), while Unknown
57+
pods result in an UNKNOWN state. Intended for use with Nagios/Icinga to detect
58+
Kubernetes pod issues like stuck, failing, or unreachable pods.
59+
60+
options:
61+
-h, --help show this help message and exit
62+
-V, --version show program's version number and exit
63+
--always-ok Always returns OK.
64+
--all-namespaces If present, list the requested object(s) across all
65+
namespaces. Namespace in current context is ignored
66+
even if specified with `--namespace`. Default: False
67+
--kubeconfig KUBECONFIG
68+
Path to the kubeconfig file. Default:
69+
/var/spool/icinga2/.kubeconfig
70+
--query QUERY Provide the SQL `WHEN` statement part to narrow down
71+
results. Example: `namespace = 'mynamespace' and name
72+
like 'prod-%' and status != 'running'`. Have a look at
73+
the README for a list of available columns. Default: 1
74+
--severity {warn,crit}
75+
Severity for alerting. Default: crit
76+
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-
77+
stderr-file,expected-retc".
78+
79+
80+
Usage Examples
81+
--------------
82+
83+
.. code-block:: bash
84+
85+
./kubectl-get-pods \
86+
--all-namespaces \
87+
--kubeconfig /var/spool/icinga2/.kubeconfig \
88+
--query='namespace = "mynamespace" and name like "mycontainer-%"'
89+
90+
Output:
91+
92+
.. code-block:: text
93+
94+
NAMESPACE ! NAME ! RDY ! RSTRT ! AGE ! IP ! STATUS
95+
------------+-------------------------------------+-----+-------+--------+------------+--------
96+
mynamespace ! mycontainer-mariadb-555df66f6c-5z8h ! 1/1 ! 0 ! 1D 2h ! 192.0.2.11 ! Running
97+
mynamespace ! mycontainer-postgres-775cb466bb-qkw ! 1/1 ! 0 ! 1M 11h ! 192.0.2.12 ! Running
98+
99+
100+
States
101+
------
102+
103+
* Depending on the ``--severity`` given, returns CRIT (default) or WARN if a pod is in 'Pending' or 'Failed' state,
104+
* UNKNOWN if it is in 'Unknown' state,
105+
* and OK in all other cases.
106+
107+
108+
Perfdata / Metrics
109+
------------------
110+
111+
.. csv-table::
112+
:widths: 25, 15, 60
113+
:header-rows: 1
114+
115+
Name, Type, Description
116+
failed, Number, Number of failed pods
117+
pending, Number, Number of pending pods
118+
running, Number, Number of running pods
119+
succeeded, Number, Number of succeeded pods
120+
unknown, Number, Number of unknown pods
121+
122+
123+
Credits, License
124+
----------------
125+
126+
* Authors: `Linuxfabrik GmbH, Zurich <https://www.linuxfabrik.ch>`_
127+
* License: The Unlicense, see `LICENSE file <https://unlicense.org/>`_.
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
apiVersion: grizzly.grafana.com/v1alpha1
2+
kind: Dashboard
3+
metadata:
4+
folder: linuxfabrik-monitoring-plugins
5+
name: kubectl-get-pods
6+
spec:
7+
schemaVersion: 2023041201
8+
tags:
9+
- Linuxfabrik
10+
- Grizzly
11+
- static
12+
time:
13+
from: now-90d
14+
to: now
15+
timepicker:
16+
hidden: false
17+
refresh_intervals:
18+
- 1m
19+
timezone: browser
20+
title: Kubectl Get Pods
21+
uid: linuxfabrik-monitoring-plugins-kubectl-get-pods
22+
editable: true
23+
liveNow: true
24+
refresh: 1m
25+
templating:
26+
list:
27+
- label: Command
28+
name: command
29+
query: SHOW MEASUREMENTS WITH MEASUREMENT =~ /.*kubectl-get-pods.*/
30+
current:
31+
text: cmd-check-kubectl-get-pods
32+
value: cmd-check-kubectl-get-pods
33+
refresh: 1
34+
sort: 1
35+
type: query
36+
- label: Hostname
37+
name: hostname
38+
query: SHOW TAG VALUES FROM "$command" WITH KEY = "hostname"
39+
refresh: 1
40+
sort: 1
41+
type: query
42+
- label: Service
43+
name: service
44+
query: SHOW TAG VALUES FROM "$command" WITH KEY = "service" WHERE hostname = '$hostname'
45+
refresh: 1
46+
sort: 1
47+
type: query
48+
49+
panels:
50+
51+
- title: Kubectl Get Pods
52+
type: timeseries
53+
gridPos:
54+
h: 8
55+
w: 12
56+
x: 12
57+
y: 0
58+
fieldConfig:
59+
defaults:
60+
color:
61+
mode: palette-classic
62+
custom:
63+
lineInterpolation: smooth
64+
spanNulls: true
65+
decimals: 1
66+
min: 0
67+
unit: bytes
68+
options:
69+
legend:
70+
calcs:
71+
- first
72+
- min
73+
- mean
74+
- max
75+
- last
76+
displayMode: table
77+
placement: bottom
78+
showLegend: true
79+
tooltip:
80+
mode: multi
81+
sort: none
82+
83+
targets:
84+
85+
- alias: failed
86+
refId: failed
87+
groupBy:
88+
- params:
89+
- $interval
90+
type: time
91+
measurement: $command
92+
resultFormat: time_series
93+
select:
94+
- - params:
95+
- value
96+
type: field
97+
- params: []
98+
type: mean
99+
tags:
100+
- key: hostname
101+
operator: '='
102+
value: $hostname
103+
- condition: AND
104+
key: service
105+
operator: '='
106+
value: $service
107+
- condition: AND
108+
key: metric
109+
operator: '='
110+
value: failed
111+
112+
- alias: pending
113+
refId: pending
114+
groupBy:
115+
- params:
116+
- $interval
117+
type: time
118+
measurement: $command
119+
resultFormat: time_series
120+
select:
121+
- - params:
122+
- value
123+
type: field
124+
- params: []
125+
type: mean
126+
tags:
127+
- key: hostname
128+
operator: '='
129+
value: $hostname
130+
- condition: AND
131+
key: service
132+
operator: '='
133+
value: $service
134+
- condition: AND
135+
key: metric
136+
operator: '='
137+
value: pending
138+
139+
- alias: running
140+
refId: running
141+
groupBy:
142+
- params:
143+
- $interval
144+
type: time
145+
measurement: $command
146+
resultFormat: time_series
147+
select:
148+
- - params:
149+
- value
150+
type: field
151+
- params: []
152+
type: mean
153+
tags:
154+
- key: hostname
155+
operator: '='
156+
value: $hostname
157+
- condition: AND
158+
key: service
159+
operator: '='
160+
value: $service
161+
- condition: AND
162+
key: metric
163+
operator: '='
164+
value: running
165+
166+
- alias: succeeded
167+
refId: succeeded
168+
groupBy:
169+
- params:
170+
- $interval
171+
type: time
172+
measurement: $command
173+
resultFormat: time_series
174+
select:
175+
- - params:
176+
- value
177+
type: field
178+
- params: []
179+
type: mean
180+
tags:
181+
- key: hostname
182+
operator: '='
183+
value: $hostname
184+
- condition: AND
185+
key: service
186+
operator: '='
187+
value: $service
188+
- condition: AND
189+
key: metric
190+
operator: '='
191+
value: succeeded
192+
193+
- alias: unknown
194+
refId: unknown
195+
groupBy:
196+
- params:
197+
- $interval
198+
type: time
199+
measurement: $command
200+
resultFormat: time_series
201+
select:
202+
- - params:
203+
- value
204+
type: field
205+
- params: []
206+
type: mean
207+
tags:
208+
- key: hostname
209+
operator: '='
210+
value: $hostname
211+
- condition: AND
212+
key: service
213+
operator: '='
214+
value: $service
215+
- condition: AND
216+
key: metric
217+
operator: '='
218+
value: unknown

0 commit comments

Comments
 (0)