Skip to content

Commit 5cecee5

Browse files
committed
fix(service): handle Windows service names with spaces and improve filter diagnostics
The plugin used to apply the --starttype filter before the --service regex, so a service whose name matched but whose start_type did not was silently dropped, and the user saw a misleading "does not match any service name" error for a service that actually existed. Regex matching now runs first. When the regex finds candidates that are all filtered out by --starttype, the plugin now reports exactly which start_types it saw and which services matched so the user can adjust the filter instead of chasing a phantom match error. This fixes the reporter's "RAS Telegraf" case (technical name with a space, start_type=manual) and the same class of bug for any other non-automatic Windows service. Closes #921
1 parent 08191f5 commit 5cecee5

File tree

4 files changed

+139
-42
lines changed

4 files changed

+139
-42
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ Monitoring Plugins:
101101
* redis-status, valkey-status: modernize code and unify both plugins again after [PR #954](https://github.com/Linuxfabrik/monitoring-plugins/pull/954)
102102
* rocketchat-stats: improve output
103103
* scanrootkit: kernel symbol matching is now exact per symbol instead of a substring search, so a signature like `is_invisible` no longer accidentally matches an unrelated legitimate symbol named `is_invisible_helper`. False positives on clean systems that previously had such symbol-name collisions will disappear.
104+
* service: Windows services whose technical name contains a space (e.g. `"RAS Telegraf"`) are now correctly matched by `--service`. The plugin also produces a much more helpful error when the regex matches a service but it is filtered out by `--starttype` (previously it reported "does not match any service name", which was misleading because the service existed but had a different start type) ([#921](https://github.com/Linuxfabrik/monitoring-plugins/issues/921))
104105
* statuspal: replace `flatdict` dependency with a recursive approach ([#1044](https://github.com/Linuxfabrik/monitoring-plugins/issues/1044))
105106
* systemd-units-failed: show failed unit names in the first output line for better dashboard and SMS alert readability ([#967](https://github.com/Linuxfabrik/monitoring-plugins/issues/967))
106107
* updates: adapt to updated powershell.py library

check-plugins/service/service

Lines changed: 85 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,11 @@ except ImportError:
2828

2929

3030
__author__ = 'Linuxfabrik GmbH, Zurich/Switzerland'
31-
__version__ = '2026040801'
31+
__version__ = '2026041402'
3232

33-
DESCRIPTION = """Checks the state of one or more Windows services. Accepts the case-insensitive service
34-
name (not the display name) and supports regular expressions to match multiple services.
35-
Alerts on services that are not in the expected state."""
33+
DESCRIPTION = """Checks the state of one or more Windows services. Accepts the case-insensitive
34+
service name (not the display name) and supports regular expressions to match multiple
35+
services. Alerts on services that are not in the expected state."""
3636

3737
DEFAULT_CRIT = None
3838
DEFAULT_SEVERITY = 'warn'
@@ -132,6 +132,49 @@ def parse_args():
132132
return args
133133

134134

135+
def collect_matches(compiled_regex, test_arg):
136+
"""Return every Windows service whose name matches `compiled_regex`.
137+
138+
Walks `psutil.win_service_iter()` in production and the
139+
`--test` fixture (CSV lines of `name, display_name, status,
140+
start_type`) in unit-test mode. Each match is a flat dict, so
141+
the caller can apply the start-type and status filters without
142+
having to re-wrap anything. A `psutil.NoSuchProcess` raised
143+
mid-iteration (race condition: a service disappears between
144+
enumeration and inspection) is swallowed so the caller still
145+
gets whatever was already collected.
146+
"""
147+
matches = []
148+
try:
149+
if test_arg is None:
150+
for s in psutil.win_service_iter():
151+
if re.search(compiled_regex, s.name()):
152+
matches.append(
153+
{
154+
'name': s.name(),
155+
'display_name': s.display_name(),
156+
'status': s.status(),
157+
'start_type': s.start_type(),
158+
}
159+
)
160+
return matches
161+
stdout, _, _ = lib.lftest.test(test_arg)
162+
for line in stdout.splitlines():
163+
name, display_name, status, start_type = line.split(', ')
164+
if re.search(compiled_regex, name):
165+
matches.append(
166+
{
167+
'name': name,
168+
'display_name': display_name,
169+
'status': status,
170+
'start_type': start_type,
171+
}
172+
)
173+
except psutil.NoSuchProcess:
174+
pass
175+
return matches
176+
177+
135178
def main():
136179
"""The main function. This is where the magic happens."""
137180

@@ -165,51 +208,51 @@ def main():
165208
}
166209
svcstate_cnt = 0 # this is the overall count of matching service states
167210

168-
# fetch and analyze data
211+
# regex-match first, start_type-filter second: otherwise a service
212+
# that would match the user's regex but happens to have a
213+
# non-matching start_type is silently dropped before the regex even
214+
# runs, and the user sees a "does not match any service name" error
215+
# for a service that very much exists. By splitting the two phases
216+
# we can give an actionable error when the regex found candidates
217+
# but the start_type filter removed all of them ([#921]).
169218
try:
170219
compiled_service_regex = re.compile(args.SERVICE, re.IGNORECASE)
171-
172-
# fetch data
173-
if args.TEST is None:
174-
for s in psutil.win_service_iter():
175-
# print(f'{s.name()}, {s.display_name()}, {s.status()}, {s.start_type()}')
176-
if s.start_type() not in args.STARTTYPE:
177-
continue
178-
matches = re.search(compiled_service_regex, s.name())
179-
if matches:
180-
# count the service states of interest
181-
if s.status() in args.STATUS:
182-
svcstates[s.status()] += 1
183-
svcstate_cnt += 1
184-
table_data.append(s.as_dict())
185-
else:
186-
# do not call the command, put in test data
187-
stdout, _, _ = lib.lftest.test(args.TEST)
188-
for s in stdout.splitlines():
189-
name, display_name, status, start_type = s.split(', ')
190-
if start_type not in args.STARTTYPE:
191-
continue
192-
matches = re.search(compiled_service_regex, name)
193-
if matches:
194-
# count the service states of interest
195-
if status in args.STATUS:
196-
svcstates[status] += 1
197-
svcstate_cnt += 1
198-
table_data.append(
199-
{
200-
'name': name,
201-
'display_name': display_name,
202-
'status': status,
203-
'start_type': start_type,
204-
}
205-
)
206220
except re.error as rerr:
207221
lib.base.cu(f'Invalid regex "{args.SERVICE}": {rerr}')
208-
except psutil.NoSuchProcess:
222+
223+
# fetch data
224+
regex_matches = collect_matches(compiled_service_regex, args.TEST)
225+
226+
# no regex match at all — the name truly does not exist
227+
if not regex_matches:
209228
lib.base.cu(f'r`{args.SERVICE}` does not match any service name.')
210229

230+
# regex matched, now apply the start_type filter
231+
for match in regex_matches:
232+
if match['start_type'] not in args.STARTTYPE:
233+
continue
234+
if match['status'] in args.STATUS:
235+
svcstates[match['status']] += 1
236+
svcstate_cnt += 1
237+
table_data.append(match)
238+
239+
# regex found candidates but all were rejected by --starttype: tell
240+
# the user exactly which start_types we saw so they can fix the
241+
# filter instead of chasing a phantom "does not match" error.
211242
if not table_data:
212-
lib.base.cu(f'r`{args.SERVICE}` does not match any service name.')
243+
seen_types = sorted({m['start_type'] for m in regex_matches})
244+
sample = ', '.join(
245+
f'{m["name"]} ({m["start_type"]})' for m in regex_matches[:5]
246+
)
247+
if len(regex_matches) > 5:
248+
sample += ', ...'
249+
lib.base.cu(
250+
f'r`{args.SERVICE}` matched {len(regex_matches)} '
251+
f'{lib.txt.pluralize("service", len(regex_matches))}, but none '
252+
f'have a start_type in {sorted(args.STARTTYPE)}. Matched '
253+
f'start_types: {seen_types}. Matched services: {sample}. '
254+
f'Adjust --starttype to include them.'
255+
)
213256
svc_cnt = len(table_data)
214257

215258
# build the message

check-plugins/service/unit-test/run

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,58 @@ TESTS = [
5050
"'start_pending', 'stop_pending', 'stopped'] (thresholds 0/None) [WARNING].",
5151
],
5252
},
53+
54+
# regression tests for issue #921: service names that contain a
55+
# space (e.g. "RAS Telegraf") must match just like any other name.
56+
# The fixture has the service with start_type=manual, so a default
57+
# call (--starttype=automatic implicit) must now give an
58+
# actionable error instead of the misleading "does not match any
59+
# service name".
60+
{
61+
'id': 'ok-ras-telegraf-manual',
62+
'test': 'stdout/windows-services-mixed,,0',
63+
'params': (
64+
"--service='^RAS Telegraf$' "
65+
'--starttype=manual --status=running --warning=1:1'
66+
),
67+
'assert-retc': STATE_OK,
68+
'assert-in': [
69+
"Everything is ok. 1 service named r`^RAS Telegraf$` and start "
70+
"type ['manual'] found, 1 in status ['running'] "
71+
'(thresholds 1:1/None).',
72+
'RAS Telegraf',
73+
],
74+
},
75+
{
76+
'id': 'unknown-ras-telegraf-default-starttype-filter',
77+
'test': 'stdout/windows-services-mixed,,0',
78+
# The user runs the plugin without thinking about --starttype;
79+
# the default filter [automatic] hides the RAS Telegraf service
80+
# (which is manual), but we now report that the regex did find
81+
# a candidate and tell them exactly how to include it.
82+
'params': "--service='^RAS Telegraf$'",
83+
'assert-retc': STATE_UNKNOWN,
84+
'assert-in': [
85+
'r`^RAS Telegraf$` matched 1 service',
86+
"start_type in ['automatic']",
87+
"Matched start_types: ['manual']",
88+
'RAS Telegraf (manual)',
89+
'Adjust --starttype',
90+
],
91+
},
92+
{
93+
'id': 'unknown-regex-does-not-match-any-service',
94+
'test': 'stdout/windows-services-mixed,,0',
95+
# The regex really does not match any service in the fixture.
96+
# The old "does not match any service name" wording is kept for
97+
# this case so existing monitoring setups that grep on the
98+
# error text still work.
99+
'params': "--service='^doesnotexist$'",
100+
'assert-retc': STATE_UNKNOWN,
101+
'assert-in': [
102+
'r`^doesnotexist$` does not match any service name.',
103+
],
104+
},
53105
]
54106

55107

check-plugins/service/unit-test/stdout/windows-services-mixed

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ PrintNotify, Printer Extensions and Notifications, stopped, manual
9494
ProfSvc, User Profile Service, running, automatic
9595
PushToInstall, Windows PushToInstall Service, stopped, manual
9696
QWAVE, Quality Windows Audio Video Experience, stopped, manual
97+
RAS Telegraf, RAS Telegraf, running, manual
9798
RasAuto, Remote Access Auto Connection Manager, stopped, manual
9899
RasMan, Remote Access Connection Manager, stopped, manual
99100
RemoteAccess, Routing and Remote Access, stopped, disabled

0 commit comments

Comments
 (0)