Skip to content

Commit dd0df5e

Browse files
committed
SNMP: mdadm: add install and verification docs
1 parent ea0d665 commit dd0df5e

2 files changed

Lines changed: 399 additions & 0 deletions

File tree

snmp/mdadm/README.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# LibreNMS mdadm SNMP Extension
2+
3+
This directory contains a LibreNMS SNMP agent for Linux software RAID. The
4+
`mdadm` script is a self-contained net-snmp **`pass_persist`** agent: `snmpd`
5+
launches it, keeps it resident, and queries it on demand. It collects array and
6+
device status directly from sysfs and `mdadm(8)` and serves every object in
7+
`MDADM-MIB` (enterprise OID `1.3.6.1.4.1.60652.101`).
8+
9+
Because it runs under `pass_persist`, there is **no cache file, cron job, or
10+
systemd timer**. Data is refreshed in-process at most once per `ttl` seconds
11+
(default 60).
12+
13+
## Install
14+
15+
Run the installer as root:
16+
17+
```bash
18+
sudo ./mdadm_install.sh
19+
```
20+
21+
For non-interactive installs, set `AUTO_YES=1`:
22+
23+
```bash
24+
sudo AUTO_YES=1 ./mdadm_install.sh
25+
```
26+
27+
The installer downloads the agent, writes a default config, installs the
28+
sudoers rule, and adds the `pass_persist` line to the `snmpd` include snippet.
29+
30+
## Installed Files
31+
32+
- Agent script: `/usr/local/lib/snmpd/mdadm`
33+
- Configuration: `/etc/snmp/extension/mdadm.yaml`
34+
- Sudoers rule: `/etc/sudoers.d/mdadm`
35+
- SNMP snippet: `/etc/snmp/snmpd.conf.d/librenms.conf`
36+
37+
The snippet contains a single line:
38+
39+
```text
40+
pass_persist .1.3.6.1.4.1.60652.101 /usr/local/lib/snmpd/mdadm
41+
```
42+
43+
## Sudoers
44+
45+
`snmpd` typically runs as an unprivileged user (`Debian-snmp` or `snmp`). The
46+
agent reads most data from sysfs, but `mdadm --detail` and `mdadm -E` need
47+
root for device counts, array UUID, and event counters. The installed
48+
`sudoers.d-mdadm` rule grants exactly those two read-only commands without a
49+
password. Without it, the agent still works but reports reduced detail.
50+
51+
## Configuration
52+
53+
See `mdadm.yaml.example` for the full, commented template. The default config
54+
discovers all md arrays:
55+
56+
```yaml
57+
---
58+
log_level: WARNING
59+
ttl: 60
60+
devices: []
61+
```
62+
63+
To limit polling to specific arrays, edit `/etc/snmp/extension/mdadm.yaml`:
64+
65+
```yaml
66+
devices:
67+
- name: md0
68+
description: Root filesystem
69+
- name: md1
70+
description: Data volume
71+
```
72+
73+
Recognised keys:
74+
75+
- `log_level` — `DEBUG`, `VERBOSE`, `INFO`, `NOTICE`, `WARNING`, `ERROR`.
76+
- `log_file` — log path (defaults to `mdadm.log` beside the script).
77+
- `ttl` — in-process refresh interval in seconds (default 60).
78+
- `devices` — list of arrays to poll; empty auto-discovers all md arrays.
79+
80+
CLI flags (`--config`, `--ttl`, `--log-level`, `--log-file`) override the
81+
config file.
82+
83+
## Verification
84+
85+
Run the agent by hand and speak the `pass_persist` protocol on stdin:
86+
87+
```bash
88+
printf 'PING\ngetnext\n.1.3.6.1.4.1.60652.101\n' \
89+
| /usr/local/lib/snmpd/mdadm --config /etc/snmp/extension/mdadm.yaml
90+
```
91+
92+
You should see `PONG` followed by the first OID, type, and value.
93+
94+
Once `snmpd` has reloaded the snippet, walk the MIB:
95+
96+
```bash
97+
snmpwalk -v2c -c public localhost .1.3.6.1.4.1.60652.101
98+
```
99+
100+
Load `MDADM-MIB.mib` to get symbolic names:
101+
102+
```bash
103+
snmpwalk -v2c -c public -m +MDADM-MIB localhost mdadmMIB
104+
```
105+
106+
Restart `snmpd` after installation if your system does not automatically reload
107+
the include directory.
108+
109+
## Error reporting
110+
111+
The agent is a resident `pass_persist` process, so it does not communicate
112+
failures through a process exit code (snmpd would simply respawn it). Instead the
113+
most recent collection result is published in-band as two scalars:
114+
115+
- `mdadmError` (`.1.1.3`) — numeric code, `0` on success.
116+
- `mdadmErrorString` (`.1.1.4`) — human-readable description, empty on success.
117+
118+
| Code | Meaning | Behaviour |
119+
|------|---------|-----------|
120+
| 0 | All arrays collected cleanly | Normal data served |
121+
| 1 | `mdadm` binary not in `$PATH` | Cleanup — empty tables, sensors removed |
122+
| 2 | Auto-discovery found no arrays | Cleanup — empty tables, sensors removed |
123+
| 3 | `/sys/block` unreadable | Skip — last good data preserved |
124+
| 5 | Configured device entry missing `name` | Skip — last good data preserved |
125+
| 6 | Some arrays had read errors, or `sudo mdadm` access is missing (data still served from sysfs) | Normal data served, error flagged |
126+
| 7 | Configured devices listed but none exist in sysfs | Cleanup — empty tables, sensors removed |
127+
128+
"Cleanup" codes serve empty tables so LibreNMS prunes stale sensors; "Skip"
129+
codes are transient and keep the last good data so sensors are not lost on a
130+
momentary failure. (Code 4, output-write failure, does not apply — there is no
131+
output file in `pass_persist` mode.)
132+
133+
If the sudoers rule is not installed, `sudo mdadm --detail` / `sudo mdadm -E`
134+
are refused and the agent reports code 6 with an actionable message: arrays are
135+
still discovered from sysfs, but per-array and per-device enrichment (UUIDs,
136+
event counts, exact device-role counts) is skipped. Install `sudoers.d-mdadm`
137+
to clear it. The agent calls sudo with `-n` (non-interactive) so a missing rule
138+
fails fast instead of blocking on a password prompt.

0 commit comments

Comments
 (0)