|
1 | 1 | # Nagios plugin to check health of a NetApp Ontap cluster |
2 | 2 |
|
3 | | -### Idea |
| 3 | +## Idea |
4 | 4 |
|
5 | | -This Perl script is able to monitor most components of a NetApp Ontap cluster, such as volume, aggregate, |
| 5 | +This Perl script is able to monitor most components of a NetApp Ontap cluster, such as volume, aggregate, |
6 | 6 | snapshot, quota, snapmirror, filer hardware, port, interface cluster and disk health. |
7 | 7 |
|
8 | | -### Status |
| 8 | +## Status |
9 | 9 |
|
10 | 10 | Production ready. |
11 | 11 |
|
12 | | -### How To |
| 12 | +## How To |
13 | 13 |
|
14 | 14 | This script requires NetApp Manageability SDK for Perl to be installed. |
15 | 15 | Can be found on https://mysupport.netapp.com/NOW/cgi-bin/software |
16 | 16 |
|
17 | | -Please visit https://outsideit.net/monitoring-netapp-ontap/ for more information on how to use this plugin. |
| 17 | +<!-- Please visit https://outsideit.net/monitoring-netapp-ontap/ for more information on how to use this plugin. --> |
18 | 18 |
|
19 | | -### Help |
| 19 | +<!-- Contents from https://outsideit.net/monitoring-netapp-ontap/ (from Google Cache) --> |
| 20 | + |
| 21 | +There are of course numerous way to monitor your NetApp Ontap storage, but this post focusses for now on how to achieve |
| 22 | +quality monitoring with the help of a Nagios plugin, which was originally developed by John Murphy. The plugin |
| 23 | +definitely has some flaws, so all help is welcome to improve it. Read the post about debugging Perl scripts, make a |
| 24 | +fork of the project on GitHub and start experimenting. |
| 25 | + |
| 26 | +The plugin is able monitor multiple critical NetApp Ontap components, from disk to aggregates to volumes. |
| 27 | +It can also alert you if it finds any unhealthy components. |
| 28 | + |
| 29 | +<!-- missing image: NetApp Ontap Logical View --> |
| 30 | + |
| 31 | +### How to monitor Netapp Ontap with Nagios? |
| 32 | + |
| 33 | +Download the latest release from GitHub to a temp directory and then navigate to it. |
| 34 | + |
| 35 | +Copy the contents of `NetApp/*` to your `/usr/lib/perl5` or `/usr/lib64/perl5` directory to install the required version |
| 36 | +of the NetApp Perl SDK. (confirmed to work with SDK 5.1 and 5.2) |
| 37 | + |
| 38 | +Copy `check_netapp_ontap.pl` script to your nagios libexec folder and configure the correct permissions |
| 39 | + |
| 40 | +**Parameters:** |
| 41 | + |
| 42 | +* –hostname, -H => Hostname or address of the cluster administrative interface. |
| 43 | +* –node, -n => Name of a vhost or cluster-node to restrict this query to. |
| 44 | +* –user, -u => Username of a Netapp Ontapi enabled user. |
| 45 | +* –password, -p => Password for the netapp Ontapi enabled user. |
| 46 | +* –option, -o => The name of the option you want to check. See the option and threshold list at the bottom of this help text. |
| 47 | +* –warning, -w => A custom warning threshold value. See the option and threshold list at the bottom of this help text. |
| 48 | +* –critical, -c => A custom warning threshold value. See the option and threshold list at the bottom of this help text. |
| 49 | +* –modifier, -m => This modifier is used to set an inclusive or exclusive filter on what you want to monitor. |
| 50 | +* –help, -h => Display this help text. |
| 51 | + |
| 52 | +### Options |
| 53 | + |
| 54 | +**volume_health** |
| 55 | + |
| 56 | +Check the space and inode health of a vServer volume on a NetApp Ontap cluster. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to accomodate large volume monitoring better. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword node: The node option restricts this check by vserver name. |
| 57 | + |
| 58 | +**aggregate_health** |
| 59 | + |
| 60 | +Check the space and inode health of a cluster aggregate on a NetApp Ontap cluster. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to better accomodate large aggregate monitoring. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword, “is-home” keyword node: The node option restricts this check by cluster-node name. |
| 61 | + |
| 62 | +**snapshot_health** |
| 63 | + |
| 64 | +Check the space and inode health of a vServer snapshot. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to better accomodate large snapshot monitoring. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword node: The node option restricts this check by vserver name. |
| 65 | + |
| 66 | +**quota_health** |
| 67 | +Check that the space and file thresholds have not been crossed on a quota. thresh: N/A storage defined. node: The node option restricts this check by vserver name. snapmirror_health: Check the lag time and health flag of the snapmirror relationships. thresh: snapmirror lag time (valid intervals are s, m, h, d). node: The node options restricts this check by snapmirror destination cluster-node name. |
| 68 | + |
| 69 | +**filer_hardware_health** |
| 70 | + |
| 71 | +Check the environment hardware health of the filers (fan, psu, temperature, battery). thresh: component name (fan, psu, temperature, battery). There is no default alert level they MUST be defined. node: The node option restricts this check by cluster-node name. port_health: Checks the state of a physical network port. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name. |
| 72 | + |
| 73 | +**interface_health desc** |
| 74 | + |
| 75 | +Check that a LIF is in the correctly configured state and that it is on its home node and port. Additionally checks the state of a physical port. thresh: N/A not customizable. node: The node option restricts this check by vserver name. |
| 76 | + |
| 77 | +**netapp_alarms** |
| 78 | + |
| 79 | +Check for Netapp console alarms. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name. cluster_health desc: Check the cluster disks for failure or other potentially undesirable states. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name. disk_health: Check the health of the disks in the cluster. thresh: Not customizable yet. node: The node option restricts this check by cluster-node name. For keyword thresholds, if you want to ignore alerts for that particular keyword you set it at the same threshold that the alert defaults to. |
| 80 | + |
| 81 | +## Help |
20 | 82 |
|
21 | 83 | In case you find a bug or have a feature request, please make an issue on GitHub. |
22 | 84 |
|
23 | | -### On Nagios Exchange |
| 85 | +## On Nagios Exchange |
24 | 86 |
|
25 | 87 | https://exchange.nagios.org/directory/Plugins/Hardware/Storage-Systems/SAN-and-NAS/NetApp/Check-Netapp-Ontap/details |
26 | 88 |
|
27 | | -### Copyright |
| 89 | +## Copyright |
28 | 90 |
|
29 | | -This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public |
30 | | -License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later |
31 | | -version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the |
32 | | -implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more |
| 91 | +This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public |
| 92 | +License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later |
| 93 | +version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the |
| 94 | +implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more |
33 | 95 | details at <http://www.gnu.org/licenses/>. |
0 commit comments