Skip to content

Commit 2bc15e2

Browse files
oneswigAlex-Welsh
authored andcommitted
Extra details added for SMART monitoring setup
Based on recent experiences with getting NVME and SSD monitoring setup.
1 parent eb24667 commit 2bc15e2

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

doc/source/configuration/monitoring.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,12 @@ present, the workaround is to go into each node running Grafana and manually
6969
restart the process with ``systemctl restart kolla-grafana-container.service``
7070
and then try the reconfigure command again.)
7171

72+
.. note::
73+
If the environment defines additional Prometheus Node Exporter startup parameters
74+
via ``prometheus_node_exporter_cmdline_extras``, the parameters should be updated
75+
to include the textfile collector used by SMART monitoring:
76+
``--collector.textfile.directory=/var/lib/node_exporter/textfile_collector``
77+
7278
Once the reconfigure has completed you can now run the custom playbook which
7379
copies over the scripts and sets up the cron jobs to start SMART monitoring
7480
on the overcloud hosts:
@@ -81,6 +87,27 @@ on the overcloud hosts:
8187
SMART reporting should now be enabled along with a Prometheus alert for
8288
unhealthy disks and a Grafana dashboard called ``Hardware Overview``.
8389

90+
Monitoring Drive Writes Per Day
91+
-------------------------------
92+
93+
Drives can be monitored for the level of write intensity of the
94+
workload, and alerts defined for drives that are persistently
95+
exceeding their stated level of write endurance. To enable this
96+
feature, set the flag ``create_dwpd_ratings``:
97+
98+
.. code-block:: console
99+
100+
(kayobe) [stack@node ~]$ cd etc/kayobe
101+
(kayobe) [stack@node kayobe]$ kayobe playbook run ansible/deployment/smartmon-tools.yml -e create_dwpd_ratings=true
102+
103+
This flag scans for NVME/SSD devices in the system and creates a new
104+
file, ``dwpd-ratings.yml``, in the directory of the current environment.
105+
106+
.. note::
107+
The playbook assigns placeholder values for write endurance for each
108+
drive model. These values should be updated with specifications from
109+
vendor datasheets.
110+
84111
Alertmanager, Slack and Microsoft Teams
85112
=======================================
86113

0 commit comments

Comments
 (0)