@@ -69,6 +69,12 @@ present, the workaround is to go into each node running Grafana and manually
6969restart the process with ``systemctl restart kolla-grafana-container.service ``
7070and then try the reconfigure command again.)
7171
72+ .. note ::
73+ If the environment defines additional Prometheus Node Exporter startup parameters
74+ via ``prometheus_node_exporter_cmdline_extras ``, the parameters should be updated
75+ to include the textfile collector used by SMART monitoring:
76+ ``--collector.textfile.directory=/var/lib/node_exporter/textfile_collector ``
77+
7278Once the reconfigure has completed you can now run the custom playbook which
7379copies over the scripts and sets up the cron jobs to start SMART monitoring
7480on the overcloud hosts:
@@ -81,6 +87,27 @@ on the overcloud hosts:
8187 SMART reporting should now be enabled along with a Prometheus alert for
8288unhealthy disks and a Grafana dashboard called ``Hardware Overview ``.
8389
90+ Monitoring Drive Writes Per Day
91+ -------------------------------
92+
93+ Drives can be monitored for the level of write intensity of the
94+ workload, and alerts defined for drives that are persistently
95+ exceeding their stated level of write endurance. To enable this
96+ feature, set the flag ``create_dwpd_ratings ``:
97+
98+ .. code-block :: console
99+
100+ (kayobe) [stack@node ~]$ cd etc/kayobe
101+ (kayobe) [stack@node kayobe]$ kayobe playbook run ansible/deployment/smartmon-tools.yml -e create_dwpd_ratings=true
102+
103+ This flag scans for NVME/SSD devices in the system and creates a new
104+ file, ``dwpd-ratings.yml ``, in the directory of the current environment.
105+
106+ .. note ::
107+ The playbook assigns placeholder values for write endurance for each
108+ drive model. These values should be updated with specifications from
109+ vendor datasheets.
110+
84111Alertmanager, Slack and Microsoft Teams
85112=======================================
86113
0 commit comments