Extra details added for SMART monitoring setup

oneswig · Alex-Welsh · commit 2bc15e2c0722 · 2026-03-18T14:29:34.000Z
Based on recent experiences with getting NVME and SSD monitoring setup.
diff --git a/doc/source/configuration/monitoring.rst b/doc/source/configuration/monitoring.rst
@@ -69,6 +69,12 @@ present, the workaround is to go into each node running Grafana and manually
 restart the process with ``systemctl restart kolla-grafana-container.service``
 and then try the reconfigure command again.)
 
+.. note::
+   If the environment defines additional Prometheus Node Exporter startup parameters
+   via ``prometheus_node_exporter_cmdline_extras``, the parameters should be updated
+   to include the textfile collector used by SMART monitoring:
+   ``--collector.textfile.directory=/var/lib/node_exporter/textfile_collector``
+
 Once the reconfigure has completed you can now run the custom playbook which
 copies over the scripts and sets up the cron jobs to start SMART monitoring
 on the overcloud hosts:
@@ -81,6 +87,27 @@ on the overcloud hosts:
 SMART reporting should now be enabled along with a Prometheus alert for
 unhealthy disks and a Grafana dashboard called ``Hardware Overview``.
 
+Monitoring Drive Writes Per Day
+-------------------------------
+
+Drives can be monitored for the level of write intensity of the
+workload, and alerts defined for drives that are persistently
+exceeding their stated level of write endurance.  To enable this
+feature, set the flag ``create_dwpd_ratings``:
+
+.. code-block:: console
+
+    (kayobe) [stack@node ~]$ cd etc/kayobe
+    (kayobe) [stack@node kayobe]$ kayobe playbook run ansible/deployment/smartmon-tools.yml -e create_dwpd_ratings=true
+
+This flag scans for NVME/SSD devices in the system and creates a new
+file, ``dwpd-ratings.yml``, in the directory of the current environment.
+
+.. note::
+   The playbook assigns placeholder values for write endurance for each
+   drive model. These values should be updated with specifications from
+   vendor datasheets.
+
 Alertmanager, Slack and Microsoft Teams
 =======================================