You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Skip device plugin alert when devicePlugin is disabled in ClusterPolicy
When devicePlugin.enabled is set to false in the ClusterPolicy, the
nvidia-node-status-exporter still monitors the device_plugin_devices_total
metric which reports 0 (since no device plugin pods are running). This
triggers a false positive GPUOperatorNodeDeploymentFailed alert.
Fix: The operator now injects a DEVICE_PLUGIN_ENABLED env var into the
node-status-exporter daemonset based on the ClusterPolicy. When set to
"false", the exporter skips device plugin validation entirely, so the
metric is never emitted and the alert does not fire.
Fixes: #2237
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshal Patil <12152047+harche@users.noreply.github.com>
0 commit comments