[ ceph ][ healthcheck ] change severity#75
Conversation
📝 WalkthroughWalkthroughThe Helm chart version for ceph-operations is incremented from 1.8.17 to 1.8.18. Additionally, the severity label of the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
charts/ceph-operations/alerts/healthchecks.yaml (1)
22-35:⚠️ Potential issue | 🟡 MinorVerify intentionality of missing
inhibited_bylabels in critical alerts.Elevating
CephDaemonSlowOpstocriticalwill change alert routing in Alertmanager. Other alert files (cluster.yaml,pgr.yaml,osd.yaml,mon.yaml) consistently addinhibited_by: cluster-maintenanceorinhibited_by: node-maintenanceto all critical alerts to suppress them during planned maintenance windows.In contrast, all critical alerts in
healthchecks.yaml—including the updatedCephDaemonSlowOpsand the Hardware* alerts (HardwareStorageError, HardwareMemoryError, HardwareProcessorError, HardwareNetworkError, HardwarePowerError, HardwareFanError)—lack this label.Please confirm this pattern is intentional (e.g., health checks should always fire even during maintenance) or add the appropriate
inhibited_bylabel for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/ceph-operations/alerts/healthchecks.yaml` around lines 22 - 35, The CephDaemonSlowOps alert (and other critical alerts in healthchecks.yaml like HardwareStorageError, HardwareMemoryError, HardwareProcessorError, HardwareNetworkError, HardwarePowerError, HardwareFanError) are marked severity: critical but do not include the inhibited_by label used elsewhere (e.g., inhibited_by: cluster-maintenance or node-maintenance), which changes Alertmanager routing; decide whether these health checks should always fire during maintenance or be suppressed—if they should be suppressible, add the appropriate inhibited_by label (match the pattern used in cluster.yaml/pgr.yaml/osd.yaml/mon.yaml) to each critical alert (e.g., add inhibited_by: cluster-maintenance or inhibited_by: node-maintenance under labels for CephDaemonSlowOps and the listed Hardware* alerts) so routing is consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@charts/ceph-operations/alerts/healthchecks.yaml`:
- Around line 22-35: The CephDaemonSlowOps alert (and other critical alerts in
healthchecks.yaml like HardwareStorageError, HardwareMemoryError,
HardwareProcessorError, HardwareNetworkError, HardwarePowerError,
HardwareFanError) are marked severity: critical but do not include the
inhibited_by label used elsewhere (e.g., inhibited_by: cluster-maintenance or
node-maintenance), which changes Alertmanager routing; decide whether these
health checks should always fire during maintenance or be suppressed—if they
should be suppressible, add the appropriate inhibited_by label (match the
pattern used in cluster.yaml/pgr.yaml/osd.yaml/mon.yaml) to each critical alert
(e.g., add inhibited_by: cluster-maintenance or inhibited_by: node-maintenance
under labels for CephDaemonSlowOps and the listed Hardware* alerts) so routing
is consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cd5d5266-da78-4dc7-8fa3-236aa9d5b770
📒 Files selected for processing (2)
charts/ceph-operations/Chart.yamlcharts/ceph-operations/alerts/healthchecks.yaml
Summary by CodeRabbit