Skip to content

[ ceph ][ healthcheck ] change severity#75

Merged
adic9000 merged 2 commits into
mainfrom
i551092-patch01
Mar 27, 2026
Merged

[ ceph ][ healthcheck ] change severity#75
adic9000 merged 2 commits into
mainfrom
i551092-patch01

Conversation

@adic9000
Copy link
Copy Markdown
Contributor

@adic9000 adic9000 commented Mar 27, 2026

Summary by CodeRabbit

  • Chores
    • Updated Helm chart version to 1.8.18
    • Modified CephDaemonSlowOps alert severity from warning to critical

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

📝 Walkthrough

Walkthrough

The Helm chart version for ceph-operations is incremented from 1.8.17 to 1.8.18. Additionally, the severity label of the CephDaemonSlowOps alert rule is elevated from warning to critical, changing its classification level.

Changes

Cohort / File(s) Summary
Ceph Operations Chart Updates
charts/ceph-operations/Chart.yaml, charts/ceph-operations/alerts/healthchecks.yaml
Chart version bumped to 1.8.18; CephDaemonSlowOps alert severity escalated from warning to critical.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A version hops forward with careful stride,
While warnings grow teeth and fiercer pride,
Daemon slow-ops now cry "CRITICAL!" loud,
One point-eight-eighteen makes the ops team proud! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'change severity' only partially captures the changeset, which includes both a Helm chart version bump and an alert severity update. While it references one aspect of the change, it misses the version update and is somewhat vague. Consider a more specific title that captures both main changes, such as 'Bump ceph-operations chart to 1.8.18 and change CephDaemonSlowOps alert severity to critical' or focus on the primary change if the version bump is incidental.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch i551092-patch01

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
charts/ceph-operations/alerts/healthchecks.yaml (1)

22-35: ⚠️ Potential issue | 🟡 Minor

Verify intentionality of missing inhibited_by labels in critical alerts.

Elevating CephDaemonSlowOps to critical will change alert routing in Alertmanager. Other alert files (cluster.yaml, pgr.yaml, osd.yaml, mon.yaml) consistently add inhibited_by: cluster-maintenance or inhibited_by: node-maintenance to all critical alerts to suppress them during planned maintenance windows.

In contrast, all critical alerts in healthchecks.yaml—including the updated CephDaemonSlowOps and the Hardware* alerts (HardwareStorageError, HardwareMemoryError, HardwareProcessorError, HardwareNetworkError, HardwarePowerError, HardwareFanError)—lack this label.

Please confirm this pattern is intentional (e.g., health checks should always fire even during maintenance) or add the appropriate inhibited_by label for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/ceph-operations/alerts/healthchecks.yaml` around lines 22 - 35, The
CephDaemonSlowOps alert (and other critical alerts in healthchecks.yaml like
HardwareStorageError, HardwareMemoryError, HardwareProcessorError,
HardwareNetworkError, HardwarePowerError, HardwareFanError) are marked severity:
critical but do not include the inhibited_by label used elsewhere (e.g.,
inhibited_by: cluster-maintenance or node-maintenance), which changes
Alertmanager routing; decide whether these health checks should always fire
during maintenance or be suppressed—if they should be suppressible, add the
appropriate inhibited_by label (match the pattern used in
cluster.yaml/pgr.yaml/osd.yaml/mon.yaml) to each critical alert (e.g., add
inhibited_by: cluster-maintenance or inhibited_by: node-maintenance under labels
for CephDaemonSlowOps and the listed Hardware* alerts) so routing is consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@charts/ceph-operations/alerts/healthchecks.yaml`:
- Around line 22-35: The CephDaemonSlowOps alert (and other critical alerts in
healthchecks.yaml like HardwareStorageError, HardwareMemoryError,
HardwareProcessorError, HardwareNetworkError, HardwarePowerError,
HardwareFanError) are marked severity: critical but do not include the
inhibited_by label used elsewhere (e.g., inhibited_by: cluster-maintenance or
node-maintenance), which changes Alertmanager routing; decide whether these
health checks should always fire during maintenance or be suppressed—if they
should be suppressible, add the appropriate inhibited_by label (match the
pattern used in cluster.yaml/pgr.yaml/osd.yaml/mon.yaml) to each critical alert
(e.g., add inhibited_by: cluster-maintenance or inhibited_by: node-maintenance
under labels for CephDaemonSlowOps and the listed Hardware* alerts) so routing
is consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cd5d5266-da78-4dc7-8fa3-236aa9d5b770

📥 Commits

Reviewing files that changed from the base of the PR and between bb39ba6 and 806773f.

📒 Files selected for processing (2)
  • charts/ceph-operations/Chart.yaml
  • charts/ceph-operations/alerts/healthchecks.yaml

@adic9000 adic9000 merged commit 4377c1c into main Mar 27, 2026
5 checks passed
@adic9000 adic9000 deleted the i551092-patch01 branch March 27, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant