Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions charts/openstack-hypervisor-operator/alerts/eviction.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/evictionfailed
annotations:
summary: "Eviction {{ $labels.name }} has failed"
description: "The eviction {{ $labels.name }} for hypervisor {{ $labels.hypervisor }} has reached a terminal failure state. Manual intervention is required — check if the hypervisor exists in OpenStack."
Expand All @@ -24,6 +25,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/evictionmigrationfailing
annotations:
summary: "Eviction {{ $labels.name }} has failing instance migrations for over 1 hour"
description: "The eviction {{ $labels.name }} has had MigratingInstance=Failed for more than 1 hour while still running. Instances may be in ERROR state, blocking eviction progress."
Expand All @@ -37,6 +39,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/evictionoutstandingram
annotations:
summary: "Eviction {{ $labels.name }} has outstanding RAM for over 6 hours"
description: "The eviction {{ $labels.name }} has had {{ $value }}MB of outstanding RAM for more than 6 hours. Check for stuck live-migrations or instances that cannot be moved."
7 changes: 7 additions & 0 deletions charts/openstack-hypervisor-operator/alerts/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisoronboardingstuck

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify operator playbook documentation exists
# Expected: Find references to these playbook paths in the repository or documentation

echo "Searching for operator playbook path references..."
rg -i "hypervisoronboardingstuck|hypervisorevictionstuck|hypervisorevictedtoolong|hypervisortraitsyncfailed|hypervisoraggregatesyncfailed|hypervisorreconcileerrors|hypervisoroperatordown" \
  --type md --type yaml -g '!alerts/' -C2

echo -e "\nSearching for playbook documentation structure..."
fd -t f -e md -i hypervisor | head -20

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 676


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Search for exact referenced playbook paths..."
rg -n --hidden --no-ignore-vcs \
  -S "docs/compute/kvm/playbooks/(hypervisoronboardingstuck|hypervisorevictionstuck|hypervisorevictedtoolong|hypervisortraitsyncfailed|hypervisoraggregatesyncfailed|hypervisorreconcileerrors|hypervisoroperatordown)" \
  . || true

echo
echo "2) List any files whose path includes docs/compute/kvm/playbooks/"
fd -t f -i --type f "hypervisoronboardingstuck" . || true
fd -t f -i --type f "hypervisorevictionstuck" . || true
fd -t f -i --type f "hypervisorevictedtoolong" . || true
fd -t f -i --type f "hypervisortraitsyncfailed" . || true
fd -t f -i --type f "hypervisoraggregatesyncfailed" . || true
fd -t f -i --type f "hypervisorreconcileerrors" . || true
fd -t f -i --type f "hypervisoroperatordown" . || true

echo
echo "3) Check if docs/compute/kvm/playbooks exists and show a sample listing..."
if [ -d "docs/compute/kvm/playbooks" ]; then
  ls -la "docs/compute/kvm/playbooks" | head -200
else
  echo "Directory not found: docs/compute/kvm/playbooks"
fi

echo
echo "4) Search for 'playbooks/' directory references in charts docs..."
rg -n "compute/kvm/playbooks/" -S . || true

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 2620


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Search for exact referenced playbook paths..."
rg -n --hidden --no-ignore-vcs \
  -S "docs/compute/kvm/playbooks/(hypervisoronboardingstuck|hypervisorevictionstuck|hypervisorevictedtoolong|hypervisortraitsyncfailed|hypervisoraggregatesyncfailed|hypervisorreconcileerrors|hypervisoroperatordown)" \
  . || true

echo
echo "2) Locate any files by playbook name substring (any location, case-insensitive)..."
fd -t f -i "hypervisoronboardingstuck" .
fd -t f -i "hypervisorevictionstuck" .
fd -t f -i "hypervisorevictedtoolong" .
fd -t f -i "hypervisortraitsyncfailed" .
fd -t f -i "hypervisoraggregatesyncfailed" .
fd -t f -i "hypervisorreconcileerrors" .
fd -t f -i "hypervisoroperatordown" .

echo
echo "3) Check if docs/compute/kvm/playbooks exists and show a sample listing..."
if [ -d "docs/compute/kvm/playbooks" ]; then
  ls -la "docs/compute/kvm/playbooks" | head -200
else
  echo "Directory not found: docs/compute/kvm/playbooks"
fi

echo
echo "4) Search for playbooks/ directory references..."
rg -n "compute/kvm/playbooks/" -S . || true

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 2620


Fix broken playbook doc references (docs/compute/kvm/playbooks/*) in operator.yaml

charts/openstack-hypervisor-operator/alerts/operator.yaml adds playbook links for:

  • hypervisoronboardingstuck, hypervisorevictionstuck, hypervisorevictedtoolong, hypervisortraitsyncfailed, hypervisoraggregatesyncfailed, hypervisorreconcileerrors, hypervisoroperatordown

But docs/compute/kvm/playbooks/ does not exist anywhere in this repo (only these references are present, including in alerts/eviction.yaml). Ensure the corresponding playbook docs/pages are added at docs/compute/kvm/playbooks/ (or update the paths to the correct location) before merging.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@charts/openstack-hypervisor-operator/alerts/operator.yaml` at line 14,
operator.yaml contains broken playbook links pointing at
docs/compute/kvm/playbooks/* (e.g., hypervisoronboardingstuck,
hypervisorevictionstuck, hypervisorevictedtoolong, hypervisortraitsyncfailed,
hypervisoraggregatesyncfailed, hypervisorreconcileerrors,
hypervisoroperatordown) that do not exist; either create corresponding
documentation pages at docs/compute/kvm/playbooks/<playbook-name> (preferably
markdown files with the playbook content and frontmatter) or update the paths in
charts/openstack-hypervisor-operator/alerts/operator.yaml (and any other files
referencing the same paths such as alerts/eviction.yaml) to point to the correct
existing doc location so all listed playbook links resolve.

annotations:
summary: "Hypervisor {{ $labels.name }} onboarding stuck for over 1 hour"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has been onboarding for more than 1 hour. Check nova registration, test VM status, or trait/aggregate sync."
Expand All @@ -22,6 +23,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisorevictionstuck
annotations:
summary: "Hypervisor {{ $labels.name }} eviction running for over 4 hours"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has had an active eviction for more than 4 hours. Check for stuck live-migrations or failed VMs."
Expand All @@ -35,6 +37,7 @@ groups:
labels:
severity: info
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisorevictedtoolong
annotations:
summary: "Hypervisor {{ $labels.name }} has been evicted for over 7 days"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has been evicted for more than 7 days without being offboarded. Consider re-enabling or decommissioning."
Expand All @@ -50,6 +53,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisortraitsyncfailed
annotations:
summary: "Hypervisor {{ $labels.name }} trait sync has been failing"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has had TraitsUpdated=False for more than 30 minutes outside of onboarding. Check OpenStack Placement API connectivity."
Expand All @@ -65,6 +69,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisoraggregatesyncfailed
annotations:
summary: "Hypervisor {{ $labels.name }} aggregate sync has been failing"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has had AggregatesUpdated=False for more than 30 minutes outside of onboarding and eviction. Check OpenStack Nova API connectivity."
Expand All @@ -78,6 +83,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisorreconcileerrors
annotations:
summary: "Hypervisor operator controller {{ $labels.controller }} has persistent reconcile errors"
description: "The controller {{ $labels.controller }} has been producing sustained reconciliation errors for more than 15 minutes."
Expand All @@ -89,6 +95,7 @@ groups:
labels:
severity: critical
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisoroperatordown
annotations:
summary: "Hypervisor operator is down"
description: "The hypervisor operator metrics endpoint has been unreachable for more than 5 minutes."
Loading