Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions monitoring/prometheus-rules/cvi.state.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
- name: virtualization.vi.state
rules:
- alert: D8VirtualizationClusterVirtualImageStuckInPendingPhase
expr: d8_virtualization_clustervirtualimage_status_phase{phase="Pending"} == 1
labels:
severity_level: "9"
tier: cluster
for: 60m
annotations:
plk_protocol_version: "1"
plk_markup_format: "markdown"
plk_create_group_if_not_exists__d8_virtualization_clustervirtualimage_state: "D8VirtualizationClusterVirtualImageState,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
plk_grouped_by__d8_virtualization_clustervirtualimage_state: "D8VirtualizationClusterVirtualImageState,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
summary: ClusterVirtualImage is stuck in the `Pending` phase for a long time.
description: |
The virtual image `{{ $labels.name }}` has been stuck in the `Pending` phase for more than 60 minutes.

### Common Causes

- Missing or not ready ClusterVirtualImage, ClusterClusterVirtualImage, VirtualDisk or ClusterVirtualImageSnapshot
- Scheduling issues on the node
- Cluster resource shortage (CPU, memory)
- Exhausted quotas (e.g., CPU, memory limits)

### Recommended Actions

1. Check virtual image status:
```bash
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status}" | jq
```

2. Inspect conditions for details:
```bash
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.conditions}" | jq
```

3. Check related events:
```bash
d8 k get events --field-selector involvedObject.name={{ $labels.name }}
```

4. Check if the source ClusterVirtualImage, ClusterClusterVirtualImage or ClusterVirtualImageSnapshot exists and is Ready:
```bash
d8 k -A get vd, vi, cvi, vis
```


- alert: D8VirtualizationClusterVirtualImageStuckInWaitForUserUploadPhase
expr: d8_virtualization_clustervirtualimage_status_phase{phase="WaitForUserUpload"} == 1
labels:
severity_level: "9"
tier: cluster
for: 60m
annotations:
plk_protocol_version: "1"
plk_markup_format: "markdown"
plk_create_group_if_not_exists__d8_virtualization_clustervirtualimage_state: "D8VirtualizationClusterVirtualImageState,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
plk_grouped_by__d8_virtualization_clustervirtualimage_state: "D8VirtualizationClusterVirtualImageState,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
summary: ClusterVirtualImage is stuck in the `WaitForUserUpload` phase for a long time.
description: |
The cluster virtual image `{{ $labels.name }}` has been waiting for a user image upload for more than 60 minutes.

This means that no image was uploaded to provision the cluster virtual image.

### What You Need to Do

Upload the required image image using one of the provided URLs:

- From outside the cluster:
```bash
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.imageUploadURLs.external}"
```

- From inside the cluster (node):
```bash
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.imageUploadURLs.inCluster}"
```

- Use `curl`, `wget`, or any HTTP client with `PUT` method and appropriate content-type (`application/octet-stream`) to upload the image.

Example:
```bash
curl -X PUT --data-binary @image.qcow2 \
-H "Content-Type: application/octet-stream" \
$(d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.imageUploadURLs.external}")
```


- alert: D8VirtualizationClusterVirtualImageFailed
expr: d8_virtualization_clustervirtualimage_status_phase{phase="Failed"} == 1
labels:
severity_level: "6"
tier: cluster
for: 0m
annotations:
plk_protocol_version: "1"
plk_markup_format: "markdown"
plk_create_group_if_not_exists__d8_virtualization_clustervirtualimage_state: "D8VirtualizationClusterVirtualImageState,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
plk_grouped_by__d8_virtualization_clustervirtualimage_state: "D8VirtualizationClusterVirtualImageState,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
summary: ClusterVirtualImage in the `Failed` phase.
description: |
The virtual image `{{ $labels.name }}` in the `Failed` phase.

This may indicate one or more of the following issues:

- Wrong image URL
- Wrong container image
- Network issues
- Storage issues

### Recommended Actions

1. Check the full status of the cluster virtual image:
```bash
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status}" | jq
```

2. Inspect the condition for details:
```bash
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.conditions}" | jq
```

3. Review events related to this cluster virtual image:
```bash
d8 k get events --field-selector involvedObject.name={{ $labels.name }}
```
87 changes: 81 additions & 6 deletions monitoring/prometheus-rules/dvcr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,49 @@
plk_grouped_by__d8_virtualization_dvcr_health: "D8VirtualizationDVCRHealth,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
summary: The dvcr Pod is NOT Ready.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy dvcr`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=dvcr`
One or more Pods of the dvcr deployment in the d8-virtualization namespace are not in a Ready state.

The dvcr component serves as a local registry for virtual machine images and disks. If its Pods are not ready, image uploads and VirtualDisk provisioning may be affected.

Recommended diagnosis steps:

1. Check the status of dvcr Pods.
```bash
d8 k -n d8-virtualization get pods -l app=dvcr
```

2. Describe the Deployment to check replicas and events.
```bash
d8 k -n d8-virtualization describe deploy dvcr
```

3. Get detailed information about the Pod, including events and container statuses.
```bash
d8 k -n d8-virtualization describe pod -l app=dvcr
```

4. View logs from the affected Pod.
```bash
d8 k -n d8-virtualization logs <pod-name>
```

5. If the Pod has restarted, check logs from the previous instance.
```bash
d8 k -n d8-virtualization logs <pod-name> --previous
```

Recommended actions:

- Investigate readiness probe failures, container crashes, or scheduling issues based on the output of the commands above.

- If the issue persists, consider restarting the dvcr Deployment.
```bash
d8 k -n d8-virtualization rollout restart deploy dvcr
```

- Ensure that required storage volumes (such as PVCs) are available and healthy.

- Verify that there are no node issues such as disk pressure or memory limits affecting scheduling.

- alert: D8VirtualizationDVCRPodIsNotRunning
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"dvcr-.*"})
Expand All @@ -31,6 +71,41 @@
plk_grouped_by__d8_virtualization_dvcr_health: "D8VirtualizationDVCRHealth,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
summary: The dvcr Pod is NOT Running.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy dvcr`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=dvcr`
No running Pods were found for the dvcr deployment in the d8-virtualization namespace for more than 2 minutes.

The dvcr component serves as a local registry for virtual machine images and disks. Its unavailability may block image uploads and provisioning of new VirtualDisks or VirtualMachines.

Recommended diagnosis steps:

1. Check if any dvcr Pods exist and what their status is.
```bash
d8 k -n d8-virtualization get pods -l app=dvcr
```

2. Describe the Deployment to check replicas and events.
```bash
d8 k -n d8-virtualization describe deploy dvcr
```

3. Get logs from previous Pods (if they have been restarted).
```bash
d8 k -n d8-virtualization logs <pod-name> --previous
```

4. Get events related to the dvcr Deployment and Pods.
```bash
d8 k -n d8-virtualization describe pod -l app=dvcr
```

Recommended actions:

- If the Deployment has zero replicas, scale it back up.
```bash
d8 k -n d8-virtualization scale deploy dvcr --replicas=1
```

- If Pods are crashing, inspect their logs and events.

- Ensure that required storage volumes (such as PVCs) are available and healthy.

- Verify that there are no node issues such as disk pressure or memory limits affecting scheduling.
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
summary: The cdi-apiserver Pod is NOT Ready.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-apiserver`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-apiserver`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`

- alert: D8InternalVirtualizationCDIAPIServerPodIsNotRunning
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"cdi-apiserver-.*"})
Expand All @@ -32,5 +32,5 @@
summary: The cdi-apiserver Pod is NOT Running.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-apiserver`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-apiserver`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
summary: The cdi-deployment Pod is NOT Ready.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-deployment`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=containerized-data-importer`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-deployment`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=containerized-data-importer`

- alert: D8InternalVirtualizationCDIDeploymentPodIsNotRunning
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"cdi-deployment-.*"})
Expand All @@ -32,5 +32,5 @@
summary: The cdi-deployment Pod is NOT Running.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-deployment`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=containerized-data-importer`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-deployment`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=containerized-data-importer`
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
summary: The cdi-operator Pod is NOT Ready.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-operator`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=cdi-operator`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-operator`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=cdi-operator`

- alert: D8InternalVirtualizationCDIOperatorPodIsNotRunning
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"cdi-operator-.*"})
Expand All @@ -32,5 +32,5 @@
summary: The cdi-operator Pod is NOT Running.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-operator`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=cdi-operator`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-operator`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=cdi-operator`
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
summary: The virt-api Pod is NOT Ready.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-api`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-api`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`

- alert: D8InternalVirtualizationVirtAPIPodIsNotRunning
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"virt-api-.*"})
Expand All @@ -32,5 +32,5 @@
summary: The virt-api Pod is NOT Running.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-api`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-api`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
summary: The virt-controller Pod is NOT Ready.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-controller`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-controller`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`

- alert: D8InternalVirtualizationVirtControllerPodIsNotRunning
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"virt-controller-.*"})
Expand All @@ -32,5 +32,5 @@
summary: The virt-controller Pod is NOT Running.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-controller`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-controller`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
summary: The virt-handler Pod is NOT Ready.
description: |
The recommended course of action:
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe daemonset virt-handler`
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod --field-selector=spec.nodeName={{ $labels.node }} -l kubevirt.internal.virtualization.deckhouse.io=virt-handler`
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe daemonset virt-handler`
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod --field-selector=spec.nodeName={{ $labels.node }} -l kubevirt.internal.virtualization.deckhouse.io=virt-handler`

- alert: D8InternalVirtualizationVirtHandlerPodIsNotRunning
expr: absent(avg by(node,pod,namespace)(kube_pod_info{}) * on(pod, namespace) group_right(node) kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"virt-handler-.*"})
Expand Down
Loading
Loading