Skip to content

Commit 673d41d

Browse files
committed
feat(module): upd
Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
1 parent 4ed335d commit 673d41d

19 files changed

Lines changed: 1055 additions & 94 deletions
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
- name: virtualization.vi.state
2+
rules:
3+
- alert: D8VirtualizationClusterVirtualImageStuckInPendingPhase
4+
expr: d8_virtualization_clustervirtualimage_status_phase{phase="Pending"} == 1
5+
labels:
6+
severity_level: "9"
7+
tier: cluster
8+
for: 60m
9+
annotations:
10+
plk_protocol_version: "1"
11+
plk_markup_format: "markdown"
12+
plk_create_group_if_not_exists__d8_virtualization_clustervirtualimage_stuck_in_pending_phase: "D8VirtualizationClusterVirtualImageStuckInPendingPhase,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
13+
plk_grouped_by__d8_virtualization_clustervirtualimage_stuck_in_pending_phase: "D8VirtualizationClusterVirtualImageStuckInPendingPhase,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
14+
summary: ClusterVirtualImage is stuck in the `Pending` phase for a long time.
15+
description: |
16+
The virtual image `{{ $labels.name }}` has been stuck in the `Pending` phase for more than 60 minutes.
17+
18+
### Common Causes
19+
20+
- Missing or not ready ClusterVirtualImage, ClusterClusterVirtualImage, VirtualDisk or ClusterVirtualImageSnapshot
21+
- Scheduling issues on the node
22+
- Cluster resource shortage (CPU, memory)
23+
- Exhausted quotas (e.g., CPU, memory limits)
24+
25+
### Recommended Actions
26+
27+
1. Check virtual image status:
28+
```bash
29+
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status}" | jq
30+
```
31+
32+
2. Inspect conditions for details:
33+
```bash
34+
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.conditions}" | jq
35+
```
36+
37+
3. Check related events:
38+
```bash
39+
d8 k get events --field-selector involvedObject.name={{ $labels.name }}
40+
```
41+
42+
4. Check if the source ClusterVirtualImage, ClusterClusterVirtualImage or ClusterVirtualImageSnapshot exists and is Ready:
43+
```bash
44+
d8 k -A get vd, vi, cvi, vis
45+
```
46+
47+
48+
- alert: D8VirtualizationClusterVirtualImageStuckInWaitForUserUploadPhase
49+
expr: d8_virtualization_clustervirtualimage_status_phase{phase="WaitForUserUpload"} == 1
50+
labels:
51+
severity_level: "9"
52+
tier: cluster
53+
for: 60m
54+
annotations:
55+
plk_protocol_version: "1"
56+
plk_markup_format: "markdown"
57+
plk_create_group_if_not_exists__d8_virtualization_clustervirtualimage_stuck_in_waitforuserupload_phase: "D8VirtualizationClusterVirtualImageStuckInWaitForUserUploadPhase,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
58+
plk_grouped_by__d8_virtualization_clustervirtualimage_stuck_in_waitforuserupload_phase: "D8VirtualizationClusterVirtualImageStuckInWaitForUserUploadPhase,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
59+
summary: ClusterVirtualImage is stuck in the `WaitForUserUpload` phase for a long time.
60+
description: |
61+
The cluster virtual image `{{ $labels.name }}` has been waiting for a user image upload for more than 60 minutes.
62+
63+
This means that no image was uploaded to provision the cluster virtual image.
64+
65+
### What You Need to Do
66+
67+
Upload the required image image using one of the provided URLs:
68+
69+
- From outside the cluster:
70+
```bash
71+
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.imageUploadURLs.external}"
72+
```
73+
74+
- From inside the cluster (node):
75+
```bash
76+
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.imageUploadURLs.inCluster}"
77+
```
78+
79+
- Use `curl`, `wget`, or any HTTP client with `PUT` method and appropriate content-type (`application/octet-stream`) to upload the image.
80+
81+
Example:
82+
```bash
83+
curl -X PUT --data-binary @image.qcow2 \
84+
-H "Content-Type: application/octet-stream" \
85+
$(d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.imageUploadURLs.external}")
86+
```
87+
88+
89+
- alert: D8VirtualizationClusterVirtualImageFailed
90+
expr: d8_virtualization_clustervirtualimage_status_phase{phase="Failed"} == 1
91+
labels:
92+
severity_level: "6"
93+
tier: cluster
94+
for: 0m
95+
annotations:
96+
plk_protocol_version: "1"
97+
plk_markup_format: "markdown"
98+
plk_create_group_if_not_exists__d8_virtualization_clustervirtualimage__failed: "D8VirtualizationClusterVirtualImageFailed,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
99+
plk_grouped_by__d8_virtualization_clustervirtualimage__failed: "D8VirtualizationClusterVirtualImageFailed,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
100+
summary: ClusterVirtualImage in the `Failed` phase.
101+
description: |
102+
The virtual image `{{ $labels.name }}` in the `Failed` phase.
103+
104+
This may indicate one or more of the following issues:
105+
106+
- Wrong image URL
107+
- Wrong container image
108+
- Network issues
109+
- Storage issues
110+
111+
### Recommended Actions
112+
113+
1. Check the full status of the cluster virtual image:
114+
```bash
115+
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status}" | jq
116+
```
117+
118+
2. Inspect the condition for details:
119+
```bash
120+
d8 k get cvi {{ $labels.name }} -o jsonpath="{.status.conditions}" | jq
121+
```
122+
123+
3. Review events related to this cluster virtual image:
124+
```bash
125+
d8 k get events --field-selector involvedObject.name={{ $labels.name }}
126+
```

monitoring/prometheus-rules/dvcr.yaml

Lines changed: 81 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,49 @@
1414
plk_grouped_by__d8_virtualization_dvcr_health: "D8VirtualizationDVCRHealth,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
1515
summary: The dvcr Pod is NOT Ready.
1616
description: |
17-
The recommended course of action:
18-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy dvcr`
19-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=dvcr`
17+
One or more Pods of the dvcr deployment in the d8-virtualization namespace are not in a Ready state.
18+
19+
The dvcr component serves as a local registry for virtual machine images and disks. If its Pods are not ready, image uploads and VirtualDisk provisioning may be affected.
20+
21+
Recommended diagnosis steps:
22+
23+
1. Check the status of dvcr Pods.
24+
```bash
25+
d8 k -n d8-virtualization get pods -l app=dvcr
26+
```
27+
28+
2. Describe the Deployment to check replicas and events.
29+
```bash
30+
d8 k -n d8-virtualization describe deploy dvcr
31+
```
32+
33+
3. Get detailed information about the Pod, including events and container statuses.
34+
```bash
35+
d8 k -n d8-virtualization describe pod -l app=dvcr
36+
```
37+
38+
4. View logs from the affected Pod.
39+
```bash
40+
d8 k -n d8-virtualization logs <pod-name>
41+
```
42+
43+
5. If the Pod has restarted, check logs from the previous instance.
44+
```bash
45+
d8 k -n d8-virtualization logs <pod-name> --previous
46+
```
47+
48+
Recommended actions:
49+
50+
- Investigate readiness probe failures, container crashes, or scheduling issues based on the output of the commands above.
51+
52+
- If the issue persists, consider restarting the dvcr Deployment.
53+
```bash
54+
d8 k -n d8-virtualization rollout restart deploy dvcr
55+
```
56+
57+
- Ensure that required storage volumes (such as PVCs) are available and healthy.
58+
59+
- Verify that there are no node issues such as disk pressure or memory limits affecting scheduling.
2060
2161
- alert: D8VirtualizationDVCRPodIsNotRunning
2262
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"dvcr-.*"})
@@ -31,6 +71,41 @@
3171
plk_grouped_by__d8_virtualization_dvcr_health: "D8VirtualizationDVCRHealth,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
3272
summary: The dvcr Pod is NOT Running.
3373
description: |
34-
The recommended course of action:
35-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy dvcr`
36-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=dvcr`
74+
No running Pods were found for the dvcr deployment in the d8-virtualization namespace for more than 2 minutes.
75+
76+
The dvcr component serves as a local registry for virtual machine images and disks. Its unavailability may block image uploads and provisioning of new VirtualDisks or VirtualMachines.
77+
78+
Recommended diagnosis steps:
79+
80+
1. Check if any dvcr Pods exist and what their status is.
81+
```bash
82+
d8 k -n d8-virtualization get pods -l app=dvcr
83+
```
84+
85+
2. Describe the Deployment to check replicas and events.
86+
```bash
87+
d8 k -n d8-virtualization describe deploy dvcr
88+
```
89+
90+
3. Get logs from previous Pods (if they have been restarted).
91+
```bash
92+
d8 k -n d8-virtualization logs <pod-name> --previous
93+
```
94+
95+
4. Get events related to the dvcr Deployment and Pods.
96+
```bash
97+
d8 k -n d8-virtualization describe pod -l app=dvcr
98+
```
99+
100+
Recommended actions:
101+
102+
- If the Deployment has zero replicas, scale it back up.
103+
```bash
104+
d8 k -n d8-virtualization scale deploy dvcr --replicas=1
105+
```
106+
107+
- If Pods are crashing, inspect their logs and events.
108+
109+
- Ensure that required storage volumes (such as PVCs) are available and healthy.
110+
111+
- Verify that there are no node issues such as disk pressure or memory limits affecting scheduling.

monitoring/prometheus-rules/internal-virtualization-cdi-apiservier.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
summary: The cdi-apiserver Pod is NOT Ready.
1616
description: |
1717
The recommended course of action:
18-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-apiserver`
19-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`
18+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-apiserver`
19+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`
2020
2121
- alert: D8InternalVirtualizationCDIAPIServerPodIsNotRunning
2222
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"cdi-apiserver-.*"})
@@ -32,5 +32,5 @@
3232
summary: The cdi-apiserver Pod is NOT Running.
3333
description: |
3434
The recommended course of action:
35-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-apiserver`
36-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`
35+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-apiserver`
36+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l cdi.internal.virtualization.deckhouse.io=cdi-apiserver`

monitoring/prometheus-rules/internal-virtualization-cdi-deployment.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
summary: The cdi-deployment Pod is NOT Ready.
1616
description: |
1717
The recommended course of action:
18-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-deployment`
19-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=containerized-data-importer`
18+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-deployment`
19+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=containerized-data-importer`
2020
2121
- alert: D8InternalVirtualizationCDIDeploymentPodIsNotRunning
2222
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"cdi-deployment-.*"})
@@ -32,5 +32,5 @@
3232
summary: The cdi-deployment Pod is NOT Running.
3333
description: |
3434
The recommended course of action:
35-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-deployment`
36-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=containerized-data-importer`
35+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-deployment`
36+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=containerized-data-importer`

monitoring/prometheus-rules/internal-virtualization-cdi-operator.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
summary: The cdi-operator Pod is NOT Ready.
1616
description: |
1717
The recommended course of action:
18-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-operator`
19-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=cdi-operator`
18+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-operator`
19+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=cdi-operator`
2020
2121
- alert: D8InternalVirtualizationCDIOperatorPodIsNotRunning
2222
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"cdi-operator-.*"})
@@ -32,5 +32,5 @@
3232
summary: The cdi-operator Pod is NOT Running.
3333
description: |
3434
The recommended course of action:
35-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy cdi-operator`
36-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l app=cdi-operator`
35+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy cdi-operator`
36+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l app=cdi-operator`

monitoring/prometheus-rules/internal-virtualization-virt-api.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
summary: The virt-api Pod is NOT Ready.
1616
description: |
1717
The recommended course of action:
18-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-api`
19-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`
18+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-api`
19+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`
2020
2121
- alert: D8InternalVirtualizationVirtAPIPodIsNotRunning
2222
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"virt-api-.*"})
@@ -32,5 +32,5 @@
3232
summary: The virt-api Pod is NOT Running.
3333
description: |
3434
The recommended course of action:
35-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-api`
36-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`
35+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-api`
36+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-api`

monitoring/prometheus-rules/internal-virtualization-virt-controller.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
summary: The virt-controller Pod is NOT Ready.
1616
description: |
1717
The recommended course of action:
18-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-controller`
19-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`
18+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-controller`
19+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`
2020
2121
- alert: D8InternalVirtualizationVirtControllerPodIsNotRunning
2222
expr: absent(kube_pod_status_phase{namespace="d8-virtualization",phase="Running",pod=~"virt-controller-.*"})
@@ -32,5 +32,5 @@
3232
summary: The virt-controller Pod is NOT Running.
3333
description: |
3434
The recommended course of action:
35-
1. Retrieve details of the Deployment: `kubectl -n d8-virtualization describe deploy virt-controller`
36-
2. View the status of the Pod and try to figure out why it is not running: `kubectl -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`
35+
1. Retrieve details of the Deployment: `d8 k -n d8-virtualization describe deploy virt-controller`
36+
2. View the status of the Pod and try to figure out why it is not running: `d8 k -n d8-virtualization describe pod -l kubevirt.internal.virtualization.deckhouse.io=virt-controller`

0 commit comments

Comments
 (0)