Skip to content

Commit f452f57

Browse files
committed
chore: making new branch (#1478)
merging personal branch to a new branch. Add conditional cronjob and increase backoff limit Refactor time-stamper.sh script inclusion in ConfigMap Time stamper cron (#1479) Change chart version from 1.13.0 to 0.1.0 removing not fit for purpose test Update time-stamper.sh removing the logic for if the pvc wasn't mounted. now only annotates mounted pvcs. Base logic for pvc auto deletion Sorting out the weekly cronjob for pvc auto deletion, also adding someting to value yaml to turn it off Added Del perm del s changing name to be more readable fix: if there isn't a last_used check if not null Fix for time-stamper.sh as it was annotating all PVCs pvc deletion test fix removing test yaml added affinity and tolerations to cronjobs added an affinity to the cronjob.
1 parent 65b7e75 commit f452f57

8 files changed

Lines changed: 270 additions & 17 deletions

File tree

helm/blueapi/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,12 @@ A Helm chart deploying a worker pod that runs Bluesky plans
3232
| podAnnotations | object | `{}` | |
3333
| podLabels | object | `{}` | |
3434
| podSecurityContext | object | `{}` | |
35+
| pvcAutoDeletion.enabled | bool | `true` | |
3536
| readinessProbe | object | `{"failureThreshold":2,"httpGet":{"path":"/healthz","port":"http"},"periodSeconds":10}` | Readiness probe, if configured kubernetes will not route traffic to this pod if failed consecutively. This could allow the service time to recover if it is being overwhelmed by traffic, but without the to ability to load balance or scale up/outwards, upstream services will need to know to back off. This is automatically disabled when in debug mode. |
36-
| resources | object | `{"limits":{"cpu":"2000m","memory":"4000Mi"},"requests":{"cpu":"200m","memory":"400Mi"}}` | Sets the compute resources available to the pod. These defaults are appropriate when using debug mode or an internal PVC and therefore running VS Code server in the pod. In the Diamond cluster, requests must be >= 0.1*limits When not using either of the above, the limits may be lowered. When idle but connected, blueapi consumes ~400MB of memory and 1% cpu and may struggle when allocated less. |
37+
| resources.limits.cpu | string | `"2000m"` | |
38+
| resources.limits.memory | string | `"4000Mi"` | |
39+
| resources.requests.cpu | string | `"200m"` | |
40+
| resources.requests.memory | string | `"400Mi"` | |
3741
| restartOnConfigChange | bool | `true` | If enabled the blueapi pod will restart on changes to `worker` |
3842
| securityContext.runAsNonRoot | bool | `true` | |
3943
| securityContext.runAsUser | int | `1000` | |
@@ -44,13 +48,13 @@ A Helm chart deploying a worker pod that runs Bluesky plans
4448
| serviceAccount.create | bool | `false` | |
4549
| serviceAccount.name | string | `""` | |
4650
| startupProbe | object | `{"failureThreshold":5,"httpGet":{"path":"/healthz","port":"http"},"periodSeconds":10}` | A more lenient livenessProbe to allow the service to start fully. This is automatically disabled when in debug mode. |
51+
| timeStampCron.enabled | bool | `true` | |
4752
| tolerations | list | `[]` | May be required to run on specific nodes (e.g. the control machine) |
4853
| tracing | object | `{"fastapi":{"excludedURLs":"/healthz"},"otlp":{"enabled":false,"protocol":"http/protobuf","server":{"host":"http://opentelemetry-collector.tracing","port":4318}}}` | Exclude health probe requests from tracing by default to prevent spamming |
4954
| volumeMounts | list | `[{"mountPath":"/config","name":"worker-config","readOnly":true}]` | Additional volumeMounts on the output StatefulSet definition. Define how volumes are mounted to the container referenced by using the same name. |
5055
| volumes | list | `[]` | Additional volumes on the output StatefulSet definition. Define volumes from e.g. Secrets, ConfigMaps or the Filesystem |
5156
| worker | object | `{"api":{"url":"http://0.0.0.0:8000/"},"env":{"sources":[{"kind":"planFunctions","module":"dodal.plans"},{"kind":"planFunctions","module":"dodal.plan_stubs.wrapped"}]},"logging":{"graylog":{"enabled":false,"url":"tcp://graylog-log-target.diamond.ac.uk:12231/"},"level":"INFO"},"scratch":{"repositories":[],"root":"/workspace"},"stomp":{"auth":{"password":"guest","username":"guest"},"enabled":false,"url":"tcp://rabbitmq:61613/"}}` | Config for the worker goes here, will be mounted into a config file |
5257
| worker.api.url | string | `"http://0.0.0.0:8000/"` | 0.0.0.0 required to allow non-loopback traffic If using hostNetwork, the port must be free on the host |
5358
| worker.env.sources | list | `[{"kind":"planFunctions","module":"dodal.plans"},{"kind":"planFunctions","module":"dodal.plan_stubs.wrapped"}]` | modules (must be installed in the venv) to fetch devices/plans from |
54-
| worker.logging | object | `{"graylog":{"enabled":false,"url":"tcp://graylog-log-target.diamond.ac.uk:12231/"},"level":"INFO"}` | Configures logging. Port 12231 is the `dodal` input on graylog which will be renamed `blueapi` |
5559
| worker.scratch | object | `{"repositories":[],"root":"/workspace"}` | If initContainer is enabled the default branch of python projects in this section are installed into the venv *without their dependencies* |
5660
| worker.stomp | object | `{"auth":{"password":"guest","username":"guest"},"enabled":false,"url":"tcp://rabbitmq:61613/"}` | Message bus configuration for returning status to GDA/forwarding documents downstream Password may be in the form ${ENV_VAR} to be fetched from an environment variable e.g. mounted from a SealedSecret |
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#!/bin/sh
2+
# Get all PVCs by running pods
3+
ALL_PVCS=$(kubectl get pvc -n $RELEASE_NAMESPACE -o=jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | sort -u)
4+
BLUEAPI_PVCS=$( echo $ALL_PVCS | tr ' ' '\n' | grep blueapi-scratch)
5+
NOW=$(date +%s)
6+
#loop through all pvcs.
7+
for pvc in $BLUEAPI_PVCS; do
8+
#check if pvc has last-used annotation
9+
if kubectl get pvc $pvc -n $RELEASE_NAMESPACE -o=jsonpath='{.metadata.annotations.last-used}'
10+
then
11+
#get last used annotation
12+
LAST_USED=$(kubectl get pvc $pvc -n $RELEASE_NAMESPACE -o=jsonpath='{.metadata.annotations.last-used}')
13+
#checking if its not null
14+
if [ -n "$LAST_USED" ]; then
15+
#check if last_used is older than 3 months
16+
if [ $(($NOW - LAST_USED)) -gt 7884000 ]; then
17+
#checking if the pvc is protected, if it is protected skip deletion
18+
if [ "$(kubectl get pvc $pvc -n $RELEASE_NAMESPACE -o=jsonpath='{.metadata.annotations.protected}')" = "true" ]; then
19+
echo "PVC $pvc is protected, skipping deletion"
20+
continue
21+
fi
22+
#PVC has not been used for more than three months, delete it
23+
kubectl delete pvc "$pvc" -n $RELEASE_NAMESPACE
24+
fi
25+
fi
26+
else
27+
echo "PVC $pvc does not have last-used annotation, skipping deletion"
28+
fi
29+
done
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/sh
2+
# Get all PVCs currently mounted by running pods
3+
MOUNTED_PVCS=$(kubectl get pods -n $RELEASE_NAMESPACE \
4+
-o=jsonpath='{.items[*].spec.volumes[*].persistentVolumeClaim.claimName}' | tr ' ' '\n' | sort -u)
5+
BLUEAPI_PVCS=$( echo $MOUNTED_PVCS | tr ' ' '\n' | grep blueapi-scratch)
6+
#loop through all the pvcs annotating ones thare are mounted
7+
NOW=$(date +%s)
8+
for pvc in $BLUEAPI_PVCS; do
9+
kubectl annotate --overwrite pvc "$pvc" -n $RELEASE_NAMESPACE last-used="$NOW"
10+
done

helm/blueapi/templates/configmap.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,6 @@ data:
3131
init_config.yaml: |-
3232
scratch:
3333
{{- toYaml .Values.worker.scratch | nindent 6 }}
34-
{{- end }}
3534
36-
---
35+
---
36+
{{- end }}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{{- if .Values.timeStampCron.enabled }}
2+
apiVersion: v1
3+
kind: ConfigMap
4+
metadata:
5+
name : {{include "blueapi.fullname" . }}-pvc-stamper-script
6+
data:
7+
{{- $files := .Files }}
8+
time-stamper.sh: |-
9+
{{ $files.Get "files/scripts/time-stamper.sh" | indent 4 }}
10+
---
11+
{{- end }}
12+
13+
{{- if .Values.pvcAutoDeletion.enabled }}
14+
apiVersion: v1
15+
kind: ConfigMap
16+
metadata:
17+
name : {{include "blueapi.fullname" . }}-pvc-auto-deletion-script
18+
data:
19+
{{- $files := .Files }}
20+
pvc-deletion.sh: |-
21+
{{ $files.Get "files/scripts/pvc-deletion.sh" | indent 4 }}
22+
{{- end }}
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
{{- if .Values.timeStampCron.enabled }}
2+
apiVersion: v1
3+
kind: ServiceAccount
4+
metadata:
5+
name: {{ include "blueapi.fullname" . }}-last-used-stamper
6+
namespace: {{ .Release.Namespace }}
7+
automountServiceAccountToken: true
8+
---
9+
apiVersion: rbac.authorization.k8s.io/v1
10+
kind: Role
11+
metadata:
12+
name: {{ include "blueapi.fullname" . }}-last-used-stamper
13+
namespace: {{ .Release.Namespace }}
14+
rules:
15+
- apiGroups: [""]
16+
resources: ["pods", "persistentvolumeclaims"]
17+
verbs: ["get", "list", "patch"]
18+
---
19+
apiVersion: rbac.authorization.k8s.io/v1
20+
kind: RoleBinding
21+
metadata:
22+
name: {{ include "blueapi.fullname" . }}-last-used-stamper
23+
namespace: {{ .Release.Namespace }}
24+
subjects:
25+
- kind: ServiceAccount
26+
name: {{ include "blueapi.fullname" . }}-last-used-stamper
27+
namespace: {{ .Release.Namespace }}
28+
roleRef:
29+
kind: Role
30+
name: {{ include "blueapi.fullname" . }}-last-used-stamper
31+
apiGroup: rbac.authorization.k8s.io
32+
---
33+
apiVersion: batch/v1
34+
kind: CronJob
35+
metadata:
36+
name: {{ include "blueapi.fullname" . }}-last-used-stamper
37+
namespace: {{ .Release.Namespace }}
38+
spec:
39+
concurrencyPolicy: Forbid
40+
successfulJobsHistoryLimit: 3
41+
failedJobsHistoryLimit: 1
42+
schedule: "*/5 * * * *"
43+
44+
jobTemplate:
45+
spec:
46+
# amount of attempts of labeling a pvc
47+
backoffLimit: 3
48+
# job stops after 180 seconds
49+
activeDeadlineSeconds: 180
50+
template:
51+
spec:
52+
serviceAccountName: {{ include "blueapi.fullname" . }}-last-used-stamper
53+
{{- with .Values.tolerations }}
54+
tolerations:
55+
{{- toYaml . | nindent 12 }}
56+
{{- end }}
57+
{{- with .Values.nodeSelector }}
58+
nodeSelector:
59+
{{- toYaml . | nindent 12 }}
60+
{{- end }}
61+
{{- with .Values.affinity }}
62+
affinity:
63+
{{- toYaml . | nindent 12 }}
64+
{{- end }}
65+
volumes:
66+
- name: {{include "blueapi.fullname" . }}-pvc-stamper-script
67+
configMap:
68+
name: {{include "blueapi.fullname" . }}-pvc-stamper-script
69+
defaultMode: 0555
70+
containers:
71+
- name: last-used-stamper
72+
env:
73+
- name: RELEASE_NAME
74+
value: {{ .Release.Name }}
75+
- name: RELEASE_NAMESPACE
76+
value: {{ .Release.Namespace }}
77+
volumeMounts:
78+
- name: {{include "blueapi.fullname" . }}-pvc-stamper-script
79+
mountPath: /scripts
80+
image: bitnami/kubectl:latest
81+
imagePullPolicy: IfNotPresent
82+
command: ["/scripts/time-stamper.sh"]
83+
restartPolicy: OnFailure
84+
{{- end }}
85+
{{- if .Values.pvcAutoDeletion.enabled }}
86+
---
87+
apiVersion: v1
88+
kind: ServiceAccount
89+
metadata:
90+
name: {{ include "blueapi.fullname" . }}-pvc-auto-deletion
91+
namespace: {{ .Release.Namespace }}
92+
automountServiceAccountToken: true
93+
---
94+
apiVersion: rbac.authorization.k8s.io/v1
95+
kind: Role
96+
metadata:
97+
name: {{ include "blueapi.fullname" . }}-pvc-auto-deletion
98+
namespace: {{ .Release.Namespace }}
99+
rules:
100+
- apiGroups: [""]
101+
resources: ["pods", "persistentvolumeclaims"]
102+
verbs: ["get", "list", "patch","delete"]
103+
---
104+
apiVersion: rbac.authorization.k8s.io/v1
105+
kind: RoleBinding
106+
metadata:
107+
name: {{ include "blueapi.fullname" . }}-pvc-auto-deletion
108+
namespace: {{ .Release.Namespace }}
109+
subjects:
110+
- kind: ServiceAccount
111+
name: {{ include "blueapi.fullname" . }}-pvc-auto-deletion
112+
namespace: {{ .Release.Namespace }}
113+
roleRef:
114+
kind: Role
115+
name: {{ include "blueapi.fullname" . }}-pvc-auto-deletion
116+
apiGroup: rbac.authorization.k8s.io
117+
---
118+
apiVersion: batch/v1
119+
kind: CronJob
120+
metadata:
121+
name: {{ include "blueapi.fullname" . }}-pvc-auto-deletion
122+
namespace: {{ .Release.Namespace }}
123+
spec:
124+
concurrencyPolicy: Forbid
125+
successfulJobsHistoryLimit: 3
126+
failedJobsHistoryLimit: 1
127+
schedule: "@weekly"
128+
129+
jobTemplate:
130+
spec:
131+
# amount of attempts of labeling a pvc
132+
backoffLimit: 3
133+
# job stops after 300 seconds
134+
activeDeadlineSeconds: 300
135+
template:
136+
spec:
137+
serviceAccountName: {{ include "blueapi.fullname" . }}-pvc-auto-deletion
138+
{{- with .Values.tolerations }}
139+
tolerations:
140+
{{- toYaml . | nindent 12 }}
141+
{{- end }}
142+
{{- with .Values.nodeSelector }}
143+
nodeSelector:
144+
{{- toYaml . | nindent 12 }}
145+
{{- end }}
146+
{{- with .Values.affinity }}
147+
affinity:
148+
{{- toYaml . | nindent 12 }}
149+
{{- end }}
150+
volumes:
151+
- name: {{include "blueapi.fullname" . }}-pvc-auto-deletion-script
152+
configMap:
153+
name: {{include "blueapi.fullname" . }}-pvc-auto-deletion-script
154+
defaultMode: 0555
155+
containers:
156+
- name: pvc-auto-deletion
157+
env:
158+
- name: RELEASE_NAME
159+
value: {{ .Release.Name }}
160+
- name: RELEASE_NAMESPACE
161+
value: {{ .Release.Namespace }}
162+
volumeMounts:
163+
- name: {{include "blueapi.fullname" . }}-pvc-auto-deletion-script
164+
mountPath: /scripts
165+
image: bitnami/kubectl:latest
166+
imagePullPolicy: IfNotPresent
167+
command: ["/scripts/pvc-deletion.sh"]
168+
restartPolicy: OnFailure
169+
{{- end }}

helm/blueapi/values.schema.json

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,14 @@
174174
"podSecurityContext": {
175175
"type": "object"
176176
},
177+
"pvcAutoDeletion": {
178+
"type": "object",
179+
"properties": {
180+
"enabled": {
181+
"type": "boolean"
182+
}
183+
}
184+
},
177185
"readinessProbe": {
178186
"description": "Readiness probe, if configured kubernetes will not route traffic to this pod if failed consecutively. This could allow the service time to recover if it is being overwhelmed by traffic, but without the to ability to load balance or scale up/outwards, upstream services will need to know to back off. This is automatically disabled when in debug mode.",
179187
"type": "object",
@@ -198,7 +206,6 @@
198206
}
199207
},
200208
"resources": {
201-
"description": "Sets the compute resources available to the pod. These defaults are appropriate when using debug mode or an internal PVC and therefore running VS Code server in the pod. In the Diamond cluster, requests must be \u003e= 0.1*limits When not using either of the above, the limits may be lowered. When idle but connected, blueapi consumes ~400MB of memory and 1% cpu and may struggle when allocated less.",
202209
"type": "object",
203210
"properties": {
204211
"limits": {
@@ -292,6 +299,14 @@
292299
}
293300
}
294301
},
302+
"timeStampCron": {
303+
"type": "object",
304+
"properties": {
305+
"enabled": {
306+
"type": "boolean"
307+
}
308+
}
309+
},
295310
"tolerations": {
296311
"description": "May be required to run on specific nodes (e.g. the control machine)",
297312
"type": "array"
@@ -389,7 +404,6 @@
389404
}
390405
},
391406
"logging": {
392-
"description": "Configures logging. Port 12231 is the `dodal` input on graylog which will be renamed `blueapi`",
393407
"type": "object",
394408
"properties": {
395409
"graylog": {

helm/blueapi/values.yaml

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,7 @@ podAnnotations: {}
3636
# For more information checkout: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
3737
podLabels: {}
3838

39-
podSecurityContext: {}
40-
# fsGroup: 2000
39+
podSecurityContext: {} # fsGroup: 2000
4140

4241
securityContext:
4342
# https://github.com/DiamondLightSource/blueapi/issues/1096
@@ -48,7 +47,7 @@ securityContext:
4847
# drop:
4948
# - ALL
5049

51-
# This is for setting up a service more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/
50+
# This is for setting up a service more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/
5251
service:
5352
# This sets the service type more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
5453
# -- To make blueapi available on an IP outside of the cluster prior to an Ingress being created, change this to LoadBalancer
@@ -76,13 +75,13 @@ ingress:
7675
# hosts:
7776
# - chart-example.local
7877

79-
# -- Sets the compute resources available to the pod.
80-
# These defaults are appropriate when using debug mode or an internal PVC and therefore
81-
# running VS Code server in the pod.
82-
# In the Diamond cluster, requests must be >= 0.1*limits
83-
# When not using either of the above, the limits may be lowered.
84-
# When idle but connected, blueapi consumes ~400MB of memory and 1% cpu
85-
# and may struggle when allocated less.
78+
# -- Sets the compute resources available to the pod.
79+
# These defaults are appropriate when using debug mode or an internal PVC and therefore
80+
# running VS Code server in the pod.
81+
# In the Diamond cluster, requests must be >= 0.1*limits
82+
# When not using either of the above, the limits may be lowered.
83+
# When idle but connected, blueapi consumes ~400MB of memory and 1% cpu
84+
# and may struggle when allocated less.
8685
resources:
8786
# We usually recommend not to specify default resources and to leave this as a conscious
8887
# choice for the user. This also increases chances charts run on environments with little
@@ -205,7 +204,7 @@ worker:
205204
repositories: []
206205
# - name: "dodal"
207206
# remote_url: https://github.com/DiamondLightSource/dodal.git
208-
# -- Configures logging. Port 12231 is the `dodal` input on graylog which will be renamed `blueapi`
207+
# -- Configures logging. Port 12231 is the `dodal` input on graylog which will be renamed `blueapi`
209208
logging:
210209
level: "INFO"
211210
graylog:
@@ -224,6 +223,12 @@ initContainer:
224223
# -- Size of persistent volume
225224
size: "1Gi"
226225

226+
timeStampCron:
227+
enabled: true
228+
229+
pvcAutoDeletion:
230+
enabled: true
231+
227232
debug:
228233
# -- If enabled, runs debugpy, allowing port-forwarding to expose port 5678 or attached vscode instance
229234
enabled: false

0 commit comments

Comments
 (0)