Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 17 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,23 @@ Instructions on generating a Fernet key can be found at [How-to Guides: Securing
Example:
```
poetry run python3
>>> from cryptography.fernet import Fernet
>>> fernet_key= Fernet.generate_key()
>>> decoded_fernet_key = fernet_key.decode()
from cryptography.fernet import Fernet
fernet_key= Fernet.generate_key()
decoded_fernet_key = fernet_key.decode()
quit()
echo -n $decoded_fernet_key | base64
```

Airflow 3 uses a secret key to encode and decode JWTs to authenticate to public and private APIs. To generate the airflow-jwt-secret-key, you can do:
```
poetry run python3
import secrets
jwt_key = secrets.token_urlsafe(16)
print(jwt_key)
quit()
echo -n $jwt_key | base64
```

Once you have generated the desired keys and password apply it to your Kubernetes cluster using:
```
export NAMESPACE=<my-namespace>
Expand All @@ -64,7 +75,7 @@ With [Docker Desktop](https://docs.docker.com/desktop/), [Helm](https://helm.sh/
export NAMESPACE=<my-namespace>
kubectl -n $NAMESPACE apply -f pv-volume.yaml
envsubst < airflow-values.yaml > ns-airflow-values.yaml
helm -n $NAMESPACE install --version 22.7.3 -f ns-airflow-values.yaml airflow oci://registry-1.docker.io/bitnamicharts/airflow
helm -n $NAMESPACE install --version=25.0.2 -f ns-airflow-values.yaml airflow oci://registry-1.docker.io/bitnamicharts/airflow
```

Note: in the `pv-volume.yaml` file you must use a storageClass that supports ReadWriteMany. If you do not specify a storageClassName, the default storageClass for your cluster will be used.
Expand All @@ -74,11 +85,11 @@ To upgrade or to reinitialize the airflow release when configuration changes are
```
envsubst < airflow-values.yaml > ns-airflow-values.yaml
export PASSWORD=$(kubectl get secret -n $NAMESPACE airflow-postgresql -o jsonpath="{.data.password}" | base64 -d)
helm -n $NAMESPACE upgrade --install --version 22.7.3 --set global.postgresql.auth.password=$PASSWORD -f ns-airflow-values.yaml airflow oci://registry-1.docker.io/bitnamicharts/airflow
helm -n $NAMESPACE upgrade --install --version=25.0.2 --set global.postgresql.auth.password=$PASSWORD -f ns-airflow-values.yaml airflow oci://registry-1.docker.io/bitnamicharts/airflow
```

## Run the DAGs to Anonymize Data
1. Either create an ingress for the airflow service that resolves to a hostname, or simply port-forward the airflow service to your local browser: `kubectl -n $NAMESPACE port-forward svc/airflow 8080:8080`
1. Either create an ingress for the airflow service that resolves to a hostname, or simply port-forward the airflow service to your local browser: `kubectl -n $NAMESPACE port-forward svc/airflow-web 8080:8080`
1. Open your browser and go to localhost:8080 (or the URL if you created a hostname and ingress).
1. Login with the user airflow and the airflow-password created with the secret.yaml file.
1. Create a new Connection under Admin > Connections.
Expand Down
139 changes: 105 additions & 34 deletions airflow-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ configuration:
auth:
username: airflow
existingSecret: airflow-user
image:
repository: bitnamilegacy/airflow
global:
security:
allowInsecureImages: true
dags:
enabled: true
repositories:
Expand All @@ -36,10 +41,14 @@ plugins:
name: "plugins"
path: /folio_data_anonymization/plugins
postgresql:
image:
repository: bitnamilegacy/postgresql
tag: 17.6.0-debian-12-r2
primary:
resourcesPreset: "medium"
setupDBJob:
resourcesPreset: "medium"
automountServiceAccountToken: true
extraVolumes:
- name: airflow-dependencies
persistentVolumeClaim:
Expand All @@ -59,7 +68,7 @@ setupDBJob:
initContainers:
- name: install-dependencies
image: "{{ include \"airflow.image\" . }}"
imagePullPolicy: "{{ .Values.image.pullPolicy }}"
imagePullPolicy: IfNotPresent
env:
- name: PYTHONPATH
value: "/opt/bitnami/airflow:/opt/bitnami/airflow/lib"
Expand All @@ -76,50 +85,42 @@ setupDBJob:
subPath: requirements.txt
- name: airflow-dependencies
mountPath: /opt/bitnami/airflow/lib
worker:
dagProcessor:
automountServiceAccountToken: true
extraVolumeMounts:
- name: airflow-dependencies
mountPath: "/opt/bitnami/airflow/lib"
- name: airflow-logs
mountPath: "/opt/bitnami/airflow_logs"
extraVolumes:
- name: airflow-dependencies
persistentVolumeClaim:
claimName: airflow-dependencies
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow:/opt/bitnami/airflow/lib"
resourcesPreset: "large"
web:
replicaCount: 1
automountServiceAccountToken: true
extraVolumeMounts:
- name: airflow-dependencies
mountPath: "/opt/bitnami/airflow/lib"
- name: airflow-logs
mountPath: "/opt/bitnami/airflow_logs"
extraVolumes:
- name: airflow-dependencies
persistentVolumeClaim:
claimName: airflow-dependencies
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
- name: airflow-dependencies
persistentVolumeClaim:
claimName: airflow-dependencies
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow:/opt/bitnami/airflow/lib"
resourcesPreset: "xlarge"
readinessProbe:
resources:
requests:
cpu: 2.5
memory: 5376Mi
ephemeral-storage: 50Mi
limits:
cpu: 3.5
memory: 6144Mi
ephemeral-storage: 2Gi
livenessProbe:
enabled: true
initialDelaySeconds: 35
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 10
initialDelaySeconds: 180
periodSeconds: 20
timeoutSeconds: 60
failureThreshold: 300
successThreshold: 1
readinessProbe:
enabled: true
failureThreshold: 20
scheduler:
replicaCount: 1
automountServiceAccountToken: true
Expand Down Expand Up @@ -157,6 +158,76 @@ scheduler:
readinessProbe:
enabled: true
failureThreshold: 20
triggerer:
replicaCount: 1
automountServiceAccountToken: true
extraVolumeMounts:
- name: airflow-dependencies
mountPath: "/opt/bitnami/airflow/lib"
- name: airflow-logs
mountPath: "/opt/bitnami/airflow_logs"
extraVolumes:
- name: airflow-dependencies
persistentVolumeClaim:
claimName: airflow-dependencies
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow:/opt/bitnami/airflow/lib"
resourcesPreset: "xlarge"
readinessProbe:
enabled: true
initialDelaySeconds: 35
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 10
successThreshold: 1
web:
replicaCount: 1
automountServiceAccountToken: true
extraVolumeMounts:
- name: airflow-dependencies
mountPath: "/opt/bitnami/airflow/lib"
- name: airflow-logs
mountPath: "/opt/bitnami/airflow_logs"
extraVolumes:
- name: airflow-dependencies
persistentVolumeClaim:
claimName: airflow-dependencies
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow:/opt/bitnami/airflow/lib"
resourcesPreset: "xlarge"
readinessProbe:
enabled: true
initialDelaySeconds: 35
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 10
successThreshold: 1
worker:
automountServiceAccountToken: true
extraVolumeMounts:
- name: airflow-dependencies
mountPath: "/opt/bitnami/airflow/lib"
- name: airflow-logs
mountPath: "/opt/bitnami/airflow_logs"
extraVolumes:
- name: airflow-dependencies
persistentVolumeClaim:
claimName: airflow-dependencies
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow:/opt/bitnami/airflow/lib"
resourcesPreset: "large"
extraDeploy:
- apiVersion: v1
kind: ConfigMap
Expand All @@ -167,4 +238,4 @@ extraDeploy:
requirements.txt: |-
faker==37.1.0
jsonpath-ng==1.7.0
pydantic==2.11.4
pydantic==2.11.4
4 changes: 2 additions & 2 deletions folio_data_anonymization/dags/anonymize_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import logging
from datetime import timedelta

from airflow.decorators import dag
from airflow.operators.empty import EmptyOperator
from airflow.sdk import dag
from airflow.providers.standard.operators.empty import EmptyOperator


try:
Expand Down
Loading