darwin-kubernetes: port split-background manifests + lock convention in AGENTS.md

rajivml · claude · rajivml · commit 017f1276de96 · 2026-06-03T14:12:36.000+05:30
The bg-scaling commit (03d1649) added 5 new k8s manifests under `deployment/kubernetes/` that split the combined background pod into beat / celery / indexer-scheduler / dask-scheduler / dask-worker. But Darwin doesn't apply from `deployment/kubernetes/` — its prod manifests live under `darwin-kubernetes/`, and the two trees aren't kept in sync. Porting all five into `darwin-kubernetes/` with Darwin conventions: - Image registry sfbrdevhelmweacr.azurecr.io/danswer/danswer-backend - configMapRef env-configmap, secretKeyRef danswer-secrets - POSTGRES_USER / POSTGRES_PASSWORD wired everywhere that talks to PG - REDIS_PASSWORD wired as optional secretKeyRef (the latent footgun flagged in MIGRATION.md §10a is now closed for the Darwin path) - indexcpu nodeAffinity + darwin/indexing toleration on every indexing-side pod (celery, indexer-scheduler, dask-scheduler, dask-worker); beat stays on the default pool (lightweight) - dynamic-pvc + file-connector-pvc volume mounts where any task may stage files The existing `darwin-kubernetes/background-deployment.yaml` (combined beat+celery+indexer via supervisord) is intentionally LEFT IN PLACE — the split is an opt-in rollout, not a forced cutover. To switch: apply the new five, verify the new pods are healthy, scale the combined deployment to 0. Also lock the convention in AGENTS.md so this doesn't recur: - New divergence-table row noting darwin-kubernetes/ is source of truth for prod. - New "Critical facts that bite" §9 documenting the two-tree split, when to touch which, and the per-pod adaptation checklist (image registry, configmap, secrets, REDIS_PASSWORD, affinity, PVCs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/AGENTS.md b/AGENTS.md
@@ -75,6 +75,7 @@ moved on substantially. This table is the explicit map.
 | Test buckets | `backend/tests/{unit,external_dependency_unit,integration}` + Playwright e2e | No comparable structure here. Most code lacks tests; add tests with the change if practical, otherwise note in PR. |
 | Plan template | The "Creating a Plan" section in their `CLAUDE.md` (Issues / Notes / Strategy / Tests) | Useful template; can be borrowed for non-trivial changes here too. |
 | Frontend stack | Next.js 15+, React 18+ | Next.js 14.2.x (App Router), React 18 |
+| K8s manifest path | `deployment/kubernetes/*` is what upstream documents | **`darwin-kubernetes/*` is the source of truth for the Darwin prod cluster.** `deployment/kubernetes/*` is upstream legacy / scratch — Darwin doesn't apply from there. New manifests for Darwin go in `darwin-kubernetes/`. See critical fact §9. |
 
 **Rule of thumb when reading upstream code or upstream guidance:** assume
 it doesn't apply unless you can verify the same construct exists here.
@@ -340,6 +341,35 @@ auto-parse entirely with a raw `requests.get` against the
 `/drives/{drive_id}/items/{item_id}/content` endpoint using the bearer
 token. Don't reintroduce the lossy re-serialization.
 
+### 9. `darwin-kubernetes/` is the source of truth for the Darwin cluster
+
+The repo has two parallel k8s manifest trees and they are **not** kept
+in sync:
+
+| Path | What it is | When to touch |
+|---|---|---|
+| `darwin-kubernetes/*.yaml` | **The actual manifests applied to Darwin's AKS cluster (the `darwin` kube context).** Image registry is `sfbrdevhelmweacr.azurecr.io/...`, configmap is `env-configmap`, secrets is `danswer-secrets`, indexing pods have `indexcpu`-pool affinity + `darwin/indexing` toleration, env vars come from the Darwin configmap. | **Edit here for any prod-affecting change**, including new deployments. |
+| `deployment/kubernetes/*.yaml` | Upstream-style manifests inherited from Onyx / authored to match the OSS docker-compose. Generic image (`danswer/danswer-backend:latest`), no Azure-specific affinity / tolerations, no Darwin-specific configmap wiring. | Reference only — not deployed to Darwin. Useful for seeing the "upstream shape" of a new component before adapting it to `darwin-kubernetes/`. |
+
+When upstream (or a branch like `feature/backgroundscaling`) adds a
+new manifest in `deployment/kubernetes/`, the corresponding
+`darwin-kubernetes/` version must be hand-ported with:
+
+- Image: `sfbrdevhelmweacr.azurecr.io/danswer/danswer-backend:<tag>`
+- `envFrom: configMapRef name: env-configmap`
+- POSTGRES_USER / POSTGRES_PASSWORD via `secretKeyRef name: danswer-secrets`
+- REDIS_PASSWORD via `secretKeyRef name: danswer-secrets, optional: true`
+  (so unauth'd in-cluster Redis still works)
+- For indexing-related pods: `nodeAffinity` on `agentpool=indexcpu` +
+  `tolerations` for `darwin/indexing/NoSchedule` + `dynamic-pvc` /
+  `file-connector-pvc` volume mounts.
+
+A drop-in port that misses any of these will boot in Darwin but
+mis-route, miss secrets, or end up on the wrong node pool. The
+existing `darwin-kubernetes/background-deployment.yaml` and
+`api_server-service-deployment.yaml` are the canonical templates for
+the conventions.
+
 ---
 
 ## Common workflows
diff --git a/darwin-kubernetes/background-beat-deployment.yaml b/darwin-kubernetes/background-beat-deployment.yaml
@@ -0,0 +1,73 @@
+# Celery beat — periodic-task scheduler. (Darwin variant.)
+#
+# MUST be a singleton: two beats on the same broker fire every
+# crontab entry twice. `Recreate` strategy guarantees no overlap
+# during rollout (the old pod is terminated before the new one
+# starts), at the cost of a brief beat outage during deploy. That's
+# acceptable because beat-fired tasks are all "check / catch-up"
+# style — missing one cycle is harmless, the next one cleans up.
+#
+# Beat is light (~100MB RSS); doesn't need the indexcpu node pool
+# the indexing-side pods sit on. Stays on the default pool.
+#
+# This deployment is part of the split-background topology
+# (beat / celery / indexer-scheduler / dask-scheduler / dask-worker).
+# Apply alongside the other four to retire `background-deployment.yaml`.
+# Keep the old combined deployment in place during cutover so you can
+# scale it to 0 once the new pods are healthy.
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: background-beat-deployment
+spec:
+  replicas: 1
+  strategy:
+    type: Recreate
+  selector:
+    matchLabels:
+      app: background-beat
+  template:
+    metadata:
+      labels:
+        app: background-beat
+    spec:
+      containers:
+      - name: beat
+        image: sfbrdevhelmweacr.azurecr.io/danswer/danswer-backend:vha-5
+        imagePullPolicy: IfNotPresent
+        command:
+          - celery
+          - -A
+          - danswer.background.celery.celery_run:celery_app
+          - beat
+          - --loglevel=INFO
+        env:
+        - name: POSTGRES_USER
+          valueFrom:
+            secretKeyRef:
+              key: postgres_user
+              name: danswer-secrets
+        - name: POSTGRES_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              key: postgres_password
+              name: danswer-secrets
+        # Beat itself doesn't mutate personas, but stays wired for
+        # parity with the other split-background pods. Optional secret
+        # so an unauth'd in-cluster Redis still works.
+        - name: REDIS_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              key: redis_password
+              name: danswer-secrets
+              optional: true
+        envFrom:
+        - configMapRef:
+            name: env-configmap
+        resources:
+          requests:
+            cpu: "50m"
+            memory: "128Mi"
+          limits:
+            cpu: "200m"
+            memory: "256Mi"
diff --git a/darwin-kubernetes/background-celery-deployment.yaml b/darwin-kubernetes/background-celery-deployment.yaml
@@ -0,0 +1,102 @@
+# Celery worker — executes periodic / on-demand tasks (prune, sync,
+# retention, deletion, etc.). (Darwin variant.)
+#
+# Horizontally scalable. Beat fires tasks → broker (Postgres) → any
+# worker pulls the task. Postgres-level row locks plus the
+# per-cc-pair advisory locks (DELE/RETENTIO namespaces) prevent
+# duplicate execution of the same task even when many workers race.
+#
+# Indexing is NOT in this deployment — see
+# background-indexer-scheduler-deployment.yaml. Slack listener is
+# still in its own deployment.
+#
+# Pool=threads (not prefork) is required because of the Celery +
+# SQLAlchemy SIGSEGV issue documented at the top of supervisord.conf.
+#
+# Lives on the indexcpu node pool with the same toleration as the old
+# combined background deployment — connector deletion / retention can
+# do heavy file-store work and that's where the dynamic / file-
+# connector PVCs are wired.
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: background-celery-deployment
+spec:
+  replicas: 2
+  selector:
+    matchLabels:
+      app: background-celery
+  template:
+    metadata:
+      labels:
+        app: background-celery
+    spec:
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: agentpool
+                operator: In
+                values:
+                - indexcpu
+      containers:
+      - name: celery
+        image: sfbrdevhelmweacr.azurecr.io/danswer/danswer-backend:vha-5
+        imagePullPolicy: IfNotPresent
+        command:
+          - celery
+          - -A
+          - danswer.background.celery.celery_run:celery_app
+          - worker
+          - --pool=threads
+          - --autoscale=3,10
+          - --loglevel=INFO
+        env:
+        - name: POSTGRES_USER
+          valueFrom:
+            secretKeyRef:
+              key: postgres_user
+              name: danswer-secrets
+        - name: POSTGRES_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              key: postgres_password
+              name: danswer-secrets
+        # Celery tasks may transitively touch persona-cache invalidation
+        # via shared db/ code paths (e.g. cleanup tasks calling into
+        # functions imported from db/persona.py). Wired even when the
+        # task path doesn't currently need it — cheap, fail-open.
+        - name: REDIS_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              key: redis_password
+              name: danswer-secrets
+              optional: true
+        envFrom:
+        - configMapRef:
+            name: env-configmap
+        volumeMounts:
+        - name: dynamic-storage
+          mountPath: /home/storage
+        - name: file-connector-storage
+          mountPath: /home/file_connector_storage
+        resources:
+          requests:
+            cpu: "200m"
+            memory: "512Mi"
+          limits:
+            cpu: "1"
+            memory: "2Gi"
+      tolerations:
+      - effect: NoSchedule
+        key: darwin
+        operator: Equal
+        value: indexing
+      volumes:
+      - name: dynamic-storage
+        persistentVolumeClaim:
+          claimName: dynamic-pvc
+      - name: file-connector-storage
+        persistentVolumeClaim:
+          claimName: file-connector-pvc
diff --git a/darwin-kubernetes/background-indexer-scheduler-deployment.yaml b/darwin-kubernetes/background-indexer-scheduler-deployment.yaml
@@ -0,0 +1,96 @@
+# Indexer-scheduler — runs the polling loop in
+# `danswer/background/update.py`. Every 10s it scans Postgres for
+# cc-pairs due for re-indexing, creates index_attempt rows, and
+# submits `run_indexing_entrypoint` tasks to the remote Dask
+# scheduler service. The actual indexing CPU/RAM work happens on
+# dask-worker pods, NOT here. (Darwin variant.)
+#
+# Singleton (replicas: 1, strategy: Recreate). Scaling indexing
+# concurrency = scaling dask-worker pods, not this one. Two
+# scheduler loops would race on `index_attempt` table inserts.
+#
+# DASK_SCHEDULER_ADDRESS env switches `update.py` from in-process
+# LocalCluster to the remote-scheduler client mode — this is the
+# topology flip the bg-scaling work introduced.
+#
+# Lives on the indexcpu pool because it polls the indexing state and
+# needs the file-connector volume mounts (some connectors stage files
+# locally before handing off to dask-worker).
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: background-indexer-scheduler-deployment
+spec:
+  replicas: 1
+  strategy:
+    type: Recreate
+  selector:
+    matchLabels:
+      app: background-indexer-scheduler
+  template:
+    metadata:
+      labels:
+        app: background-indexer-scheduler
+    spec:
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: agentpool
+                operator: In
+                values:
+                - indexcpu
+      containers:
+      - name: indexer-scheduler
+        image: sfbrdevhelmweacr.azurecr.io/danswer/danswer-backend:vha-5
+        imagePullPolicy: IfNotPresent
+        command: ["python", "danswer/background/update.py"]
+        env:
+        - name: DASK_SCHEDULER_ADDRESS
+          value: "tcp://dask-scheduler-service:8786"
+        - name: CURRENT_PROCESS_IS_AN_INDEXING_JOB
+          value: "true"
+        - name: POSTGRES_USER
+          valueFrom:
+            secretKeyRef:
+              key: postgres_user
+              name: danswer-secrets
+        - name: POSTGRES_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              key: postgres_password
+              name: danswer-secrets
+        - name: REDIS_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              key: redis_password
+              name: danswer-secrets
+              optional: true
+        envFrom:
+        - configMapRef:
+            name: env-configmap
+        volumeMounts:
+        - name: dynamic-storage
+          mountPath: /home/storage
+        - name: file-connector-storage
+          mountPath: /home/file_connector_storage
+        resources:
+          requests:
+            cpu: "200m"
+            memory: "512Mi"
+          limits:
+            cpu: "500m"
+            memory: "1Gi"
+      tolerations:
+      - effect: NoSchedule
+        key: darwin
+        operator: Equal
+        value: indexing
+      volumes:
+      - name: dynamic-storage
+        persistentVolumeClaim:
+          claimName: dynamic-pvc
+      - name: file-connector-storage
+        persistentVolumeClaim:
+          claimName: file-connector-pvc
diff --git a/darwin-kubernetes/dask-scheduler-service-deployment.yaml b/darwin-kubernetes/dask-scheduler-service-deployment.yaml
diff --git a/darwin-kubernetes/dask-worker-deployment.yaml b/darwin-kubernetes/dask-worker-deployment.yaml