@@ -122,33 +122,60 @@ as the `redis` ClusterIP Service on 6379. Pod restart drops the cache
122122— that's intentional; the source of truth is Postgres, and counters
123123self-heal as windows expire.
124124
125- # ## 5b. Dask scaling topology (optional, future)
126-
127- The bg-scaling commit adds these in `deployment/kubernetes/` (the
128- upstream-style path, **not** the darwin-specific path) :
129-
130- - ` background-beat-deployment.yaml`
131- - ` background-celery-deployment.yaml`
132- - ` background-indexer-scheduler-deployment.yaml`
133- - ` dask-scheduler-service-deployment.yaml`
134- - ` dask-worker-deployment.yaml`
135- - ` docker-compose.dask-distributed.yml` (compose variant)
125+ # ## 5b. Dask scaling topology (opt-in, Darwin manifests ready)
126+
127+ The bg-scaling commit added 5 upstream-style manifests under
128+ ` deployment/kubernetes/` (legacy / reference tree; **not** what
129+ Darwin applies from — see AGENTS.md "Critical fact §9"). A later
130+ commit on this branch ported each one to Darwin conventions under
131+ ` darwin-kubernetes/` , with the right image registry, configmap /
132+ secrets wiring, REDIS_PASSWORD (optional), indexcpu node affinity,
133+ darwin/indexing toleration, and PVCs :
134+
135+ - ` darwin-kubernetes/background-beat-deployment.yaml`
136+ - ` darwin-kubernetes/background-celery-deployment.yaml`
137+ - ` darwin-kubernetes/background-indexer-scheduler-deployment.yaml`
138+ - ` darwin-kubernetes/dask-scheduler-service-deployment.yaml`
139+ - ` darwin-kubernetes/dask-worker-deployment.yaml`
140+
141+ Plus `deployment/docker_compose/docker-compose.dask-distributed.yml`
142+ (compose variant, for local reproduction of the remote-scheduler
143+ topology — not part of the prod deploy).
136144
137145Darwin currently runs `darwin-kubernetes/background-deployment.yaml`
138- (a single combined background pod). **The new manifests are NOT
139- applied automatically** and don't conflict with the existing
140- ` background-deployment.yaml` .
146+ (a single combined beat+celery+indexer pod via supervisord). **The new
147+ manifests are NOT applied automatically** by `kubectl apply -f
148+ darwin-kubernetes/` because the combined deployment is still in place
149+ — you apply each new file explicitly when you want to switch.
141150
142- If/when you want to switch Darwin to the new topology :
151+ To switch Darwin to the split topology :
143152
144- 1. Mirror the Redis env wiring from `darwin-kubernetes/background-deployment.yaml`
145- into each of the new manifests (none of them currently have
146- ` REDIS_PASSWORD` wired — see §10).
147- 2. Apply the new manifests, scale the old background-deployment to 0.
148- 3. Verify each pod boots and the worker logs show clean startup.
153+ ` ` ` bash
154+ # 1. Apply the new five (order doesn't matter; they self-discover
155+ # the scheduler Service once it's up).
156+ kubectl apply -f darwin-kubernetes/dask-scheduler-service-deployment.yaml
157+ kubectl apply -f darwin-kubernetes/dask-worker-deployment.yaml
158+ kubectl apply -f darwin-kubernetes/background-beat-deployment.yaml
159+ kubectl apply -f darwin-kubernetes/background-celery-deployment.yaml
160+ kubectl apply -f darwin-kubernetes/background-indexer-scheduler-deployment.yaml
161+
162+ # 2. Wait for all five to be Ready.
163+ kubectl get pods -l 'app in (background-beat,background-celery,background-indexer-scheduler,dask-scheduler,dask-worker)'
164+
165+ # 3. Once healthy + you've seen an indexing attempt dispatch through
166+ # the new dask-scheduler-service (check the indexer-scheduler
167+ # pod logs), scale the old combined deployment to 0:
168+ kubectl scale deploy/background-deployment --replicas=0
169+
170+ # 4. If anything goes wrong, scale back up:
171+ kubectl scale deploy/background-deployment --replicas=1
172+ # The split pods will keep running but no harm — only one set is
173+ # actually doing the work (whichever has --replicas > 0).
174+ ` ` `
149175
150- Out of scope for this PR; the relevant files are present so the
151- migration is a one-step "apply" later.
176+ Both deployments can coexist briefly during cutover, but **do NOT
177+ run both at non-zero replicas long-term** — two beat schedulers on
178+ the same Postgres broker fire every crontab task twice.
152179
153180---
154181
@@ -274,25 +301,21 @@ All features default OFF means even without revert, setting all
274301
275302# # 10. Known footguns
276303
277- # ## 10a. Bg-scaling k8s manifests don't have `REDIS_PASSWORD` wired
278-
279- The new `deployment/kubernetes/{background-celery,background-beat,
280- background-indexer-scheduler,dask-scheduler-service,dask-worker}-deployment.yaml`
281- files don't include the `REDIS_PASSWORD` env var pattern that the
282- existing `darwin-kubernetes/background-deployment.yaml` has.
283-
284- - **Today's impact: none.** Darwin runs the darwin-kubernetes/ tree,
285- not the upstream-style deployment/kubernetes/ tree, and none of the
286- bg-scaling processes currently invoke persona-mutating code paths
287- that would need Redis access.
288- - **Becomes a real concern if** Darwin adopts the new topology AND
289- ` PERSONA_CACHE_ENABLED=true` AND a future Celery task ever mutates a
290- Persona / Persona__User / Persona__UserGroup row. In that scenario
291- the mutation succeeds, the cache bust logs a warning, and `/persona`
292- serves stale data for up to 24h (TTL backstop).
293- - **Fix when relevant**: mirror the `REDIS_PASSWORD` `secretKeyRef`
294- block from `darwin-kubernetes/background-deployment.yaml` into each
295- of the new manifests.
304+ # ## 10a. ~~Bg-scaling k8s manifests don't have `REDIS_PASSWORD` wired~~ — RESOLVED
305+
306+ **Closed for the Darwin path.** The 5 ported manifests under
307+ ` darwin-kubernetes/` (added in `19335e31`) all wire `REDIS_PASSWORD`
308+ via `secretKeyRef` with `optional : true`, matching the existing
309+ ` darwin-kubernetes/background-deployment.yaml` pattern. So persona-
310+ cache invalidation from any future Celery / indexer-scheduler /
311+ dask-worker task path will work correctly once you switch to the
312+ split topology.
313+
314+ The upstream `deployment/kubernetes/*` files are still missing
315+ ` REDIS_PASSWORD` env wiring, but **Darwin doesn't apply from that
316+ tree** — it's reference-only (see AGENTS.md "Critical fact §9").
317+ Leave them alone unless/until you adopt the upstream-style
318+ deployment shape outside Darwin.
296319
297320# ## 10b. `backend/scripts/seed_assistants.py` bypasses persona-cache invalidation
298321
@@ -353,11 +376,14 @@ These need eyes — automated coverage doesn't catch them:
353376
354377# # 12. Branch contents at-a-glance
355378
356- 14 commits on top of `rajiv/add-claude` (PR # 45):
379+ 16 commits on top of `rajiv/add-claude` (PR # 45):
357380
358381```
382+ [ BG-scale] darwin-kubernetes: port split-background manifests + lock convention in AGENTS.md
359383[ BG-scale] Scale indexing via remote Dask scheduler topology
360384
385+ [ Docs] docs: add MIGRATION.md covering Redis / bg-scaling / UX
386+
361387[ UX] Gallery: column picker as dropdown to match Sort
362388[ UX] Gallery: user-controllable column count (segmented control, persists)
363389[ UX] Show document-set names on assistant cards (was: count only)
@@ -374,4 +400,4 @@ These need eyes — automated coverage doesn't catch them:
374400[ Redis] docs: add Redis caching & scaling plan
375401```
376402
377- Total: **45 files changed, +5857 / −499**. 63 unit tests pass.
403+ Total: **51 files changed, +6372 / −499**. 63 unit tests pass.
0 commit comments