Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions k8s/APPLY-CHECKLIST.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ This checklist applies to:
Per CLAUDE.md rule 15: **this repo has no auto-apply by design.** Manifest
apply is a deliberate, human-driven step.

> **Stateful data-tier manifests** (`k8s/data/*` — postgres-customers,
> mongodb, redis-provision, nats: PVCs, NetworkPolicy, pg_hba lockdown,
> PriorityClass/PDBs) have their own apply order + verification gates in
> **`k8s/DATA-TIER-APPLY-RUNBOOK.md`**. Use that for S1/S2/R6/R7. This file
> is for the api/worker/provisioner Deployment manifests only.

---

## Hard rules
Expand Down
292 changes: 292 additions & 0 deletions k8s/DATA-TIER-APPLY-RUNBOOK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
# Data-Tier Apply Runbook — `instant-data` stateful hardening

> Companion to `k8s/APPLY-CHECKLIST.md` (which covers the api/worker/
> provisioner **Deployment** manifests). This runbook covers the **stateful
> data-tier** manifests in `k8s/data/` — the ones that hold real customer data
> and therefore must be applied deliberately, in order, in a maintenance
> window. **This repo has no auto-apply (CLAUDE.md rule 15).**
>
> **CRITICAL: never `kubectl apply -f k8s/app.yaml`** (stale vs prod — strips
> `imagePullSecrets`, resets images). The files below are individually
> applyable; apply them one at a time and read each `--dry-run=server` diff.

This runbook is the operator apply checklist for four changes that are
**committed but NOT yet applied to prod** (infra has no auto-apply):

| Tag | File | What it does | Customer-visible risk if mis-applied |
|---|---|---|---|
| **S1** | `k8s/data/postgres-customers-lockdown.yaml` + the patched `postgres-customers.yaml` | pg_hba that REJECTS the admin/superuser roles (`instanode_admin`, `instant_cust`) from the public path; preserves `usr_*` customer roles | LOW — admin-only reject; customers unaffected. Detailed runbook: `POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md` |
| **S2** | `k8s/data/networkpolicy.yaml` | ingress NetworkPolicy: only provisioner/migrator/worker (+ nats-proxy for 4222) may reach the data pods | **HIGH — can break ALL customers** if the pg-proxy allow-rule is missing. See §S2 below. |
| **R6** | `k8s/data/nats.yaml` | JetStream `emptyDir{}` → PVC (`nats-jetstream-pvc`, 5Gi) so queue data survives restarts | LOW — but the migration step (§R6) drains existing in-memory JetStream state. |
| **R7** | `k8s/data/stateful-priority.yaml` + resource requests in `{postgres-customers,mongodb,redis-provision}.yaml` | PriorityClass `instant-data-critical` + one PDB per stateful pod + right-sized requests (BestEffort → Burstable) | LOW — eviction-ordering + drain-gating only; no data-path change. |

---

## Pre-flight (every apply below)

```bash
# 1. Confirm context — NEVER run against the wrong cluster.
kubectl config current-context
# Expected for prod: do-nyc3-instant-prod

# 2. Snapshot current data-tier state for rollback reference.
kubectl get pods,pvc,netpol,pdb,priorityclass -n instant-data -o wide
kubectl get priorityclass instant-data-critical 2>/dev/null || echo "no priorityclass yet"

# 3. Server-side dry-run EACH file and read the diff line by line.
kubectl apply --dry-run=server -f <file>
```

Apply in a **maintenance window**. The recommended order is **R7 → R6 → S1 →
S2** — least-risky and reversible first, the customer-breaking NetworkPolicy
LAST so it is the freshest thing in your head if customers report errors.

---

## R7 — PriorityClass + PDBs + resource requests (apply FIRST)

Pure eviction-protection; no data-path change. Two parts.

**Part A — the PriorityClass + PDBs:**

```bash
kubectl apply --dry-run=server -f k8s/data/stateful-priority.yaml # read diff
kubectl apply -f k8s/data/stateful-priority.yaml

# Verify
kubectl get priorityclass instant-data-critical
kubectl get pdb -n instant-data
# Expect 4 PDBs (postgres-customers / mongodb / redis-provision / nats),
# each ALLOWED DISRUPTIONS reading 0 (single replica, minAvailable 1 → the one
# pod is "not disruptable" by voluntary eviction, which is the point).
```

**Part B — the resource requests + the priorityClassName patch.** The requests
ship INSIDE each workload manifest (`mongodb.yaml`, `redis-provision.yaml`,
`postgres-customers.yaml`). Re-applying those manifests rolls the pod (Recreate
strategy → brief downtime per workload — do this in the window). Because the
PriorityClass is deliberately NOT inlined in the Deployments (so the priority
rollout is one auditable step), patch `priorityClassName` in the same roll:

```bash
# postgres-customers carries the S1 pg_hba mount already — apply it as part of
# S1 below (§S1) to avoid two rolls. For mongodb + redis-provision, roll now:
for w in mongodb redis-provision; do
kubectl apply --dry-run=server -f k8s/data/$w.yaml # read diff: only resources{} added
kubectl apply -f k8s/data/$w.yaml
kubectl patch deploy/$w -n instant-data --type=merge \
-p '{"spec":{"template":{"spec":{"priorityClassName":"instant-data-critical"}}}}'
kubectl rollout status deploy/$w -n instant-data --timeout=180s
done

# Verify QoS flipped from BestEffort → Burstable and priority is set:
kubectl get pod -n instant-data -l app=mongodb \
-o jsonpath='{.items[0].status.qosClass}{" "}{.items[0].spec.priorityClassName}{"\n"}'
# Expect: Burstable instant-data-critical
```

> nats already declared requests; just patch its `priorityClassName` (do it in
> the R6 roll below so nats only restarts once).

**Rollback R7:** `kubectl delete -f k8s/data/stateful-priority.yaml` removes the
PDBs + PriorityClass (pods keep running; priorityClassName on a pod referencing
a deleted class is harmless until the next reschedule — re-patch to remove).

---

## R6 — NATS JetStream emptyDir → PVC

`k8s/data/nats.yaml` now declares `nats-jetstream-pvc` (5Gi, default
StorageClass = `do-block-storage` on DOKS) and mounts it at `/data/jetstream`.

> **Data note:** pre-cutover JetStream state lived in `emptyDir{}` and is
> **already non-durable** (every prior restart wiped it). Switching to the PVC
> does NOT migrate old in-memory state — there is nothing durable to migrate.
> Existing `legacy_open` queue resources reconnect + re-establish streams on
> reconnect (same as any nats restart today). Schedule during low queue
> traffic; clients reconnect automatically.

```bash
kubectl apply --dry-run=server -f k8s/data/nats.yaml # read diff: PVC added, volume swapped

# The Deployment uses strategy.type: Recreate (RWO volume — required). Applying
# rolls the pod: old pod terminates, PVC binds, new pod starts on /data/jetstream.
kubectl apply -f k8s/data/nats.yaml

# Patch the PriorityClass in the SAME context so nats restarts once (R7 part B):
kubectl patch deploy/nats -n instant-data --type=merge \
-p '{"spec":{"template":{"spec":{"priorityClassName":"instant-data-critical"}}}}'

kubectl rollout status deploy/nats -n instant-data --timeout=180s

# Verify the PVC bound and JetStream is on it:
kubectl get pvc nats-jetstream-pvc -n instant-data # STATUS Bound
kubectl exec -n instant-data deploy/nats -- ls -la /data/jetstream
kubectl get pod -n instant-data -l app=nats \
-o jsonpath='{.items[0].status.qosClass}{" "}{.items[0].spec.priorityClassName}{"\n"}'
# Expect a jetstream dir on the mounted PVC + Burstable instant-data-critical.

# Durability proof (the whole point): publish to a stream, delete the pod,
# confirm the message survives the restart.
# kubectl exec ... nats pub test.durability hello ; kubectl delete pod -l app=nats ;
# (after Ready) nats stream info / consumer next — message must still be there.
```

**Rollback R6:** revert the `nats.yaml` change and re-apply (volume goes back to
`emptyDir{}`). The PVC can be left bound (it costs a few cents) or deleted with
`kubectl delete pvc nats-jetstream-pvc -n instant-data` once nats is off it.

---

## S1 — postgres-customers admin lockdown

Full procedure (root-cause, role analysis, proxy-IP SNAT caveat, the live
pg_hba stopgap) is in **`POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md`** — follow THAT
for S1; this is the short pointer + the verification gate.

Apply order: the `postgres-customers-hba` ConfigMap (in
`postgres-customers-lockdown.yaml`) FIRST, then the patched
`postgres-customers.yaml` (which mounts the hba file via subPath + sets
`hba_file=/etc/postgresql/pg_hba.conf` and now also carries the R7
resource requests). Roll postgres-customers ONCE for both:

```bash
kubectl apply -f k8s/data/postgres-customers-lockdown.yaml # ConfigMap (+ any docs)
kubectl apply -f k8s/data/postgres-customers.yaml # mounts hba + R7 requests
kubectl patch deploy/postgres-customers -n instant-data --type=merge \
-p '{"spec":{"template":{"spec":{"priorityClassName":"instant-data-critical"}}}}'
kubectl rollout status deploy/postgres-customers -n instant-data --timeout=300s
```

> ⚠️ Read `POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md §3a` BEFORE applying — the
> pg-proxy SNATs customer traffic to a pod IP, so the lockdown rejects the
> admin role BY ROLE NAME (`instanode_admin` AND `instant_cust`), not by source
> IP. If the runbook's proxy-pod-IP reject lines are stale, fix them first.

### S1 verification gate (the load-bearing check)

After the roll, the **external admin path MUST FAIL** while in-cluster admin and
customer paths keep working:

```bash
# (a) EXTERNAL admin connect MUST be REJECTED by pg_hba (NOT a password prompt
# that proceeds). SAFE: connection-rejection probe only — no SQL/DDL.
PGCONNECT_TIMEOUT=5 psql \
"host=pg.instanode.dev port=5432 user=instant_cust dbname=instant_customers sslmode=require" \
-c '\q' 2>&1 | head
# EXPECT: 'no pg_hba.conf entry for host ... rejected' (or FATAL 28000 from the
# proxy). FAILURE TO REJECT = lockdown not in effect — STOP, investigate.

# Repeat for the OTHER admin role (the confirmed truehomie role):
PGCONNECT_TIMEOUT=5 psql \
"host=pg.instanode.dev port=5432 user=instanode_admin dbname=instant_customers sslmode=require" \
-c '\q' 2>&1 | head
# EXPECT: rejected.

# (b) In-cluster admin still works (provisioner path is intact):
kubectl exec -n instant-data deploy/postgres-customers -- \
psql -U instant_cust -d instant_customers -tAc 'select 1;' # expect: 1

# (c) A real customer usr_<token> still connects through the public path
# (regression check — the lockdown must NOT catch customer roles).
# Use a known test-tenant DSN from the dashboard / a /db/new claim.
```

**Rollback S1:** see `POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md §Rollback` (revert
the manifest, the pod falls back to the stock catch-all pg_hba; the live file
backup is at `$PGDATA/pg_hba.conf.bak.2026-06-03`).

---

## S2 — data-tier ingress NetworkPolicy (apply LAST — highest risk)

`k8s/data/networkpolicy.yaml` adds a default-deny ingress policy per data pod,
allowing ONLY provisioner / migrator / worker (+ nats-proxy for 4222/8222).

### ⚠️ The pg-proxy allow-rule — this is the customer-breaking trap

The `postgres-customers-ingress` policy as committed **does NOT list
`instant-pg-proxy`** — the allow-rule for it is **DORMANT** (commented out at
`networkpolicy.yaml` lines ~88–103). If the public customer connect path is
`pg.instanode.dev → ingress-nginx tcp-services → instant-pg-proxy
(instant ns) → postgres-customers`, then applying this policy AS-IS
**default-denies the proxy and BREAKS EVERY CUSTOMER POSTGRES CONNECTION.**

`POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md §L4` records that as of 2026-06-06 the
NetworkPolicy is **NOT applied in prod** and that applying it as-is would
default-deny + break the proxy path. **Do not apply S2 until you have:**

1. **Confirmed the live proxy deployment's namespace + pod labels:**
```bash
kubectl get pods -A -l app=instant-pg-proxy -o wide
# (the proxy manifest lives in the separate InstaNode-dev/instant-pg-proxy
# repo, NOT here — read the real ns + labels off the live cluster.)
```
2. **Uncommented + edited the dormant pg-proxy block** in
`networkpolicy.yaml` (lines ~88–103) to match those real ns/labels.
3. **Confirmed Cilium (the CNI) actually enforces NetworkPolicy** in this
cluster (`kubectl get ds -n kube-system | grep cilium`).

Only then:

```bash
kubectl apply --dry-run=server -f k8s/data/networkpolicy.yaml # read diff
kubectl apply -f k8s/data/networkpolicy.yaml
```

### S2 verification gate (must run IMMEDIATELY after apply)

```bash
# (a) Legit in-cluster caller (provisioner) still reaches postgres-customers:
kubectl exec -n instant-infra deploy/instant-provisioner -- \
sh -c 'nc -z -w5 postgres-customers.instant-data.svc.cluster.local 5432 && echo OK'
# Expect: OK

# (b) THE CUSTOMER PATH still works — connect a real customer usr_<token>
# through pg.instanode.dev (same DSN as S1 check (c)). If this now FAILS
# where it worked pre-apply, the pg-proxy allow-rule is missing/wrong:
# kubectl delete -f k8s/data/networkpolicy.yaml # IMMEDIATE rollback
# then fix the dormant pg-proxy block and re-apply.

# (c) The 4 NetworkPolicies are present:
kubectl get networkpolicy -n instant-data
# Expect: postgres-customers-ingress, redis-provision-ingress, mongodb-ingress,
# nats-ingress.
```

**Rollback S2 (do this fast if customers report connection errors):**

```bash
kubectl delete -f k8s/data/networkpolicy.yaml
# Removing the policies returns the pods to allow-all ingress (the pre-apply
# state). No data loss; instant effect.
```

---

## Apply-order summary

| # | Tag | Command | Verify | Reversible |
|---|---|---|---|---|
| 1 | R7-A | `kubectl apply -f k8s/data/stateful-priority.yaml` | `kubectl get pdb,priorityclass -n instant-data` | `kubectl delete -f …` |
| 2 | R7-B | apply `mongodb.yaml`,`redis-provision.yaml` + patch `priorityClassName` | QoS = Burstable | re-apply prior manifest |
| 3 | R6 | `kubectl apply -f k8s/data/nats.yaml` (+ priorityClassName patch) | PVC Bound + durability publish/restart test | revert manifest |
| 4 | S1 | apply lockdown ConfigMap + `postgres-customers.yaml` (+ patch) | **external admin psql REJECTED** | per lockdown runbook |
| 5 | S2 | **edit dormant pg-proxy rule FIRST**, then apply `networkpolicy.yaml` | provisioner reaches pg AND customer path works | `kubectl delete -f …` |

After every step, sanity-check the platform hot path:

```bash
curl -sS https://api.instanode.dev/healthz | jq .
curl -sS https://api.instanode.dev/readyz | jq . # data-tier deep readiness
```

---

## Related

- `k8s/APPLY-CHECKLIST.md` — the api/worker/provisioner Deployment apply rules.
- `POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md` — the full S1 procedure + root cause.
- `NATS-AUTH-RUNBOOK.md` — NATS operator-mode key generation (separate from R6).
- `k8s/data/networkpolicy.yaml` — the S2 policy with the dormant pg-proxy block.
- CLAUDE.md rule 15 — why this repo has no auto-apply.
11 changes: 11 additions & 0 deletions k8s/data/mongodb.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,17 @@ spec:
image: mongo:7
ports:
- containerPort: 27017
# R7 (2026-06-10): requests added so this pod is Burstable, not
# BestEffort (BestEffort = first evicted under the cluster's memory
# overcommit). WiredTiger sizes its cache to 50% of (RAM - 1GB) by
# default; the 1Gi limit keeps that bounded for the free-tier nosql
# footprint. Bump both if dedicated/Team mongodb lands here.
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: root
Expand Down
47 changes: 45 additions & 2 deletions k8s/data/nats.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,43 @@ data:
resolver: MEMORY

---
# JetStream durability (R6, 2026-06-10). Before this PVC, the JetStream
# store_dir (/data/jetstream) was an emptyDir{} — every pod restart (the
# Recreate rollout, an OOMKill, or a node drain) WIPED all stream + consumer
# state and every persisted message. For a queue product that promises
# "queue data survives pod restarts" that is a durability lie. This PVC backs
# /data/jetstream with real block storage so stream/consumer state + messages
# persist across restarts.
#
# 5Gi is conservative — sized to the JetStream config's max_file_store: 50GB
# CEILING, not its current footprint; today's queue volume is tiny. Grow with
# `kubectl edit pvc nats-jetstream-pvc` (do-block-storage / EBS support online
# expansion when allowVolumeExpansion=true on the StorageClass) if file_store
# usage approaches the request. Do NOT pre-allocate 50Gi — that is the hard
# ceiling, not the working set.
#
# storageClassName is OMITTED → falls back to the cluster default. On DOKS prod
# that default is `do-block-storage` (confirmed in k8s/self-hosted-runner.yaml
# :152 + k8s/data/postgres-customers.yaml, which use the same omit-for-default
# convention). Local dev (Rancher Desktop / k3s) gets `local-path` via the
# cluster default there, or layer a kustomize overlay setting
# storageClassName: local-path. Block storage is RWO single-attach, which is
# why the Deployment below MUST stay strategy.type: Recreate (a RollingUpdate
# would Multi-Attach-deadlock the new pod against the old holder — same
# constraint postgres-customers.yaml documents).
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nats-jetstream-pvc
namespace: instant-data
labels:
app: nats
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
Expand Down Expand Up @@ -172,9 +209,15 @@ spec:
# Secret + restart. Once the Secret exists the pod converges.
optional: false
- name: rendered-conf
emptyDir: {}
emptyDir: {} # render scratch only — operator.conf is re-rendered
# from the nats-operator Secret by the initContainer on
# every pod start, so this one stays ephemeral by design.
- name: jetstream-data
emptyDir: {} # TODO: convert to PVC for prod durability
# R6 (2026-06-10): was emptyDir{} — now PVC-backed so JetStream
# stream/consumer state + persisted messages survive pod restarts.
# See the nats-jetstream-pvc PersistentVolumeClaim above.
persistentVolumeClaim:
claimName: nats-jetstream-pvc
---
apiVersion: v1
kind: Service
Expand Down
Loading
Loading