Skip to content

Commit 73ff5c6

Browse files
sec(data): persist admin-lockdown into postgres-customers Deployment + drill log (#63)
Follow-up to the 2026-06-06 apply of PR #61. The lockdown was applied to prod via `kubectl patch` (imperative). This makes the durable repo manifest match the live state so a future apply of postgres-customers.yaml does NOT silently revert the lockdown back to the vulnerable catch-all pg_hba: - mount the postgres-customers-hba ConfigMap at /etc/postgresql/pg_hba.conf (subPath) - start postgres with `-c hba_file=... -c password_encryption=scram-sha-256` - strategy RollingUpdate → Recreate (the RWO PVC deadlocks a rolling update on a Multi-Attach error; Recreate terminates the old pod first — brief downtime, acceptable for single-replica stateful) Runbook §9 Drill Log records the apply result: external admin (instanode_admin + instant_cust) now REJECTED at pg_hba (verified live; baseline reached scram), all in-cluster admin + customer usr_* paths preserved (verified), no rollback. Lists the operator follow-ups (durable pg-proxy role-gate; proxy-IP churn refresh; networkpolicy.yaml apply-exclude). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent e308143 commit 73ff5c6

2 files changed

Lines changed: 48 additions & 0 deletions

File tree

POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,3 +301,21 @@ is already shipped (audit doc):
301301
Together: this runbook removes the *unaudited external admin DROP capability*;
302302
the chokepoint ensures every *sanctioned* drop is recorded; the CI guard ensures a
303303
*new* unaudited drop call site cannot be merged.
304+
305+
---
306+
307+
## 9. Drill Log
308+
309+
| Date | Operator | Action | Result |
310+
|---|---|---|---|
311+
| 2026-06-06 | Claude (operator-authorized apply, "no customers, low blast radius") | **APPLIED to do-nyc3-instant-prod.** Merged PR #61 (squash, merge commit `78cb6677`) after fixing the manifest for two live findings (see below). Applied ConfigMap `postgres-customers-hba`; patched `deploy/postgres-customers` to mount it + `-c hba_file=/etc/postgresql/pg_hba.conf -c password_encryption=scram-sha-256`; changed strategy `RollingUpdate→Recreate` (RWO PVC Multi-Attach). Did NOT apply `networkpolicy.yaml` (verified NOT enforced in prod; applying as-is would default-deny the proxy path). | **SUCCESS.** External admin REJECTED at pg_hba (both `instanode_admin` + `instant_cust`, error names the SNAT'd proxy pod IP) — baseline beforehand reached scram (vector was OPEN). In-cluster admin preserved: provisioner `instant_cust` CREATE/DROP smoke OK, api/worker `instanode_admin` connect + `pg_database_size` OK, customer `usr_*` path still reaches scram. No rollback. |
312+
313+
**Manifest fixes made before apply (live pre-apply verification):**
314+
1. **`instanode_admin` was missing.** Prod has TWO superusers — `instanode_admin` (api/worker `CUSTOMER_DATABASE_URL`, the CONFIRMED truehomie vector) and `instant_cust` (provisioner `POSTGRES_CUSTOMERS_URL`). The original PR rejected only `instant_cust`; `instanode_admin` would have matched the catch-all customer allow → vector still open. Both now rejected.
315+
2. **pg-proxy SNAT defeats source-CIDR.** instant-pg-proxy (in-cluster, no hostNetwork) re-originates TCP, so external admin arrives SNAT'd to a proxy pod IP inside `10.0.0.0/8` — a plain `10.0.0.0/8 allow` matches it. Added proxy-pod-IP `reject` lines (`10.109.4.113`, `10.109.0.101`) ordered BEFORE the in-cluster allow. **Verified in the reject error message** (`rejects connection for host "10.109.0.101"`). ⚠️ Churn dependency, see §3a.
316+
317+
**Operator follow-ups created by this apply:**
318+
- **Ship the durable pg-proxy role-gate** (`PG_PROXY_DENIED_ROLES` in `InstaNode-dev/instant-pg-proxy`, staged per memory) so the closure no longer depends on the churning proxy-pod-IP reject lines in the ConfigMap.
319+
- **On any `instant-pg-proxy` reschedule:** refresh the two `host all instanode_admin/instant_cust <proxy-ip>/32 reject` lines in `postgres-customers-lockdown.yaml`, re-apply, `SELECT pg_reload_conf()`. Add a proxy-pod-restart alert.
320+
- **`k8s/data/postgres-customers.yaml` updated** to carry the mount/args/Recreate-strategy so a future repo apply does not silently revert the lockdown (shipped in the same follow-up PR).
321+
- The repo `apply.yml` workflow now includes `postgres-customers-lockdown.yaml` (safe — ConfigMap) but ALSO `networkpolicy.yaml`; running that workflow WOULD create the unenforced-today NetPol and default-deny the proxy path. Add it to the apply EXCLUDE list or add the pg-proxy ingress rule before anyone runs the workflow.

k8s/data/postgres-customers.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,13 @@ metadata:
1919
namespace: instant-data
2020
spec:
2121
replicas: 1
22+
# Recreate (NOT RollingUpdate): the PVC is ReadWriteOnce, so a rolling update
23+
# deadlocks on a Multi-Attach error (new pod can't attach the volume while the
24+
# old pod still holds it on another node). Recreate terminates the old pod
25+
# first → brief downtime, acceptable for a single-replica stateful workload.
26+
# (truehomie admin-lockdown apply 2026-06-06 hit + fixed this.)
27+
strategy:
28+
type: Recreate
2229
selector:
2330
matchLabels:
2431
app: postgres-customers
@@ -30,6 +37,17 @@ spec:
3037
containers:
3138
- name: postgres
3239
image: pgvector/pgvector:pg16
40+
# truehomie-db-drop admin lockdown (2026-06-03 → applied 2026-06-06):
41+
# start postgres with the custom pg_hba (mounted below) that rejects the
42+
# admin superuser roles (instanode_admin/instant_cust) from the public
43+
# pg-proxy path while preserving the in-cluster admin + customer usr_*
44+
# paths. See k8s/data/postgres-customers-lockdown.yaml +
45+
# POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md.
46+
args:
47+
- "-c"
48+
- "hba_file=/etc/postgresql/pg_hba.conf"
49+
- "-c"
50+
- "password_encryption=scram-sha-256"
3351
ports:
3452
- containerPort: 5432
3553
env:
@@ -45,6 +63,12 @@ spec:
4563
volumeMounts:
4664
- mountPath: /var/lib/postgresql/data
4765
name: data
66+
# Admin-lockdown pg_hba (ConfigMap postgres-customers-hba). subPath
67+
# mounts just the single file so PGDATA is untouched.
68+
- name: hba
69+
mountPath: /etc/postgresql/pg_hba.conf
70+
subPath: pg_hba.conf
71+
readOnly: true
4872
readinessProbe:
4973
exec:
5074
command: ["pg_isready", "-U", "instant_cust", "-d", "instant_customers"]
@@ -54,6 +78,12 @@ spec:
5478
- name: data
5579
persistentVolumeClaim:
5680
claimName: postgres-customers-pvc
81+
- name: hba
82+
configMap:
83+
name: postgres-customers-hba
84+
items:
85+
- key: pg_hba.conf
86+
path: pg_hba.conf
5787
---
5888
apiVersion: v1
5989
kind: Service

0 commit comments

Comments
 (0)