|
| 1 | +# postgres-customers Admin Lockdown Runbook |
| 2 | + |
| 3 | +> **Status: DORMANT. Operator-applied in a maintenance window. This repo has no |
| 4 | +> auto-apply (rule 15). Nothing here runs automatically.** |
| 5 | +> |
| 6 | +> **HIGH BLAST RADIUS — touches the shared customer-Postgres data tier. Requires |
| 7 | +> USER/OPERATOR review and a maintenance window before apply.** |
| 8 | +> |
| 9 | +> Closes the OPEN root cause of the **truehomie-db DROP incident (2026-06-03)**: |
| 10 | +> a direct, public-internet admin connection to `postgres-customers` that could |
| 11 | +> `DROP DATABASE` with no `audit_log` row. Memory: |
| 12 | +> `project_truehomie_db_drop_incident_2026_06_03`. Audit: |
| 13 | +> `docs/ci/DATA-INTEGRITY-DROP-PATH-AUDIT.md` (§truehomie root-cause hypotheses, |
| 14 | +> H1 = confirmed vector). |
| 15 | +
|
| 16 | +--- |
| 17 | + |
| 18 | +## 1. What was confirmed vs hypothesis (verify-don't-assert) |
| 19 | + |
| 20 | +### CONFIRMED — via config + SAFE checks (2026-06-06) |
| 21 | + |
| 22 | +| # | Finding | How confirmed | |
| 23 | +|---|---|---| |
| 24 | +| C1 | `pg.instanode.dev` → `152.42.154.144` — the SAME IP as `api`/`redis`/`mongo.instanode.dev` (the shared DO LoadBalancer fronting ingress-nginx). pg is **publicly DNS-routed**. | `dig +short pg.instanode.dev` (and the three siblings) | |
| 25 | +| C2 | `pg.instanode.dev:5432` **answers a TCP handshake from the public internet.** | `nc -z -w5 pg.instanode.dev 5432` → "succeeded" (**TCP only — no auth, no SQL, no DDL attempted**) | |
| 26 | +| C3 | The `postgres-customers` pod runs the **stock `pgvector/pgvector:pg16` image** with **NO** custom `pg_hba.conf` / `postgresql.conf` / `POSTGRES_HOST_AUTH_METHOD` and **no config volume mount** → the image default `host all all all scram-sha-256` (a **catch-all**) is in effect. The admin/superuser role (`instant_cust`, the `POSTGRES_USER`) can authenticate from any source that reaches the listener. | `k8s/data/postgres-customers.yaml` (only the data PVC is mounted; no `command`/`args`/`POSTGRES_HOST_AUTH_METHOD`); no `pg_hba.conf`/`postgresql.conf` anywhere in the infra repo | |
| 27 | +| C4 | The `postgres-customers` **Service is ClusterIP** (no `type:` field). The public exposure is via the external `instant-pg-proxy` + ingress-nginx `tcp-services`, **NOT** this Service. | `k8s/data/postgres-customers.yaml` Service spec | |
| 28 | +| C5 | A `postgres-customers-ingress` NetworkPolicy already exists allowing ingress on 5432 only from provisioner/migrator/worker (all `instant-infra`). It does **not** list pg-proxy. **Its prod-apply state is unverified** (infra has no auto-apply). | `k8s/data/networkpolicy.yaml` | |
| 29 | + |
| 30 | +### CONFIRMED LIVE during the 2026-06-06 apply session (supersedes H1–H3 below) |
| 31 | + |
| 32 | +| # | Finding | How confirmed | |
| 33 | +|---|---|---| |
| 34 | +| L1 | **TWO superuser roles exist on the prod customer pod:** `instanode_admin` (rolsuper=t — the role api/worker connect with via `CUSTOMER_DATABASE_URL`) AND `instant_cust` (rolsuper=t + createdb + createrole — the `POSTGRES_USER` the provisioner connects with via `POSTGRES_CUSTOMERS_URL`). The PR's pg_hba listed only `instant_cust`. **Manifest FIXED to reject BOTH** before apply — else the catch-all customer rule re-opens the vector for `instanode_admin` (the confirmed truehomie role). | `psql -tAc "select rolname,rolsuper from pg_roles where rolsuper"`; `kubectl get secret instant-secrets -o jsonpath CUSTOMER_DATABASE_URL` → `instanode_admin`; provisioner deploy env `POSTGRES_CUSTOMERS_URL` → `instant_cust` | |
| 35 | +| L2 | **A LIVE pg_hba stopgap was already on the pod** (from the 2026-06-03 incident): `host all instanode_admin <pod-ip>/32 reject` for the THEN proxy pod IPs (`10.109.3.201`, `10.109.0.101`), plus catch-all `host all all all scram`. One rejected IP (`10.109.3.201`) is now **STALE** — the proxy rescheduled to `10.109.4.113` — so the stopgap is partially broken. This ConfigMap's **role-keyed** reject is the churn-proof replacement. Live file backed up at `$PGDATA/pg_hba.conf.bak.2026-06-03`. | `kubectl exec … cat $PGDATA/pg_hba.conf`; `kubectl get pods -l app=instant-pg-proxy -o wide` | |
| 36 | +| L3 | **pg-proxy is a custom TCP proxy that SNATs.** `instant-pg-proxy:v0.1.0` (in `instant` ns) routes by Redis prefix `pg_route:` with `PG_PROXY_FALLBACK_BACKEND=postgres-customers.instant-data.svc:5432`. Being a TCP proxy it terminates inbound + re-originates, so customer traffic arrives at postgres-customers as the **proxy pod IP (10.x)**. This confirms the role-based (not source-IP) reject is the correct boundary, and confirms the **fallback** would forward an admin connection straight through (the live vector). | `kubectl get deploy/instant-pg-proxy -o jsonpath env` | |
| 37 | +| L4 | **The `postgres-customers-ingress` NetworkPolicy is NOT applied in prod** (`kubectl get netpol -n instant-data` → "No resources found"). Cilium IS the CNI (policies would enforce if applied). So the network layer provides **zero** protection today — the pg_hba role-reject is the **sole** boundary. The NetworkPolicy was therefore **NOT applied** in this session (applying it as-is would default-deny + break the proxy path, which is not in its allow-list). | `kubectl get networkpolicy -n instant-data`; `kubectl get ds -n kube-system \| grep cilium` | |
| 38 | +| L5 | **No committed public-admin automation.** `grep -rI pg.instanode.dev` across all repos finds it only as the customer-facing `POSTGRES_PUBLIC_HOST` (the `usr_*` path); nothing pairs it with an admin DSN. `tcp-services` cm currently maps `5432 → instant/instant-pg-proxy` (its `last-applied` annotation shows it was ORIGINALLY `5432 → instant-data/postgres-customers`, i.e. a former direct-to-pod route — historical corroboration of the vector). | `grep`; `kubectl get cm -n ingress-nginx tcp-services -o yaml` | |
| 39 | + |
| 40 | +> **Net:** H1's *vector* is now fully corroborated end-to-end (public DNS → LB → |
| 41 | +> ingress tcp-services → SNATting pg-proxy with a fallback → catch-all pg_hba), |
| 42 | +> the proxy behaviour (H2) and NetPol non-enforcement (H3) are RESOLVED above. We |
| 43 | +> still did NOT attempt auth as the admin role (no destructive pentest); the |
| 44 | +> apply-time external test (§5b) uses a connection-rejection probe only. |
| 45 | +
|
| 46 | +### ORIGINAL HYPOTHESES (pre-apply; superseded by L1–L5 above) |
| 47 | + |
| 48 | +| # | Open item | Why it could not be confirmed at PR time | |
| 49 | +|---|---|---| |
| 50 | +| H1 | That an external actor **actually authenticated** as the admin role over the public path. | We deliberately did **not** attempt auth (out-of-scope noisy/destructive pentest). C1–C3 prove the path is *open*, not that it was *used*. | |
| 51 | +| H2 | The `instant-pg-proxy`'s own role-gate / `pg_hba` behaviour and whether it already blocks the admin role. | The proxy config lives in the **separate repo `InstaNode-dev/instant-pg-proxy`**. **RESOLVED L3:** it SNATs + has an open fallback (no role gate in v0.1.0). | |
| 52 | +| H3 | Whether the existing `postgres-customers-ingress` NetworkPolicy is enforced in prod. | infra has no auto-apply; requires a live `kubectl get netpol -n instant-data` (operator). **RESOLVED L4:** NOT applied. | |
| 53 | + |
| 54 | +> **Net:** the exposure (public-reachable customer-Postgres listener + a |
| 55 | +> catch-all default pg_hba that lets the admin role auth from anywhere) is |
| 56 | +> **CONFIRMED at the config + reachability level**. Whether it was the actual |
| 57 | +> truehomie dropper, and the proxy's own gate, remain hypothesis. The hardening |
| 58 | +> agent's #1 hypothesis is therefore **corroborated, not refuted**. |
| 59 | +
|
| 60 | +--- |
| 61 | + |
| 62 | +## 2. Legitimate consumers of postgres-customers (must NOT break) |
| 63 | + |
| 64 | +| Consumer | How it connects | Role | Preserved by this lockdown? | |
| 65 | +|---|---|---|---| |
| 66 | +| **instant-provisioner** (`instant-infra`) | `POSTGRES_CUSTOMERS_URL` admin DSN, in-cluster to `postgres-customers.instant-data.svc:5432` | **admin** (`instant_cust`) — CREATE/DROP `db_<token>` + `usr_<token>` | **Yes** — pg_hba allows `instant_cust` from `10.0.0.0/8` (pod CIDR). NetPol already allows provisioner. | |
| 67 | +| **instant-migrator** (`instant-infra`) | in-cluster, resource migrations (CopyData/Verify) | admin or per-tenant | **Yes** — same pod-CIDR admin allow + NetPol already allows migrator. | |
| 68 | +| **instant-worker** (`instant-infra`) | in-cluster, read-only `pg_database_size` (quota tick) | admin (read) | **Yes** — pod-CIDR admin allow + NetPol already allows worker. | |
| 69 | +| **Customers** | public `pg.instanode.dev:5432` → pg-proxy → `db_<token>` | per-tenant **`usr_<token>`** (non-superuser) | **Yes** — pg_hba `host all all 0.0.0.0/0 scram-sha-256` LAST rule still allows customer roles from anywhere. The admin reject does NOT catch them (role name != `instant_cust`/`postgres`). | |
| 70 | +| **backup CronJob** (`postgres-customers-backup`) | in-cluster `pg_dumpall` (BACKUP-RESTORE-RUNBOOK) | admin | **Yes** — runs in-cluster (pod CIDR) as the admin role. *Operator: verify its pod lands in 10.0.0.0/8 — it does, all DOKS pods are in pod CIDR.* | |
| 71 | +| **restore-drill sidecar** | throwaway namespace, never touches the live pod | n/a | Unaffected. | |
| 72 | + |
| 73 | +**The one thing this CLOSES:** a direct `psql -h pg.instanode.dev -U instant_cust` |
| 74 | +(or `-U postgres`) from **outside** the cluster. That is the truehomie vector. |
| 75 | + |
| 76 | +> **Unverifiable-consumer caution:** if any **ad-hoc operator/CI workflow** |
| 77 | +> currently connects to the admin role over the **public** `pg.instanode.dev` |
| 78 | +> (e.g. a migration run from a laptop or a GitHub Action), the lockdown will |
| 79 | +> **break it by design** — that path IS the vulnerability. Before apply, the |
| 80 | +> operator MUST confirm no legitimate automation depends on public admin access |
| 81 | +> (search CI secrets / workflows for `pg.instanode.dev` + an admin DSN). If one |
| 82 | +> exists, migrate it to an in-cluster runner / `kubectl exec` first. |
| 83 | +
|
| 84 | +--- |
| 85 | + |
| 86 | +## 3. Pre-apply verification (do this FIRST, in the window) |
| 87 | + |
| 88 | +```bash |
| 89 | +kubectl config current-context # MUST be do-nyc3-instant-prod |
| 90 | + |
| 91 | +# (a) Is the existing ingress NetworkPolicy enforced? (H3) |
| 92 | +kubectl get netpol -n instant-data postgres-customers-ingress -o yaml | sed -n '1,60p' |
| 93 | + |
| 94 | +# (b) Where is pg-proxy, and does customer traffic SNAT through it? (H2) |
| 95 | +# The proxy manifest is in the SEPARATE instant-pg-proxy repo; find it live: |
| 96 | +kubectl get pods -A | grep -i pg-proxy |
| 97 | +kubectl get svc,cm -A | grep -iE 'tcp-services|pg-proxy' |
| 98 | +# Inspect ingress-nginx tcp-services to see what 5432 maps to: |
| 99 | +kubectl get cm -n ingress-nginx tcp-services -o yaml 2>/dev/null |
| 100 | + |
| 101 | +# (c) Does a real customer connection currently work end-to-end? (baseline to |
| 102 | +# compare AFTER lockdown — use a KNOWN test tenant's usr_/db_, NOT admin) |
| 103 | +# (operator runs from a real customer connection string they own) |
| 104 | + |
| 105 | +# (d) Confirm the admin role name actually is `instant_cust` (POSTGRES_USER) on |
| 106 | +# the live pod (don't trust the manifest blindly): |
| 107 | +kubectl exec -n instant-data deploy/postgres-customers -- \ |
| 108 | + psql -U instant_cust -d instant_customers -tAc "select rolname,rolsuper from pg_roles where rolsuper;" |
| 109 | +# Expect the superuser to be `instant_cust` (and possibly `postgres`). |
| 110 | +# If the admin role differs, EDIT the pg_hba ConfigMap to match BEFORE apply. |
| 111 | + |
| 112 | +# (e) Confirm no legitimate automation uses PUBLIC admin access: |
| 113 | +# (search your CI secrets/workflows + local shell history for the DSN) |
| 114 | +# grep your repos for: pg.instanode.dev .* instant_cust (or :@pg.instanode.dev) |
| 115 | +``` |
| 116 | + |
| 117 | +**Decision gate:** |
| 118 | +- If (d) shows a different admin role → fix the ConfigMap, re-run pre-apply. |
| 119 | +- If (e) finds a public-admin automation → migrate it in-cluster FIRST. |
| 120 | +- If (b) shows pg-proxy SNATs and (a) shows the NetPol enforced and customers |
| 121 | + currently work → the NetPol must already allow the proxy somehow; do NOT touch |
| 122 | + the NetPol, rely on the pg_hba role-reject alone. **If customers do NOT |
| 123 | + currently work, that is a pre-existing issue — do not conflate it with this |
| 124 | + lockdown.** |
| 125 | + |
| 126 | +### 3a. ⚠️ The pg-proxy SNAT problem — proxy-pod-IP reject is REQUIRED (and churns) |
| 127 | + |
| 128 | +**LIVE-VERIFIED 2026-06-06, and it changes the design:** `instant-pg-proxy` is a |
| 129 | +normal in-cluster pod (not hostNetwork) that terminates the inbound TCP and |
| 130 | +re-originates to `postgres-customers`. So EVERY public connection — including an |
| 131 | +external `psql -U instanode_admin` — arrives SNAT'd to a **proxy pod IP inside |
| 132 | +10.109.x (i.e. inside 10.0.0.0/8)**. A plain `instanode_admin 10.0.0.0/8 allow` |
| 133 | +would therefore MATCH a SNAT'd external admin and NOT close the vector. Baseline |
| 134 | +probe before apply confirmed the live vector is OPEN: `psql -U instanode_admin` |
| 135 | +over `pg.instanode.dev` returns `password authentication failed` (it REACHED |
| 136 | +scram). The proxy v0.1.0 has no role gate and an open fallback. |
| 137 | + |
| 138 | +**Consequence for the ConfigMap:** the admin reject MUST list the CURRENT proxy |
| 139 | +pod IPs and be ordered BEFORE the `10.0.0.0/8` allow (first-match wins). Get them: |
| 140 | + |
| 141 | +```bash |
| 142 | +kubectl get pods -n instant -l app=instant-pg-proxy -o jsonpath='{range .items[*]}{.status.podIP}{"\n"}{end}' |
| 143 | +# Put these into the `host all instanode_admin <ip>/32 reject` (and instant_cust) |
| 144 | +# lines at the TOP of the admin block in postgres-customers-lockdown.yaml. |
| 145 | +``` |
| 146 | + |
| 147 | +**⚠️ CHURN: these IPs change when the proxy reschedules** — that is exactly how |
| 148 | +the 2026-06-03 hand-stopgap silently broke (it listed `10.109.3.201`, now dead). |
| 149 | +After ANY proxy reschedule, re-run the command above, update the two reject lines, |
| 150 | +re-apply the ConfigMap, and `SELECT pg_reload_conf()`. The **durable** churn-proof |
| 151 | +closer is the pg-proxy's own privileged-role deny (`PG_PROXY_DENIED_ROLES`, staged |
| 152 | +in repo `InstaNode-dev/instant-pg-proxy` per memory) — once that ships and is |
| 153 | +deployed, the proxy rejects admin roles before forwarding and these IP lines |
| 154 | +become redundant belt-and-suspenders. **Operator follow-up: ship the proxy |
| 155 | +role-gate; add an alert on proxy pod restarts so the pg_hba IPs can be refreshed.** |
| 156 | + |
| 157 | +- If the proxy-pod-IP reject lines in the ConfigMap do NOT match the live proxy |
| 158 | + IPs at apply time → FIX them first, else the lockdown is a no-op for the live |
| 159 | + public path. |
| 160 | + |
| 161 | +--- |
| 162 | + |
| 163 | +## 4. Apply (online pg_hba reload first; pod patch is the durable step) |
| 164 | + |
| 165 | +The pg_hba change is **online-reloadable** — no customer downtime for the |
| 166 | +config itself. The pod patch (to mount the ConfigMap + start with the custom |
| 167 | +`hba_file`) is a **pod restart** (single-replica → brief connect blip; provisioner |
| 168 | +retries; customers reconnect). |
| 169 | + |
| 170 | +```bash |
| 171 | +kubectl config current-context # do-nyc3-instant-prod — re-confirm |
| 172 | + |
| 173 | +# 1. Apply the ConfigMap (inert until mounted — safe to apply anytime). |
| 174 | +kubectl apply -f k8s/data/postgres-customers-lockdown.yaml |
| 175 | + |
| 176 | +# 2. Patch the Deployment to mount the ConfigMap and start postgres with the |
| 177 | +# custom hba_file. (Single, reviewable strategic patch — read the diff first.) |
| 178 | +kubectl patch deploy/postgres-customers -n instant-data --type=strategic -p ' |
| 179 | +spec: |
| 180 | + template: |
| 181 | + spec: |
| 182 | + containers: |
| 183 | + - name: postgres |
| 184 | + args: ["-c", "hba_file=/etc/postgresql/pg_hba.conf", "-c", "password_encryption=scram-sha-256"] |
| 185 | + volumeMounts: |
| 186 | + - name: hba |
| 187 | + mountPath: /etc/postgresql/pg_hba.conf |
| 188 | + subPath: pg_hba.conf |
| 189 | + readOnly: true |
| 190 | + volumes: |
| 191 | + - name: hba |
| 192 | + configMap: |
| 193 | + name: postgres-customers-hba |
| 194 | + items: |
| 195 | + - key: pg_hba.conf |
| 196 | + path: pg_hba.conf |
| 197 | +' |
| 198 | +# NOTE: the container name on the live pod is `postgres` (per the manifest). |
| 199 | +# Confirm with: kubectl get deploy/postgres-customers -n instant-data \ |
| 200 | +# -o jsonpath='{.spec.template.spec.containers[0].name}' |
| 201 | + |
| 202 | +# 3. Wait for the new pod to be Ready. |
| 203 | +kubectl rollout status deploy/postgres-customers -n instant-data --timeout=180s |
| 204 | + |
| 205 | +# (Alternative to a restart, if you want ZERO downtime and the file is already |
| 206 | +# mounted on a prior apply: edit pg_hba on the pod's mounted path is read-only, |
| 207 | +# so the reload path is to update the ConfigMap and `pg_ctl reload`:) |
| 208 | +# kubectl exec -n instant-data deploy/postgres-customers -- \ |
| 209 | +# psql -U instant_cust -d instant_customers -c "SELECT pg_reload_conf();" |
| 210 | +``` |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## 5. Verify AFTER apply |
| 215 | + |
| 216 | +### 5a. Legitimate access STILL works (do these FIRST) |
| 217 | + |
| 218 | +```bash |
| 219 | +# Provisioner admin path (in-cluster) — must still authenticate: |
| 220 | +kubectl exec -n instant-data deploy/postgres-customers -- \ |
| 221 | + psql -U instant_cust -d instant_customers -tAc "select 1;" # expect: 1 |
| 222 | + |
| 223 | +# A provisioning smoke test through the real API (creates + lists a db): |
| 224 | +curl -sS -X POST https://api.instanode.dev/db/new | jq '.connection_string!=null' # expect true |
| 225 | +# then connect to the returned connection string as the customer (usr_ role) |
| 226 | +# from OUTSIDE the cluster and run `select 1;` — expect SUCCESS (customer path |
| 227 | +# preserved). |
| 228 | + |
| 229 | +# Backup CronJob smoke (or wait for the nightly): trigger a manual run and |
| 230 | +# confirm it still dumps (BACKUP-RESTORE-RUNBOOK §verify). |
| 231 | +kubectl create job -n instant-data --from=cronjob/postgres-customers-backup pg-lockdown-verify |
| 232 | +kubectl logs -n instant-data job/pg-lockdown-verify --follow # expect a clean dumpall |
| 233 | +``` |
| 234 | + |
| 235 | +### 5b. External ADMIN access is CLOSED (the whole point) |
| 236 | + |
| 237 | +```bash |
| 238 | +# From a machine OUTSIDE the cluster, attempt the ADMIN role over the public host. |
| 239 | +# EXPECT: rejected by pg_hba ("no pg_hba.conf entry ... rejected"), NOT a password |
| 240 | +# prompt that proceeds. This is a SAFE connection-rejection test — it does NOT |
| 241 | +# require valid credentials and runs NO SQL. |
| 242 | +PGCONNECT_TIMEOUT=5 psql "host=pg.instanode.dev port=5432 user=instant_cust dbname=instant_customers sslmode=require" -c '\q' 2>&1 | head |
| 243 | +# PASS = an explicit pg_hba REJECT / "no pg_hba.conf entry for host ... user |
| 244 | +# \"instant_cust\" ... rejected" (FATAL). |
| 245 | +# FAIL = a password prompt / "password authentication failed" (means the hba |
| 246 | +# rule did NOT reject — admin is still reachable; ROLL BACK + investigate). |
| 247 | +# (The TCP handshake will still succeed — that is expected; the boundary is the |
| 248 | +# pg_hba role reject, not the port. The customer usr_* path is unaffected.) |
| 249 | +``` |
| 250 | + |
| 251 | +> The TCP port stays open (customers need it). The boundary is the **role-level |
| 252 | +> reject** at pg_hba. If you want the port itself closed to the public, that is a |
| 253 | +> separate, larger change in the `instant-pg-proxy` repo + ingress-nginx |
| 254 | +> tcp-services (do not attempt as part of this lockdown). |
| 255 | +
|
| 256 | +--- |
| 257 | + |
| 258 | +## 6. Rollback |
| 259 | + |
| 260 | +```bash |
| 261 | +# Revert the pod patch (drops the custom hba_file + mount → back to image default): |
| 262 | +kubectl patch deploy/postgres-customers -n instant-data --type=json -p '[ |
| 263 | + {"op":"remove","path":"/spec/template/spec/containers/0/args"}, |
| 264 | + {"op":"remove","path":"/spec/template/spec/containers/0/volumeMounts/0"}, |
| 265 | + {"op":"remove","path":"/spec/template/spec/volumes/0"} |
| 266 | +]' |
| 267 | +kubectl rollout status deploy/postgres-customers -n instant-data --timeout=180s |
| 268 | + |
| 269 | +# Optionally delete the ConfigMap (inert either way): |
| 270 | +kubectl delete cm -n instant-data postgres-customers-hba --ignore-not-found |
| 271 | +``` |
| 272 | + |
| 273 | +Rollback restores the (vulnerable) catch-all default. Only roll back if a |
| 274 | +**legitimate** consumer breaks — and capture which one, because that maps to a |
| 275 | +consumer the analysis missed. |
| 276 | + |
| 277 | +--- |
| 278 | + |
| 279 | +## 7. What this does NOT do (scope honesty) |
| 280 | + |
| 281 | +- It does **not** close the public TCP port on 5432 (customers connect there). |
| 282 | + The admin boundary is the pg_hba **role reject**, not the port. |
| 283 | +- It does **not** touch the `instant-pg-proxy` repo (the proxy's own role-gate / |
| 284 | + pg_hba is the durable fix tracked separately, per memory + the audit doc). |
| 285 | +- It does **not** prove the truehomie dropper used this path (H1 remains |
| 286 | + hypothesis) — it removes the *capability*, which is the right action regardless. |
| 287 | +- It does **not** by itself add an audit trail for in-cluster admin DROPs — that |
| 288 | + is the provisioner `guardedDrop` chokepoint (already shipped, audit doc §Layer 1) |
| 289 | + + the DDL-logging trap set on the cluster (memory). |
| 290 | + |
| 291 | +--- |
| 292 | + |
| 293 | +## 8. Defense-in-depth context (already shipped elsewhere) |
| 294 | + |
| 295 | +This lockdown is the **infra** half of the truehomie fix. The **application** half |
| 296 | +is already shipped (audit doc): |
| 297 | +- provisioner `guardedDrop` chokepoint + DDL-audit log + `instant_provisioner_drop_total` (PR #50) |
| 298 | +- CI guard test: no raw DROP outside the chokepoint (PR #50) |
| 299 | +- NR alert + dashboard tile + catalog row for the drop metric (infra PR #60, merged) |
| 300 | + |
| 301 | +Together: this runbook removes the *unaudited external admin DROP capability*; |
| 302 | +the chokepoint ensures every *sanctioned* drop is recorded; the CI guard ensures a |
| 303 | +*new* unaudited drop call site cannot be merged. |
0 commit comments