You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(lockdown): mark durable pg-proxy role-gate DONE (shipped+deployed+verified) (#64)
The churn-proof PG_PROXY_DENIED_ROLES role-gate is now live in prod
(InstaNode-dev/instant-pg-proxy PR #1, image v0.2.0, deployed with
PG_PROXY_DENIED_ROLES=instanode_admin,instant_cust,postgres,doadmin).
Live-verified pod-IP-independent: external admin rejected at the PROXY layer
(28000) even though the proxy now runs on new IPs the pg_hba reject lines don't
name; customer usr_* still forwarded; in-cluster provisioning via ClusterIP svc
unaffected. The pg_hba proxy-IP reject lines are now redundant belt-and-suspenders.
Updates §3a (churn warning → mitigated), §7 (scope), §9 Drill Log (new row +
follow-up closed).
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
env `PG_PROXY_DENIED_ROLES=instanode_admin,instant_cust,postgres,doadmin`. The
158
+
proxy now rejects admin roles at the StartupMessage with a FATAL `28000`
159
+
ErrorResponse BEFORE resolving/dialing — **independent of pod IPs / pg_hba.**
160
+
Live-verified: after the rollout the proxy runs on NEW IPs (`10.109.6.132`,
161
+
`10.109.4.98`) that the pg_hba reject lines do NOT name, yet external admin is
162
+
still rejected (at the proxy, not pg_hba). The pg_hba proxy-IP reject lines are
163
+
therefore now **redundant belt-and-suspenders** — left in place (harmless), no
164
+
longer the sole boundary, no longer require churn-refresh on reschedule.
165
+
**Remaining operator follow-up:** add an alert on `instant-pg-proxy` pod restarts
166
+
(defense-in-depth visibility), and ensure any future redeploy preserves the
167
+
`PG_PROXY_DENIED_ROLES` env (it lives only on the live Deployment patch — fold it
168
+
into a committed manifest when one is created for the proxy).
156
169
157
170
- If the proxy-pod-IP reject lines in the ConfigMap do NOT match the live proxy
158
171
IPs at apply time → FIX them first, else the lockdown is a no-op for the live
@@ -280,8 +293,11 @@ consumer the analysis missed.
280
293
281
294
- It does **not** close the public TCP port on 5432 (customers connect there).
282
295
The admin boundary is the pg_hba **role reject**, not the port.
283
-
- It does **not** touch the `instant-pg-proxy` repo (the proxy's own role-gate /
284
-
pg_hba is the durable fix tracked separately, per memory + the audit doc).
296
+
-~~It does **not** touch the `instant-pg-proxy` repo~~ — **superseded 2026-06-06:**
297
+
the durable fix (the proxy's own `PG_PROXY_DENIED_ROLES` role-gate) IS now shipped
298
+
+ deployed (repo `InstaNode-dev/instant-pg-proxy` PR #1, image v0.2.0). This
299
+
runbook's pg_hba lockdown is now the belt-and-suspenders layer behind that gate.
300
+
See §3a + the §9 Drill Log row 2.
285
301
- It does **not** prove the truehomie dropper used this path (H1 remains
286
302
hypothesis) — it removes the *capability*, which is the right action regardless.
287
303
- It does **not** by itself add an audit trail for in-cluster admin DROPs — that
@@ -309,13 +325,14 @@ the chokepoint ensures every *sanctioned* drop is recorded; the CI guard ensures
309
325
| Date | Operator | Action | Result |
310
326
|---|---|---|---|
311
327
| 2026-06-06 | Claude (operator-authorized apply, "no customers, low blast radius") |**APPLIED to do-nyc3-instant-prod.** Merged PR #61 (squash, merge commit `78cb6677`) after fixing the manifest for two live findings (see below). Applied ConfigMap `postgres-customers-hba`; patched `deploy/postgres-customers` to mount it + `-c hba_file=/etc/postgresql/pg_hba.conf -c password_encryption=scram-sha-256`; changed strategy `RollingUpdate→Recreate` (RWO PVC Multi-Attach). Did NOT apply `networkpolicy.yaml` (verified NOT enforced in prod; applying as-is would default-deny the proxy path). |**SUCCESS.** External admin REJECTED at pg_hba (both `instanode_admin` + `instant_cust`, error names the SNAT'd proxy pod IP) — baseline beforehand reached scram (vector was OPEN). In-cluster admin preserved: provisioner `instant_cust` CREATE/DROP smoke OK, api/worker `instanode_admin` connect + `pg_database_size` OK, customer `usr_*` path still reaches scram. No rollback. |
328
+
| 2026-06-06 | Claude (operator-authorized, "no customers, low blast radius") | **DURABLE FIX SHIPPED + DEPLOYED — the churn-proof pg-proxy role-gate.** Created the `InstaNode-dev/instant-pg-proxy` repo (did not exist before — the proxy source was a loose, un-versioned local dir; live image was `ghcr.io/mastermanas805/instant-pg-proxy:v0.1.0` applied by hand, no committed manifest). Merged PR #1 (squash, merge commit `5a86c93`): the proxy parses the StartupMessage `user` and, if in `PG_PROXY_DENIED_ROLES`, returns a FATAL `28000` ErrorResponse (`role is not permitted over the public endpoint`) BEFORE resolving/dialing — default empty = inert. Built+pushed `ghcr.io/mastermanas805/instant-pg-proxy:v0.2.0`; `kubectl patch deploy/instant-pg-proxy -n instant` → image v0.2.0 + `PG_PROXY_DENIED_ROLES=instanode_admin,instant_cust,postgres,doadmin`. | **SUCCESS — durable closure verified, pod-IP-independent.** Rollout landed new pods at `10.109.6.132`/`10.109.4.98` (NOT the `10.109.4.113`/`10.109.0.101` the pg_hba reject lines name — those lines now point at DEAD pods, yet admin is STILL rejected, proving independence). External `instanode_admin`/`instant_cust`/`postgres` over `pg.instanode.dev` → **proxy 28000** (`role is not permitted over the public endpoint`), NOT a pg_hba reject naming a pod IP. Proxy logged `user_denied_public` for all three. Customer `usr_*` → FORWARDED (reached postgres scram → `password authentication failed`, not 28000). In-cluster admin via ClusterIP svc UNAFFECTED: `instant_cust` CREATE+DROP OK (`INCLUSTER_PROVISION_PATH_OK`), `pg_database_size` quota read OK. Provisioner DSN confirmed → `postgres-customers.instant-data.svc.cluster.local:5432` (svc, NOT the public proxy). The pg_hba proxy-IP reject lines are now redundant belt-and-suspenders (left in place, harmless). |
312
329
313
330
**Manifest fixes made before apply (live pre-apply verification):**
314
331
1.**`instanode_admin` was missing.** Prod has TWO superusers — `instanode_admin` (api/worker `CUSTOMER_DATABASE_URL`, the CONFIRMED truehomie vector) and `instant_cust` (provisioner `POSTGRES_CUSTOMERS_URL`). The original PR rejected only `instant_cust`; `instanode_admin` would have matched the catch-all customer allow → vector still open. Both now rejected.
315
332
2.**pg-proxy SNAT defeats source-CIDR.** instant-pg-proxy (in-cluster, no hostNetwork) re-originates TCP, so external admin arrives SNAT'd to a proxy pod IP inside `10.0.0.0/8` — a plain `10.0.0.0/8 allow` matches it. Added proxy-pod-IP `reject` lines (`10.109.4.113`, `10.109.0.101`) ordered BEFORE the in-cluster allow. **Verified in the reject error message** (`rejects connection for host "10.109.0.101"`). ⚠️ Churn dependency, see §3a.
316
333
317
334
**Operator follow-ups created by this apply:**
318
-
-**Ship the durable pg-proxy role-gate** (`PG_PROXY_DENIED_ROLES`in `InstaNode-dev/instant-pg-proxy`, staged per memory) so the closure no longer depends on the churning proxy-pod-IP reject lines in the ConfigMap.
319
-
-**On any `instant-pg-proxy` reschedule:** refresh the two `host all instanode_admin/instant_cust <proxy-ip>/32 reject` lines in `postgres-customers-lockdown.yaml`, re-apply, `SELECT pg_reload_conf()`. Add a proxy-pod-restart alert.
335
+
-~~**Ship the durable pg-proxy role-gate**~~ ✅ **DONE 2026-06-06.**`PG_PROXY_DENIED_ROLES`shipped (repo `InstaNode-dev/instant-pg-proxy` created + PR #1, merge `5a86c93`), image `v0.2.0` built+pushed, deployed to `deploy/instant-pg-proxy` with `PG_PROXY_DENIED_ROLES=instanode_admin,instant_cust,postgres,doadmin`. Live-verified the closure is now pod-IP-independent (see §3a + Drill Log row 2). The closure no longer depends on the churning proxy-pod-IP reject lines.
336
+
-~~**On any `instant-pg-proxy` reschedule:** refresh the proxy-IP reject lines~~ — **no longer required for the security boundary** (the role-gate is now the durable boundary). The pg_hba IP reject lines are redundant belt-and-suspenders; leave them. Still recommended: add a proxy-pod-restart alert for visibility, and persist `PG_PROXY_DENIED_ROLES` into a committed proxy Deployment manifest (currently the env lives only on the live `kubectl patch` — a manual re-create of the Deployment would drop it).
320
337
-**`k8s/data/postgres-customers.yaml` updated** to carry the mount/args/Recreate-strategy so a future repo apply does not silently revert the lockdown (shipped in the same follow-up PR).
321
338
- The repo `apply.yml` workflow now includes `postgres-customers-lockdown.yaml` (safe — ConfigMap) but ALSO `networkpolicy.yaml`; running that workflow WOULD create the unenforced-today NetPol and default-deny the proxy path. Add it to the apply EXCLUDE list or add the pg-proxy ingress rule before anyone runs the workflow.
0 commit comments