Skip to content

feat(alerts): pg-proxy role-gate disabled / proxy down (truehomie residual)#65

Merged
mastermanas805 merged 1 commit into
masterfrom
feat/pg-proxy-role-gate-alert
Jun 6, 2026
Merged

feat(alerts): pg-proxy role-gate disabled / proxy down (truehomie residual)#65
mastermanas805 merged 1 commit into
masterfrom
feat/pg-proxy-role-gate-alert

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

What

Closes the last residual of the 2026-06-03 truehomie-db DROP durable fix. The instant-pg-proxy role-gate (PG_PROXY_DENIED_ROLES) bars privileged Postgres roles from the public pg.instanode.dev:5432 path. The env is now committed to a manifest (instant-pg-proxy#2, merged), but nothing alerted if the gate were ever disabled or the proxy went down.

Why log-based (interim)

The proxy is a thin TCP proxy with slog-to-stdout only — it exposes no /metrics endpoint, so a Prometheus-metric rule is impossible today. The lowest-effort reliable signal is the proxy's own log: each pod logs pgproxy.role_gate{denied_role_count} on boot (count>0 = gate ON, 0 = exposure) and pgproxy.user_denied_public on every rejected privileged role. Verified live: denied_role_count:4 + active user_denied_public events for instanode_admin/instant_cust/postgres. The newrelic-logging Fluent Bit DaemonSet (confirmed running on all nodes) ships proxy stdout to NR Log.

Files

  • newrelic/alerts/pg-proxy-role-gate-disabled.jsonP0/CRITICAL, fires on pgproxy.role_gate line with "denied_role_count":0 (gate disabled).
  • newrelic/alerts/pg-proxy-down.jsonP1/CRITICAL, fires on 10m of zero proxy logs (proxy down / path broken).
  • newrelic/dashboards/admin-defense.json — new "pg-proxy public-path gate" page (4 tiles).
  • observability/METRICS-CATALOG.md — catalog row (rule 25).
  • POSTGRES-CUSTOMERS-LOCKDOWN-RUNBOOK.md — §3a + §9 + Drill Log row: residual closed, manifest no-op-verified vs live, alert documented.

Durable upgrade (documented follow-up)

Add a pgproxy_role_gate_denied_roles gauge + /metrics listener to the proxy, scrape it, alert on gauge == 0, AND add a worker synthetic-reject prober leg (raw StartupMessage to pg.instanode.dev as instanode_admin, assert FATAL 28000). Until then the log alerts are the alarm.

Scope

Operator-apply only (infra has no auto-apply). No live behavior changed — JSON/MD only.

🤖 Generated with Claude Code

…idual)

Adds the alert for the last residual of the 2026-06-03 truehomie-db DROP
durable fix: the instant-pg-proxy role-gate (PG_PROXY_DENIED_ROLES) is now
committed to a manifest (InstaNode-dev/instant-pg-proxy k8s/), but nothing
alerted if the gate were ever disabled or the proxy went down.

The proxy is a thin TCP proxy with slog-to-stdout only — it exposes NO
/metrics endpoint, so a Prometheus-metric rule is not possible today. The
lowest-effort reliable signal is the proxy's startup log line
`pgproxy.role_gate{denied_role_count}` (count>0 = gate ON, 0 = exposure),
shipped to NR via the newrelic-logging Fluent Bit DaemonSet (verified
running on all nodes). Two log-based NR alerts (operator-apply):

- pg-proxy-role-gate-disabled.json (P0) — fires on denied_role_count==0
- pg-proxy-down.json (P1) — fires on 10m proxy log silence

Plus an admin-defense dashboard page ("pg-proxy public-path gate", 4 tiles)
and a METRICS-CATALOG row (rule 25). Runbook §3a + §9 updated: residual
closed, manifest no-op-verified vs live, alert documented. Proper durable
upgrade documented: add a pgproxy_role_gate_denied_roles gauge + /metrics +
a worker synthetic-reject prober leg.

Operator-apply only (no auto-apply on infra). No live behavior changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mastermanas805 mastermanas805 enabled auto-merge (squash) June 6, 2026 15:42
@mastermanas805 mastermanas805 merged commit 4f26343 into master Jun 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant