Commit d426ffe
authored
fix(zero): add replicator health check and statement timeout (tldraw#8437)
After the 16+ hour Zero replication stall with no automated detection,
this adds a health check endpoint and a safety net for stuck queries.
**Health check** (`/health-check/zero-replicator`): queries
`pg_stat_replication` for the `zero-replicator` application and returns
500 if it's disconnected, stalled (`write_lsn IS NULL`), or lagging
(`write_lag > 1 minute`). Uses the existing Kysely pool, same pattern as
`/health-check/db`. Configure Updown.io to hit this endpoint every 60s
with the `HEALTH_CHECK_BEARER_TOKEN`.
**Statement timeout**: changes `statement_timeout=0` (infinite) to
`statement_timeout=1800000` (30 min) on Zero's connection strings.
Prevents stuck queries from blocking forever while still allowing
initial sync to complete.
### Change type
- [x] `improvement`
### Test plan
1. Deploy to staging
2. Hit `/health-check/zero-replicator` with bearer token — should return
200
3. Stop the zero-replicator process — endpoint should return 500 after
Updown confirmation
4. Verify Zero still boots and completes initial sync with the 30-min
statement timeout
### Code changes
| Section | LOC change |
| -------------- | ---------- |
| Apps | +25 / -0 |
| Config/tooling | +1 / -1 |1 parent 309af39 commit d426ffe
2 files changed
Lines changed: 33 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
| 52 | + | |
51 | 53 | | |
52 | | - | |
| 54 | + | |
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
| |||
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
61 | 92 | | |
62 | 93 | | |
63 | 94 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
669 | 669 | | |
670 | 670 | | |
671 | 671 | | |
672 | | - | |
| 672 | + | |
673 | 673 | | |
674 | 674 | | |
675 | 675 | | |
| |||
0 commit comments