You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update documentation across all relevant files:
- README: add Maintenance Mode Integration to features list
- API reference: add MaintenanceSpec type, MaintenanceMode condition,
StartMaintenance/EndMaintenance upgrade phases
- Architecture: add Maintenance Jobs to diagram and reconciliation
strategy, add maintenance_jobs.go to project structure
- Safe upgrade runbook: add Maintenance Mode section with YAML examples,
update upgrade order and phases table
Amp-Thread-ID: https://ampcode.com/threads/T-019ccbea-b6d3-7583-8ac6-4f8a88c21dbd
Co-authored-by: Amp <amp@ampcode.com>
Copy file name to clipboardExpand all lines: docs/api-reference/v1alpha1.md
+15-1Lines changed: 15 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,6 +32,7 @@ This document describes all Custom Resource Definitions (CRDs) managed by the Pe
32
32
|`paused`|`bool`| No |`false`| When true, the operator stops reconciling this cluster. |
33
33
|`upgradePolicy`|[`UpgradePolicy`](#upgradepolicy)| No |`Automatic`| Controls how version upgrades are applied. Enum: `Automatic`, `Manual`. |
34
34
|`maintenanceWindow`|[`MaintenanceWindow`](#maintenancewindow)| No | — | Time window for automatic upgrades. Only used when `upgradePolicy` is `Automatic`. |
35
+
|`maintenance`|[`MaintenanceSpec`](#maintenancespec)| No | — | Configures PeerDB maintenance mode for graceful upgrades. When set, the operator pauses mirrors before upgrading and resumes them after. |
35
36
36
37
### PeerDBClusterStatus
37
38
@@ -205,6 +206,16 @@ Defines a time window during which automatic upgrades may be applied.
205
206
|`end`|`string`|**Yes**| — | End time in 24-hour `HH:MM` format. |
206
207
|`timeZone`|`*string`| No |`UTC`| IANA timezone name (e.g., `America/New_York`). |
207
208
209
+
### MaintenanceSpec
210
+
211
+
Configuration for PeerDB maintenance mode during upgrades. When configured, the operator runs maintenance Jobs (`ghcr.io/peerdb-io/flow-maintenance`) to gracefully pause all mirrors before upgrading and resume them after.
212
+
213
+
| Field | Type | Required | Default | Description |
|`startedAt`|`*metav1.Time`| Timestamp when the upgrade started. |
237
248
|`message`|`string`| Human-readable message about the upgrade state. |
238
249
@@ -361,6 +372,7 @@ The following condition types are used in `PeerDBCluster` status:
361
372
|`Degraded`| Set to `True` when one or more components are unhealthy but the cluster is partially operational. |
362
373
|`UpgradeInProgress`| Set to `True` when a version upgrade is in progress. |
363
374
|`BackupSafe`| Whether it is safe to take a backup. `True` when no upgrade or rolling restart is in progress. `False` with reason `BackupInProgress` when the `peerdb.io/backup-in-progress` annotation is set, or `BackupUnsafe` when an upgrade/rollout is active. |
375
+
|`MaintenanceMode`| Set to `True` when PeerDB maintenance mode is active (mirrors are paused for an upgrade). Set to `False` with reason `MaintenanceComplete` after mirrors are resumed. |
364
376
365
377
### Annotations
366
378
@@ -383,9 +395,11 @@ The `UpgradeStatus.phase` field tracks progress through a rolling upgrade:
383
395
|-------|-------------|
384
396
|`Waiting`| Upgrade is pending (e.g., waiting for a maintenance window). |
385
397
|`Blocked`| Upgrade is blocked (e.g., manual policy requires acknowledgement). |
398
+
|`StartMaintenance`| Running the StartMaintenance Job to pause mirrors before upgrade. |
386
399
|`Config`| Updating shared ConfigMap and configuration. |
387
400
|`InitJobs`| Re-running init jobs if needed. |
388
401
|`FlowAPI`| Rolling out the Flow API Deployment. |
389
402
|`PeerDBServer`| Rolling out the PeerDB Server Deployment. |
390
403
|`UI`| Rolling out the PeerDB UI Deployment. |
404
+
|`EndMaintenance`| Running the EndMaintenance Job to resume mirrors after upgrade. |
Copy file name to clipboardExpand all lines: docs/runbooks/safe-upgrade.md
+46-1Lines changed: 46 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -70,10 +70,11 @@ For more control, use the manual upgrade policy:
70
70
The controller enforces a specific rollout order to minimize disruption:
71
71
72
72
```
73
-
ConfigMap/Secrets → Init Jobs → Flow API → PeerDB Server → UI
73
+
[StartMaintenance →] ConfigMap/Secrets → Init Jobs → Flow API → PeerDB Server → UI [→ EndMaintenance]
74
74
```
75
75
76
76
Each step must complete successfully before the next begins. This ensures:
77
+
- Mirrors are gracefully paused before any component restarts (when `spec.maintenance` is configured).
77
78
- Configuration is propagated before any component restarts.
78
79
- The Flow API (gRPC backend) is ready before the Server and UI that depend on it.
79
80
- The UI is upgraded last since it's the least critical component.
@@ -102,6 +103,48 @@ spec:
102
103
- Remove or omit `maintenanceWindow` to allow upgrades at any time.
103
104
- If `timeZone` is not specified, it defaults to UTC.
104
105
106
+
## Maintenance Mode
107
+
108
+
PeerDB has a built-in maintenance mode that gracefully pauses all running mirrors before an upgrade and resumes them after. The operator integrates this via Kubernetes Jobs:
109
+
110
+
```yaml
111
+
apiVersion: peerdb.peerdb.io/v1alpha1
112
+
kind: PeerDBCluster
113
+
metadata:
114
+
name: peerdb
115
+
spec:
116
+
version: "v0.37.0"
117
+
maintenance: {}
118
+
# ... rest of spec
119
+
```
120
+
121
+
When `spec.maintenance` is set, the upgrade flow becomes:
122
+
123
+
1.**StartMaintenance** — A Job runs using the `flow-maintenance` image with `start` command. This triggers PeerDB's `StartMaintenance` Temporal workflow, which waits for running snapshots, enables maintenance mode (`PEERDB_MAINTENANCE_MODE_ENABLED`), and pauses all running mirrors.
124
+
2.**Normal upgrade** — Config, init jobs, Flow API, Server, and UI are rolled out in order.
125
+
3.**EndMaintenance** — A Job runs with the `end` command, resuming all previously paused mirrors and disabling maintenance mode.
126
+
127
+
While maintenance mode is active, mirrors cannot be created or mutated through PeerDB.
0 commit comments