Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .claude/skills/pgro-status/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ Look for:
- Restore objects: each replica should have **exactly one** `Active` restore in steady state. A transient `Pending` / `Restoring` / `Ready` / `Switching` restore is normal during a cycle. More than one `Active` indicates the sweep isn't pruning.
- Pending pod count > 0 is worth digging into before reporting healthy β€” could be a scheduling problem (Karpenter, taints, resource pressure).

**A `Ready` phase replica is not necessarily healthy.** `Ready` only means the operator's switchover state machine is at rest β€” the previous restore is still serving traffic. If `consecutiveRestoreFailures > 0` and growing, *every restore attempt since the last good one has failed*, so the data is staler than its `lastRestoreCompletedAt` claims. To users, "the replica isn't working" usually means the data is days behind, not that connections are refused. Always cross-check `consecutiveRestoreFailures` against `lastRestoreCompletedAt` and the replica's expected cadence before calling a `Ready` replica healthy.

### Phase 2 β€” per-replica detail

For each replica that looks off β€” and whenever a thorough check is requested β€” fetch the key status fields and conditions:
Expand Down