Skip to content

docs(skill): flag that Ready phase isn't necessarily healthy#63

Merged
passcod merged 1 commit into
mainfrom
skill-ready-not-healthy
Jun 5, 2026
Merged

docs(skill): flag that Ready phase isn't necessarily healthy#63
passcod merged 1 commit into
mainfrom
skill-ready-not-healthy

Conversation

@passcod

@passcod passcod commented Jun 5, 2026

Copy link
Copy Markdown
Member

🤖

Summary

Adds a note to the `pgro-status` skill's Phase 1 overview that a replica in `Ready` phase isn't automatically healthy: `Ready` only means the operator's switchover state machine is at rest, not that data is fresh. A replica with `consecutiveRestoreFailures > 0` and growing is serving increasingly stale data even though its phase looks fine.

This came out of a recent incident where I reported "the dev replicas are healthy" based on Phase column alone, missing that they hadn't had a successful restore in 28–38h. To users, "the replica isn't working" usually means stale data, not refused connections — so the skill should explicitly tell future agents to cross-check the failure counter against `lastRestoreCompletedAt` and the schedule.

A Ready replica can have a stale Active restore and a growing
consecutiveRestoreFailures counter — every recent restore attempt has
failed, so its data is older than lastRestoreCompletedAt suggests. To
users "the replica isn't working" usually means stale data, not
refused connections. Note this in the pgro-status skill's overview
checks so future agents cross-reference the failure counter against
the last successful restore before declaring a Ready replica healthy.
@passcod passcod merged commit b952fc8 into main Jun 5, 2026
17 checks passed
@passcod passcod deleted the skill-ready-not-healthy branch June 5, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant