feat(watchdog): alert when upstream sync is stuck (fork far behind)#426
Merged
Conversation
Root-cause prevention for the 2026-06-19 upstream-sync stall: the daily sync ran 'ok' every day (it escalated with an [UPSTREAM] issue) but never MERGED, so the fork silently fell to 301 commits behind and nobody noticed for days. check_jobs only sees the job ran, not that nothing landed. Add check_upstream_lag(): reads the real distance to upstream/main (git rev-list --count HEAD..upstream/main) and alerts the owner when the fork exceeds the auto-merge ceiling (UPSTREAM_BEHIND_ALERT=80). The watchdog runs as a no_agent script outside the repo, so _resolve_repo_dir() locates it (EVOLUTION_REPO_DIR env, in-tree, or common install paths) and the check is silent when the repo/upstream remote can't be found — never a false alarm. 7 tests (over/under/at threshold, git failure, garbage output, spawn error, unresolved repo).
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-attribute |
2 |
First entries
run_agent.py:3223: [unresolved-attribute] unresolved-attribute: Object of type `Self@get_credits_spent_micros` has no attribute `_credits_session_start_micros`
tests/run_agent/test_credits_notices_toggle.py:76: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_credits_session_start_micros` on type `AIAgent`
✅ Fixed issues (1):
| Rule | Count |
|---|---|
invalid-assignment |
1 |
First entries
tests/run_agent/test_credits_notices_toggle.py:76: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to attribute `_credits_session_start_micros` of type `int`
Unchanged: 6081 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Root-cause prevention for the upstream-sync stall just cleared in #405/#403. The daily
evolution-upstream-syncran "ok" every day — it escalated (filed an[UPSTREAM]issue) but never merged — so the fork silently fell to 301 commits behind and nobody noticed for days. The watchdog'scheck_jobsonly sees the job ran, not that nothing landed.What
Add
check_upstream_lag()toscripts/evolution_watchdog.py: it reads the real distance toupstream/main(git rev-list --count HEAD..upstream/main) and alerts the owner (via the watchdog's normal chat delivery) when the fork exceeds the auto-merge ceiling (UPSTREAM_BEHIND_ALERT = 80— the same threshold the sync uses to decide auto vs escalate).no_agentscript copied outside the repo, so_resolve_repo_dir()locates the repo (EVOLUTION_REPO_DIRenv → in-tree → common install/agent-clone paths).upstream/mainref can't be found, or git errors — never a false alarm from a missing remote.main()alongside the existing checks.With this, the next time the sync freezes, the owner is pinged within a day instead of discovering it weeks later.
Tests
tests/scripts/test_evolution_watchdog.py::TestUpstreamLag— 7 cases: over/at/under threshold, git failure, garbage output, spawn error, unresolved repo. Full module: 35 passed.Note
This is the watchdog (visibility) half of the follow-up. The other half — making the sync incremental (merge to the last conflict-free point each day so one far conflict can't freeze it) — is a larger behavioral change to the upstream-sync skill, deferred separately.