Skip to content

feat(watchdog): alert when upstream sync is stuck (fork far behind)#426

Merged
Lexus2016 merged 1 commit into
mainfrom
evolution/watchdog-upstream-lag-alert
Jun 21, 2026
Merged

feat(watchdog): alert when upstream sync is stuck (fork far behind)#426
Lexus2016 merged 1 commit into
mainfrom
evolution/watchdog-upstream-lag-alert

Conversation

@Lexus2016

Copy link
Copy Markdown
Owner

Why

Root-cause prevention for the upstream-sync stall just cleared in #405/#403. The daily evolution-upstream-sync ran "ok" every day — it escalated (filed an [UPSTREAM] issue) but never merged — so the fork silently fell to 301 commits behind and nobody noticed for days. The watchdog's check_jobs only sees the job ran, not that nothing landed.

What

Add check_upstream_lag() to scripts/evolution_watchdog.py: it reads the real distance to upstream/main (git rev-list --count HEAD..upstream/main) and alerts the owner (via the watchdog's normal chat delivery) when the fork exceeds the auto-merge ceiling (UPSTREAM_BEHIND_ALERT = 80 — the same threshold the sync uses to decide auto vs escalate).

  • The watchdog runs as a no_agent script copied outside the repo, so _resolve_repo_dir() locates the repo (EVOLUTION_REPO_DIR env → in-tree → common install/agent-clone paths).
  • Fail-safe / silent when the repo or upstream/main ref can't be found, or git errors — never a false alarm from a missing remote.
  • Wired into main() alongside the existing checks.

With this, the next time the sync freezes, the owner is pinged within a day instead of discovering it weeks later.

Tests

tests/scripts/test_evolution_watchdog.py::TestUpstreamLag — 7 cases: over/at/under threshold, git failure, garbage output, spawn error, unresolved repo. Full module: 35 passed.

Note

This is the watchdog (visibility) half of the follow-up. The other half — making the sync incremental (merge to the last conflict-free point each day so one far conflict can't freeze it) — is a larger behavioral change to the upstream-sync skill, deferred separately.

Root-cause prevention for the 2026-06-19 upstream-sync stall: the daily
sync ran 'ok' every day (it escalated with an [UPSTREAM] issue) but never
MERGED, so the fork silently fell to 301 commits behind and nobody noticed
for days. check_jobs only sees the job ran, not that nothing landed.

Add check_upstream_lag(): reads the real distance to upstream/main
(git rev-list --count HEAD..upstream/main) and alerts the owner when the
fork exceeds the auto-merge ceiling (UPSTREAM_BEHIND_ALERT=80). The
watchdog runs as a no_agent script outside the repo, so _resolve_repo_dir()
locates it (EVOLUTION_REPO_DIR env, in-tree, or common install paths) and
the check is silent when the repo/upstream remote can't be found — never a
false alarm. 7 tests (over/under/at threshold, git failure, garbage output,
spawn error, unresolved repo).
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: evolution/watchdog-upstream-lag-alert vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 11559 on HEAD, 11556 on base (🆕 +3)

🆕 New issues (2):

Rule Count
unresolved-attribute 2
First entries
run_agent.py:3223: [unresolved-attribute] unresolved-attribute: Object of type `Self@get_credits_spent_micros` has no attribute `_credits_session_start_micros`
tests/run_agent/test_credits_notices_toggle.py:76: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_credits_session_start_micros` on type `AIAgent`

✅ Fixed issues (1):

Rule Count
invalid-assignment 1
First entries
tests/run_agent/test_credits_notices_toggle.py:76: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to attribute `_credits_session_start_micros` of type `int`

Unchanged: 6081 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@Lexus2016 Lexus2016 merged commit aa0b498 into main Jun 21, 2026
39 checks passed
@Lexus2016 Lexus2016 deleted the evolution/watchdog-upstream-lag-alert branch June 21, 2026 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant