You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(gastown): restore agent working status on heartbeat after dispatch timeout race (#1359)
* fix(gastown): restore agent working status on heartbeat after dispatch timeout race (#1358)
Three compounding fixes for the 5-minute bead reset cycle caused by a
timing race between startAgentInContainer's 60s timeout and slow cold
starts:
1. touchAgent restores idle→working on heartbeat — a heartbeat is proof
the agent is alive in the container regardless of its recorded status.
2. reconcileBeads Rule 3 checks last_activity_at freshness — defense in
depth so an agent with a recent heartbeat is never treated as lost,
even if its status field is wrong.
3. dispatchAgent !started path no longer sets agent to idle — leaves it
working so the reconciler doesn't reset the bead. reconcileAgents
catches truly dead agents after 90s of missing heartbeats.
Closes#1358
* fix(gastown): add cold start grace period for container_status not_found
The container status pre-phase polls /agents/:id/status on every alarm
tick. During a cold start (git clone + worktree), the agent hasn't
registered in the process manager yet, so the container returns 404.
This was immediately setting the agent to idle, undoing the dispatch
timeout fix.
Add a 3-minute grace period for not_found status: if the agent was
dispatched recently (last_activity_at < 3 min ago), ignore the 404.
Truly dead agents are still caught by reconcileAgents after 90s of
missing heartbeats.
* fix(gastown): fix SQLite datetime comparison bug that prevented stuck bead recovery
reconcileBeads Rule 3 compared ISO 8601 timestamps (2026-03-21T05:55:50Z)
against SQLite datetime() output (2026-03-21 05:55:50). Since 'T' (ASCII
84) > ' ' (ASCII 32), the comparison last_activity_at > datetime('now',
'-90 seconds') was ALWAYS TRUE — the heartbeat check never expired. Rule 3
thought every hooked agent had a fresh heartbeat and never recovered stuck
in_progress beads.
Fix: use strftime('%Y-%m-%dT%H:%M:%fZ', ...) to produce ISO 8601 format
matching the stored timestamps.
Also: move invariant violation logging from console.error (spamming Workers
logs every 5s per town) to analytics events for observability dashboards.
Closes#1361
* fix(gastown): set refinery to idle after gt_done so next review can start immediately
The refinery's gt_done path unhooks the agent but doesn't set it to
idle. The refinery stays 'working' with no hook until agentCompleted
fires (when the container process exits, which can take 10-30s after
gt_done). During that time processReviewQueue sees the refinery as
non-idle and won't pop the next MR bead.
Set the refinery to idle immediately after unhooking in agentDone.
The container process continues running but the DO knows the refinery
is available for new reviews.
* fix(gastown): set working agents with no hook to idle in reconcileAgents
Working agents with fresh heartbeats but no hook are running in the
container doing nothing — gt_done already ran and unhooked them, or the
hook was cleared by another path. Without this, the refinery stays
'working' indefinitely (heartbeats keep it alive), blocking
processReviewQueue from dispatching it for the next review.
Also skip the mayor in the working-agent check (mayors are always
working with no hook — that's normal). This eliminates the invariant 7
false positive from #1364.
* style: run oxfmt formatter
* fix(gastown): guard agentCompleted idle transition against re-dispatched agents
agentCompleted unconditionally set the agent to idle, which could
clobber a live dispatch if the agent was re-hooked and dispatched
for new work between gt_done and the container's completion callback.
Add a guard: don't set to idle if the agent is working AND has a
hook (re-dispatched). Only set to idle if the agent is working with
no hook (gt_done completed, waiting for process exit) or already idle.
* style: format review-queue.ts
0 commit comments