Skip to content

feature: add heartbeat loop for periodic health checks#29

Merged
benvinegar merged 3 commits into
mainfrom
benvinegar/heartbeat-loop
Feb 17, 2026
Merged

feature: add heartbeat loop for periodic health checks#29
benvinegar merged 3 commits into
mainfrom
benvinegar/heartbeat-loop

Conversation

@benvinegar
Copy link
Copy Markdown
Member

What

Adds a periodic heartbeat loop to the control agent so Baudbot becomes proactive instead of purely reactive.

New files

  • pi/extensions/heartbeat.ts — pi extension that runs a setTimeout timer, reads ~/.pi/agent/HEARTBEAT.md on each tick, and injects it as a follow-up prompt
  • pi/skills/control-agent/HEARTBEAT.md — default checklist deployed to ~/.pi/agent/HEARTBEAT.md

How it works

  1. Extension starts a timer on session_start (default: 10 min interval)
  2. On each tick, reads HEARTBEAT.md — if empty/missing, skips silently (zero token cost)
  3. If content exists, injects it as a followUp message with triggerTurn: true
  4. Agent processes the checklist, takes action on anything broken, responds briefly if healthy
  5. Timer re-arms after each fire

Default checklist checks

  • Agent sessions alive (sentry-agent, orphaned dev-agents)
  • Slack bridge responsive
  • Email monitor running
  • Stale worktrees with no matching active todo
  • Stuck todos (in-progress >2h, no dev-agent)

Features

  • Error backoff: 2x exponential per consecutive failure, max 1 hour — prevents token burn
  • Min interval floor: 2 minutes hard limit
  • heartbeat tool: status / pause / resume / trigger / config
  • State persisted via appendEntry — run count survives session restarts
  • Configurable: HEARTBEAT_INTERVAL_MS, HEARTBEAT_FILE, HEARTBEAT_ENABLED env vars
  • Deploy: deploy.sh copies HEARTBEAT.md (always overwrites — admin-managed)

Inspired by

OpenClaw's HEARTBEAT.md + cron service pattern, simplified for our architecture (no separate CronService, no isolated sessions, no job persistence — just a timer + Markdown file).

Docs updated

  • AGENTS.md — repo layout
  • README.md — new Heartbeat section + architecture diagram
  • CONFIGURATION.md — new env vars
  • Control agent skill — heartbeat section + startup checklist

New heartbeat.ts extension — a periodic timer that reads HEARTBEAT.md
and injects it as a follow-up prompt (default: every 10 min).

- Configurable via env vars (HEARTBEAT_INTERVAL_MS, HEARTBEAT_FILE,
  HEARTBEAT_ENABLED)
- Error backoff: exponential delay on consecutive failures (2x per
  error, max 1 hour) to prevent token burn
- heartbeat tool: status/pause/resume/trigger/config actions
- Default checklist checks agent sessions, Slack bridge, email
  monitor, stale worktrees, and stuck todos
- If HEARTBEAT.md is empty or missing, no heartbeat fires (zero cost)
- deploy.sh deploys HEARTBEAT.md (always overwrites — admin-managed)

Inspired by OpenClaw's HEARTBEAT.md pattern.
Comment thread pi/extensions/heartbeat.ts Outdated
Tests cover all pure functions:
- readHeartbeatFile: missing/empty/comments-only/headings-only/valid
- resolveConfig: interval parsing, minimum floor, file path, defaults
- isDisabledByEnv: all boolean-ish values (0/false/no/1/true/yes/null)
- computeBackoffMs: exponential progression, MAX cap, monotonicity
- Deploy checklist: HEARTBEAT.md exists and has actionable content
Comment thread pi/extensions/heartbeat.ts Outdated
Bug: if pi.sendMessage() or saveState() threw, the exception propagated
uncaught from the setTimeout callback. The armTimer() call at the end
was never reached, permanently killing the heartbeat loop after a single
failure. The consecutiveErrors counter was also never incremented, making
the backoff machinery dead code.

Fix:
- try/catch/finally around the entire fire path
- Success resets consecutiveErrors to 0
- Catch increments consecutiveErrors (drives exponential backoff)
- finally always calls armTimer() so the loop never dies
- saveState() in catch is best-effort (nested try/catch)
- Removed stale comment referencing nonexistent agent_end handler

Added 6 tests simulating the error handling contract.
@benvinegar benvinegar merged commit ef7313e into main Feb 17, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant