This document covers how agents execute work: the runtime, tools, context building, and the worker model.
Sabbatical uses Google ADK (Agent Development Kit) as the agent runtime. LLM calls are routed through OpenRouter via ADK's LiteLLM adapter.
When the dispatcher claims a task, it spawns a worker coroutine that:
- Loads the agent, task, comments, and organization from the database.
- Builds the context payload (system prompt + user message).
- Creates an ADK runner with the agent's tools.
- Iterates over runner events, recording steps and flushing comments.
- After completion, applies routing based on the agent's last comment.
The context is assembled in four blocks, structured to maximize LLM prompt caching.
A fixed template injected into every agent's system prompt. It covers:
- How the comment thread works
- Private vs. public work (tool calls are private, comments are public)
- Available tools and their usage
- The
add_commentprotocol (single-mode, last-comment routing) - Handoff rules (last valid
@tagrouting, thread-based fallback) - Comment formatting guidelines (plain prose, no headers, no markdown formatting beyond lists and code blocks)
- Iteration budget awareness
- Error handling instructions (document blockers, escalate)
This block is identical across all agents and runs.
Injected for organizational awareness:
- Organization name and description/purpose
- Agent roster rendered as a hierarchical ASCII tree showing names and descriptions
lead_dev — Lead developer, owns code quality and architecture
├── backend_dev — Backend Python specialist, APIs and database
└── test_writer — Testing specialist, unit and integration tests
The agent's specific profile:
- Agent name
- Full instructions content (read from the instructions file)
- Boss and direct reports
- Guidance to delegate to subordinates and escalate to boss
- Workspace path
- Iteration budget for this run
The only part that changes per task:
- Task ID, title, and organization
- Full task description
- Comment thread history (all comments chronologically, with author and timestamp)
- Instructions to act and post comments with
@tagfor routing
Caching benefit: Blocks A-C form a stable prefix that LLM providers can cache across runs. Block D at the bottom ensures the cache stays valid even as the task evolves.
All workspace tools are sandboxed to the organization's workspace directory. Path validation ensures no file operations can escape the workspace boundary.
Read files from the workspace with multiple modes:
| Mode | Parameters | Description |
|---|---|---|
view |
path |
Display full file content with line numbers. If path is a directory, lists entries. |
lines |
path, start_line, end_line |
Show a line range (1-indexed). |
search |
path, search_pattern |
Search for a regex pattern. If path is a directory, searches recursively. Uses grep -rn. |
find |
path |
List files and directories recursively (up to 200 entries). |
Write content to a file. Creates parent directories if needed. Returns the byte count written.
Make targeted edits without rewriting entire files:
| Command | Parameters | Description |
|---|---|---|
str_replace |
path, old_str, new_str |
Replace old_str with new_str. old_str must appear exactly once. |
insert |
path, new_str, line |
Insert text at a specific line number (1-indexed). |
undo_edit |
path |
Revert the last edit to this file. Only one level of undo is supported. |
The editor maintains an in-memory undo history per file path within a single run.
Execute a shell command in the workspace directory:
- Default timeout: 120 seconds
- Returns stdout, stderr (prefixed with
[stderr]), and exit code - Runs with
shell=Truefor full shell syntax support
The only way agents can write to the task's comment thread. A single-mode tool — post a comment and keep working.
All comments are written to the database immediately when flushed (after each LLM event). There is no "final" comment concept — every comment is equal. When the agent's execution ends (the LLM stops producing tool calls, or max iterations is hit), the routing engine reads the agent's last comment and extracts the last valid @tag for routing.
- No tag validation at comment time. Tags are validated at routing time.
- No
is_finalparameter. No double-submission guards. - Agents can post as many comments as they want throughout execution.
Comments are queued in thread_state and flushed to the database by the worker event loop after each LLM event.
- Load agent, task, comments, and organization records from the database.
- Validate that the workspace directory exists.
- Build the context payload via
build_context_payload(). - Determine the model (agent-specific override or system default).
- Build the set of valid routing targets (all active agent names + "user").
- Create the ADK runner with tools and initial heartbeat.
The worker iterates over ADK runner events:
- LLM reasoning: Extract text from event content parts. Record as an
llm_reasoningstep. - Tool calls: Extract function calls. Record each as a
tool_callstep with tool name and arguments. - Comment flushing: Check
thread_statefor pendingadd_commentcalls. Flush all pending comments to the database. - Token counting: Aggregate input and output tokens from
usage_metadata. - Heartbeat: Update
last_heartbeaton the run record. Checkcancel_requestedflag. - Iteration limit: If
iteration_count >= max_iterations, raiseMaxIterationsExceeded.
Steps are flushed to the database after each step, enabling real-time progress visibility.
On successful completion:
- Flush any remaining pending comments.
- If the agent posted no comments (
comment_count == 0), insert a system note: "[SYSTEM: Agent completed execution without submitting a response.]" - Compute cost via
openrouter_cost()(uses LiteLLM pricing tables). - Update run:
status='success', fill token counts, cost, and execution steps. - Call
handle_routing()to route the task based on the last valid@tagin the agent's last comment.
| Error | Run Status | Task Status | System Comment |
|---|---|---|---|
MaxIterationsExceeded |
success |
routed normally | "Agent reached iteration limit (N iterations)" |
RunTimedOut |
failed |
failed |
Timeout message with duration |
CancelledError |
preempted |
unchanged (already handled by preempt/cancel) | - |
| Generic exception | failed |
failed |
Sanitized error message |
Max iterations vs. timeout: Max iterations means the agent did useful work but used up its iteration budget — remaining comments are flushed, a system note is posted, and routing proceeds normally using the agent's last comment. Timeout usually means something is stuck (hung shell command, infinite loop), so the task is marked as failed with no routing attempted.
Error sanitization converts technical errors into user-friendly messages:
- Context window / token errors -> "Context window exceeded"
- 429 / rate limit -> "LLM rate limit reached"
- Connection / network errors -> "Network error"
- Others -> First 120 chars with pointer to
run view
Every event triggers _heartbeat_and_check_cancel():
- Updates
last_heartbeatto current UTC time. - Reads
cancel_requestedfrom the database. - If
cancel_requestedis set, raisesasyncio.CancelledError.
This enables:
- Liveness detection: The dispatcher identifies orphaned workers (heartbeat > 60s stale) and marks them as failed.
- Cooperative cancellation: User preemption and dispatcher shutdown set the flag; workers check it on every event and self-terminate cleanly.
Cost is computed using LiteLLM's cost_per_token() function:
cost = openrouter_cost(model, input_tokens, output_tokens)If the model isn't in LiteLLM's pricing table, cost defaults to 0.0. Costs are stored per run and aggregated at query time for tasks, agents, organizations, and the system.
The model used for a run is determined by:
- Agent-specific model (if set via
agent edit --model): Takes priority. - System default (
config.llm.default_model): Falls back to this if the agent has no override.
The default model is minimax/minimax-m2.7. Models are specified in OpenRouter format (e.g., anthropic/claude-3-5-sonnet-20241022).