Agent Execution

This document covers how agents execute work: the runtime, tools, context building, and the worker model.

Runtime

Sabbatical uses Google ADK (Agent Development Kit) as the agent runtime. LLM calls are routed through OpenRouter via ADK's LiteLLM adapter.

When the dispatcher claims a task, it spawns a worker coroutine that:

Loads the agent, task, comments, and organization from the database.
Builds the context payload (system prompt + user message).
Creates an ADK runner with the agent's tools.
Iterates over runner events, recording steps and flushing comments.
After completion, applies routing based on the agent's last comment.

Context Payload

The context is assembled in four blocks, structured to maximize LLM prompt caching.

Block A: System Rules (Static)

A fixed template injected into every agent's system prompt. It covers:

How the comment thread works
Private vs. public work (tool calls are private, comments are public)
Available tools and their usage
The add_comment protocol (single-mode, last-comment routing)
Handoff rules (last valid @tag routing, thread-based fallback)
Comment formatting guidelines (plain prose, no headers, no markdown formatting beyond lists and code blocks)
Iteration budget awareness
Error handling instructions (document blockers, escalate)

This block is identical across all agents and runs.

Block B: Organization Topology (Static per org)

Injected for organizational awareness:

Organization name and description/purpose
Agent roster rendered as a hierarchical ASCII tree showing names and descriptions

lead_dev — Lead developer, owns code quality and architecture
├── backend_dev — Backend Python specialist, APIs and database
└── test_writer — Testing specialist, unit and integration tests

Block C: Agent Identity (Static per agent)

The agent's specific profile:

Agent name
Full instructions content (read from the instructions file)
Boss and direct reports
Guidance to delegate to subordinates and escalate to boss
Workspace path
Iteration budget for this run

Block D: Task Briefing (Dynamic)

The only part that changes per task:

Task ID, title, and organization
Full task description
Comment thread history (all comments chronologically, with author and timestamp)
Instructions to act and post comments with @tag for routing

Caching benefit: Blocks A-C form a stable prefix that LLM providers can cache across runs. Block D at the bottom ensures the cache stays valid even as the task evolves.

Tools

Workspace Tools

All workspace tools are sandboxed to the organization's workspace directory. Path validation ensures no file operations can escape the workspace boundary.

file_read

Read files from the workspace with multiple modes:

Mode	Parameters	Description
`view`	`path`	Display full file content with line numbers. If path is a directory, lists entries.
`lines`	`path`, `start_line`, `end_line`	Show a line range (1-indexed).
`search`	`path`, `search_pattern`	Search for a regex pattern. If path is a directory, searches recursively. Uses `grep -rn`.
`find`	`path`	List files and directories recursively (up to 200 entries).

file_write

Write content to a file. Creates parent directories if needed. Returns the byte count written.

editor

Make targeted edits without rewriting entire files:

Command	Parameters	Description
`str_replace`	`path`, `old_str`, `new_str`	Replace `old_str` with `new_str`. `old_str` must appear exactly once.
`insert`	`path`, `new_str`, `line`	Insert text at a specific line number (1-indexed).
`undo_edit`	`path`	Revert the last edit to this file. Only one level of undo is supported.

The editor maintains an in-memory undo history per file path within a single run.

shell

Execute a shell command in the workspace directory:

Default timeout: 120 seconds
Returns stdout, stderr (prefixed with [stderr]), and exit code
Runs with shell=True for full shell syntax support

Communication Tool

add_comment

The only way agents can write to the task's comment thread. A single-mode tool — post a comment and keep working.

All comments are written to the database immediately when flushed (after each LLM event). There is no "final" comment concept — every comment is equal. When the agent's execution ends (the LLM stops producing tool calls, or max iterations is hit), the routing engine reads the agent's last comment and extracts the last valid @tag for routing.

No tag validation at comment time. Tags are validated at routing time.
No is_final parameter. No double-submission guards.
Agents can post as many comments as they want throughout execution.

Comments are queued in thread_state and flushed to the database by the worker event loop after each LLM event.

Worker Lifecycle

Setup Phase

Load agent, task, comments, and organization records from the database.
Validate that the workspace directory exists.
Build the context payload via build_context_payload().
Determine the model (agent-specific override or system default).
Build the set of valid routing targets (all active agent names + "user").
Create the ADK runner with tools and initial heartbeat.

Execution Loop

The worker iterates over ADK runner events:

LLM reasoning: Extract text from event content parts. Record as an llm_reasoning step.
Tool calls: Extract function calls. Record each as a tool_call step with tool name and arguments.
Comment flushing: Check thread_state for pending add_comment calls. Flush all pending comments to the database.
Token counting: Aggregate input and output tokens from usage_metadata.
Heartbeat: Update last_heartbeat on the run record. Check cancel_requested flag.
Iteration limit: If iteration_count >= max_iterations, raise MaxIterationsExceeded.

Steps are flushed to the database after each step, enabling real-time progress visibility.

Completion

On successful completion:

Flush any remaining pending comments.
If the agent posted no comments (comment_count == 0), insert a system note: "[SYSTEM: Agent completed execution without submitting a response.]"
Compute cost via openrouter_cost() (uses LiteLLM pricing tables).
Update run: status='success', fill token counts, cost, and execution steps.
Call handle_routing() to route the task based on the last valid @tag in the agent's last comment.

Error Handling

Error	Run Status	Task Status	System Comment
`MaxIterationsExceeded`	`success`	routed normally	"Agent reached iteration limit (N iterations)"
`RunTimedOut`	`failed`	`failed`	Timeout message with duration
`CancelledError`	`preempted`	unchanged (already handled by preempt/cancel)	-
Generic exception	`failed`	`failed`	Sanitized error message

Max iterations vs. timeout: Max iterations means the agent did useful work but used up its iteration budget — remaining comments are flushed, a system note is posted, and routing proceeds normally using the agent's last comment. Timeout usually means something is stuck (hung shell command, infinite loop), so the task is marked as failed with no routing attempted.

Error sanitization converts technical errors into user-friendly messages:

Context window / token errors -> "Context window exceeded"
429 / rate limit -> "LLM rate limit reached"
Connection / network errors -> "Network error"
Others -> First 120 chars with pointer to run view

Heartbeat and Cancellation

Every event triggers _heartbeat_and_check_cancel():

Updates last_heartbeat to current UTC time.
Reads cancel_requested from the database.
If cancel_requested is set, raises asyncio.CancelledError.

This enables:

Liveness detection: The dispatcher identifies orphaned workers (heartbeat > 60s stale) and marks them as failed.
Cooperative cancellation: User preemption and dispatcher shutdown set the flag; workers check it on every event and self-terminate cleanly.

Cost Computation

Cost is computed using LiteLLM's cost_per_token() function:

cost = openrouter_cost(model, input_tokens, output_tokens)

If the model isn't in LiteLLM's pricing table, cost defaults to 0.0. Costs are stored per run and aggregated at query time for tasks, agents, organizations, and the system.

Model Selection

The model used for a run is determined by:

Agent-specific model (if set via agent edit --model): Takes priority.
System default (config.llm.default_model): Falls back to this if the agent has no override.

The default model is minimax/minimax-m2.7. Models are specified in OpenRouter format (e.g., anthropic/claude-3-5-sonnet-20241022).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Execution

Runtime

Context Payload

Block A: System Rules (Static)

Block B: Organization Topology (Static per org)

Block C: Agent Identity (Static per agent)

Block D: Task Briefing (Dynamic)

Tools

Workspace Tools

file_read

file_write

editor

shell

Communication Tool

add_comment

Worker Lifecycle

Setup Phase

Execution Loop

Completion

Error Handling

Heartbeat and Cancellation

Cost Computation

Model Selection

FilesExpand file tree

agent-execution.md

Latest commit

History

agent-execution.md

File metadata and controls

Agent Execution

Runtime

Context Payload

Block A: System Rules (Static)

Block B: Organization Topology (Static per org)

Block C: Agent Identity (Static per agent)

Block D: Task Briefing (Dynamic)

Tools

Workspace Tools

file_read

file_write

editor

shell

Communication Tool

add_comment

Worker Lifecycle

Setup Phase

Execution Loop

Completion

Error Handling

Heartbeat and Cancellation

Cost Computation

Model Selection