Agent Orchestrator is a long-running Go daemon that supervises multiple parallel AI coding agent sessions. Each session runs in an isolated git worktree with its own runtime, while the daemon coordinates lifecycle, observes external state, and routes feedback.
- Mental Model
- System Overview
- Core Architectural Principles
- Component Architecture
- Data Flows
- Persistence and CDC
- Status Derivation
- Lifecycle Management
- Observation Loops
- HTTP Layer
- Terminal Multiplexing
The fundamental architecture follows a simple three-stage pipeline:
flowchart LR
A[OBSERVE<br/>External Facts] --> B[UPDATE<br/>Durable Facts]
B --> C[DERIVE<br/>Display Status / ACT]
Key insight: Display status is never stored. It is computed at read time from durable facts.
The only persistent session state is:
activity_state— What the agent last reported (active,idle,waiting_input,exited)is_terminated— Whether the session should be treated as over- PR facts —
pr,pr_checks,pr_commenttables
Display status like working, needs_input, ci_failed, mergeable are computed at read time by the service layer from the durable facts above.
graph TB
subgraph Frontend
FE[Electron + React UI]
CLI[ao CLI]
end
subgraph HTTP["HTTP Daemon (127.0.0.1)"]
Controllers[REST Controllers]
SSE[SSE Events]
Terminal[Terminal WebSocket]
end
subgraph Core["Core Services"]
SessionSvc[Session Service]
ProjectSvc[Project Service]
PRSvc[PR Service]
ReviewSvc[Review Service]
SessionMgr[Session Manager]
LCM[Lifecycle Manager]
end
subgraph Observe["Observation Layer"]
SCMObserver[SCM Observer]
Reaper[Runtime Reaper]
end
subgraph Storage["Persistence Layer"]
SQLite[(SQLite DB)]
CDC[CDC Poller]
Broadcaster[Event Broadcaster]
end
subgraph Adapters["Adapters"]
AgentAdapter[Agent Adapters]
RuntimeAdapter[Runtime tmux/conpty]
WorkspaceAdapter[Workspace git worktree]
SCMAdapter[SCM GitHub]
end
FE -->|REST/SSE| Controllers
CLI -->|REST| Controllers
Controllers --> SessionSvc
Controllers --> ProjectSvc
Controllers --> PRSvc
SessionSvc --> SessionMgr
SessionMgr --> LCM
SessionMgr --> AgentAdapter
SessionMgr --> RuntimeAdapter
SessionMgr --> WorkspaceAdapter
LCM --> SQLite
LCM --> AgentAdapter
SCMObserver --> SCMAdapter
SCMObserver --> SQLite
SCMObserver --> LCM
Reaper --> RuntimeAdapter
Reaper --> SQLite
Reaper --> LCM
CDC -->|poll| SQLite
CDC --> Broadcaster
Broadcaster --> SSE
Terminal --> RuntimeAdapter
Core code never depends on concrete implementations. All external systems are accessed through port interfaces defined in backend/internal/ports/:
graph LR
Core[Core Services] -->|consumes| Ports[Port Interfaces]
Adapters[Adapters] -->|implement| Ports
External[External Systems] -->|wrapped by| Adapters
Storage layer persists minimal facts. Service layer computes display status on-demand:
flowchart LR
SQLite[(SQLite)] -->|raw facts| Service[Session Service]
Service -->|compute| Status[Display Status]
Service -->|enrich| UI[Dashboard/UI]
SQLite -->|activity_state| Service
SQLite -->|is_terminated| Service
SQLite -->|PR facts| Service
SQLite -->|runtime_handle| Service
Observation is separated from action:
- Observe layer — SCM Observer, Runtime Reaper poll external state
- Lifecycle layer — Reduces observations into durable facts
- Service layer — Computes display status from facts
All durable changes flow through a CDC pipeline:
flowchart LR
DB[(SQLite)] -->|triggers| ChangeLog[change_log table]
ChangeLog -->|tail| Poller[CDC Poller]
Poller -->|Event| Broadcaster[Event Broadcaster]
Broadcaster -->|fan-out| Subscribers[Subscribers]
Subscribers -->|SSE| Clients[Dashboard Clients]
backend/internal/
├── domain/ # Shared vocabulary and durable fact records
├── ports/ # Inbound/outbound interfaces
├── service/ # Controller-facing services
│ ├── project/ # Project CRUD
│ ├── session/ # Session read-model assembly
│ ├── pr/ # PR observation service
│ └── review/ # Code review service
├── session_manager/ # Internal session command engine
├── lifecycle/ # Durable session fact reducer
├── observe/ # Observation loops
│ ├── scm/ # SCM (GitHub) observer
│ └── reaper/ # Runtime liveness observer
├── storage/ # SQLite persistence
│ └── sqlite/ # DB, migrations, queries, stores
├── cdc/ # Change-log poller and broadcaster
├── httpd/ # HTTP API, controllers, terminal mux
├── terminal/ # Terminal session protocol
├── adapters/ # Concrete adapter implementations
│ ├── agent/ # 23+ agent harnesses
│ ├── runtime/ # tmux/conpty runtimes
│ ├── workspace/ # git worktree
│ ├── scm/ # GitHub
│ └── tracker/ # GitHub tracker
├── daemon/ # Production wiring
└── config/ # Environment-based configuration
sequenceDiagram
participant UI as Dashboard
participant HTTP as HTTP Controller
participant Svc as Session Service
participant Mgr as Session Manager
participant LCM as Lifecycle Manager
participant Agent as Agent Adapter
participant Runtime as Runtime Adapter
participant WS as Workspace Adapter
participant DB as SQLite
participant CDC as CDC Broadcaster
UI->>HTTP: POST /sessions
HTTP->>Svc: Spawn(config)
Svc->>Mgr: Spawn(config)
Note over Mgr: 1. Create session row
Mgr->>DB: Insert session
DB->>CDC: trigger change_log
CDC->>UI: SSE session.created
Note over Mgr: 2. Create workspace
Mgr->>WS: Create(project, branch)
WS->>WS: git worktree add
Note over Mgr: 3. Launch runtime
Mgr->>Runtime: Create(session)
Runtime->>Runtime: Start tmux/conpty
Note over Mgr: 4. Start agent
Mgr->>Agent: GetLaunchCommand()
Agent-->>Mgr: launch command
Mgr->>Runtime: Execute(agent command)
Note over Mgr: 5. Mark spawned
Mgr->>LCM: MarkSpawned(handle)
LCM->>DB: Update activity_state
DB->>CDC: trigger change_log
CDC->>UI: SSE session.updated
Mgr-->>Svc: Session(created)
Svc-->>HTTP: Session response
HTTP-->>UI: 201 Created
flowchart TD
Start([User spawns session]) --> Validate[Validate project config]
Validate --> CreateRow[Create session row in SQLite]
CreateRow --> CreateWS[Create git worktree]
CreateWS --> CreateRT[Launch runtime tmux/conpty]
CreateRT --> GetCmd[Get agent launch command]
GetCmd --> ExecAgent[Execute agent in runtime]
ExecAgent --> MarkSpawned[MarkSpawned in LCM]
MarkSpawned --> Trigger1[CDC: session.created]
Trigger1 --> Trigger2[CDC: session.updated]
Trigger2 --> Done([Session running])
flowchart TD
subgraph SCM["SCM Observer Loop"]
Poll1[Poll PRs every 30s]
Poll1 --> Fetch[Fetch from GitHub API]
Fetch --> Diff[Semantic diff vs local]
Diff --> Changed{Changed?}
Changed -->|Yes| WritePR[Write PR/check/comment]
Changed -->|No| Wait1[Wait for tick]
WritePR --> NotifyLCM[Notify Lifecycle Manager]
NotifyLCM --> Trigger1[CDC event]
Trigger1 --> Wait1
Wait1 --> Poll1
end
subgraph Reaper["Runtime Reaper Loop"]
Poll2[Poll every 5s]
Poll2 --> Probe[Probe each runtime]
Probe --> Report[Report fact to LCM]
Report --> Trigger2[CDC event]
Trigger2 --> Wait2[Wait for tick]
Wait2 --> Poll2
end
LCM[Lifecycle Manager] -->|consumes| NotifyLCM
LCM -->|consumes| Report
sequenceDiagram
participant SCM as SCM Observer
participant LCM as Lifecycle Manager
participant Agent as Agent Adapter
participant Runtime as Runtime Adapter
SCM->>SCM: Observe PR comment
SCM->>LCM: ApplySCMObservation()
LCM->>LCM: Detect actionable feedback
LCM->>Agent: SendNudge(feedback)
SCM->>SCM: Observe CI failure
SCM->>LCM: ApplySCMObservation()
LCM->>LCM: Detect actionable feedback
LCM->>Agent: SendNudge(CI failure)
SCM->>SCM: Observe merge conflict
SCM->>LCM: ApplySCMObservation()
LCM->>LCM: Detect actionable feedback
LCM->>Agent: SendNudge(merge conflict)
Note over Agent,Runtime: Agent receives nudges via<br/>runtime messages or hooks
erDiagram
projects ||--o{ sessions : owns
sessions ||--o{ pull_requests : owns
pull_requests ||--o{ pr_checks : has
pull_requests ||--o{ pr_review_threads : has
pull_requests ||--o{ pr_comments : has
sessions ||--o{ notifications : has
change_log }|--|| projects : tracks
change_log }|--|| sessions : tracks
change_log }|--|| pull_requests : tracks
projects {
string id PK
string name
string repo
jsonb config
}
sessions {
string id PK
string project_id FK
string harness
string activity_state
boolean is_terminated
jsonb metadata
}
pull_requests {
string id PK
string session_id FK
integer number
string state
string title
boolean draft
boolean mergeable
}
pr_checks {
string id PK
string pr_id FK
string name
string status
string conclusion
}
change_log {
bigint seq PK
string table_name
string row_id
string operation
jsonb old_data
jsonb new_data
}
flowchart LR
DB[(SQLite)] -->|INSERT/UPDATE/DELETE| Trigger[DB Trigger]
Trigger -->|append| ChangeLog[change_log]
ChangeLog -->|poll| Poller[CDC Poller]
Poller -->|decode| Decoder[Event Decoder]
Decoder -->|Event| Broadcaster[Broadcaster]
Broadcaster -->|callback| Sub1[Terminal Fanout]
Broadcaster -->|callback| Sub2[SSE Writer]
Broadcaster -->|callback| Sub3[Cache Invalidation]
Poller -->|watermark| Watermark[seq tracking]
Watermark -->|resume position| Poller
The service.Session computes display status from durable facts using this precedence (highest to lowest):
flowchart TD
CheckTerm{is_terminated?}
CheckTerm -->|Yes| PRMerged{PR merged?}
CheckTerm -->|No| CheckWait{activity_state<br/>== waiting_input?}
PRMerged -->|Yes| Merged[merged]
PRMerged -->|No| Terminated[terminated]
CheckWait -->|Yes| NeedsInput[needs_input]
CheckWait -->|No| CheckPR{Has PR facts?}
CheckPR -->|Yes| PRPipeline[PR Pipeline Check]
CheckPR -->|No| CheckActive{activity_state<br/>== active?}
PRPipeline --> PRState{PR State}
PRState -->|ci failed| CIFailed[ci_failed]
PRState -->|draft| Draft[draft]
PRState -->|changes requested| Changes[changes_requested]
PRState -->|not mergeable| Conflict[merge_conflict]
PRState -->|mergeable| Mergeable[mergeable]
PRState -->|approved| Approved[approved]
PRState -->|review pending| ReviewPending[review_pending]
PRState -->|open| PROpen[pr_open]
CheckActive -->|Yes| Working[working]
CheckActive -->|No| CheckSignal{Signal capable<br/>&& no signal?}
CheckSignal -->|Yes| NoSignal[no_signal]
CheckSignal -->|No| Idle[idle]
flowchart LR
PR[Open PR] --> CI{CI Status}
CI -->|failing| CIFailed[ci_failed]
CI -->|pending| CIPending[ci_pending]
CI -->|passing| Review{Reviews}
Review -->|changes requested| Changes[changes_requested]
Review -->|approved| Mergeable{Mergeable?}
Mergeable -->|conflict| Conflict[merge_conflict]
Mergeable -->|yes| Merged[Mergeable]
PR -.->|draft| Draft[Draft State]
The lifecycle.Manager is the canonical write path for all session lifecycle facts:
flowchart TD
subgraph Inputs["Observation Inputs"]
RuntimeObs[Runtime Observations]
ActivitySignals[Agent Activity Signals]
SCMObs[SCM Observations]
end
subgraph LCM["Lifecycle Manager"]
Reducer[Fact Reducer]
StateMachine[Activity State Machine]
Termination[Termination Logic]
Nudge[Agent Nudge Engine]
end
subgraph Outputs["Durable Facts"]
ActivityState[activity_state]
IsTerminated[is_terminated]
PRFacts[PR Facts Table]
end
RuntimeObs --> Reducer
ActivitySignals --> Reducer
SCMObs --> Reducer
Reducer --> StateMachine
StateMachine --> Termination
Termination --> ActivityState
Termination --> IsTerminated
SCMObs --> Nudge
Nudge -->|route| Agent[Agent Adapter]
stateDiagram-v2
[*] --> Spawning: Spawn()
Spawning --> Active: MarkSpawned
Active --> Idle: activity_state = idle
Active --> Working: activity_state = active
Active --> Waiting: activity_state = waiting_input
Active --> Exited: activity_state = exited
Working --> Active: work completes
Waiting --> Active: user responds
Idle --> Active: agent starts work
Exited --> Terminated: process exit
Active --> Terminated: Kill()
Waiting --> Terminated: Kill()
Idle --> Terminated: Kill()
Terminated --> [*]
note right of Active
Agent is working
Runtime alive
end note
note right of Waiting
Agent needs input
Waiting for user
end note
note right of Terminated
Session over
Runtime cleaned up
end note
The lifecycle manager only terminates when all conditions are met:
flowchart TD
Check{Can terminate?}
Check -->|No| Keep[Keep running]
Check -->|Yes| AllDead{Runtime AND<br/>process dead?}
AllDead -->|No| Keep
AllDead -->|Yes| NoRecent{No recent<br/>activity?}
NoRecent -->|No| Keep
NoRecent -->|Yes| NoPR{No merged PR<br/>ownership?}
NoPR -->|No| Keep
NoPR -->|Yes| Terminate[Mark terminated]
Terminate --> Cleanup[Trigger cleanup]
Cleanup --> CDC[CDC event]
CDC --> UI[Dashboard update]
Key principle: Failed probes are NOT proof of death. A session is only terminated when the runtime and process are both clearly dead and recent activity doesn't contradict that.
flowchart TD
Start([Observer Start]) --> Immediate[Immediate Poll]
Immediate --> Loop{Tick every 30s}
Loop --> ListRepos[List active repos]
ListRepos --> CheckCreds{Credentials<br/>available?}
CheckCreds -->|No| Disabled[Disabled mode]
CheckCreds -->|Yes| Fetch[Fetch PRs via ETags]
Fetch --> ListPRs[List open PRs]
ListPRs --> Discover[Discover new PRs]
Discover --> FetchDetailed[Fetch detailed PR data]
FetchDetailed --> FetchChecks[Fetch CI checks]
FetchChecks --> FetchReviews[Fetch review threads]
FetchReviews --> Write[Write to SQLite]
Write --> Notify[Notify Lifecycle]
Notify --> Trigger[CDC event]
Disabled --> Loop
Trigger --> Loop
flowchart TD
Start([Reaper Start]) --> Loop{Tick every 5s}
Loop --> List[List non-terminated<br/>sessions]
List --> ForEach[For each session]
ForEach --> GetHandle{Has runtime<br/>handle?}
GetHandle -->|No| Skip[Skip session]
GetHandle -->|Yes| Probe[Probe runtime]
Probe --> Result{Probe result}
Result -->|Error| ReportFailed[Report ProbeFailed]
Result -->|Alive| ReportAlive[Report ProbeAlive]
Result -->|Dead| ReportDead[Report ProbeDead]
ReportFailed --> Apply[ApplyRuntimeObservation]
ReportAlive --> Apply
ReportDead --> Apply
Apply --> LCM[Lifecycle Manager]
LCM --> Update[Update facts]
Update --> CDC[CDC event]
Skip --> NextSession{More sessions?}
CDC --> NextSession
NextSession -->|Yes| ForEach
NextSession -->|No| Loop
flowchart LR
subgraph External["External State"]
GitHub[GitHub API]
Runtimes[tmux/conpty]
end
subgraph Observers["Observation Layer"]
SCM[SCM Observer]
Reaper[Runtime Reaper]
end
subgraph Core["Core Processing"]
LCM[Lifecycle Manager]
PRMgr[PR Manager]
end
subgraph Storage["Persistence"]
SQLite[(SQLite)]
end
GitHub --> SCM
Runtimes --> Reaper
SCM --> PRMgr
PRMgr --> SQLite
PRMgr --> LCM
Reaper --> LCM
LCM --> SQLite
flowchart TD
subgraph HTTPD["HTTP Daemon"]
Router[Router + Middleware]
Router --> API[REST API]
Router --> Events[SSE Events]
Router --> Terminal[Terminal WebSocket]
end
subgraph Controllers["Controllers"]
Sessions[Sessions Controller]
Projects[Projects Controller]
PRs[PRs Controller]
Reviews[Reviews Controller]
end
subgraph Services["Services"]
SessionSvc[Session Service]
ProjectSvc[Project Service]
PRSvc[PR Service]
ReviewSvc[Review Service]
end
API --> Sessions
API --> Projects
API --> PRs
API --> Reviews
Sessions --> SessionSvc
Projects --> ProjectSvc
PRs --> PRSvc
Reviews --> ReviewSvc
Events -->|subscribe| CDC[CDC Broadcaster]
Terminal --> TerminalMux[Terminal Manager]
sequenceDiagram
participant Client
participant Router
participant Controller
participant Service
participant Manager
participant Store
participant DB
Client->>Router: POST /api/v1/sessions
Router->>Router: Middleware (auth, logging)
Router->>Controller: handler(w, r)
Controller->>Controller: decode JSON
Controller->>Service: Spawn(config)
Service->>Manager: Spawn(config)
Manager->>Store: Create session
Store->>DB: INSERT INTO sessions
DB->>Store: session record
Store->>Manager: session record
Manager->>Manager: Create workspace
Manager->>Manager: Launch runtime
Manager->>Service: Session response
Service->>Controller: enriched session
Controller->>Controller: encode JSON
Controller->>Client: 201 Created + Session
flowchart TD
subgraph Frontend
Browser[Browser Terminal]
end
subgraph HTTPD
WS[WebSocket Handler]
end
subgraph Terminal
Mux[Terminal Mux]
Sessions[Session States]
end
subgraph Runtime
TMux[tmux Runtime]
ConPTY[conpty Runtime]
end
Browser -->|WebSocket| WS
WS -->|attach| Mux
Mux --> Sessions
Sessions -->|create| TMux
Sessions -->|create| ConPTY
TMux -->|PTY attach| Mux
ConPTY -->|loopback dial| Mux
Mux -->|frame| WS
WS -->|binary| Browser
sequenceDiagram
participant Client as Browser
participant WS as WebSocket Handler
participant Mux as Terminal Mux
participant Runtime as tmux/conpty
Client->>WS: WebSocket upgrade
WS->>Mux: Attach(session, rows, cols)
Mux->>Runtime: Attach(handle, rows, cols)
Runtime->>Runtime: Create PTY
Runtime->>Runtime: Spawn tmux attach
loop Data Loop
Runtime->>Mux: PTY output
Mux->>WS: Binary frame
WS->>Client: WebSocket message
Client->>WS: User input
WS->>Mux: Input frame
Mux->>Runtime: Write to PTY
end
Client->>WS: Close
WS->>Mux: Detach
Mux->>Runtime: Close PTY
These rules are load-bearing — changing them breaks fundamental architectural assumptions:
- Never store display status — Status is derived from durable facts at read time
- Never treat failed probes as death — A failed probe is a fact, not a termination signal
- Never force-delete dirty worktrees — User data safety over cleanup convenience
- All app state under ~/.ao — No OS-default app-data locations
- Daemon binds to 127.0.0.1 only — No network exposure, ever
- CLI is thin — All logic lives in the daemon, CLI is just an HTTP client
- CDC is source-truth for events — DB triggers write to change_log, poller fans out
- Adapters are leaves — Adapters never import core packages, only ports and domain
- Hooks are gitignored — Every file an adapter writes must be in .gitignore
- Migrations never change — Add new migrations, never modify existing ones
Agent Orchestrator's architecture is designed around:
- Separation of concerns — Observation, persistence, and display are distinct layers
- Port-based design — Core code depends on interfaces, not implementations
- Durable minimalism — Store only facts, compute everything else
- Event-driven updates — CDC broadcasts changes to all subscribers
- Isolation — Each session in its own worktree with its own runtime
- Safety — Conservative termination, path validation, gitignored hooks
This architecture enables parallel AI agents to work safely while maintaining complete visibility and control.