Last updated: 2026-05-15 · Doc version: 2.1.0 · Entrypoint template: v14 (shared shim — body in
closed_claw/runtime/agent_loop.py)Deep reference. For a quick file-by-file map, see
CODEBASE_MAP.md. For conventions, seeCONVENTIONS.md.
Closed Claw is a local capsule-based multi-agent orchestrator. A coordinator LLM routes incoming tasks to specialized agent subprocesses ("capsules"), enforces approval gates, executes tools on behalf of agents, and records full audit/observability traces — all on a single machine, no external services required by default.
Primary components:
| Component | Location | Role |
|---|---|---|
| Coordinator graph | coordinator/graph.py, coordinator/nodes.py |
Central LangGraph state machine that drives every run |
| Agent capsules | agents/<agent_id>/ |
Isolated subprocesses; implement the JSON-line protocol |
| Registry + vectors | registry/store.py, registry/schema.sql |
SQLite store for agents, runs; sqlite-vec for semantic search |
| Runtime protocol | runtime/protocol.py, runtime/runner.py |
JSON-line message framing + subprocess lifecycle |
| Policy engine | policy/approval.py, policy/audit.py |
Human-in-the-loop gates (4 modes incl. web); structured audit trail |
| Tool execution | tools/executor.py |
Sandboxed terminal, HTTP, file I/O, Python, SQL tools |
| Embeddings | embeddings/provider.py |
sentence-transformers or SHA-256 hash fallback |
| Run logs | observability/runlog.py |
Per-run JSONL event stream |
| CLI | cli.py |
All user-facing commands (15 subcommands) |
| Web dashboard | web/server.py |
FastAPI REST API (27 endpoints) + SSE streaming + web approvals |
Every run command follows this LangGraph node sequence:
ingest_task
│ Validate task string; assign run_id + session_id; initialize state
▼
decompose_task
│ LLM/heuristic: split task into atomic subtasks with role tags + dependency graph
▼
execute_task_pool
│ For each subtask (dependency-aware):
│ 1. Embed role description → semantic search → rerank → reuse or create agent
│ 2. Launch agent subprocess
│ 3. Drive JSON-line I/O loop:
│ - api_call_intent → ApprovalGate + circuit breaker → ApiCallDecision
│ - tool_call_intent → allowlist check → ToolExecutor → ToolCallResult
│ 4. Receive AgentResponse; record result + metrics
▼
validate_outputs
│ Assert all subtasks completed OK; surface failures
▼
update_registry_and_audit
│ Update agent usage_count / success_rate / avg_latency; write audit rows
▼
synthesize_final_response
│ Merge subtask results into a single response string
▼
END
State flows as an immutable-merge dict (CoordinatorState) through every node. Each node returns _merge(state, **updates) — it never mutates the input.
execute_task_pool runs tasks in two phases:
- Discovery phase —
generate_task_plan(phase="discovery")decomposes the task into information-gathering subtasks - Execution phase —
generate_task_plan(phase="execution", discovery_results=...)uses discovery outputs to plan action subtasks
Each phase independently acquires agents (reuse or create) and runs them with dependency-awareness. The max_subtasks_per_phase guardrail (default 4) caps subtask count per phase.
synthesize_final_response calls the LLM to produce a coherent, user-facing summary from all subtask results — not just a passthrough concatenation.
agents/
skills/ # Shared base skill library (Layer 1 of system prompt composition)
terminal.md # Shell execution patterns
python_scripting.md # Python code / data-processing patterns
git.md # Git workflow patterns
file_system.md # file_io tool patterns
web_http.md # http_api / web_fetch patterns
sql_databases.md # sql_query patterns
data_analysis.md # data ingestion → transform → report patterns
<agent_id>/
manifest.json # AgentManifest — identity, metrics, tools_allowlist, tags, skill_ids, embedding
skill.md # Role overlay — agent-specific identity + decision rules (Layer 2)
memory.db # SQLite episodic memory (key/value; agent-writable)
entrypoint.py # Thin shim (v14) — delegates to closed_claw.runtime.agent_loop
logs/ # Per-run output artifacts
- Creation:
CoordinatorNodes._acquire_agent_for_rolecallsAgentFactory.create_capsule(), which writes the capsule directory.generate_agent_profile(via LLM or heuristic) producesskill.md,tools_allowlist, andskill_ids— a list of base skill module IDs the agent should inherit. - Reuse: If a registered agent's embedding is above
low_confidence_thresholdfor the current task, it is reused directly. - System prompt composition: Before each run,
CoordinatorNodes._compose_system_prompt()loads allagents/skills/<id>.mdfiles listed inmanifest.skill_ids(Layer 1) then appendsagents/<agent_id>/skill.md(Layer 2). The result is passed asconfig["system_prompt"]in theCoordinatorRequest. - Execution:
AgentRunner.run_agent(agent_dir, request, on_intent)launchesentrypoint.pyas a subprocess and drives the I/O loop. The v14 shim delegates toclosed_claw.runtime.agent_loop.main, which readsconfig["system_prompt"]and injects it as the system role message in every LLM call. Centralising the loop body means a fix lands in every capsule without per-capsule rewrites — capsules just need the entrypoint version bumped viacli rewrite-entrypoints. - Metrics: After each run,
RegistryStoremetrics update recordsusage_count,success_rate,avg_latency_ms.
Every agent's LLM identity is a two-layer composed system prompt built by CoordinatorNodes._compose_system_prompt() at run time:
Layer 1 — Base skill modules (agents/skills/<skill_id>.md)
Modular, reusable capability knowledge.
One file per tool domain: terminal, git, file_system, http, etc.
Each agent manifest carries a skill_ids list; only listed modules are included.
[ terminal.md ] [ git.md ] [ python_scripting.md ] ...
↓ ↓ ↓
concatenated in manifest skill_ids order
Layer 2 — Role overlay (agents/<agent_id>/skill.md)
Agent-specific identity, high-level decision rules, output format.
Appended after Layer 1.
The resulting string is injected as:
config["system_prompt"]inCoordinatorRequest(passed to the subprocess)- The system/role message prepended to every LLM call inside the shared
agent_loop.mainbody (v14 shim)
Adding a new base skill: Create agents/skills/<name>.md, add the module name to _BASE_SKILL_IDS in registry/search.py, and add a short faithful scope description to _BASE_SKILL_DESCRIPTIONS in the same file. The description is injected into the profile-gen prompt via _build_skill_catalog() so the LLM picks the skill on its actual scope rather than guessing from the bare ID — agents/skills/*.md files don't currently exist on disk, so the description is the only ground truth the prompt has.
Transport: \n-delimited JSON over stdin (coordinator → agent) and stdout (agent → coordinator).
Coordinator Agent
──────────── ─────
CoordinatorRequest
─────────────────────►
(optional, repeat):
ToolCallIntent
◄─────────────────────
ToolCallResult
─────────────────────►
ApiCallIntent
◄─────────────────────
ApiCallDecision
─────────────────────►
AgentResponse (final)
◄─────────────────────
| Type | Direction | Key fields |
|---|---|---|
CoordinatorRequest |
→ Agent | session_id, task, context, artifacts, config (llm, system_prompt, tool_registry) |
ToolCallIntent |
← Agent | tool, args, reason |
ToolCallResult |
→ Agent | ok, result, error |
ApiCallIntent |
← Agent | provider, endpoint, estimated_cost_usd, reason |
ApiCallDecision |
→ Agent | approved, note |
AgentResponse |
← Agent | status ("ok" or "error"), result, memory_updates, artifacts, metrics |
An agent must emit exactly one AgentResponse as its final line.
Inside the loop, the agent's final action accepts an optional status field — {"type":"final","status":"ok"|"error","result":...}. Emit status="error" when the task cannot be completed with the available tools (required tool missing, hard constraint violated). The coordinator treats agent-self-status "error" as a subtask failure and propagates to the top-level run status — so agents never need to wrap "I gave up" as success-shaped text.
All persistent state lives in .closed_claw/registry.db (SQLite):
| Table | Purpose |
|---|---|
agents |
Agent manifest (JSON blob) + indexed scalar fields |
agent_vectors |
sqlite-vec virtual table for cosine similarity search |
runs |
Run history with status, latency, agent reference |
agent_compositions |
Multi-agent composition records |
provider_circuit_breakers |
Per-provider failure counters + reset timestamps |
Note: The audit_events table is created dynamically by AuditStore._init_tables() (not defined in schema.sql).
A human-readable mirror lives at agents/registry.json.
task text
│
▼
EmbeddingProvider.embed(task) # 384-dim vector
│
▼
RegistryStore.search_agents(vector) # cosine similarity via sqlite-vec
│
▼
Reranker.rerank(task, candidates) # HeuristicReranker or LLMReranker
│
▼
score < low_confidence_threshold?
│ yes → ApprovalGate (create/reuse) → create new capsule
│ no → reuse existing agent
build_reranker(settings) in registry/search.py builds an LLMReranker using the configured provider. A heuristic fallback is used internally within the reranker when LLM calls fail.
Two gates, each independently configurable:
| Gate | Trigger | Config var |
|---|---|---|
| Create/reuse gate | Reranker score below threshold | CLOSED_CLAW_CREATE_APPROVAL_MODE |
| API approval gate | Agent emits api_call_intent for a paid provider |
CLOSED_CLAW_API_APPROVAL_MODE |
Modes for each gate:
interactive— prompts operator on stdin withapi_approval_timeout_secdeadlineapprove— auto-approves (use in CI scripts:--api-approval-mode approve)deny— auto-deniesweb— publishes approval to in-memory queue; web UI dashboard resolves it
Tracks per-provider failure/denial counts in provider_circuit_breakers table. After CLOSED_CLAW_CIRCUIT_BREAKER_FAILURES consecutive failures, the provider is blocked until CLOSED_CLAW_CIRCUIT_BREAKER_RESET_SEC elapses. This prevents runaway retry loops against paid APIs.
- Each agent's
manifest.jsonhas atools_allowlist— only listed tool names are executable by that agent. ToolExecutorenforcesallowed_rootsfilesystem roots (fromSettings.extra_allowed_paths).sql_queryvalidates the query begins withSELECTbefore execution.
Tools are invoked by the coordinator on behalf of agents (agents never execute tools directly).
| Tool | Description |
|---|---|
terminal |
Shell command in workspace |
http_api |
HTTP request (GET/POST/etc.) |
web_fetch |
Fetch + extract text from a URL |
file_io |
list/read/write/append text files within allowed roots |
python_exec |
Execute a Python code snippet in a subprocess |
sql_query |
SELECT-only SQLite query |
Flow: tool_call_intent → allowlist check → ToolExecutor.execute(tool, args) → ToolCallResult.
All configuration is via environment variables (read from os.environ first, then .env file).
Key variable groups:
| Group | Prefix/Example |
|---|---|
| Storage paths | CLOSED_CLAW_DB_PATH, CLOSED_CLAW_AGENTS_DIR, CLOSED_CLAW_RUN_LOGS_DIR |
| Approval modes | CLOSED_CLAW_CREATE_APPROVAL_MODE, CLOSED_CLAW_API_APPROVAL_MODE |
| LLM provider | CLOSED_CLAW_LLM_PROVIDER, CLOSED_CLAW_LLM_MODEL |
| API keys | OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY, SIEMENS_API_KEY |
| Thresholds | CLOSED_CLAW_LOW_CONFIDENCE_THRESHOLD, CLOSED_CLAW_AGENT_TIMEOUT_SEC |
| Circuit breaker | CLOSED_CLAW_CIRCUIT_BREAKER_FAILURES, CLOSED_CLAW_CIRCUIT_BREAKER_RESET_SEC |
| Guardrails | CLOSED_CLAW_SUBTASK_MAX_ATTEMPTS, CLOSED_CLAW_MAX_TOOL_CALLS_PER_AGENT, CLOSED_CLAW_MAX_AGENTS_PER_RUN, CLOSED_CLAW_MAX_SUBTASKS_PER_PHASE |
| Sandbox | CLOSED_CLAW_EXTRA_ALLOWED_PATHS (comma-separated absolute paths) |
See .env.example for the complete list with defaults.
The supervisor owns the task pool loop for the duration of a single run invocation:
waiting → pending → in_progress → completed / failed
- Dependencies are enforced: a subtask stays
waitinguntil all its prerequisite subtasks arecompleted. - Role tags determine which agent handles each subtask.
- If a subtask reaches
CLOSED_CLAW_SUBTASK_MAX_ATTEMPTS(default 2) failures, it is markedfailed. - Status updates are emitted as
task_pool_updateevents to the run JSONL log.
The task pool is not durable — if the CLI process exits mid-run, the task pool is lost. There is no persistent background worker. See docs/todo.md for the target architecture.
| What to extend | Where |
|---|---|
| Add a new LLM provider | registry/search.py (reranker) + coordinator/nodes.py + entrypoint template _execute_llm_http + config.py |
| Add a new tool | tools/executor.py — SUPPORTED_TOOLS, TOOL_REGISTRY, ToolExecutor |
| Change agent capsule template | closed_claw/agents/factory.py — ENTRYPOINT_TEMPLATE (bump version constant) |
| Add a coordinator node | coordinator/nodes.py + coordinator/graph.py |
| Add a CLI command | cli.py — cmd_<name> + _build_parser |
| Change approval logic | policy/approval.py |
| Change routing algorithm | registry/search.py — implement RerankerProtocol |
| Add DB columns | registry/schema.sql + registry/store.py + registry/store.AgentManifest |
| Add a web API endpoint | web/server.py |
| Add a base skill module | agents/skills/<name>.md + add to _BASE_SKILL_IDS in registry/search.py |
When invoked with no subcommand (python -m closed_claw.cli), the system opens a Rich-powered interactive menu:
- ASCII art shown once at startup
- Numbered options: setup / init / doctor / run / list agents / inspect agent / delete agent
- All menu paths delegate to the same
cmd_*functions used by the CLI
The setup wizard prompts for provider → model → API key → runs a live verification request → writes .env. No config is saved until the user confirms.
.closed_claw/ # runtime data (gitignored)
registry.db # SQLite main database
# Tables: agents, agent_vectors, runs, agent_compositions,
# provider_circuit_breakers
# audit_events table created by AuditStore (not in schema.sql)
runs/
<run_id>.jsonl # per-run event stream (one line per event)
agents/ # agent capsule store
registry.json # human-readable agent list
skills/ # shared base skill library
terminal.md
python_scripting.md
git.md
file_system.md
web_http.md
sql_databases.md
data_analysis.md
<agent_id>/
manifest.json # includes skill_ids list
skill.md # role overlay (Layer 2)
memory.db
entrypoint.py # v14 shim → closed_claw.runtime.agent_loop
logs/