Skip to content

Latest commit

 

History

History
464 lines (360 loc) · 25.1 KB

File metadata and controls

464 lines (360 loc) · 25.1 KB

Codebase Map

Last updated: 2026-02-26 · Doc version: 2.0.0

Read this first. Every file, class, and key function — one reference.


Package: closed_claw/

__init__.py

Empty package marker.


cli.py

Purpose: argparse CLI entrypoints for every user-facing command.

Symbol Type What it does
cmd_init function Creates DB, initializes graph wiring
cmd_doctor function Validates env: provider config, API key presence, sqlite/langgraph health
cmd_run function Runs a single task through the full coordinator graph
cmd_agents function Lists registered agents with manifest details
cmd_agent function Shows one agent in full detail (manifest + skill.md + memory count)
cmd_runs function Lists run history from DB
cmd_audit function Lists audit events from DB
cmd_runlog function Streams JSONL events for a specific run
cmd_tools function Lists supported tools; optionally shows one agent's allowlist
cmd_delete_agent function Removes a single agent from registry + capsule dir
cmd_delete_all_agents function Removes all agents from registry + capsule dirs
cmd_setup function Delegates to interactive run_setup_wizard
cmd_cancel_run function Sets a cancellation flag in the run log to stop a running task
cmd_web function Launches FastAPI web dashboard on http://127.0.0.1:7860
_build_parser function Constructs the argparse tree; register new commands here
_migrate_legacy_agents function Auto-migrates agent entrypoints to v14 on init/run
cmd_rewrite_entrypoints function Overwrites every capsule's entrypoint.py with the current shim template (use after bumping CLOSED_CLAW_ENTRYPOINT_VERSION)
_sync_registry_index function Syncs agents/registry.json from DB state
main function CLI entry (python -m closed_claw.cli) — opens menu if no subcommand

Subcommands: setup, init, run, agents, agent, runs, audit, runlog, cancel-run, doctor, tools, delete-agent, delete-all-agents, menu, web

How to add a CLI command: add cmd_<name>(args: Namespace) -> int then register a subparser in _build_parser().


compat.py

Purpose: Thin Pydantic v1/v2 compatibility shim.

Symbol What it does
BaseModel Re-export of pydantic.BaseModel
Field Re-export of pydantic.Field

Always import BaseModel/Field from here, not directly from pydantic.


config.py

Purpose: Environment-backed runtime Settings loading.

Symbol Type What it does
Settings dataclass All runtime config as typed fields; source of truth for all paths
Settings.from_env() classmethod Reads .env + os.environ; call this at CLI entry, never at module init
_load_dotenv function Parses .env file to dict (no dependencies)
_getenv function Reads from os.environ first, then dotenv, then falls back to default

Key fields in Settings (34 total):

  • db_path — path to registry.db
  • agents_dir — root of all agent capsule dirs
  • run_logs_dir — where JSONL run logs go
  • llm_providersiemens | openai | gemini | claude (default: siemens)
  • llm_model — provider-specific model string (default: qwen3-30b-a3b-instruct-2507 for siemens)
  • llm_timeout_sec — LLM call timeout in seconds
  • create_approval_mode / api_approval_modeinteractive | approve | deny | web
  • low_confidence_threshold — float; below this triggers create/reuse gate
  • siemens_api_key / openai_api_key / gemini_api_key / anthropic_api_key — provider keys
  • siemens_base_url / openai_base_url / gemini_base_url / anthropic_base_url — provider base URLs
  • extra_allowed_paths — filesystem sandbox roots for tool execution
  • subtask_max_attempts (default 2) — retry count for failed subtasks
  • max_tool_calls_per_agent (default 50) — intent count limit per agent run
  • max_agents_per_run (default 10) — cap on agents created/used in one run
  • max_subtasks_per_phase (default 4) — cap on subtasks per discovery/execution phase

interactive.py

Purpose: Rich-powered terminal interactive menu.

Symbol Type What it does
run_main_menu function Infinite loop rendering numbered options (setup/init/doctor/run/inspect/delete)

Called by main() in cli.py when no subcommand is given.


setup_wizard.py

Purpose: Guided provider/key setup with live verification before writing .env.

Symbol Type What it does
run_setup_wizard function Prompts for provider → model → API key → verifies → writes .env

Package: closed_claw/agents/

factory.py

Purpose: Creates agent capsule directories and manages the registry.json flat index.

Symbol Type What it does
ENTRYPOINT_TEMPLATE str constant Template for entrypoint.py (currently v14). Thin shim that imports and calls closed_claw.runtime.agent_loop.main. v14 replaced the v13 per-capsule loop body with a shared module so fixes land in every capsule via cli rewrite-entrypoints (no per-capsule code edit). Loop config (agent_loop_max_steps, agent_loop_max_consecutive_errors) is read from config at run time. Auto-migrated on cli init / cli run.
AgentFactory class Creates capsule dirs: manifest.json, skill.md, memory.db, entrypoint.py, logs/
AgentFactory.create_capsule(name, description, ..., skill_content, skill_ids) method Writes capsule files to agents_dir/<agent_id>/; updates registry.json; stores skill_ids in manifest
AgentFactory.save_registry_index(path, manifests) static method Overwrites registry.json with a fresh list

Version detection: The entrypoint template version tag (CLOSED_CLAW_ENTRYPOINT_VERSION=14) is read by _migrate_legacy_agents in cli.py to auto-upgrade older capsules. The shared loop body lives at closed_claw/runtime/agent_loop.py — see the closed_claw/runtime/ package section below.


Package: closed_claw/coordinator/

state.py

Purpose: Shared coordinator state shapes.

Symbol What it is
CoordinatorState TypedDict — all possible keys in graph state dict
Candidate TypedDict{agent_id, score, reason} entry in candidate lists

graph.py

Purpose: Builds and wires the LangGraph StateGraph.

Symbol What it does
build_graph(settings) Instantiates all sub-components, creates StateGraph(dict), adds all nodes + edges, compiles and returns the runnable graph

Node order (edges):

ingest_task → decompose_task → execute_task_pool → validate_outputs
  → update_registry_and_audit → synthesize_final_response → END

To add a node: add it to CoordinatorNodes, then call graph.add_node(...) and graph.add_edge(...) in build_graph.


nodes.py

Purpose: All coordinator graph node implementations. ~1650 lines — the heart of the system.

Symbol Type What it does
CoordinatorNodes class Holds all node methods plus shared sub-components
__init__ method Wires settings, registry, reranker, embedder, runner, factory, approval_gate, audit, tool_executor
ingest_task async node Validates task string, sets run_id/session_id, initializes state
decompose_task async node Uses LLM/heuristic to split task into atomic subtasks with role tags and dependencies (discovery phase)
execute_task_pool async node Two-phase execution: discovery phase → execution phase. Dependency-aware loop: resolves agents per role tag, runs them via AgentRunner, handles api_call_intent / tool_call_intent mid-flight
validate_outputs async node Checks all subtasks completed OK; marks failures
update_registry_and_audit async node Updates agent metrics (usage, success rate, latency); writes audit rows
synthesize_final_response async node LLM synthesis — calls LLM to merge all subtask results into a coherent final response string
_handle_api_call_intent method Routes api_call_intent through ApprovalGate and circuit breaker
_handle_tool_call_intent method Validates allowlist + executes via ToolExecutor
_acquire_agent_for_role method Embeds role description → semantic search → rerank → reuse or create
_create_agent method Calls AgentFactory.create_capsule, generates skill.md + tools_allowlist + skill_ids via LLM/heuristic
_compose_system_prompt method Loads base skill modules (Layer 1) + role overlay (Layer 2) into a single system prompt string
_request_config_for_agent method Builds config dict for CoordinatorRequest — includes llm, system_prompt (composed), and tool_registry
_prepare_phase_pool method Prepares a subtask pool for a given phase (discovery or execution)
_execute_phase_pool method Runs all subtasks in a phase pool with dependency-awareness
_run_single_subtask method Executes one subtask: acquires agent, runs it, handles retries. Runs _verify_subtask_tool_execution + _verify_acceptance_criteria before marking the subtask completed
_find_reusable_capability_agent method Reuse gate: semantic-similarity-primary search; role-tag overlap + required-tools subset as sanity filters; below-threshold candidates skipped; success-rate-weighted ranking
_should_skip_discovery static method Decides whether to skip discovery based on which subtask pools are populated (used by the execute_task_pool simple-classification fast path)
_verify_subtask_tool_execution static method Filesystem-side check that any tool actually ran for a tool-requiring subtask
_verify_acceptance_criteria static method Lightweight content check: matches eight canonical "I gave up" failure phrasings (safety net for the agent's self-reported status) plus the $HOME / ~ rejection for "absolute path" criteria
_merge static method Shallow-merges two state dicts
_emit_runlog method Writes a named event to the run's JSONL log

Legacy/unwired methods (not connected to the graph): embed_task, semantic_search, llm_rerank, human_gate_if_low_confidence, decide_reuse_or_create, create_agent_if_needed, dispatch_agents_async, approval_gate_for_api_calls, continue_or_deny_api_path, failure_recovery


Package: closed_claw/embeddings/

provider.py

Purpose: Text-to-vector embedding with graceful fallback.

Symbol What it does
EmbeddingProvider Wraps sentence-transformers if available; returns zero-vector of correct dim otherwise
EmbeddingProvider.embed(text) Returns list[float] of length embedding_dim

Package: closed_claw/observability/

runlog.py

Purpose: Per-run JSONL event logging.

Symbol What it does
RunLogger Writes {"event": ..., "payload": ..., "ts": ...} lines to .closed_claw/runs/<run_id>.jsonl
RunLogger.emit(event, payload) Appends one event line

Package: closed_claw/policy/

approval.py

Purpose: Human-in-the-loop approval gate for paid API calls / low-confidence agent creation.

Symbol What it does
ApprovalRequest Pydantic model — call type, provider, endpoint, cost, reason, session_id
ApprovalDecision Pydantic model — approved bool, operator, timestamp, note
ApprovalGate Prompts user with timeout; returns ApprovalDecision
ApprovalGate.prompt(req, operator) Blocks on stdin with timeout_sec deadline

Gate modes (set via CLI flags or env vars):

  • interactive — shows prompt and waits
  • approve — auto-approves (CI / scripts)
  • deny — auto-denies
  • web — publishes to in-memory queue; web UI resolves via /api/approvals/*/decide

Web mode functions: get_pending_approvals(), resolve_approval(), _web_prompt()


audit.py

Purpose: Structured audit event persistence.

Symbol What it does
AuditStore Writes rows to audit_events table in registry.db
AuditStore.record(event_type, agent_id, session_id, payload) Inserts one audit row

Package: closed_claw/registry/

schema.sql

Purpose: SQLite DDL. Tables:

  • agents — agent manifests (JSON blob + indexed fields)
  • agent_vectors — sqlite-vec virtual table for cosine search
  • runs — run history
  • agent_compositions — multi-agent composition records
  • provider_circuit_breakers — failure tracking per provider

Note: audit_events table is created dynamically by AuditStore._init_tables(), not in this DDL file.


store.py

Purpose: SQLite-backed CRUD and semantic vector search for agents and runs.

Symbol What it does
AgentManifest Pydantic model — full agent identity, metrics, tools, embedding. Includes skill_ids: list[str] for base skill module selection.
SearchCandidate Dataclass — {agent_id, score, description} from vector search
RegistryStore All DB operations
RegistryStore.upsert_agent(manifest) Insert or update agent + vector
RegistryStore.search_agents(vector, limit) Cosine similarity search via sqlite-vec
RegistryStore.get_agent(agent_id) Fetch one AgentManifest
RegistryStore.list_agents(limit) Fetch N agents ordered by usage
RegistryStore.delete_agent(agent_id) Remove agent + vector row
RegistryStore.record_run(...) Insert a run record
RegistryStore.update_agent_metrics(...) Update usage_count, success_rate, avg_latency

search.py

Purpose: Reranking, task plan generation, and agent profile generation.

Symbol What it does
_BASE_SKILL_IDS list[str] constant — canonical list of available base skill module IDs; intended to match files in agents/skills/ (those files don't exist on disk yet; see _BASE_SKILL_DESCRIPTIONS)
_BASE_SKILL_DESCRIPTIONS dict[str, str] constant — short faithful scope description per skill ID; serves as the ground truth the profile-gen prompt sees until per-skill .md files exist
_build_tool_catalog(allowed) Injects description + args_schema + side_effects per tool into the profile-gen prompt (Fix A — closes hallucinated-capability bug where bare names let the LLM invent tool constraints)
_build_skill_catalog() Mirror of _build_tool_catalog for skill IDs — same hallucination-closing pattern, applied to skill modules
_CANONICAL_ROLE_RULES / _derive_canonical_tags(task, description, tools) Deterministic regex-based tagging (fs.list, db.read, tool.terminal, ...) appended to LLM-generated tags so reuse tag-overlap doesn't depend on LLM tag stability
RerankerProtocol Protocol interface — .rerank(task, candidates) -> list[Candidate]
HeuristicReranker Keyword overlap + recency scoring; used internally as fallback when LLM calls fail
LLMReranker Uses configured LLM provider to rerank candidates
build_reranker(settings) Builds an LLMReranker using the configured provider
generate_task_plan(settings, task, *, phase, discovery_results) Decomposes task text into subtask list via LLM/heuristic; supports phase="discovery" and phase="execution" with discovery_results
generate_agent_profile(settings, task, supported_tools, fallback_tools) Generates skill_md, tools_allowlist, and skill_ids for a new agent
_normalize_profile_payload(payload, task) Validates and normalises LLM-returned profile dict; filters skill_ids to valid _BASE_SKILL_IDS members; appends _derive_canonical_tags(...) output to tags

Package: closed_claw/runtime/

protocol.py

Purpose: All JSON-line protocol message models.

Symbol Direction What it is
CoordinatorRequest Coordinator → Agent Task, context, artifacts, config
ApiCallIntent Agent → Coordinator Request to call a paid external API
ApiCallDecision Coordinator → Agent Approval/denial for ApiCallIntent
ToolCallIntent Agent → Coordinator Request to run a tool
ToolCallResult Coordinator → Agent Tool execution result/error
AgentResponse Agent → Coordinator Final response: status, result, memory_updates, artifacts, metrics
AgentMetrics (nested in AgentResponse) latency_ms
parse_agent_line(line) utility Parses a raw stdout line into `ApiCallIntent

runner.py

Purpose: Subprocess lifecycle management for agent capsules.

Symbol What it does
AgentRuntimeError Exception raised on agent protocol violations or subprocess errors
AgentRunner Launches entrypoint.py, drives JSON-line I/O loop, enforces retries and intent count limit
AgentRunner.run_agent(agent_dir, request, on_intent) Sends CoordinatorRequest, loops on intents until AgentResponse, returns it

on_intent is a callback (intent) -> str that the coordinator uses to handle ApiCallIntent/ToolCallIntent and return the JSON-encoded response line.


agent_loop.py

Purpose: Shared ReAct loop body imported by every capsule's entrypoint.py shim (v14). Centralising it means a fix lands in every capsule without rewriting per-capsule code — cli rewrite-entrypoints bumps the shim version.

Symbol What it does
CLOSED_CLAW_ENTRYPOINT_VERSION int constant (currently 14) read by _migrate_legacy_agents
main() Subprocess entry. Reads CoordinatorRequest from stdin, runs the ReAct loop (observe → think → act), prints AgentResponse to stdout
append_memory(memory_db, session_id, content) Persists one episodic memory row

Final-action protocol: the agent's final JSON action accepts an optional status field — {"type":"final","status":"ok"|"error","result":...,"artifacts":[]}. Emit status="error" only when the task cannot be completed with the available tools; the coordinator treats this as a subtask failure and propagates to the run's top-level status.


Package: closed_claw/tools/

executor.py

Purpose: Sandboxed tool execution.

Symbol What it does
SUPPORTED_TOOLS list[str] — canonical tool names
TOOL_REGISTRY dict — name → description + args_schema (used for CLI display + LLM prompt)
ToolExecutionError Exception for tool-level failures
ToolExecutor Executes tools within allowed filesystem roots
ToolExecutor.execute(tool, args) Dispatches to the right tool impl; returns dict result
tool_registry_for_allowlist(allowlist) Returns filtered TOOL_REGISTRY subset for an agent

Tool implementations (all inside ToolExecutor):

  • _run_terminal(cmd, timeout_s) — subprocess shell command
  • _run_http_api(method, url, ...) — HTTP request via requests
  • _run_web_fetch(url, ...) — webpage fetch + text extraction
  • _run_file_io(op, path, ...) — list/read/write/append within allowed roots
  • _run_python_exec(code, timeout_s) — executes Python snippet in subprocess
  • _run_sql_query(db_path, query, params) — SELECT-only SQLite query

Package: closed_claw/web/

server.py

Purpose: FastAPI web dashboard + REST API. Launched via python -m closed_claw.cli webhttp://127.0.0.1:7860.

API endpoints (27):

Method Path Description
GET / HTML dashboard (ui.html)
GET /api/health Health check
POST /api/health/verify Live-test LLM connection
POST /api/settings/apikey Verify + save API key
GET /api/system System metadata
GET /api/system/status Connection status
GET /api/settings Configurable settings catalog
POST /api/settings Save settings to .env
GET /api/agents List agents
GET /api/agents/{agent_id} Get single agent detail
DELETE /api/agents/{agent_id} Delete agent
POST /api/agents/delete-bulk Bulk delete agents
GET /api/skills List base skill modules
GET /api/skills/{skill_id} Get skill content
GET /api/runs List runs
GET /api/runs/enriched Enriched run list with analysis
GET /api/runs/{run_id}/analysis Detailed run analysis
GET /api/runlog/{run_id} Get run log events
GET /api/runlog/{run_id}/stream SSE stream of run events
GET /api/audit List audit events
GET /api/tools List tools + schemas
POST /api/tools/{tool_name}/test Sandbox tool test
GET /api/stats Dashboard stats
POST /api/runs Create/start a run from web UI
GET /api/runs/active List active runs
GET /api/runs/{run_id}/status Live run status
GET /api/approvals/pending Pending web-mode approvals
POST /api/approvals/{id}/decide Resolve approval
POST /api/runs/{run_id}/cancel Cancel run
GET /api/events/stream Global SSE dashboard stream
POST /api/init Init system from web

ui.html

Single-page HTML UI served by server.py. Full dashboard with agent management, run creation, settings editor, approval queue, and SSE event streaming.


Generated Directories (runtime, not in source control)

agents/skills/

Shared base skill library. Each file is a Markdown document describing tool usage patterns for one capability domain. These are not agent-specific — they are composed into any agent's system prompt based on manifest.skill_ids.

File Domain
terminal.md Shell execution, process management, packages
python_scripting.md Python code via python_exec tool
git.md Git workflow via terminal tool
file_system.md file_io tool patterns
web_http.md http_api and web_fetch tool patterns
sql_databases.md sql_query tool, SELECT-only SQLite
data_analysis.md Cross-tool data pipeline patterns

To add a module: create the .md file here, add its name to _BASE_SKILL_IDS, and add a short faithful scope description to _BASE_SKILL_DESCRIPTIONS (both in registry/search.py). The description is the only ground truth the profile-gen prompt sees while the .md files don't exist on disk.

agents/<agent_id>/

File Purpose
manifest.json AgentManifest — identity, tools, metrics, tags, skill_ids, embedding
skill.md Role overlay (Layer 2) — agent-specific identity, decision rules, output format
memory.db Per-agent SQLite episodic memory (key/value store)
entrypoint.py Thin shim (v14) that imports and calls closed_claw.runtime.agent_loop.main; the loop body itself lives in that shared module
logs/ Per-run output artifacts

agents/registry.json

Flat JSON array of all registered agent manifests (human-readable mirror of DB).

.closed_claw/registry.db

Main SQLite database. Tables: agents, agent_vectors, runs, agent_compositions, provider_circuit_breakers. The audit_events table is created by AuditStore._init_tables() (not in schema.sql DDL).

.closed_claw/runs/<run_id>.jsonl

One file per run. Each line is a JSON event: {event, payload, ts}.


Test Coverage Map

Test file Covers
tests/integration/test_flow.py Full coordinator graph end-to-end
tests/unit/test_protocol.py Protocol model parse/serialize
tests/unit/test_manifest.py AgentManifest validation
tests/unit/test_approval.py ApprovalGate modes (incl. web)
tests/unit/test_config_env.py Settings.from_env() loading
tests/unit/test_registry_audit.py RegistryStore + AuditStore operations
tests/unit/test_tools_executor.py ToolExecutor sandboxing
tests/unit/test_reranker_config.py Reranker selection logic
tests/unit/test_agent_profile_generation.py generate_agent_profile
tests/unit/test_agent_entrypoint_fallback.py Agent entrypoint fallback behavior
tests/unit/test_coordinator_retry_loop.py Coordinator retry handling
tests/unit/test_coordinator_two_phase_pool.py Two-phase task pool execution
tests/unit/test_delete_all_agents.py delete-all-agents CLI command
tests/unit/test_setup_wizard.py Setup wizard flow
tests/unit/test_guardrails.py Guardrail enforcement (max_tool_calls, max_agents, etc.)