Skip to content

Latest commit

 

History

History
401 lines (290 loc) · 20.4 KB

File metadata and controls

401 lines (290 loc) · 20.4 KB

Tools

Tools are the functions an agent can call during a turn. Each agent gets its own tool registry — an in-memory map of name → Tool — built once at agent creation and used for both prompt generation and execution.


Tool Access System

Tool access is determined by three factors:

  1. Public tools — tools not listed in any privilege group. Every agent can use them.
  2. Privilege groups — defined in world-config.yaml under tool_privileges. Each group contains a set of tool names. An agent declares which groups it belongs to in config.md.
  3. Override tools — individual tool names listed in config.md that grant access beyond the agent's privilege groups.

Configuration

World config (world-config.yaml) defines the privilege groups:

tool_privileges:
  admin:
    - syscall.agent.create
    - syscall.agent.stop
    - syscall.agent.start
    - syscall.agent.inspect
    - syscall.resource.usage
  manager:
    - syscall.task.create
    - syscall.task.update
    - syscall.knowledge.write
    - syscall.knowledge.delete
    - syscall.broadcast
  communicator:
    - channel
    - syscall.ask
    - syscall.channel.send
    - syscall.channel.handoff
  google-searcher:
    - tool.google_search

Agent config (config.md) declares access:

## Tool Access

privilege_groups: manager, communicator

- workspace.write
- tool.browser
  • privilege_groups: — comma-separated group names the agent belongs to.
  • List items — individual tool overrides (grants access to privileged tools not covered by groups).

Resolution logic

When an agent is created, config.ResolveToolAccess computes the final access policy:

  • Public tool: not in any privilege group → always accessible.
  • Privileged tool: in at least one group → accessible only if the agent's groups include that group, or the tool is in the agent's override list.
  • Skill syscalls (syscall.skill.*) are always-on regardless of access.

The resolver produces a *config.ToolAccess object stored on the managedAgent. It exposes:

  • CanAccess(name) bool — true if public OR granted.
  • IsPublic(name) bool — true if not in any privilege group.
  • GrantedPrivileged() []string — sorted list of privileged tools this agent has.

Backward compatibility

If config.md has no ## Tool Access section and world-config.yaml has no tool_privileges, the config allow-list behavior is preserved.

Toolbox in config.md

External toolbox references are declared in config.md:

## Toolbox

source: toolbox/devtools/toolbox.yaml
- repo-git
- github-repo
- cursor-agent

repo-git manages platform-hosted artifact repos (see Artifact System). github-repo manages external GitHub repositories cloned into the agent workspace (see GitHub Repos below).

World-level tools.md

Each world directory can have a tools.md that documents system tools with non-obvious usage patterns. This file is loaded by LoadWorld() and injected into the system prompt (after laws.md, before "How You Operate"). It supplements the <tools> briefing section which only shows one-liner descriptions.

agent.md role

agent.md contains operational how-to prose — guides for using tools, processing tasks, and managing skills. It is injected into the user message/briefing under the <agent> tag between <goal> and <tools>.

config.md role

config.md (formerly agent.md) contains machine-readable configuration: provider, model, limits, skills, tool access, and toolbox declarations. Parsed by ParseAgentMD(). Never sent to the LLM.

Briefing

The agent's briefing includes a <tools> section listing all accessible tools:

<tools>
- workspace.read: Read files from the agent workspace
- tool.cli: Run a shell command
- syscall.task.list: List tasks by owner and status
</tools>

Generated from toolRegistry.Specs() and included in BuildUserMessage.

CLI

ghostctl tool list <agent-id>    # JSON array of {name, description}
ghostctl tool brief <agent-id>   # human-readable: "- name: description" per line

API

GET /api/agents/{id}/tools        → JSON [{name, description}, ...]
GET /api/agents/{id}/tools/brief  → text/plain, one tool per line

System tools

Every tool below is implemented in internal/tools/. Tools are registered based on the agent's tool access policy (or the legacy tools.md allow-list).

Workspace

Tool Description
workspace.read Read a file from the agent workspace. Output is truncated at 64 KB.
workspace.write Write content to a file in the agent workspace. Paths are validated to stay inside the workspace root.

Channel

Privileged tool group (communicator). Declare channel in the tools list or the communicator privilege group.

Tool Description
channel.send Post a standalone message to the originating channel thread.
channel.file_attach Attach a workspace file to the current response.
channel.handoff Transfer the conversation to another agent.

CLI, browser, fetch & Python

Tool Description
tool.cli Run a shell command inside the sandboxed executor.
tool.browser Fetch a URL or take a screenshot via headless Chromium.
tool.fetch HTTP fetch for URLs (plain text or binary).
tool.python Run Python 3 source in the sandbox: structured code (plus optional args, cwd) without building a shell one-liner. Snippets are written under .ghost/ and removed after execution.

Prefer tool.python over tool.cli + python3 -c when the model emits multi-line scripts or strings that are awkward to quote in a shell. Large programs should live in workspace files (workspace.write + tool.cli or imports from cwd). Very long code strings may hit OS argument-length limits; use files for big payloads.

The python skill (architect/skills/python/SKILL.md) documents patterns, sys.argv, and workspace I/O. Acquiring the skill does not grant the tool by itself — include tool.python in the agent’s privilege group or tool overrides.

Google Search

Privileged tool group (google-searcher). Declare tool.google_search in the tools list or the google-searcher privilege group.

Tool Description
tool.google_search Search Google for real-time information via Gemini Grounding API and return an answer with citations.

Each call sends the query to Gemini with the google_search grounding tool enabled; the model generates search queries, processes results, and returns a grounded answer with source URLs. Billed per search query executed by the model.

Syscall — Agent management

Tool Description
syscall.reflect Snapshot of world status, all agents, token usage, and system time.
syscall.agent.list List every agent in the world with archetype, status, and uptime.
syscall.agent.stop Stop an agent by ID. Data is preserved for restart.
syscall.agent.start Start a stopped agent by ID. No-op if already running.
syscall.agent.create Create and start a new agent from an archetype.
syscall.agent.inspect Read another agent's memory files (read-only).
syscall.resource.usage Host CPU, memory, disk, and Go runtime stats.

Syscall — Communication

Tool Description
syscall.ask Send a message to another agent and block until it replies.
syscall.broadcast Broadcast a message to every other agent in the world.
syscall.channel.send Send a message to the originating external channel. Always registered when channel context exists.
syscall.channel.handoff Hand the conversation to another agent. Always registered when channel context exists.

Syscall — Shared Chat (syscall.chat.*)

In-world shared channels for group communication between agents.

Tool Description
syscall.chat.create Create a named shared channel with membership. Supports skip_creator (moderator not added as member) and gated (suppress inbox delivery).
syscall.chat.send Post a message to a shared channel. Delivered to all other members' inboxes unless the channel is gated.
syscall.chat.invite Add an agent to a channel. Only the creator may invite.
syscall.chat.remove Remove an agent from a channel. Only the creator may remove.
syscall.chat.list List shared channels the calling agent belongs to.
syscall.chat.history Read recent messages from a shared channel (default limit 50).

Gated mode. When gated: true is set at creation, messages are stored in history and forwarded to external subscribers (dashboard WebSocket) but not delivered to agent inboxes. This prevents fan-out storms on channels with 3+ members. A moderator grants speaking turns via syscall.ask; agents read syscall.chat.history explicitly for context. See the moderator-open-mic skill for the full pattern.

Multi-agent channels (3+ members): Consider using gated: true with a moderator to prevent fan-out storms. Without gating, every message delivers to every member's inbox and can trigger redundant activations with stale context. If you create an ungated multi-agent channel, ensure ghost whisper and calling guidance suppress unnecessary wakes.

Syscall — Knowledge

Tool Description
syscall.knowledge.list List knowledge entries the agent can access.
syscall.knowledge.read Read a knowledge entry by ID (requires read permission).
syscall.knowledge.write Update a knowledge entry by ID (requires write permission).
syscall.knowledge.delete Delete a knowledge entry by ID (requires write permission).

Syscall — Tasks

Tool Description
syscall.task.create Create a task and assign it to self or another agent.
syscall.task.list List tasks, optionally filtered by owner and/or status.
syscall.task.get Return full details of a task by ID.
syscall.task.update Update task status or reassign to a different agent.

Syscall — Skills

Tool Description
syscall.skill.list List available skills and which ones the agent has acquired. Always registered.
syscall.skill.acquire Acquire a skill; loads its toolbox tools into the registry. Always registered.
syscall.skill.activate Load a skill's full SKILL.md knowledge into context. Always registered.
syscall.skill.drop Drop an acquired skill; unloads its toolbox tools. Always registered.

Syscall — World state

Tool Description
syscall.state.read Read a key (or all keys) from shared world state.
syscall.state.write Write a key-value pair to shared world state.

Utility (stubs)

Tool Description
util.watch Watch a file or directory for changes. Not yet implemented.
util.timer Set a named timer that fires after a duration. Not yet implemented.
util.cron Register a recurring cron schedule. Not yet implemented.

GitHub Repos

devtools.github-repo lets agents clone and manage external GitHub repositories in their workspace. It is an alternative to repo-git (which manages platform-hosted artifact repos). Enable it by adding github-repo to the toolbox tools list in config.md.

Command Description
github-repo clone <url> [--name <dir>] [--branch <b>] Clone a GitHub repo into $AGENT_WORKSPACE/repos/<name>/
github-repo pull [<repo>] Pull latest changes
github-repo push [<repo>] [--branch <b>] Push committed changes
github-repo commit [<repo>] --message <msg> Stage all + commit
github-repo status [<repo>] Show git status
github-repo list List all cloned repos with branch and status
github-repo branch [<repo>] [--create <name>] List or create branches
github-repo pr create [<repo>] --title <t> [--body …] [--base …] [--head …] Create a pull request via gh (run after push; on failure prints a /pull/new/… URL)
github-repo pr url [<repo>] Print the GitHub “open PR” URL for the current branch
github-repo gh [<repo>] -- <gh-args>… Run gh with the repo as working directory (e.g. gh myapp -- pr review 1)
github-repo remove <repo> Delete a repo from workspace

Prefer devtools.github-repo for PRs so agents are not blocked by tool.cli allowlists (which classify by the first shell token — avoid cd … && gh …). Raw gh in tool.cli is allowed when gh is listed in the world’s cli_policy tiers.

Authentication is auto-detected from the container environment — agents never configure credentials. Three modes in priority order:

  1. Token (HTTPS)GITHUB_TOKEN env var. Uses a GIT_ASKPASS helper (github-askpass) that reads the token from the environment without writing it to disk. SSH URLs are automatically rewritten to HTTPS. The script mirrors GITHUB_TOKEN to GH_TOKEN (and vice versa) so the gh CLI authenticates consistently.
  2. SSH — key mounted at /etc/github-ssh/id_ed25519. Used via GIT_SSH_COMMAND.
  3. Public only — no auth. Clone works for public repos; push operations fail with a clear error.

GITHUB_TOKEN and GH_TOKEN are added to passthroughEnvKeys in sandbox.go so they flow from the container environment into each agent's sandbox without being written to any file.

Workspace layout: Repos are cloned into $AGENT_WORKSPACE/repos/<name>/, separate from the artifact-repo/ directory used by repo-git. Agents can delegate to coding agents via cursor-agent --folder $AGENT_WORKSPACE/repos/<name>.

Skill: The github-repo skill (architect/skills/github-repo/SKILL.md) provides full workflow documentation. Agents can acquire it at runtime via syscall.skill.acquire.

Key files: toolbox/devtools/bin/github-repo (bash tool), toolbox/devtools/bin/github-askpass (GIT_ASKPASS helper), architect/skills/github-repo/SKILL.md, architect/snippets/github-repo-howto.md.

Toolbox (dynamic)

Toolbox tools are not statically named. Each is a WorkspaceTool instance created at boot from a skill or world toolbox definition. The tool name comes from the toolbox config (e.g. universe.bot-marketer). At runtime, it runs a binary inside the sandbox and streams progress back via Cursor-style JSON parsing.

Coding agent output compaction

devtools.cursor-agent and devtools.claude-agent are delegated coding tools — the orchestrator LLM calls them to hand off file-editing tasks to Cursor or Claude CLI. The output they produce is handled specially to avoid flooding the orchestrator's context window.

Problem. Cursor CLI's --output-format stream-json produces NDJSON with hundreds of events: thinking/delta tokens (74% of bytes), tool_call/completed echoing full file contents back (33%), and assistant streaming fragments. A single invocation easily produces 60–130 KB of raw output. The orchestrator LLM receives this as the tool result and wastes its context window on noise — or worse, the output is truncated before Cursor's result event appears, leaving the orchestrator with zero useful information about what the coding agent did.

Design. Cursor CLI emits a structured result/success (or result/error) event at the end of the stream containing its own summary of what it accomplished, duration, and token usage. ExtractCursorResult() in internal/tools/streamparse.go parses the NDJSON and extracts:

  1. The result event text, error status, duration, and output token count.
  2. A file manifest from tool_call/started events — paths from readToolCall, editToolCall, writeToolCall, and commands from shellToolCall.
  3. Everything else (thinking deltas, assistant fragments, echoed file contents) is dropped.

The compact output returned to the orchestrator is ~1–2 KB instead of 60–130 KB:

Cursor agent completed successfully in 13.4s (1028 output tokens).

Files read: site/index.html, site/style.css
Files written: site/technologies.html

Result:
Created technologies.html with matching layout and updated nav links.

Where it runs. WorkspaceTool.Execute in internal/tools/workspace.go detects coding agent tools by name suffix (cursor-agent, claude-agent). For these tools it calls ExtractCursorResult(result.Stdout). If extraction succeeds (returns non-empty), the compact output replaces the raw stream. If the input is not NDJSON (e.g. claude-agent uses --output-format text), extraction returns "" and the raw output is used as-is.

Live progress is unaffected. The streaming path (RemoteExecutor.RunStreamParseCursorStreamLinechanctx.SendMessage) continues to send real-time progress summaries to the UI during execution. The compaction only affects what the orchestrator LLM sees in the final tool result.

Key files: internal/tools/streamparse.go (extraction + streaming parser), internal/tools/workspace.go (wire-up), toolbox/devtools/bin/run-cursor-agent (the shell wrapper).


Registry lifecycle

Where the inputs come from

Input Source Used for
World name, agent list, toolbox allow-list, tool privileges world-config.yamlconfig.LoadWorldCfg()control.Config Which agents to create; which toolboxes are allowed; privilege group definitions
Archetype config (soul, goal, agent.md, config.md, user) architect/worlds/<world>/archetype/<arch>/config.NewLoader()*config.MDFiles Tool access (config.md), LLM config (config.md), identity (soul, goal), how-to prose (agent.md)
Sandbox executor sandbox.NewExecutor(sandboxCfg) from cfg.WorkspaceDir, cfg.Sandbox.* Used by tool.cli, tool.browser, and toolbox runners
Skill state skills.LoadState(cfg.DataDir) or PG Which skills are active; their toolboxes are loaded into the registry

How the registry is built

main()
  └─ config.LoadWorldCfg(path)          → WorldCfg (includes ToolPrivileges)
  └─ control.New(ctx, control.Config{…})
  └─ cp.BootWorld()
        └─ config.LoadWorld(worldDir)   → world markdown
        └─ for each agent:
              └─ CreateAgent(ctx, agentCfg)

Inside CreateAgent:

1. configLoader := config.NewLoader(cfg.ConfigDir)
2. mdFiles := configLoader.Get()
3. agentMD := config.ParseAgentMD(md.Config)
4. (reg, wsExec, toolAccess) := buildToolRegistry(mdFiles, executor, cfg, skillState)
       │
       ├─ If tool access active (config.md has groups or overrides):
       │     ta := config.ResolveToolAccess(worldPrivileges, groups, overrides)
       │     accessible := filter allKnownBuiltinTools by ta.CanAccess()
       │     toolboxRefs from config.md
       │
       ├─ If tool access NOT active (backward compat):
       │     tm := config.ParseToolsMD(md.Config) → tools list from config.md
       │     toolboxRefs from config.md
       │
       ├─ Always-on tools:
       │     skill list / acquire / activate / drop
       │
       ├─ registerBuiltinTools (switch on name for each accessible tool)
       ├─ Load toolbox refs → toolbox.Load(…) → reg.Register(each)
       ├─ Load skill toolboxes → reg.Register(each)
       │
       └─ return reg, wsExec, toolAccess

5. agent.New(cfg, agent.Deps{ ToolRegistry: reg, … })
6. managedAgent.toolAccess = toolAccess
7. go agent.Run(agentCtx)

Runtime usage

  • Prompt building — each turn the agent calls toolRegistry.Specs() to get []ToolSpec (name + description + params). These are included in the <tools> briefing tag and in the LLM ChatRequest.Tools.
  • Execution — when the model returns tool calls, the agent calls toolRegistry.Execute(ctx, call). The registry looks up the tool by name and runs it.
  • Dynamic changessyscall.skill.acquire / drop can add or remove tools from the same registry at runtime.

What is stored where

What Storage Lifetime
Tool registry In-process map[string]Tool Agent lifetime; rebuilt on restart
Tool access policy In-process *config.ToolAccess on managedAgent Agent lifetime; computed once
Agent memory File (data/agents/<id>/memory/) or PG Persistent — conversation summaries
Skill state File or PG Persistent — which skills are active; used to rebuild toolbox tools on restart