Skip to content

Latest commit

 

History

History
187 lines (135 loc) · 11.9 KB

File metadata and controls

187 lines (135 loc) · 11.9 KB

Storage Design

Ghost uses three storage backends: PostgreSQL for structured state, a GCS-backed data directory (/mnt/data) for Ghost internal runtime state, and a GCS-backed workspace directory (/mnt/workspace) for per-agent working files. When GCS is unavailable, both fall back to the local data/ directory.

Artifact versioning (bare Git repos at /mnt/repos/) is handled by the Artifact System — see artifact-system.md.

PostgreSQL

Connection is via DATABASE_URL, which is required. If it is unset, or the server cannot connect, or migrations fail, Ghost exits at startup (no file-only fallback).

Migrations are embedded in internal/store/pgstore/migrations/ and run automatically on startup.

Tables

agents — one row per agent instance. Static config (soul, goal, tools, agent) lives in the archetype markdown; this table holds runtime/dynamic state.

Column Type Purpose
id TEXT PK Agent instance ID
archetype TEXT Archetype name
name TEXT Display name
owner_id TEXT Owner (empty string if unowned)
world_id TEXT World this agent belongs to
kind TEXT persistent or dynamic
status TEXT running or stopped
config_overrides JSONB Per-instance config overrides
metadata JSONB Arbitrary metadata
created_at TIMESTAMPTZ Creation time
updated_at TIMESTAMPTZ Last update
last_active_at TIMESTAMPTZ Last activity (nullable)

Indexes on world_id, archetype, and owner_id (partial).

memories — dialogue summaries used for long-term agent memory.

Column Type Purpose
id BIGSERIAL PK Auto-increment ID
agent_id TEXT FK → agents(id) ON DELETE CASCADE
content TEXT Dialogue summary text
turn_number INT Conversation turn
token_estimate INT Estimated tokens (for budget trimming)
created_at TIMESTAMPTZ Creation time

Indexes on (agent_id, created_at DESC) and (agent_id, turn_number DESC).

agent_skills — tracks which skills each agent has acquired and whether they are active.

Column Type Purpose
agent_id TEXT FK → agents(id) ON DELETE CASCADE
skill_name TEXT Skill identifier
active BOOLEAN Whether the skill is currently active
acquired_at TIMESTAMPTZ When the agent acquired the skill

Primary key: (agent_id, skill_name).

tasks — work items assigned to agents. Created by orchestrator agents, executed by worker agents.

Column Type Purpose
id TEXT PK Auto-generated UUID
world_id TEXT World this task belongs to
owner_id TEXT FK → agents(id), assigned agent
created_by TEXT FK → agents(id), agent that created it
title TEXT Short task title
description TEXT Detailed task description
status TEXT todo, wip, cancelled, or finished
priority INT Higher = more important (default 0)
metadata JSONB Arbitrary metadata
created_at TIMESTAMPTZ Creation time
updated_at TIMESTAMPTZ Last update

Indexes on (owner_id, status) and world_id.

knowledge — world-level shared knowledge entries accessible to agents via permissions. See knowledge.md for the full design.

Column Type Purpose
id TEXT PK Human-readable slug (e.g. company-handbook)
world_id TEXT World this entry belongs to
title TEXT Display name
content TEXT Knowledge body (markdown)
created_at TIMESTAMPTZ Creation time
updated_at TIMESTAMPTZ Last modification

Index on world_id.

knowledge_access — per-agent permission grants on knowledge entries.

Column Type Purpose
knowledge_id TEXT FK → knowledge(id) ON DELETE CASCADE
agent_id TEXT Agent granted access
permission TEXT read or write
granted_at TIMESTAMPTZ When access was granted

Primary key: (knowledge_id, agent_id, permission). Index on agent_id.

token_usage_daily — pre-aggregated daily token usage buckets, one row per (agent, provider, model, day).

Column Type Purpose
agent_id TEXT Agent that made the LLM calls
provider TEXT LLM provider (e.g. anthropic)
model TEXT Model name (e.g. claude-sonnet-4-20250514)
day DATE Calendar day (UTC)
input_tokens BIGINT Sum of input tokens for the day
output_tokens BIGINT Sum of output tokens for the day
calls BIGINT Number of LLM calls for the day

Primary key: (agent_id, provider, model, day). Index on day.

Structured state (PostgreSQL)

Agent records, memories, skills, knowledge, tasks, schedules, token usage, and related features use the tables above. There is no DB-less mode.

Events, world state, and workspace files still use the filesystem under /mnt/data and the workspace root.

Data Directory — /mnt/data

When world_id is set in the world YAML (e.g. world0), Ghost sets DataDir, workspace, prompt-debug, and artifact-repo roots to /mnt/data/{world_id}, /mnt/workspace/{world_id}, /mnt/prompt_debug/{world_id}, and /mnt/repos/{world_id} unless GHOST_DATA_DIR, WORKSPACE_DIR, GHOST_PROMPT_DEBUG_DIR, or ARTIFACT_REPOS_DIR are already set in the environment. Local Docker mounts the repo config/ tree at /app/config. Optionally set GHOST_CONFIG=/app/config/<file>.yaml in the environment for boot-time world, or load a world at runtime via the admin API / dashboard.

The data directory (default data/, or an explicit path from env / world_id above) holds Ghost internal runtime state.

/mnt/data/
├── events/
│   └── events.jsonl          # Event WAL (append-only JSONL)
└── world/
    └── state.json            # Shared world state (JSON key-value map)

Agent memory, skills, tasks, and related state live in PostgreSQL only. agents/{id}/ may still contain legacy files from older runs; they are not written in normal operation when Ghost starts with a database.

Path Purpose Writer
events/events.jsonl Append-only event WAL, replayed on startup internal/events/memory.go (WithWAL)
world/state.json Shared world state (JSON key-value map) internal/world/state.go

Agent Workspace — /mnt/workspace

With world_id set, the per-world root is /mnt/workspace/{world_id}/; each agent then uses /mnt/workspace/{world_id}/{agent_id}/. Without world_id, the legacy layout /mnt/workspace/{agent_id}/ applies when the workspace root is /mnt/workspace.

Each agent gets an isolated workspace directory under that root. This is the working directory for the agent's sandbox and for toolbox commands. Cross-agent access is supported.

Shared read-only directories (e.g. sites templates) are exposed to all agents via the WORKSPACE_SHARED_DIRS config. Agents access them through workspace.read("sites-template/..."). See agent-workspace.md for the full workspace design: layout, GCS mounts, shared directories, per-agent isolation, toolbox integration, and security.

Event Publishing

Ghost can optionally publish events to an external message bus. The event store is always in-memory (backed by the WAL on disk); these publishers send a copy externally.

Publisher Env vars Code
Google Pub/Sub GIS_EVENT_PUBLISHER=pubsub, GIS_PUBSUB_PROJECT, GIS_PUBSUB_TOPIC internal/events/pubsub.go
NATS GIS_EVENT_PUBLISHER=nats, GIS_NATS_URL, GIS_NATS_STREAM internal/events/nats.go

Static Content (read-only)

These directories are read at startup and never written to at runtime:

Directory Content Baked into image
architect/worlds/{world}/ World definitions (world.md, laws.md, ghost-whisper.md, archetypes) No — Ghost uses /app/architect from gcsfuse in Kubernetes (bizs-*-shared-resource/architect) or a read-only bind mount of gis/architect in local Docker Compose
architect/skills/ Skill definitions (SKILL.md + optional config) No — same as worlds
architect/templates/ Business artifact templates (used by the Artifact System) No — Artifact System uses TEMPLATES_DIR (bind mount or gcsfuse architect/templates), not the Ghost image
config/ World config YAML (world-config.yaml, etc.) No (mounted)
toolbox/ Toolbox manifests, binaries, and Dockerfiles No (mounted)

GCS Mounts

In production, containers use a mix of PVCs and gcsfuse mounts. Agent workspaces and bare repos are on PVCs. gcsfuse is used for Ghost internal data and shared read-only resources. Additional gcsfuse mounts use the numbered pattern GCS_BUCKET_2 / GCS_BUCKET_MOUNT_PATH_2 / GCSFUSE_OPTIONS_2, and optionally _3, _4, … up to _9.

Mount point Container Backend Purpose
/mnt/workspace Toolbox-Devtools PVC Per-agent workspace (read-write)
/mnt/repos Toolbox-Devtools + Artifact System PVC Bare git repos
/mnt/shared-readonly Toolbox-Devtools gcsfuse Shared read-only resources (site templates, cursor config)
/mnt/data (Ghost) Ghost gcsfuse Ghost internal data
/mnt/prompt_debug Ghost gcsfuse Optional; prompt debug logs
/app/architect Ghost gcsfuse Full architect tree (Kubernetes); prefix architect/ on bizs-*-shared-resource (read-only)

Local Docker Compose: Ghost bind-mounts the repo’s architect/ directory to /app/architect (:ro) instead of using a second gcsfuse mount.

Local Dev Fallback

When GCS is not configured (GCS_BUCKET env var is empty), entrypoint.sh skips gcsfuse. The Go engine uses DataDir (default ./data) for everything, and WorkspaceDir falls back to DataDir. This preserves backward compatibility for local development without GCS.