Storage Design

Ghost uses three storage backends: PostgreSQL for structured state, a GCS-backed data directory (/mnt/data) for Ghost internal runtime state, and a GCS-backed workspace directory (/mnt/workspace) for per-agent working files. When GCS is unavailable, both fall back to the local data/ directory.

Artifact versioning (bare Git repos at /mnt/repos/) is handled by the Artifact System — see artifact-system.md.

PostgreSQL

Connection is via DATABASE_URL, which is required. If it is unset, or the server cannot connect, or migrations fail, Ghost exits at startup (no file-only fallback).

Migrations are embedded in internal/store/pgstore/migrations/ and run automatically on startup.

Tables

agents — one row per agent instance. Static config (soul, goal, tools, agent) lives in the archetype markdown; this table holds runtime/dynamic state.

Column	Type	Purpose
id	TEXT PK	Agent instance ID
archetype	TEXT	Archetype name
name	TEXT	Display name
owner_id	TEXT	Owner (empty string if unowned)
world_id	TEXT	World this agent belongs to
kind	TEXT	`persistent` or `dynamic`
status	TEXT	`running` or `stopped`
config_overrides	JSONB	Per-instance config overrides
metadata	JSONB	Arbitrary metadata
created_at	TIMESTAMPTZ	Creation time
updated_at	TIMESTAMPTZ	Last update
last_active_at	TIMESTAMPTZ	Last activity (nullable)

Indexes on world_id, archetype, and owner_id (partial).

memories — dialogue summaries used for long-term agent memory.

Column	Type	Purpose
id	BIGSERIAL PK	Auto-increment ID
agent_id	TEXT FK	→ agents(id) ON DELETE CASCADE
content	TEXT	Dialogue summary text
turn_number	INT	Conversation turn
token_estimate	INT	Estimated tokens (for budget trimming)
created_at	TIMESTAMPTZ	Creation time

Indexes on (agent_id, created_at DESC) and (agent_id, turn_number DESC).

agent_skills — tracks which skills each agent has acquired and whether they are active.

Column	Type	Purpose
agent_id	TEXT FK	→ agents(id) ON DELETE CASCADE
skill_name	TEXT	Skill identifier
active	BOOLEAN	Whether the skill is currently active
acquired_at	TIMESTAMPTZ	When the agent acquired the skill

Primary key: (agent_id, skill_name).

tasks — work items assigned to agents. Created by orchestrator agents, executed by worker agents.

Column	Type	Purpose
id	TEXT PK	Auto-generated UUID
world_id	TEXT	World this task belongs to
owner_id	TEXT FK	→ agents(id), assigned agent
created_by	TEXT FK	→ agents(id), agent that created it
title	TEXT	Short task title
description	TEXT	Detailed task description
status	TEXT	`todo`, `wip`, `cancelled`, or `finished`
priority	INT	Higher = more important (default 0)
metadata	JSONB	Arbitrary metadata
created_at	TIMESTAMPTZ	Creation time
updated_at	TIMESTAMPTZ	Last update

Indexes on (owner_id, status) and world_id.

knowledge — world-level shared knowledge entries accessible to agents via permissions. See knowledge.md for the full design.

Column	Type	Purpose
id	TEXT PK	Human-readable slug (e.g. `company-handbook`)
world_id	TEXT	World this entry belongs to
title	TEXT	Display name
content	TEXT	Knowledge body (markdown)
created_at	TIMESTAMPTZ	Creation time
updated_at	TIMESTAMPTZ	Last modification

Index on world_id.

knowledge_access — per-agent permission grants on knowledge entries.

Column	Type	Purpose
knowledge_id	TEXT FK	→ knowledge(id) ON DELETE CASCADE
agent_id	TEXT	Agent granted access
permission	TEXT	`read` or `write`
granted_at	TIMESTAMPTZ	When access was granted

Primary key: (knowledge_id, agent_id, permission). Index on agent_id.

token_usage_daily — pre-aggregated daily token usage buckets, one row per (agent, provider, model, day).

Column	Type	Purpose
agent_id	TEXT	Agent that made the LLM calls
provider	TEXT	LLM provider (e.g. `anthropic`)
model	TEXT	Model name (e.g. `claude-sonnet-4-20250514`)
day	DATE	Calendar day (UTC)
input_tokens	BIGINT	Sum of input tokens for the day
output_tokens	BIGINT	Sum of output tokens for the day
calls	BIGINT	Number of LLM calls for the day

Primary key: (agent_id, provider, model, day). Index on day.

Structured state (PostgreSQL)

Agent records, memories, skills, knowledge, tasks, schedules, token usage, and related features use the tables above. There is no DB-less mode.

Events, world state, and workspace files still use the filesystem under /mnt/data and the workspace root.

Data Directory — `/mnt/data`

When world_id is set in the world YAML (e.g. world0), Ghost sets DataDir, workspace, prompt-debug, and artifact-repo roots to /mnt/data/{world_id}, /mnt/workspace/{world_id}, /mnt/prompt_debug/{world_id}, and /mnt/repos/{world_id} unless GHOST_DATA_DIR, WORKSPACE_DIR, GHOST_PROMPT_DEBUG_DIR, or ARTIFACT_REPOS_DIR are already set in the environment. Local Docker mounts the repo config/ tree at /app/config. Optionally set GHOST_CONFIG=/app/config/<file>.yaml in the environment for boot-time world, or load a world at runtime via the admin API / dashboard.

The data directory (default data/, or an explicit path from env / world_id above) holds Ghost internal runtime state.

/mnt/data/
├── events/
│   └── events.jsonl          # Event WAL (append-only JSONL)
└── world/
    └── state.json            # Shared world state (JSON key-value map)

Agent memory, skills, tasks, and related state live in PostgreSQL only. agents/{id}/ may still contain legacy files from older runs; they are not written in normal operation when Ghost starts with a database.

Path	Purpose	Writer
`events/events.jsonl`	Append-only event WAL, replayed on startup	`internal/events/memory.go` (WithWAL)
`world/state.json`	Shared world state (JSON key-value map)	`internal/world/state.go`

Agent Workspace — `/mnt/workspace`

With world_id set, the per-world root is /mnt/workspace/{world_id}/; each agent then uses /mnt/workspace/{world_id}/{agent_id}/. Without world_id, the legacy layout /mnt/workspace/{agent_id}/ applies when the workspace root is /mnt/workspace.

Each agent gets an isolated workspace directory under that root. This is the working directory for the agent's sandbox and for toolbox commands. Cross-agent access is supported.

Shared read-only directories (e.g. sites templates) are exposed to all agents via the WORKSPACE_SHARED_DIRS config. Agents access them through workspace.read("sites-template/..."). See agent-workspace.md for the full workspace design: layout, GCS mounts, shared directories, per-agent isolation, toolbox integration, and security.

Event Publishing

Ghost can optionally publish events to an external message bus. The event store is always in-memory (backed by the WAL on disk); these publishers send a copy externally.

Publisher	Env vars	Code
Google Pub/Sub	`GIS_EVENT_PUBLISHER=pubsub`, `GIS_PUBSUB_PROJECT`, `GIS_PUBSUB_TOPIC`	`internal/events/pubsub.go`
NATS	`GIS_EVENT_PUBLISHER=nats`, `GIS_NATS_URL`, `GIS_NATS_STREAM`	`internal/events/nats.go`

Static Content (read-only)

These directories are read at startup and never written to at runtime:

Directory	Content	Baked into image
`architect/worlds/{world}/`	World definitions (`world.md`, `laws.md`, `ghost-whisper.md`, archetypes)	No — Ghost uses `/app/architect` from gcsfuse in Kubernetes (`bizs-*-shared-resource/architect`) or a read-only bind mount of `gis/architect` in local Docker Compose
`architect/skills/`	Skill definitions (`SKILL.md` + optional config)	No — same as worlds
`architect/templates/`	Business artifact templates (used by the Artifact System)	No — Artifact System uses `TEMPLATES_DIR` (bind mount or gcsfuse `architect/templates`), not the Ghost image
`config/`	World config YAML (`world-config.yaml`, etc.)	No (mounted)
`toolbox/`	Toolbox manifests, binaries, and Dockerfiles	No (mounted)

GCS Mounts

In production, containers use a mix of PVCs and gcsfuse mounts. Agent workspaces and bare repos are on PVCs. gcsfuse is used for Ghost internal data and shared read-only resources. Additional gcsfuse mounts use the numbered pattern GCS_BUCKET_2 / GCS_BUCKET_MOUNT_PATH_2 / GCSFUSE_OPTIONS_2, and optionally _3, _4, … up to _9.

Mount point	Container	Backend	Purpose
`/mnt/workspace`	Toolbox-Devtools	PVC	Per-agent workspace (read-write)
`/mnt/repos`	Toolbox-Devtools + Artifact System	PVC	Bare git repos
`/mnt/shared-readonly`	Toolbox-Devtools	gcsfuse	Shared read-only resources (site templates, cursor config)
`/mnt/data` (Ghost)	Ghost	gcsfuse	Ghost internal data
`/mnt/prompt_debug`	Ghost	gcsfuse	Optional; prompt debug logs
`/app/architect`	Ghost	gcsfuse	Full architect tree (Kubernetes); prefix `architect/` on `bizs-*-shared-resource` (read-only)

Local Docker Compose: Ghost bind-mounts the repo’s architect/ directory to /app/architect (:ro) instead of using a second gcsfuse mount.

Local Dev Fallback

When GCS is not configured (GCS_BUCKET env var is empty), entrypoint.sh skips gcsfuse. The Go engine uses DataDir (default ./data) for everything, and WorkspaceDir falls back to DataDir. This preserves backward compatibility for local development without GCS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage Design

PostgreSQL

Tables

Structured state (PostgreSQL)

Data Directory — `/mnt/data`

Agent Workspace — `/mnt/workspace`

Event Publishing

Static Content (read-only)

GCS Mounts

Local Dev Fallback

FilesExpand file tree

storage.md

Latest commit

History

storage.md

File metadata and controls

Storage Design

PostgreSQL

Tables

Structured state (PostgreSQL)

Data Directory — /mnt/data

Agent Workspace — /mnt/workspace

Event Publishing

Static Content (read-only)

GCS Mounts

Local Dev Fallback

Data Directory — `/mnt/data`

Agent Workspace — `/mnt/workspace`