A structured knowledge store that lets agents curate workspace documents with bidirectional wikilinks, semantic search, and team-scoped access — all layered on top of existing memory systems.
Knowledge Vault is a v3-only feature. It sits between agents and the episodic/KG stores, adding document-level notes with explicit relationships.
Vault vs Knowledge Graph — Vault stores full documents (notes, context files, specs) with lexical + semantic search and wikilinks. The Knowledge Graph stores extracted entities and relations from conversations. They complement each other: vault for curated docs, KG for auto-extracted facts. The VaultSearchService fans out to both simultaneously.
| Component | Role |
|---|---|
| VaultStore | Document CRUD, link management, hybrid FTS + vector search |
| VaultService | Search coordinator: fan-out across vault, episodic, and KG stores with weighted ranking |
| VaultSyncWorker | Filesystem watcher: detects file changes (create/write/delete), syncs content hashes |
| EnrichWorker | Processes vault document upsert events to generate summaries, embeddings, and semantic links |
| VaultRetriever | Bridges vault search into the agent L0 memory system |
| HTTP Handlers | REST endpoints: list, get, search, links, tree, graph |
Agent writes document → Workspace FS
↓
VaultSyncWorker detects change
↓
Update vault_documents (hash, metadata)
↓
On agent query: vault_search tool
↓
VaultSearchService (parallel fan-out)
↙ ↓ ↘
Vault Episodic Knowledge Graph
(0.4 weight) (0.3 weight) (0.3 weight)
↘ ↓ ↙
Normalize & Weight Scores
↓
Return Top Results
Documents are scoped by tenant (isolation boundary), agent (namespace), and document scope:
| Scope | Description |
|---|---|
personal |
Agent-specific documents (per-agent context files, per-user work) |
team |
Team workspace documents shared across team members |
shared |
Cross-tenant shared knowledge (future) |
The scope field has a strict ownership invariant enforced at the database level by migration 000055 (vault_documents_scope_consistency CHECK constraint):
scope |
agent_id |
team_id |
Visibility |
|---|---|---|---|
personal |
set | NULL | Owning agent only (within tenant) |
team |
NULL | set | Members of the team (within tenant) |
shared |
NULL | NULL | All agents within the tenant |
custom |
any | any | User-defined via custom_scope |
The CHECK constraint rejects any INSERT or UPDATE that violates the scope × agent_id × team_id relationship above. scope='custom' is the exception — it is intentionally unconstrained, allowing user-defined ownership semantics.
vault_search, ListDocuments, and CountDocuments always return:
- Documents owned by the querying agent (
agent_id = <agent>) - PLUS shared documents (
agent_id IS NULL)
Within a team context (a RunContext with TeamID set), results also include team-scoped documents for that team (scope = 'team' with team_id = <team>). Tenant isolation (tenant_id = <tenant>) is always enforced regardless of scope.
Registry of document metadata. Content lives on the filesystem; the registry stores path, hash, embeddings, and links.
| Column | Type | Notes |
|---|---|---|
id |
UUID | Primary key |
tenant_id |
UUID | Multi-tenant isolation |
agent_id |
UUID | Per-agent namespace; nullable for team-scoped or tenant-shared files (migration 046) |
scope |
TEXT | personal | team | shared |
chat_id |
TEXT | Chat-scope isolation for isolated teams; NULL = no chat scope (team-wide or legacy) |
path |
TEXT | Workspace-relative path (e.g., workspace/notes/foo.md) |
title |
TEXT | Display name |
doc_type |
TEXT | context, memory, note, skill, episodic, image, video, audio, document |
content_hash |
TEXT | SHA-256 of file content (change detection) |
embedding |
vector(1536) | pgvector semantic similarity |
tsv |
tsvector | GIN FTS index on title + path + summary |
metadata |
JSONB | Optional custom fields |
Migration 000056 adds the chat_id column to vault_documents to support isolated teams — groups where each chat channel is fully partitioned.
Invariant for isolated teams:
chat_id != NULL→ document is visible only to that chatchat_id IS NULL→ document is team-wide (shared or legacy)- Both rescan and search enforce this filter:
chat_id = <target> OR chat_id IS NULL
What migration 000056 does:
- Adds column
vault_documents.chat_id TEXT(nullable) - Adds composite index
idx_vault_docs_team_chaton(team_id, chat_id) WHERE team_id IS NOT NULL - Drops the
vault_documents_scope_consistencyconstraint before running backfill UPDATEs — the constraint was added asNOT VALIDin migration 055, meaning it skipped existing rows but still re-checked every UPDATE. Legacy data (pre-M46/M43) often violated the invariant, causing the backfill to abort and leaving migration 056 in a dirty state (issue #1035, fixed in v3.11.2). The constraint is re-added at the end of the migration withNOT VALID.
Backfill logic:
Migration 056 backfills chat_id for two groups:
- Team-scoped docs (
scope='team'): extracts the chat segment from the path (teams/<uuid>/<chat>/...ortenants/<slug>/teams/<uuid>/<chat>/...). Segments starting with.(config dirs such as.goclaw) are skipped. - Legacy docs (
team_id IS NULL): a broader regex covers all channel integrations:telegram,discord,zalo,feishu,lark,whatsapp,slack,line,messenger,wechat,viber,ws,delegate,api— not just telegram/discord as in older releases.
Related search parameters:
| Parameter | Type | Notes |
|---|---|---|
ChatID |
*string | Pointer to the chat ID to filter by; nil = no filter |
TeamIsolated |
bool | true = apply ChatID filter; false = skip (shared/personal) |
Bidirectional links between documents (wikilinks, explicit references, and enrichment-generated semantic links).
| Column | Type | Notes |
|---|---|---|
from_doc_id |
UUID | Source document |
to_doc_id |
UUID | Target document |
link_type |
TEXT | wikilink, reference, depends_on, extends, related, supersedes, contradicts, task_attachment, delegation_attachment |
context |
TEXT | ~50-char surrounding text snippet |
metadata |
JSONB | Extra metadata from enrichment pipeline (migration 048) |
Unique constraint: (from_doc_id, to_doc_id, link_type) — no duplicate links.
Version history prepared for v3.1 — table exists but is empty in v3.0.
Agents can create bidirectional markdown links in [[target]] format.
See [[architecture/components]] for details.
Reference [[SOUL.md|agent persona]] here.
Link [[../parent-project]] up.[[path/to/file.md]]— path-based target[[name|display text]]— display text is cosmetic only.mdextension auto-appended if missing- Empty or whitespace-only targets are skipped
When resolving a wikilink target:
- Exact path match — find document by path
- With .md suffix — retry if target lacks extension
- Basename search — scan all agent docs, match by filename (case-insensitive)
- Unresolved — silently skipped; backlinks can be incomplete
SyncDocLinks keeps vault_links in sync with document content:
- Extract all
[[...]]patterns from content - Delete existing outgoing links for the document (replace strategy)
- Resolve each target and create
vault_linkrows for resolved targets
This runs on every document upsert and on each VaultSyncWorker file event.
Hybrid FTS + vector search on a single vault:
- FTS: PostgreSQL
plainto_tsquery()ontsv(title + path keywords) - Vector: pgvector cosine similarity on embeddings (semantic)
- Scoring: Scores from each method normalized to 0–1, then combined with query-time weights
VaultSearchService fans out in parallel across all knowledge sources:
| Source | Weight | What it searches |
|---|---|---|
| Vault | 0.4 | Document titles, paths, embeddings |
| Episodic | 0.3 | Session summaries |
| Knowledge Graph | 0.3 | Entity names and descriptions |
Results are normalized per source (max score = 1.0), weighted, merged, deduplicated by ID, and sorted by final score descending.
| Param | Type | Default | Notes |
|---|---|---|---|
Query |
string | — | Required: natural language |
AgentID |
string | — | Scope to agent |
TenantID |
string | — | Scope to tenant |
Scope |
string | all | personal, team, shared |
DocTypes |
[]string | all | context, memory, note, skill, episodic |
MaxResults |
int | 10 | Final result set size |
MinScore |
float64 | 0.0 | Minimum score filter |
VaultSyncWorker watches workspace directories for changes using fsnotify:
- Debounce: 500ms — multiple rapid changes collapse to one batch
- For each changed file:
- Compute SHA-256 hash
- Compare to
vault_documents.content_hash - If different: update hash in DB
- If file deleted: mark
metadata["deleted"] = true
Note: Sync is one-way — only registered documents are watched. New files must first be registered by an agent write. The vault does not write back to the filesystem.
After each document upsert, EnrichWorker processes the event asynchronously to enrich vault documents with summaries, embeddings, and semantic links.
- Generates a text summary of the document content
- Computes a vector embedding for semantic search
- Classifies semantic relationships to other documents in the vault and creates
vault_linkrows
The classifier produces links with one of six relationship types:
| Type | Meaning |
|---|---|
reference |
Document cites another as a source |
depends_on |
Document requires another to be meaningful |
extends |
Document adds to or builds upon another |
related |
General topical relationship |
supersedes |
Document replaces or obsoletes another |
contradicts |
Document conflicts with another |
Two additional link types are created by the task/delegation system rather than the classifier:
task_attachment— links a vault document to a team task it was attached todelegation_attachment— links a vault document to a delegation it was attached to
These are not affected by enrichment cleanup or rescan.
Real-time enrichment progress is broadcast as WebSocket events. The UI shows per-document status while the worker runs.
From the UI (or REST API), users can:
- Stop enrichment — halts the EnrichWorker for the current tenant
- Trigger rescan — re-queues all vault documents for re-enrichment (useful after model or config changes)
The vault accepts binary and media files in addition to text documents. Supported file types are controlled by an extension whitelist.
doc_type |
Used for |
|---|---|
image |
PNG, JPG, GIF, WEBP, SVG, etc. |
video |
MP4, MOV, AVI, etc. |
audio |
MP3, WAV, OGG, etc. |
document |
PDF, DOCX, XLSX, etc. |
Because media files cannot be read as text, the vault uses SynthesizeMediaSummary() to generate a deterministic semantic summary from the filename and parent folder context. No LLM call is needed. The summary is stored in vault_documents.summary and included in the FTS index, enabling keyword discovery of media files by name and location.
Primary discovery tool. Searches across vault, episodic memory, and Knowledge Graph with unified ranking.
{
"query": "authentication flow",
"scope": "team",
"types": "context,note",
"maxResults": 10
}Each result carries a source-specific ID field that tells you which follow-up tool to use:
| Source | ID field | Follow-up tool |
|---|---|---|
vault |
doc_id |
vault_read(doc_id=...) |
kg |
entity_id |
knowledge_graph_search(entity_id=...) |
episodic |
episodic_id |
memory_expand(id=episodic_id) |
ID namespace protection: If you pass a
entity_idorepisodic_idtovault_readby mistake, the tool returns a descriptive error telling you the correct tool to use — rather than a generic "document not found". Always use thedoc_idfrom vault results withvault_read.
Note on linking: Explicit document linking is now handled automatically by the enrichment pipeline. The
vault_linkagent tool has been removed. Links are created via wikilink syntax in document content ([[target]]) or generated semantically by EnrichWorker. You can view links viaGET /v1/agents/{agentID}/vault/documents/{docID}/links.
All endpoints require Authorization: Bearer <token>.
| Method | Path | Description |
|---|---|---|
GET |
/v1/agents/{agentID}/vault/documents |
List documents (scope, doc_type, limit, offset) |
GET |
/v1/agents/{agentID}/vault/documents/{docID} |
Get single document |
POST |
/v1/agents/{agentID}/vault/documents |
Create document (optional content body — see below) |
PUT |
/v1/agents/{agentID}/vault/documents/{docID} |
Update document (optional content body — see below) |
POST |
/v1/agents/{agentID}/vault/search |
Unified search |
GET |
/v1/agents/{agentID}/vault/documents/{docID}/links |
Outlinks + backlinks |
| Method | Path | Description |
|---|---|---|
GET |
/v1/vault/documents |
List across all tenant agents (filter by agent_id) |
POST |
/v1/vault/documents |
Create document (optional content body — see below) |
PUT |
/v1/vault/documents/{docID} |
Update document (optional content body — see below) |
GET |
/v1/vault/tree |
Tree view of vault structure |
GET |
/v1/vault/graph |
Cross-tenant graph visualization (node limit: 2000, FA2 layout) |
POST/PUT accept an optional content field. When supplied, the bytes are materialised at <tenant-workspace>/<path>, the SHA-256 hash is stored on the row, and an enrichment event is emitted so summaries, embeddings, and links are computed — the same code path the multipart /v1/vault/upload endpoint uses.
content value |
Behaviour |
|---|---|
| field omitted | Metadata-only stub; no file written, no enrichment event |
| present and non-empty | Bytes written to disk; hash stored; enrichment fires |
present and empty ("") |
A 0-byte file is written; enrichment still fires |
The path must use an allowed extension (same whitelist as /v1/vault/upload). Writes are sandboxed inside the tenant workspace: lexical ../ checks, symlink-ancestor resolution, and an atomic O_NOFOLLOW open reject any attempt to follow a symlink out of the workspace.
POST /v1/vault/documents
Content-Type: application/json
Authorization: Bearer <token>
{
"path": "notes/auth.md",
"title": "Authentication Flow",
"doc_type": "note",
"content": "# Authentication Flow\n\nSee [[architecture/components]] for details."
}| Method | Path | Description |
|---|---|---|
POST |
/v1/vault/enrichment/stop |
Stop the enrichment worker |
POST /v1/agents/agent-123/vault/search
Content-Type: application/json
Authorization: Bearer <token>
{
"query": "authentication flow",
"scope": "personal",
"max_results": 5
}[
{
"document": {
"id": "doc-456",
"path": "notes/auth.md",
"title": "Authentication Flow",
"doc_type": "note"
},
"score": 0.92,
"source": "vault"
},
{
"document": {"id": "episodic-789", "title": "Session-2026-04-06"},
"score": 0.68,
"source": "episodic"
}
]GET /v1/agents/agent-123/vault/documents/doc-456/links{
"outlinks": [
{
"id": "uuid",
"to_doc_id": "uuid",
"link_type": "wikilink",
"context": "See [[target]] for details."
}
],
"backlinks": [
{
"id": "uuid",
"from_doc_id": "uuid",
"link_type": "wikilink",
"context": "Reference [[auth.md]] here."
}
]
}| Migration | Name | What changed |
|---|---|---|
| 046 | vault_nullable_agent_id |
Makes vault_documents.agent_id nullable for team-scoped and tenant-shared files |
| 048 | vault_media_linking |
Adds base_name generated column on team_task_attachments; adds metadata JSONB on vault_links; fixes CASCADE FK constraints |
| 049 | vault_path_prefix_index |
Adds concurrent index idx_vault_docs_path_prefix with text_pattern_ops for fast prefix queries |
| 056 | vault_chat_id |
Adds chat_id column + idx_vault_docs_team_chat index; backfills legacy data from all channel integrations; drops and re-adds scope-consistency CHECK (v3.11.1 + fix v3.11.2) |
- PostgreSQL with
pgvectorextension (embeddings) - Migration
000038_vault_tablesmust have run successfully - VaultStore initialized during gateway startup
- VaultSyncWorker started for filesystem sync
- EnrichWorker started for automatic enrichment (summaries, embeddings, semantic links)
No feature flag. Vault is active if the migration ran and VaultStore initialized.
- Vault documents are not auto-injected into the agent system prompt — they must be retrieved via
vault_search - FTS indexes title + path only; content requires vector embeddings for discovery
- Sync is one-way (filesystem → vault; vault does not write back)
- No conflict resolution — concurrent edits use last-write-wins
- Version history (
vault_versionstable) prepared for v3.1; empty in v3.0
- Knowledge Graph — Entity and relation graph auto-extracted from conversations
- Memory System — Vector-based long-term memory
- Context Files — Static documents injected into agent context