- Docker and Docker Compose installed
- An embedding API key (OpenAI, Gemini, or local Ollama)
- A
BRAIN_API_KEYfor authentication
# Clone and configure
cd /path/to/multi-agent
cp .env.example .env
# Edit .env — at minimum set BRAIN_API_KEY, QDRANT_API_KEY, and an embedding key
# Start core services (Qdrant + API with SQLite)
docker-compose up -d
# Verify health
curl http://localhost:8084/health
# Expected: {"status":"ok","service":"zengram","timestamp":"..."}# Start with Postgres profile
docker-compose --profile postgres up -d
# Update .env:
# STRUCTURED_STORE=postgres
# POSTGRES_URL=postgresql://brain:brain_secret@postgres:5432/zengram
# Restart API to pick up changes
docker-compose restart memory-apidocker-compose up -d --build memory-api# All services
docker-compose logs -f
# API only
docker-compose logs -f memory-api
# Qdrant only
docker-compose logs -f qdrant| Endpoint | Auth | What It Returns |
|---|---|---|
GET /health |
No | {"status":"ok"} -- basic liveness |
GET /stats |
Yes | Full health dashboard: memory counts, decay stats, retrieval status |
GET /consolidate/status |
Yes | Consolidation engine state: running, last_run_at, LLM info |
curl -H "x-api-key: YOUR_KEY" http://localhost:8084/statsKey fields to monitor:
| Field | Healthy | Warning |
|---|---|---|
total_memories |
Growing steadily | Stagnant (agents not storing) or spiking (dedup broken) |
active / superseded |
Ratio depends on usage | If superseded >> active, consolidation is working |
decayed_below_50pct |
Low (< 5% of facts) | High count means facts are going stale without access |
retrieval.multi_path |
true |
false means only vector search is active |
retrieval.keyword_search |
true |
false if no structured store configured |
retrieval.graph_search |
true |
false if no entity store (baserow/none backend) |
entities.total |
Growing | Zero means entity extraction is failing |
Browse http://localhost:8084/dashboard for a visual stats overview. No authentication required for the HTML page; the embedded JavaScript uses the API key from the URL or prompts for one.
Browse http://localhost:8084/graph/html?key=YOUR_KEY for the interactive entity browser with D3.js force-directed graphs.
Symptom: Docker reports zengram-api as unhealthy. API requests time out or return 502.
Root Cause: At 39K+ vectors, some Qdrant operations (scroll, count) exceed the default timeout. The health check itself is lightweight (GET /health does not query Qdrant), but heavy API operations can cause cascading slowness.
Resolution:
- Increase Qdrant timeout:
QDRANT_TIMEOUT_MS=15000or20000(default is 10000) - The
scrollPointsfunction uses this timeout for all Qdrant HTTP requests - Run consolidation to merge/expire old memories and reduce collection size
- Monitor
total_memoriesvia/statsand consolidate proactively
Lesson learned: The 39K vector stress test revealed that Qdrant count queries with exact: true are the bottleneck at scale. The stats endpoint runs 6+ count queries in parallel; under load this can saturate Qdrant.
Symptom: Qdrant container killed by OOM, restarts repeatedly.
Resolution:
- Check vector count:
curl http://localhost:6334/collections/shared_memories - Qdrant stores vectors in memory; each 3072-dim float32 vector uses ~12KB
- At 50K vectors with 3072 dims: ~600MB RAM minimum
- Reduce dimensions: switch to 1536 (
GEMINI_EMBEDDING_DIMS=1536) -- Gemini supports Matryoshka - Add memory limits to Docker: edit
docker-compose.ymlto addmem_limit: 2g - Run consolidation to expire unused old events
Symptom: 429 Too Many Requests errors in logs during bulk import or consolidation.
Resolution:
- Import endpoint already batches in groups of 10 with 100ms delay between batches
- For Gemini: the free tier has low RPM limits; consider switching to OpenAI or Ollama
- The consolidation engine embeds each merged fact individually -- at high volumes this can hit rate limits
- Temporary fix: increase
CONSOLIDATION_INTERVALto reduce frequency
Symptom: [consolidation] Scheduled run failed in logs. Memories accumulate without being consolidated.
Resolution:
- Check
/consolidate/statusfor LLM provider info - Verify the LLM API key is valid (OPENAI_API_KEY for OpenAI, ANTHROPIC_API_KEY for Anthropic)
- Check the model name in
CONSOLIDATION_MODELis accessible from your account - The consolidation engine parses JSON from LLM output; malformed responses are logged and skipped
- Manual trigger:
curl -X POST -H "x-api-key: KEY" http://localhost:8084/consolidate?sync=true
Symptom: Entity extraction works but doesn't resolve aliases. Entities extracted as new instead of matching existing ones.
Resolution:
- The cache loads at startup from the structured store
- If the store was empty on first boot, the cache starts with only built-in tech names (~70 entries)
- After consolidation discovers entities, restart the API to reload the cache
- Check:
curl -H "x-api-key: KEY" http://localhost:8084/entities/stats
Symptom: retrieval.keyword_search: false in stats. Only vector search results returned.
Resolution:
- Keyword search requires
STRUCTURED_STORE=sqliteorSTRUCTURED_STORE=postgres - If set to
baserowornone, keyword search is disabled - For Postgres: the
memory_searchtable needs acontent_tsvgenerated column with GIN index - For SQLite: the
memory_search_ftsFTS5 virtual table is created automatically
# Export all active memories (default limit 1000, max 5000)
curl -H "x-api-key: KEY" \
"http://localhost:8084/export?limit=5000" > brain-backup.json
# Export specific client
curl -H "x-api-key: KEY" \
"http://localhost:8084/export?client_id=acme-corp&limit=5000" > acme-backup.json
# Export with pagination (offset support)
curl -H "x-api-key: KEY" \
"http://localhost:8084/export?offset=0&limit=1000" > page1.json
curl -H "x-api-key: KEY" \
"http://localhost:8084/export?offset=1000&limit=1000" > page2.json# Import from backup (max 500 per call, deduplicates by content hash)
curl -X POST -H "x-api-key: KEY" \
-H "Content-Type: application/json" \
-d @brain-backup.json \
http://localhost:8084/export/import
# Response: {"imported": 342, "skipped": 158, "errors": 0}Import re-embeds with the current provider, so switching embedding providers is safe -- just export and reimport.
Raw Qdrant storage is at ./data/qdrant/. For bare-metal backup:
docker-compose stop qdrant
cp -r ./data/qdrant ./data/qdrant-backup-$(date +%Y%m%d)
docker-compose start qdrantcp ./data/brain.db ./data/brain-backup-$(date +%Y%m%d).dbdocker-compose restart memory-apidocker-compose down
docker-compose up -ddocker-compose down -v
rm -rf ./data/qdrant ./data/brain.db ./data/postgres
docker-compose up -dThis destroys all memories and the entity graph. Only use for a clean start.
The API enforces per-key rate limits:
| Request Type | Default Limit | Window | Configurable Via |
|---|---|---|---|
| Writes (POST/PUT/PATCH/DELETE) | 60/min | 1 minute | RATE_LIMIT_WRITES |
| Reads (GET) | 120/min | 1 minute | RATE_LIMIT_READS |
| Consolidation (POST /consolidate) | 1/hour | 1 hour | Hardcoded |
When rate-limited, the API returns 429 with a Retry-After header.
The auth middleware tracks failed authentication attempts per IP:
- After 10 failures within 60 seconds, the IP is blocked with
429 - Uses timing-safe comparison to prevent timing attacks
- Failed attempt records are cleaned up every 5 minutes
All log lines use bracketed prefixes for grep-friendly filtering:
| Prefix | Component |
|---|---|
[zengram] |
Startup, shutdown, top-level events |
[qdrant] |
Qdrant collection/index operations |
[embeddings] |
Embedding provider init/errors |
[store] |
Structured store operations |
[memory:store] |
POST /memory write path |
[memory:search] |
GET /memory/search |
[memory:update] |
PATCH /memory/:id |
[memory:delete] |
DELETE /memory/:id |
[consolidation] |
Consolidation engine runs |
[entities] |
Entity extraction and alias cache |
[keyword-search] |
BM25 keyword search |
[webhook:n8n] |
n8n webhook ingestion |
[subscribe] |
SSE subscription lifecycle |
[notifications] |
Webhook dispatch |
[auth] |
Agent key loading |
[reflect] |
LLM reflection |
- Architecture -- system design and data flow
- Configuration -- every environment variable explained
- API Reference -- endpoint details for manual debugging