Date: November 2025 Status: Active Development
Memex provides layered knowledge graphs where raw data and ontological interpretation coexist. Like git for knowledge: content-addressed sources with interpretation layers on top.
Problem: RAG returns similar text chunks. Agents need structured relationships and can't access raw sources.
Solution: Two-layer architecture:
- Source Layer: Raw data, content-addressed, immutable
- Ontology Layer: Interpreted entities and relationships, references sources
Key insight: One person's metadata is another's data. Don't hide sources behind interpretations.
┌─────────────────────────────────────────┐
│ Transaction Log (Merkle DAG) │ ← Git-like history
│ - Every operation is a commit │
│ - Content-addressed │
│ - Verifiable, auditable │
└──────────────┬──────────────────────────┘
│ records
↓
┌─────────────────────────────────────────┐
│ Graph Layer (Neo4j) │
│ │
│ ┌────────────────────────────────┐ │
│ │ Source Nodes (Layer 0) │ │ ← Raw data
│ │ - Content-addressed (hash IDs) │ │
│ │ - Immutable │ │
│ │ - Format metadata │ │
│ └────────────────────────────────┘ │
│ ↑ │
│ │ extracted_from │
│ │ │
│ ┌────────────────────────────────┐ │
│ │ Ontology Nodes (Layer 1+) │ │ ← Interpretations
│ │ - Domain entities │ │
│ │ - Relationships │ │
│ │ - Multiple interpretations OK │ │
│ └────────────────────────────────┘ │
└─────────────────────────────────────────┘
- Every raw input becomes a Source node
- ID = hash of content (content-addressed)
- Stored once, referenced many times
- Can be reinterpreted later without loss
Example Source Node:
{
"id": "sha256:abc123...",
"type": "Source",
"content": "<raw git log output>",
"meta": {
"format": "git-log",
"ingested_at": "2025-11-17T10:00:00Z",
"size_bytes": 15234
}
}- LLM or parser extracts entities and relationships
- References source via
extracted_fromlinks - Multiple interpretations can coexist
- Includes extraction metadata
Example Ontology Nodes:
{
"id": "commit-abc",
"type": "Commit",
"meta": {
"hash": "abc123",
"message": "fix: auth timeout",
"author": "alice",
"extracted_by": "gpt-4",
"extraction_timestamp": "2025-11-17T10:01:00Z"
}
}Extraction Link:
{
"source": "commit-abc",
"target": "sha256:abc123...",
"type": "extracted_from",
"meta": {
"confidence": 0.95,
"reasoning": "Extracted from git log line 42"
}
}- Every operation creates a transaction
- Transactions form Merkle DAG (like git)
- Content-addressed, verifiable
- Enables: audit trails, replication, time travel
Example Transaction:
{
"tx_hash": "sha256:def456...",
"parent": "sha256:prev-tx...",
"timestamp": "2025-11-17T10:00:00Z",
"operations": [
{
"op": "create_source",
"node_id": "sha256:abc123...",
"format": "git-log"
},
{
"op": "extract_ontology",
"source_id": "sha256:abc123...",
"created_nodes": ["commit-abc", "commit-def"],
"extractor": "gpt-4",
"model_version": "gpt-4-0613"
}
]
}Status: Basic graph storage working, building ingest layer next.
memex (CLI) → HTTP API → memex-server (Go) → Neo4j
What works:
- ✓ Neo4j connection and driver
- ✓ Basic node/link CRUD (create, get, list)
- ✓ HTTP API endpoints
- ✓ CLI as API client
- ✓ Docker deployment
Building next:
- Content-addressed source storage
- Transaction log for operations
- Ingest endpoint with LLM extraction
- Export/import for graph portability
Future (not priority yet):
- Lenses: Domain-specific ontology views
- ECS: Model-agnostic embeddings
- Activation tracking: Query-induced memory
- Vector search integration
- Sources are first-class: Never hide raw data behind interpretations
- Content-addressed: Use hashes for immutable references
- Multiple interpretations: Same source can have many ontology views
- Transaction log: Every change is recorded and verifiable
- Export/import: Graphs are portable via transactions + nodes
- Purpose-built for graph operations
- Native vector support (5.11+) for future use
- Mature, production-ready
- Can self-host or use cloud
- Git-like history for knowledge graphs
- Enables: audit trails, replication, time travel, verification
- Already have transaction system in codebase
- Immutable references (same content = same hash)
- Deduplication automatic
- Can reinterpret without re-ingesting
- Portable across servers
- Universal parser (any data format)
- Emergent ontologies (no predefined schemas)
- Captures reasoning (stores why, not just what)
- Can improve over time (rerun with better models)
Problem: Coding agents don't remember project history
Solution: Memex ingests git, terminal, LLM traces
Result: Agents learn from past attempts, don't repeat mistakes
Entities: Commit, Error, Edit, Terminal Session, LLM Prompt, Function, Test Relationships: fixes, caused_by, attempts_to_fix, suggested_by, tests
Problem: RAG returns disconnected paper chunks
Solution: Memex builds citation graph + concept ontology
Result: Agents understand research lineage and influence
Entities: Paper, Author, Concept, Method, Dataset Relationships: cites, introduces, applies, authored_by
Problem: Finding relevant case law requires expert knowledge
Solution: Memex maps legal precedents and statutes
Result: Agents navigate legal relationships accurately
Entities: Case, Statute, Regulation, Citation, Jurisdiction Relationships: cites, overturns, distinguishes, applies
- Content-addressed storage - Implement SHA256 hashing for source nodes
- Transaction log - Hook up existing transaction system to record operations
- Ingest endpoint -
POST /api/ingestwith LLM extraction - Export/import - Serialize graph + transactions for portability
Lenses - Domain-specific ontology views and extraction patterns ECS Representations - Model-agnostic embeddings that can compile to any model Activation Tracking - Learn from queries to improve retrieval Query Engine - Natural language → graph traversal + LLM synthesis
This architecture reflects our current understanding as of November 2024. It will evolve as we build and learn.