Memex Architecture

Date: November 2025 Status: Active Development

Vision

Memex provides layered knowledge graphs where raw data and ontological interpretation coexist. Like git for knowledge: content-addressed sources with interpretation layers on top.

Core Concept

Problem: RAG returns similar text chunks. Agents need structured relationships and can't access raw sources.

Solution: Two-layer architecture:

Source Layer: Raw data, content-addressed, immutable
Ontology Layer: Interpreted entities and relationships, references sources

Key insight: One person's metadata is another's data. Don't hide sources behind interpretations.

Layered Architecture

┌─────────────────────────────────────────┐
│ Transaction Log (Merkle DAG)            │  ← Git-like history
│  - Every operation is a commit          │
│  - Content-addressed                     │
│  - Verifiable, auditable                 │
└──────────────┬──────────────────────────┘
               │ records
               ↓
┌─────────────────────────────────────────┐
│ Graph Layer (Neo4j)                     │
│                                         │
│  ┌────────────────────────────────┐   │
│  │ Source Nodes (Layer 0)          │   │  ← Raw data
│  │  - Content-addressed (hash IDs) │   │
│  │  - Immutable                     │   │
│  │  - Format metadata               │   │
│  └────────────────────────────────┘   │
│               ↑                         │
│               │ extracted_from          │
│               │                         │
│  ┌────────────────────────────────┐   │
│  │ Ontology Nodes (Layer 1+)       │   │  ← Interpretations
│  │  - Domain entities              │   │
│  │  - Relationships                │   │
│  │  - Multiple interpretations OK  │   │
│  └────────────────────────────────┘   │
└─────────────────────────────────────────┘

Source Layer

Every raw input becomes a Source node
ID = hash of content (content-addressed)
Stored once, referenced many times
Can be reinterpreted later without loss

Example Source Node:

{
  "id": "sha256:abc123...",
  "type": "Source",
  "content": "<raw git log output>",
  "meta": {
    "format": "git-log",
    "ingested_at": "2025-11-17T10:00:00Z",
    "size_bytes": 15234
  }
}

Ontology Layer

LLM or parser extracts entities and relationships
References source via extracted_from links
Multiple interpretations can coexist
Includes extraction metadata

Example Ontology Nodes:

{
  "id": "commit-abc",
  "type": "Commit",
  "meta": {
    "hash": "abc123",
    "message": "fix: auth timeout",
    "author": "alice",
    "extracted_by": "gpt-4",
    "extraction_timestamp": "2025-11-17T10:01:00Z"
  }
}

Extraction Link:

{
  "source": "commit-abc",
  "target": "sha256:abc123...",
  "type": "extracted_from",
  "meta": {
    "confidence": 0.95,
    "reasoning": "Extracted from git log line 42"
  }
}

Transaction Log

Every operation creates a transaction
Transactions form Merkle DAG (like git)
Content-addressed, verifiable
Enables: audit trails, replication, time travel

Example Transaction:

{
  "tx_hash": "sha256:def456...",
  "parent": "sha256:prev-tx...",
  "timestamp": "2025-11-17T10:00:00Z",
  "operations": [
    {
      "op": "create_source",
      "node_id": "sha256:abc123...",
      "format": "git-log"
    },
    {
      "op": "extract_ontology",
      "source_id": "sha256:abc123...",
      "created_nodes": ["commit-abc", "commit-def"],
      "extractor": "gpt-4",
      "model_version": "gpt-4-0613"
    }
  ]
}

Current Implementation

Status: Basic graph storage working, building ingest layer next.

memex (CLI) → HTTP API → memex-server (Go) → Neo4j

What works:

✓ Neo4j connection and driver
✓ Basic node/link CRUD (create, get, list)
✓ HTTP API endpoints
✓ CLI as API client
✓ Docker deployment

Building next:

Content-addressed source storage
Transaction log for operations
Ingest endpoint with LLM extraction
Export/import for graph portability

Future (not priority yet):

Lenses: Domain-specific ontology views
ECS: Model-agnostic embeddings
Activation tracking: Query-induced memory
Vector search integration

Design Principles

Sources are first-class: Never hide raw data behind interpretations
Content-addressed: Use hashes for immutable references
Multiple interpretations: Same source can have many ontology views
Transaction log: Every change is recorded and verifiable
Export/import: Graphs are portable via transactions + nodes

Design Decisions

Why Neo4j?

Purpose-built for graph operations
Native vector support (5.11+) for future use
Mature, production-ready
Can self-host or use cloud

Why Transaction Log?

Git-like history for knowledge graphs
Enables: audit trails, replication, time travel, verification
Already have transaction system in codebase

Why Content-Addressed Sources?

Immutable references (same content = same hash)
Deduplication automatic
Can reinterpret without re-ingesting
Portable across servers

Why LLM for Extraction?

Universal parser (any data format)
Emergent ontologies (no predefined schemas)
Captures reasoning (stores why, not just what)
Can improve over time (rerun with better models)

Use Cases

1. Development Memory (Primary Launch Use Case)

Problem: Coding agents don't remember project history
Solution: Memex ingests git, terminal, LLM traces
Result: Agents learn from past attempts, don't repeat mistakes

Entities: Commit, Error, Edit, Terminal Session, LLM Prompt, Function, Test Relationships: fixes, caused_by, attempts_to_fix, suggested_by, tests

2. Research Assistant

Problem: RAG returns disconnected paper chunks
Solution: Memex builds citation graph + concept ontology
Result: Agents understand research lineage and influence

Entities: Paper, Author, Concept, Method, Dataset Relationships: cites, introduces, applies, authored_by

3. Legal/Compliance

Problem: Finding relevant case law requires expert knowledge
Solution: Memex maps legal precedents and statutes
Result: Agents navigate legal relationships accurately

Entities: Case, Statute, Regulation, Citation, Jurisdiction Relationships: cites, overturns, distinguishes, applies

Next Steps

Content-addressed storage - Implement SHA256 hashing for source nodes
Transaction log - Hook up existing transaction system to record operations
Ingest endpoint - POST /api/ingest with LLM extraction
Export/import - Serialize graph + transactions for portability

Future Enhancements

Lenses - Domain-specific ontology views and extraction patterns ECS Representations - Model-agnostic embeddings that can compile to any model Activation Tracking - Learn from queries to improve retrieval Query Engine - Natural language → graph traversal + LLM synthesis

This architecture reflects our current understanding as of November 2024. It will evolve as we build and learn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memex Architecture

Vision

Core Concept

Layered Architecture

Source Layer

Ontology Layer

Transaction Log

Current Implementation

Design Principles

Design Decisions

Why Neo4j?

Why Transaction Log?

Why Content-Addressed Sources?

Why LLM for Extraction?

Use Cases

1. Development Memory (Primary Launch Use Case)

2. Research Assistant

3. Legal/Compliance

Next Steps

Future Enhancements

FilesExpand file tree

ARCHITECTURE_PIVOT.md

Latest commit

History

ARCHITECTURE_PIVOT.md

File metadata and controls

Memex Architecture

Vision

Core Concept

Layered Architecture

Source Layer

Ontology Layer

Transaction Log

Current Implementation

Design Principles

Design Decisions

Why Neo4j?

Why Transaction Log?

Why Content-Addressed Sources?

Why LLM for Extraction?

Use Cases

1. Development Memory (Primary Launch Use Case)

2. Research Assistant

3. Legal/Compliance

Next Steps

Future Enhancements