How do production memory systems for AI agents structure recall — across vector, graph, and key-value stores; across user, session, and agent scopes; across short-term context and long-term knowledge — and which conventions are converging?
The interesting bits are not the embeddings themselves; those are commodity. The interesting bits are: how does each system decide what to remember, when to update, how to scope, and what shape the memory artifacts take on disk and on the wire.
When reading each entry below, the working checklist is:
- Storage topology. Single store or hybrid (vector + graph + KV)? What determines which store handles which class of memory?
- Write policy. Append-only? Self-editing? LLM-summarized? Reconciled against a graph? When does a memory get superseded vs. merged?
- Scopes & namespacing. User / session / agent / tenant — how are these modeled, and how does retrieval respect them?
- Schema of a memory. What fields does a single memory record carry? Timestamps, provenance, confidence, embedding ref, references to other memories?
- Recall surface. Vector similarity, graph traversal, time-based, hybrid ranking? What's the API the agent actually calls?
- Eviction & compaction. Is there a policy? A summarizer? A TTL?
- Serialization on disk. What does the persisted form look like? JSON blobs, parquet, a specific graph format, SQL rows? Could a non-AI program parse it cleanly?
- Operational story. Local-first? Server-required? Stateless agent + external store?
| Bet | Entry |
|---|---|
| Memory baked into the model (frozen-backbone transformer adapter) | Delta-Mem |
| Real typed bi-temporal graph | Graphiti |
| Self-editing core memory + paged archival (the MemGPT lineage) | Letta |
| Hybrid vector + entity-link + tight CRUD API | Mem0 |
| Verbatim chunks + hybrid recall, no extraction | MemPalace |
| Typed/scoped facts + deterministic supersession + outcome learning | Neo |
| Immutable Postgres log + summary DAG | Volt |
| Folder → in-process NetworkX KG + Leiden + MCP query surface | Graphify |
| Multi-peer "theory of mind" — what A knows about B, derived async | Honcho |
| Context as a filesystem; L0/L1/L2 abstract tiers; unified memories + resources + skills | OpenViking |
| Typed memory taxonomy (world facts / experiences / mental models) + bank-based scoping | Hindsight |
| 12+ semantic categories with bi-temporal validity (validFrom/validUntil) | RetainDB |
| Universal RAG + memory layer, currently top of LongMemEval / LoCoMo / ConvoMem | Supermemory |
| Coding-agent context tree (CLI + optional cloud sync; Elastic-licensed) | ByteRover CLI |
| Procedural/working memory — versioned dependency-typed task graph on a Git-like SQL DB (Dolt), no vectors | Beads |
| Conformance benchmark (the yardstick) | StateBench |
Deep per-entry write-ups live in context-v/profiles/.
- Repo: https://github.com/mem0ai/mem0 — Universal memory layer for AI Agents
- Maintainer: Mem0.ai (
mem0aiorg) - Why this is here: Pioneers a composable hybrid architecture (vector + graph + KV store) with adaptive updates. Multi-level recall across user / session / agent scopes is its headline. Reported +26% accuracy over OpenAI memory and 91% faster responses in their benchmarks. Most starred entry in this space (54k+) and the most explicit about being a layer rather than a framework.
- Repo: https://github.com/Parslee-ai/neo — A self-improving code reasoning engine with persistent semantic memory
- Maintainer: Parslee AI (
Parslee-aiorg) - Why this is here: A reasoning engine, not a framework — the memory is the point, not a feature bolted on. Worth comparing schema and write-policy choices against Mem0 and Graphiti. Smaller surface than the major players, which makes it easier to read end-to-end.
- Repo: https://github.com/declare-lab/delta-Mem — δ-mem: Efficient Online Memory for Large Language Models
- Maintainer: declare-lab (SUTD) — Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, Soujanya Poria
- Why this is here: The architectural counter-bet to every other entry. While Mem0, MemPalace, Graphiti, Neo, and Volt all answer "where does the agent store and look things up," Delta-Mem answers "what if memory was part of the model's forward pass?" A frozen-backbone transformer adapter that gives each attention head a low-rank dense state matrix updated by a learned delta rule (S_{t+1} = λ·S_t − β·(S_t·k_t)⊗k_t + β·v_t⊗k_t), with three temporal write granularities (TSW/SSW/MSW), a Triton-accelerated affine scan, and a public Qwen3-4B-Instruct adapter on Hugging Face. Released alongside arXiv:2605.12357. Evaluated on LoCoMo, HotpotQA, IFEval, GPQA Diamond, and MemoryAgentBench. This is a research artifact (not a deployable library), and that's the point — it forces the question "is agent memory even a retrieval problem?" that the system-level entries quietly assume.
- Repo: https://github.com/letta-ai/letta — Platform for building stateful agents: AI with advanced memory that can learn and self-improve over time
- Maintainer: letta-ai (formerly MemGPT-ai; Charles Packer, Sarah Wooders et al.)
- Why this is here: The direct successor to MemGPT — the 2023
Berkeley paper that named the agent-memory problem and shipped the
OS-inspired hierarchical-memory pattern (core context = RAM, archival
= disk, recall = paging). The MemGPT lineage is the framing fact.
Letta is the operational productization: every agent is a persistent
Postgres row, core memory is a set of agent-editable Blocks
rendered into the system prompt, archival memory is a paginated
pgvector-backed passage store with semantic search. The agent edits
its own memory via
core_memory_append/core_memory_replace— the headline MemGPT design move that no other entry in the study replicates. Ships as a FastAPI + Postgres + pgvector compose stack with REST, WebSocket, and an OpenAI-compatible/v1/chat/completionsendpoint so a Letta agent looks like a ChatGPT-shaped model to any client. Multi-agent block sharing is a database join (a single Block row can belong to many agents). The newer git-backed memory mode renders blocks as files (system/persona.md,skills/.../SKILL.md) and turns every memory edit into a commit — making it the closest entry in the study to our owncontext-vigilancediscipline, but with the agent as committer.
- Repo: https://github.com/getzep/graphiti — Build Real-Time Knowledge Graphs for AI Agents
- Maintainer: Zep Software, Inc. (Paul Paliychuk, Preston Rasmussen, Daniel Chalef)
- Why this is here: The only entry in the study that puts a real,
typed, bi-temporal graph between the agent and its memories — four
interchangeable Cypher-flavoured backends (Neo4j, FalkorDB, Kuzu,
Neptune), a single-LLM-call episode-to-graph extraction pipeline with
Pydantic-typed entities, hybrid BM25 + cosine + BFS recall with
cross-encoder reranking, and label-propagation community detection.
Every edge carries both
valid_at/invalid_at(real-world validity) andcreated_at/expired_at(system time) — textbook bi-temporality brought to agent memory. Backed by a peer-reviewed paper (Zep: A Temporal Knowledge Graph Architecture for Agent Memory). The interesting comparison is Graphiti vs MemPalace: opposite bets on structure (MemPalace says structure is over-engineered; Graphiti says it's under-engineered).
- Repo: https://github.com/MemPalace/mempalace — The best-benchmarked open-source AI memory system. And it's free.
- Maintainer: MemPalace Contributors (
MemPalaceorg; milla-jovovich, @bensig) - Why this is here: The most direct counter-bet to Mem0 in the study.
Same problem space (give agents persistent memory), nearly opposite
design choice at the write step: MemPalace stores verbatim text —
no LLM-driven extraction, no summarization — in ChromaDB
(
mempalace_drawers+mempalace_closets) plus a local SQLite temporal entity graph. Recall is hybrid (semantic + BM25 + closet-boost, closets signal never gate), and the published benchmarks (benchmarks/BENCHMARKS.md) report 96.6% R@5 on LongMemEval with zero LLM calls at query time, and 92.9% vs Mem0's 30–45% on ConvoMem — a ~2× margin attributed directly to extraction losing information. Inspired by Zettelkasten + Method of Loci (MISSION.md). The benchmark-honesty note inBENCHMARKS.md:70-95(the 100% headline involved teaching-to-test; the held-out figure is 98.4%) is worth reading in its own right.
- Repo: https://github.com/Martian-Engineering/volt — Coding agent with lossless context management
- Maintainer: Martian Engineering (
Martian-Engineeringorg) - Why this is here: A coding agent built around Lossless Context
Management (LCM) — a dual-state design where every user message,
assistant response, and tool result is persisted verbatim (immutable
store) and the active context is assembled from recent raw messages plus
precomputed summary nodes. Storage is a DAG in embedded Postgres
(
voltcode_lcm, optional external viaLCM_DATABASE_URL). The write/summarize policy is deterministic (soft/hard token thresholds drive a control loop), not LLM-decided — a useful contrast against Mem0's adaptive updates and Letta's self-editing memory. Two runtime modes (Dolt: evict oldest with ghost-cue off-context retrieval; Upward: recursive bottom-up condensation, default) make the eviction/compaction trade-off explicit and readable.
- Repo: https://github.com/safishamsi/graphify — AI coding assistant skill that turns any folder of code, docs, papers, images, or videos into a queryable knowledge graph
- Maintainer: Safi Shamsi (
safishamsi); MIT licensed; PyPI packagegraphifyy - Why this is here: The codebase-as-memory bet. Where every other
entry models conversational memory (user / session / agent turns),
Graphify models the static corpus an agent works against as a
persistent, queryable knowledge graph and exposes it to the agent via
MCP (
graphify/serve.py:1). The pipeline is single-process and legible end-to-end:detect → extract → build → cluster → analyze → report → export(ARCHITECTURE.md), with tree-sitter parsers for ~30 languages emitting{nodes, edges}dicts that get folded into a NetworkX graph, Leiden community detection via graspologic with a NetworkX-Louvain fallback (graphify/cluster.py:48-76), and per-extraction confidence labels (EXTRACTED/INFERRED/AMBIGUOUS) carried on every edge so the agent can reason about how much to trust each relation. The MCP server exposes graph-query tools (query_graph,get_neighbors,get_community,god_nodes,shortest_path, PR-triage tools) — so "memory" here is structural recall over the project, not episodic recall over a conversation. The interesting contrast is against Graphiti and MemPalace: Graphiti puts a bi-temporal graph DB between agent and memory; MemPalace stores verbatim chunks; Graphify denies the graph-DB premise altogether and ships the graph as a singlegraph.jsonnext to a static HTML viewer. A useful "do we even need a server?" data point in a study otherwise dominated by server-backed designs.
- Repo: https://github.com/plastic-labs/honcho — Build AI agents that truly know your users
- Maintainer: Plastic Labs (
plastic-labsorg); AGPL-3.0 - Why this is here: The only entry in the study that puts a
multi-peer / theory-of-mind model at the center of the design.
Where Mem0 models memories belonging to a user and Letta models
memories editable by an agent, Honcho models what one peer
understands about another —
peers,sessions,messages,workspacesas primary entities, with vector-embedded "internal collections" keyed by the (observer, observed) pair. Write path is append-only messages + async background reasoning: new messages immutably persist, and a deriver worker generates summaries, peer cards, and conclusions out-of-band. This is closer to the "agents building rich models of users" framing that Plastic Labs' research thread (Open Source Honcho, the "Tutor-GPT" lineage) has been chasing since 2023, and it's the cleanest example in the study of separating storage from understanding. Worth comparing the deriver to Volt's deterministic summary thresholds — both are explicit, neither blocks the agent loop.
- Repo: https://github.com/volcengine/OpenViking — An open-source context database for AI agents (file-system paradigm)
- Maintainer: Volcengine (ByteDance's cloud division); AGPL-3.0 main / Apache-2.0 CLI + examples
- Why this is here: The most architecturally distinct entry on the
storage side. OpenViking refuses the flat-vector premise entirely and
organizes context as a filesystem —
viking://resources/,viking://user/,viking://agent/— with three abstraction tiers per node (L0 one-sentence abstract, L1 overview, L2 full data) so retrieval can match at the coarsest tier and drill down on demand. "Directory recursive retrieval" combines vector search with hierarchical navigation. Unifies what every other study entry treats as separate concerns: memories (user / agent task), resources (docs, repos), and skills (agent instructions) are all just nodes in the same tree. ByteDance-scale backing is the operational story (Rust core incrates/, multi-language CLI, Docker deploy). The interesting compare-and-contrast is against Letta's git-backed memory mode (which also renders memory as files but on a flat-ish layout) and against Graphify (which also collapses resource-as-context but as a knowledge graph rather than a tree).
- Repo: https://github.com/vectorize-io/hindsight — Agent memory system that helps AI agents learn over time
- Maintainer: Vectorize.io (
vectorize-ioorg); MIT - Why this is here: The first entry in the study to commit to a
typed memory taxonomy as a first-class axis: every memory is one
of world fact, experience, or mental model. Storage is
deliberately polyglot — vector for semantics, graph for entity /
causal links, time series for temporal context, BM25 for lexical
match — and recall is a three-verb surface (
retain,recall,reflect) wherereflectis the unusual one: the agent analyzes its own existing memories rather than the world. Per-user / per-agent isolation lives in a "bank" abstraction. Ships as a full open-source stack — Python API, Node/Python SDKs, CLI, Docker / Helm — not a wrapper around a hosted service. The compare-and- contrast against Mem0 is direct: same hybrid-retrieval thesis, but Hindsight argues the type of memory (epistemic vs. experiential vs. inferential) should drive write, not just retrieval ranking.
- Repo: https://github.com/RetainDB/RetainDB — Durable memory for AI agents — decisions, preferences, workflows, corrections, project facts, session handoffs that survive across conversations
- Maintainer: RetainDB organization; Apache-2.0 (local / SDK / MCP), BSL-1.1 (server)
- Why this is here: Pushes the typed-memory bet further than
Hindsight — 12+ semantic categories (factual, preference,
procedural, decision, constraint, correction, session-summary, etc.)
— and pairs it with bi-temporal validity (
validFrom/validUntil) so superseding stale facts is an explicit, queryable operation rather than something the recall ranker has to paper over. Hybrid retrieval is "lexical + vector + graph signals + RRF fusion + reranking." Three deployment modes (Local single-machine, Server with Postgres + pgvector, managed Cloud) cover the same "library / server / hosted" trifecta Mem0 ships, but the BSL-licensed server is the flag for anyone planning on commercial hosting. The contrast against Mem0 is direct on the supersession axis (Mem0 has none in OSS; RetainDB makes it a first-class column) and against Neo on the typed-categories-vs-typed-facts axis (Neo carries asuperseded_bychain on a single fact type; RetainDB enumerates categories).
- Repo: https://github.com/supermemoryai/supermemory — The memory and context layer for AI
- Maintainer: supermemoryai research lab; MIT
- Why this is here: The current benchmark-leaderboard occupant —
ranked #1 on LongMemEval, LoCoMo, and ConvoMem as of mid-2026,
the three headline harnesses every other entry in this study reports
against (compare to MemPalace's 96.6% R@5 on LongMemEval and Mem0's
91.6 / 93.4 on LoCoMo / LongMemEval — the leaderboard moves fast).
At ~22.7k stars and 1,600+ commits, the most active OSS entry after
Mem0 / Letta. Design surface is the universal RAG + memory bet:
fact extraction from conversations + user-profile maintenance +
hybrid search + multimodal ingest + external connectors (Google
Drive, Notion, GitHub) that sync content into the same store. Scopes
are
containerTag(typically a user id) plus optional project tags. Claims to handle "temporal changes, contradictions, and automatic forgetting" — which makes the comparison against StateBench (the yardstick) the obvious next step.
- Repo: https://github.com/campfirein/byterover-cli — Memory Hub for AI coding agents to remember
- Maintainer: ByteRover team (
campfireinorg) - License note: Elastic License 2.0 — source-available, not OSI-open-source. Commercial use is restricted. We pin it for study reading; do not vendor the code into a product without reading the license.
- Why this is here: The only entry in the study focused specifically on coding-agent memory as an end-user product (compare to Volt, which is also coding-agent-shaped but ships as a full agent; ByteRover is just the memory hub). Organizes codebase knowledge into a context tree with local file-based storage and optional cloud sync. Project-level scope (per directory), multi-machine access, team sharing via ByteRover Cloud. Integrates with Cursor, Claude Code, and 20+ LLM providers as the consumer surface. The reason it earns a pin despite the licensing caveat: the CLI + REPL + dashboard packaging is a distinct deployment story (most other entries assume the agent is the only consumer; here the human and the agent are both editors of the same context tree) and worth studying as one possible answer to "how does a team of developers share agent memory."
- Repo: https://github.com/gastownhall/beads — Distributed graph issue tracker for AI agents, powered by Dolt
- Maintainer:
gastownhallorg; Go module pathgithub.com/steveyegge/beads(Steve Yegge et al.); MIT. Ships as@beads/bd(npm),beads-mcp(PyPI),beads(Homebrew). - Why this is here: The entry that names the missing axis. Every other
entry models episodic / semantic memory — recall over conversation or
facts. Beads models procedural / working memory: the agent's own
task graph, so a coding agent can survive context compaction, session
resets, and account rotations without losing the plot or redoing work.
And it makes the study's sharpest storage bet — no vector store, no
embeddings, no semantic similarity anywhere. Recall is SQL views +
graph traversal over a typed dependency graph (
blocks,parent-child,supersedes,relates-to,discovered-from, ~19 edge kinds) stored in Dolt — a Git-like, versioned, MySQL-compatible database with cell-level merge and native branching. Hash-based IDs (bd-a1b2= base36 of a SHA256 over title+description+creator+timestamp+nonce,internal/idgen/hash.go) give conflict-free writes across parallel agents and branches — no shared auto-increment counter to collide on. The write path is mutate-the-row + append to an immutableeventsaudit trail; supersession is an explicitsupersedesgraph edge (compare Neo / RetainDB). Eviction is real: closed issues get Claude-Haiku-summarized "memory decay" (internal/compact/) into Summary / Key Decisions / Resolution, with the full original preserved in snapshot tables for recovery. A separate, deliberately dumb memory surface —bd remember/recall/memories/forget(cmd/bd/memory.go) — stores slugified key→value insights in the config KV table and re-injects them every session viabd prime(cmd/bd/prime.go) as a SessionStart hook. On disk, the Dolt DB under.beads/is authoritative;.beads/issues.jsonlis a plain-JSON export for viewing/interchange only — not the source of truth. Local-first and git-optional (BEADS_DIR+--stealth), with peer-to-peer federation carrying data-sovereignty tiers (the only GDPR-tier knob in the study). The cleanest pairing is Beads vs Volt (both coding-agent, both database-backed — and mind the trap: Volt's eviction mode is named "Dolt" while Beads is built on Dolt the DB); the cleanest contrast is Beads vs every vector-backed entry. Deep write-up:context-v/profiles/Profile__Beads.md.
- Repo: https://github.com/Parslee-ai/statebench — Conformance test for stateful AI agents. Measures state correctness over time.
- Maintainer: Parslee AI (
Parslee-aiorg) - Why this is here: A benchmark rather than an implementation — fills a different slot in the study. Pinning it lets us evaluate the memory-layer implementations against a common harness, instead of trusting each project's self-reported numbers. Pair it with Mem0 and neo as the units under test.
These are not yet pinned as submodules. When the study expands, run
git submodule add <url> <slug> from the study root to add any of them.
-
Zep / Graphiti — temporal knowledge graphs for session memory. Integrates with LangChain / LangGraph. The interesting code lives in Graphiti, not in the Zep examples repo.
- https://github.com/getzep/graphiti — Build Real-Time Knowledge Graphs for AI Agents (25k★ — recommended entry point)
- https://github.com/getzep/zep — examples and integrations (4.5k★)
-
Letta — open-source server with self-editing memory, descended from MemGPT. Stateful agents that persist user preferences and survive conversation resets.
-
LangMem — LangChain's memory utilities; summarization for context limits. Smaller surface than the others but tightly integrated with the LangChain stack.
-
Memary — graph-centric memory layer for autonomous agents. Worth comparing schema choices against Mem0 and Graphiti.
-
Cognee — pipelines for RAG-style memory; positions itself as "memory for your AI agents in 6 lines of code." Focus on ingestion ergonomics.
These are large enough that they will dilute the study if added wholesale. Worth referencing only if a specific subdir is what we want to study.
- LangChain — modular memory buffers and summary memory live somewhere in here. https://github.com/langchain-ai/langchain (135k★)
- LlamaIndex — document-integrated memory and retrieval. https://github.com/run-llama/llama_index (49k★)
- Cloudflare Agents — Workers-runtime agent memory and ingestion. https://github.com/cloudflare/agents (4.8k★)
- Start with Mem0's README and
docs/for the topology overview, then walk the source for the write/update path (look for "add", "update", "consolidate" verbs in the public API). - Compare to Graphiti's temporal-graph approach — same problem, different storage primitive.
- Compare to Letta's self-editing memory — same problem, different control primitive (the agent edits memory, not the framework).
By the time those three are read in sequence, the design space should be mostly visible.