Skip to content

lossless-group/study-memory-layers-for-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Memory Layers for Agents — Study

The question

How do production memory systems for AI agents structure recall — across vector, graph, and key-value stores; across user, session, and agent scopes; across short-term context and long-term knowledge — and which conventions are converging?

The interesting bits are not the embeddings themselves; those are commodity. The interesting bits are: how does each system decide what to remember, when to update, how to scope, and what shape the memory artifacts take on disk and on the wire.

What we are looking at, repo by repo

When reading each entry below, the working checklist is:

  • Storage topology. Single store or hybrid (vector + graph + KV)? What determines which store handles which class of memory?
  • Write policy. Append-only? Self-editing? LLM-summarized? Reconciled against a graph? When does a memory get superseded vs. merged?
  • Scopes & namespacing. User / session / agent / tenant — how are these modeled, and how does retrieval respect them?
  • Schema of a memory. What fields does a single memory record carry? Timestamps, provenance, confidence, embedding ref, references to other memories?
  • Recall surface. Vector similarity, graph traversal, time-based, hybrid ranking? What's the API the agent actually calls?
  • Eviction & compaction. Is there a policy? A summarizer? A TTL?
  • Serialization on disk. What does the persisted form look like? JSON blobs, parquet, a specific graph format, SQL rows? Could a non-AI program parse it cleanly?
  • Operational story. Local-first? Server-required? Stateless agent + external store?

The design space at a glance

Bet Entry
Memory baked into the model (frozen-backbone transformer adapter) Delta-Mem
Real typed bi-temporal graph Graphiti
Self-editing core memory + paged archival (the MemGPT lineage) Letta
Hybrid vector + entity-link + tight CRUD API Mem0
Verbatim chunks + hybrid recall, no extraction MemPalace
Typed/scoped facts + deterministic supersession + outcome learning Neo
Immutable Postgres log + summary DAG Volt
Folder → in-process NetworkX KG + Leiden + MCP query surface Graphify
Multi-peer "theory of mind" — what A knows about B, derived async Honcho
Context as a filesystem; L0/L1/L2 abstract tiers; unified memories + resources + skills OpenViking
Typed memory taxonomy (world facts / experiences / mental models) + bank-based scoping Hindsight
12+ semantic categories with bi-temporal validity (validFrom/validUntil) RetainDB
Universal RAG + memory layer, currently top of LongMemEval / LoCoMo / ConvoMem Supermemory
Coding-agent context tree (CLI + optional cloud sync; Elastic-licensed) ByteRover CLI
Procedural/working memory — versioned dependency-typed task graph on a Git-like SQL DB (Dolt), no vectors Beads
Conformance benchmark (the yardstick) StateBench

Deep per-entry write-ups live in context-v/profiles/.

In the study now

  • Repo: https://github.com/mem0ai/mem0Universal memory layer for AI Agents
  • Maintainer: Mem0.ai (mem0ai org)
  • Why this is here: Pioneers a composable hybrid architecture (vector + graph + KV store) with adaptive updates. Multi-level recall across user / session / agent scopes is its headline. Reported +26% accuracy over OpenAI memory and 91% faster responses in their benchmarks. Most starred entry in this space (54k+) and the most explicit about being a layer rather than a framework.
  • Repo: https://github.com/Parslee-ai/neoA self-improving code reasoning engine with persistent semantic memory
  • Maintainer: Parslee AI (Parslee-ai org)
  • Why this is here: A reasoning engine, not a framework — the memory is the point, not a feature bolted on. Worth comparing schema and write-policy choices against Mem0 and Graphiti. Smaller surface than the major players, which makes it easier to read end-to-end.
  • Repo: https://github.com/declare-lab/delta-Memδ-mem: Efficient Online Memory for Large Language Models
  • Maintainer: declare-lab (SUTD) — Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, Soujanya Poria
  • Why this is here: The architectural counter-bet to every other entry. While Mem0, MemPalace, Graphiti, Neo, and Volt all answer "where does the agent store and look things up," Delta-Mem answers "what if memory was part of the model's forward pass?" A frozen-backbone transformer adapter that gives each attention head a low-rank dense state matrix updated by a learned delta rule (S_{t+1} = λ·S_t − β·(S_t·k_t)⊗k_t + β·v_t⊗k_t), with three temporal write granularities (TSW/SSW/MSW), a Triton-accelerated affine scan, and a public Qwen3-4B-Instruct adapter on Hugging Face. Released alongside arXiv:2605.12357. Evaluated on LoCoMo, HotpotQA, IFEval, GPQA Diamond, and MemoryAgentBench. This is a research artifact (not a deployable library), and that's the point — it forces the question "is agent memory even a retrieval problem?" that the system-level entries quietly assume.
  • Repo: https://github.com/letta-ai/lettaPlatform for building stateful agents: AI with advanced memory that can learn and self-improve over time
  • Maintainer: letta-ai (formerly MemGPT-ai; Charles Packer, Sarah Wooders et al.)
  • Why this is here: The direct successor to MemGPT — the 2023 Berkeley paper that named the agent-memory problem and shipped the OS-inspired hierarchical-memory pattern (core context = RAM, archival = disk, recall = paging). The MemGPT lineage is the framing fact. Letta is the operational productization: every agent is a persistent Postgres row, core memory is a set of agent-editable Blocks rendered into the system prompt, archival memory is a paginated pgvector-backed passage store with semantic search. The agent edits its own memory via core_memory_append / core_memory_replace — the headline MemGPT design move that no other entry in the study replicates. Ships as a FastAPI + Postgres + pgvector compose stack with REST, WebSocket, and an OpenAI-compatible /v1/chat/completions endpoint so a Letta agent looks like a ChatGPT-shaped model to any client. Multi-agent block sharing is a database join (a single Block row can belong to many agents). The newer git-backed memory mode renders blocks as files (system/persona.md, skills/.../SKILL.md) and turns every memory edit into a commit — making it the closest entry in the study to our own context-vigilance discipline, but with the agent as committer.
  • Repo: https://github.com/getzep/graphitiBuild Real-Time Knowledge Graphs for AI Agents
  • Maintainer: Zep Software, Inc. (Paul Paliychuk, Preston Rasmussen, Daniel Chalef)
  • Why this is here: The only entry in the study that puts a real, typed, bi-temporal graph between the agent and its memories — four interchangeable Cypher-flavoured backends (Neo4j, FalkorDB, Kuzu, Neptune), a single-LLM-call episode-to-graph extraction pipeline with Pydantic-typed entities, hybrid BM25 + cosine + BFS recall with cross-encoder reranking, and label-propagation community detection. Every edge carries both valid_at/invalid_at (real-world validity) and created_at/expired_at (system time) — textbook bi-temporality brought to agent memory. Backed by a peer-reviewed paper (Zep: A Temporal Knowledge Graph Architecture for Agent Memory). The interesting comparison is Graphiti vs MemPalace: opposite bets on structure (MemPalace says structure is over-engineered; Graphiti says it's under-engineered).
  • Repo: https://github.com/MemPalace/mempalaceThe best-benchmarked open-source AI memory system. And it's free.
  • Maintainer: MemPalace Contributors (MemPalace org; milla-jovovich, @bensig)
  • Why this is here: The most direct counter-bet to Mem0 in the study. Same problem space (give agents persistent memory), nearly opposite design choice at the write step: MemPalace stores verbatim text — no LLM-driven extraction, no summarization — in ChromaDB (mempalace_drawers + mempalace_closets) plus a local SQLite temporal entity graph. Recall is hybrid (semantic + BM25 + closet-boost, closets signal never gate), and the published benchmarks (benchmarks/BENCHMARKS.md) report 96.6% R@5 on LongMemEval with zero LLM calls at query time, and 92.9% vs Mem0's 30–45% on ConvoMem — a ~2× margin attributed directly to extraction losing information. Inspired by Zettelkasten + Method of Loci (MISSION.md). The benchmark-honesty note in BENCHMARKS.md:70-95 (the 100% headline involved teaching-to-test; the held-out figure is 98.4%) is worth reading in its own right.
  • Repo: https://github.com/Martian-Engineering/voltCoding agent with lossless context management
  • Maintainer: Martian Engineering (Martian-Engineering org)
  • Why this is here: A coding agent built around Lossless Context Management (LCM) — a dual-state design where every user message, assistant response, and tool result is persisted verbatim (immutable store) and the active context is assembled from recent raw messages plus precomputed summary nodes. Storage is a DAG in embedded Postgres (voltcode_lcm, optional external via LCM_DATABASE_URL). The write/summarize policy is deterministic (soft/hard token thresholds drive a control loop), not LLM-decided — a useful contrast against Mem0's adaptive updates and Letta's self-editing memory. Two runtime modes (Dolt: evict oldest with ghost-cue off-context retrieval; Upward: recursive bottom-up condensation, default) make the eviction/compaction trade-off explicit and readable.
  • Repo: https://github.com/safishamsi/graphifyAI coding assistant skill that turns any folder of code, docs, papers, images, or videos into a queryable knowledge graph
  • Maintainer: Safi Shamsi (safishamsi); MIT licensed; PyPI package graphifyy
  • Why this is here: The codebase-as-memory bet. Where every other entry models conversational memory (user / session / agent turns), Graphify models the static corpus an agent works against as a persistent, queryable knowledge graph and exposes it to the agent via MCP (graphify/serve.py:1). The pipeline is single-process and legible end-to-end: detect → extract → build → cluster → analyze → report → export (ARCHITECTURE.md), with tree-sitter parsers for ~30 languages emitting {nodes, edges} dicts that get folded into a NetworkX graph, Leiden community detection via graspologic with a NetworkX-Louvain fallback (graphify/cluster.py:48-76), and per-extraction confidence labels (EXTRACTED / INFERRED / AMBIGUOUS) carried on every edge so the agent can reason about how much to trust each relation. The MCP server exposes graph-query tools (query_graph, get_neighbors, get_community, god_nodes, shortest_path, PR-triage tools) — so "memory" here is structural recall over the project, not episodic recall over a conversation. The interesting contrast is against Graphiti and MemPalace: Graphiti puts a bi-temporal graph DB between agent and memory; MemPalace stores verbatim chunks; Graphify denies the graph-DB premise altogether and ships the graph as a single graph.json next to a static HTML viewer. A useful "do we even need a server?" data point in a study otherwise dominated by server-backed designs.
  • Repo: https://github.com/plastic-labs/honchoBuild AI agents that truly know your users
  • Maintainer: Plastic Labs (plastic-labs org); AGPL-3.0
  • Why this is here: The only entry in the study that puts a multi-peer / theory-of-mind model at the center of the design. Where Mem0 models memories belonging to a user and Letta models memories editable by an agent, Honcho models what one peer understands about anotherpeers, sessions, messages, workspaces as primary entities, with vector-embedded "internal collections" keyed by the (observer, observed) pair. Write path is append-only messages + async background reasoning: new messages immutably persist, and a deriver worker generates summaries, peer cards, and conclusions out-of-band. This is closer to the "agents building rich models of users" framing that Plastic Labs' research thread (Open Source Honcho, the "Tutor-GPT" lineage) has been chasing since 2023, and it's the cleanest example in the study of separating storage from understanding. Worth comparing the deriver to Volt's deterministic summary thresholds — both are explicit, neither blocks the agent loop.
  • Repo: https://github.com/volcengine/OpenVikingAn open-source context database for AI agents (file-system paradigm)
  • Maintainer: Volcengine (ByteDance's cloud division); AGPL-3.0 main / Apache-2.0 CLI + examples
  • Why this is here: The most architecturally distinct entry on the storage side. OpenViking refuses the flat-vector premise entirely and organizes context as a filesystemviking://resources/, viking://user/, viking://agent/ — with three abstraction tiers per node (L0 one-sentence abstract, L1 overview, L2 full data) so retrieval can match at the coarsest tier and drill down on demand. "Directory recursive retrieval" combines vector search with hierarchical navigation. Unifies what every other study entry treats as separate concerns: memories (user / agent task), resources (docs, repos), and skills (agent instructions) are all just nodes in the same tree. ByteDance-scale backing is the operational story (Rust core in crates/, multi-language CLI, Docker deploy). The interesting compare-and-contrast is against Letta's git-backed memory mode (which also renders memory as files but on a flat-ish layout) and against Graphify (which also collapses resource-as-context but as a knowledge graph rather than a tree).
  • Repo: https://github.com/vectorize-io/hindsightAgent memory system that helps AI agents learn over time
  • Maintainer: Vectorize.io (vectorize-io org); MIT
  • Why this is here: The first entry in the study to commit to a typed memory taxonomy as a first-class axis: every memory is one of world fact, experience, or mental model. Storage is deliberately polyglot — vector for semantics, graph for entity / causal links, time series for temporal context, BM25 for lexical match — and recall is a three-verb surface (retain, recall, reflect) where reflect is the unusual one: the agent analyzes its own existing memories rather than the world. Per-user / per-agent isolation lives in a "bank" abstraction. Ships as a full open-source stack — Python API, Node/Python SDKs, CLI, Docker / Helm — not a wrapper around a hosted service. The compare-and- contrast against Mem0 is direct: same hybrid-retrieval thesis, but Hindsight argues the type of memory (epistemic vs. experiential vs. inferential) should drive write, not just retrieval ranking.
  • Repo: https://github.com/RetainDB/RetainDBDurable memory for AI agents — decisions, preferences, workflows, corrections, project facts, session handoffs that survive across conversations
  • Maintainer: RetainDB organization; Apache-2.0 (local / SDK / MCP), BSL-1.1 (server)
  • Why this is here: Pushes the typed-memory bet further than Hindsight — 12+ semantic categories (factual, preference, procedural, decision, constraint, correction, session-summary, etc.) — and pairs it with bi-temporal validity (validFrom / validUntil) so superseding stale facts is an explicit, queryable operation rather than something the recall ranker has to paper over. Hybrid retrieval is "lexical + vector + graph signals + RRF fusion + reranking." Three deployment modes (Local single-machine, Server with Postgres + pgvector, managed Cloud) cover the same "library / server / hosted" trifecta Mem0 ships, but the BSL-licensed server is the flag for anyone planning on commercial hosting. The contrast against Mem0 is direct on the supersession axis (Mem0 has none in OSS; RetainDB makes it a first-class column) and against Neo on the typed-categories-vs-typed-facts axis (Neo carries a superseded_by chain on a single fact type; RetainDB enumerates categories).
  • Repo: https://github.com/supermemoryai/supermemoryThe memory and context layer for AI
  • Maintainer: supermemoryai research lab; MIT
  • Why this is here: The current benchmark-leaderboard occupant — ranked #1 on LongMemEval, LoCoMo, and ConvoMem as of mid-2026, the three headline harnesses every other entry in this study reports against (compare to MemPalace's 96.6% R@5 on LongMemEval and Mem0's 91.6 / 93.4 on LoCoMo / LongMemEval — the leaderboard moves fast). At ~22.7k stars and 1,600+ commits, the most active OSS entry after Mem0 / Letta. Design surface is the universal RAG + memory bet: fact extraction from conversations + user-profile maintenance + hybrid search + multimodal ingest + external connectors (Google Drive, Notion, GitHub) that sync content into the same store. Scopes are containerTag (typically a user id) plus optional project tags. Claims to handle "temporal changes, contradictions, and automatic forgetting" — which makes the comparison against StateBench (the yardstick) the obvious next step.
  • Repo: https://github.com/campfirein/byterover-cliMemory Hub for AI coding agents to remember
  • Maintainer: ByteRover team (campfirein org)
  • License note: Elastic License 2.0 — source-available, not OSI-open-source. Commercial use is restricted. We pin it for study reading; do not vendor the code into a product without reading the license.
  • Why this is here: The only entry in the study focused specifically on coding-agent memory as an end-user product (compare to Volt, which is also coding-agent-shaped but ships as a full agent; ByteRover is just the memory hub). Organizes codebase knowledge into a context tree with local file-based storage and optional cloud sync. Project-level scope (per directory), multi-machine access, team sharing via ByteRover Cloud. Integrates with Cursor, Claude Code, and 20+ LLM providers as the consumer surface. The reason it earns a pin despite the licensing caveat: the CLI + REPL + dashboard packaging is a distinct deployment story (most other entries assume the agent is the only consumer; here the human and the agent are both editors of the same context tree) and worth studying as one possible answer to "how does a team of developers share agent memory."
  • Repo: https://github.com/gastownhall/beadsDistributed graph issue tracker for AI agents, powered by Dolt
  • Maintainer: gastownhall org; Go module path github.com/steveyegge/beads (Steve Yegge et al.); MIT. Ships as @beads/bd (npm), beads-mcp (PyPI), beads (Homebrew).
  • Why this is here: The entry that names the missing axis. Every other entry models episodic / semantic memory — recall over conversation or facts. Beads models procedural / working memory: the agent's own task graph, so a coding agent can survive context compaction, session resets, and account rotations without losing the plot or redoing work. And it makes the study's sharpest storage bet — no vector store, no embeddings, no semantic similarity anywhere. Recall is SQL views + graph traversal over a typed dependency graph (blocks, parent-child, supersedes, relates-to, discovered-from, ~19 edge kinds) stored in Dolt — a Git-like, versioned, MySQL-compatible database with cell-level merge and native branching. Hash-based IDs (bd-a1b2 = base36 of a SHA256 over title+description+creator+timestamp+nonce, internal/idgen/hash.go) give conflict-free writes across parallel agents and branches — no shared auto-increment counter to collide on. The write path is mutate-the-row + append to an immutable events audit trail; supersession is an explicit supersedes graph edge (compare Neo / RetainDB). Eviction is real: closed issues get Claude-Haiku-summarized "memory decay" (internal/compact/) into Summary / Key Decisions / Resolution, with the full original preserved in snapshot tables for recovery. A separate, deliberately dumb memory surface — bd remember / recall / memories / forget (cmd/bd/memory.go) — stores slugified key→value insights in the config KV table and re-injects them every session via bd prime (cmd/bd/prime.go) as a SessionStart hook. On disk, the Dolt DB under .beads/ is authoritative; .beads/issues.jsonl is a plain-JSON export for viewing/interchange only — not the source of truth. Local-first and git-optional (BEADS_DIR + --stealth), with peer-to-peer federation carrying data-sovereignty tiers (the only GDPR-tier knob in the study). The cleanest pairing is Beads vs Volt (both coding-agent, both database-backed — and mind the trap: Volt's eviction mode is named "Dolt" while Beads is built on Dolt the DB); the cleanest contrast is Beads vs every vector-backed entry. Deep write-up: context-v/profiles/Profile__Beads.md.
  • Repo: https://github.com/Parslee-ai/statebenchConformance test for stateful AI agents. Measures state correctness over time.
  • Maintainer: Parslee AI (Parslee-ai org)
  • Why this is here: A benchmark rather than an implementation — fills a different slot in the study. Pinning it lets us evaluate the memory-layer implementations against a common harness, instead of trusting each project's self-reported numbers. Pair it with Mem0 and neo as the units under test.

Candidates to add (verified to exist on GitHub)

These are not yet pinned as submodules. When the study expands, run git submodule add <url> <slug> from the study root to add any of them.

Memory-specialized

  • Zep / Graphiti — temporal knowledge graphs for session memory. Integrates with LangChain / LangGraph. The interesting code lives in Graphiti, not in the Zep examples repo.

  • Letta — open-source server with self-editing memory, descended from MemGPT. Stateful agents that persist user preferences and survive conversation resets.

  • LangMem — LangChain's memory utilities; summarization for context limits. Smaller surface than the others but tightly integrated with the LangChain stack.

  • Memary — graph-centric memory layer for autonomous agents. Worth comparing schema choices against Mem0 and Graphiti.

  • Cognee — pipelines for RAG-style memory; positions itself as "memory for your AI agents in 6 lines of code." Focus on ingestion ergonomics.

Broader frameworks (not memory-first, but ship memory primitives)

These are large enough that they will dilute the study if added wholesale. Worth referencing only if a specific subdir is what we want to study.


Reading order suggestion

  1. Start with Mem0's README and docs/ for the topology overview, then walk the source for the write/update path (look for "add", "update", "consolidate" verbs in the public API).
  2. Compare to Graphiti's temporal-graph approach — same problem, different storage primitive.
  3. Compare to Letta's self-editing memory — same problem, different control primitive (the agent edits memory, not the framework).

By the time those three are read in sequence, the design space should be mostly visible.

About

A study of production memory systems for AI agents — storage topology, write policy, scopes, schema, and recall surface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors