The agent-memory space grew up fast in 2025–2026. This page is an honest side-by-side with the money, features, and real numbers — so you can pick the right tool for your job.
TL;DR. If you build a chatbot or a second-brain note app, pick mem0 or Supermemory. If you need a stateful agent runtime, pick Letta. If you want enterprise temporal KG, pick Zep. If you run a coding agent (Claude Code, Codex CLI, Cursor) and you want a local, MCP-native brain that learns how you work — this project is for you.
| Project | Funding (2026) | GitHub | Positioning |
|---|---|---|---|
| mem0 | $24M Series A (YC, AWS exclusive memory for Agent SDK) | ~48k⭐ | Universal memory SDK for LLM apps |
| Letta (ex-MemGPT) | $10M seed | ~17k⭐ | Stateful agent platform (LLM-as-OS, memory blocks) |
| Zep / Graphiti | $12M seed | ~3.5k⭐ (Zep) + ~6k⭐ (Graphiti) | Enterprise temporal KG memory |
| Supermemory | $2.6M seed | ~15k⭐ | Consumer second-brain + SaaS Memory API |
| Cognee | $7.5M seed (OpenAI + FAIR founder checks) | ~7k⭐ | Open-source AI memory engine, GraphRAG |
| LangMem | part of LangChain | — | Long-term memory primitives for LangGraph |
| Claude Memory (Anthropic) | built-in, GA March 2026 | — | In-product memory for Claude consumer tiers |
| ChatGPT Memory (OpenAI) | built-in, since 2024 | — | In-product memory for ChatGPT consumer |
| total-agent-memory | self-funded | this repo | Local-first memory for coding agents via MCP |
Legend: ✅ native · 🟡 partial/addon · ❌ not available · 🔒 gated behind paid plan
| Feature | mem0 | Letta | Zep | Supermemory | Cognee | LangMem | Claude Memory | total-agent-memory |
|---|---|---|---|---|---|---|---|---|
| Open-source core | ✅ | ✅ | 🟡 | 🟡 | ✅ | ✅ | ❌ | ✅ |
| Runs 100% local (no network required) | 🟡 | ✅ | 🟡 | ❌ | 🟡 | 🟡 | ❌ | ✅ |
| MCP server native (Claude Code, Codex, Cursor) | via SDK | ❌ | 🟡 (Graphiti MCP) | 🟡 | ❌ | ❌ | — | ✅ (46 tools) |
| Semantic search (vectors) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hybrid search (BM25 + dense + graph + RRF) | 🟡 | ❌ | ✅ | ✅ | ✅ | ❌ | — | ✅ 6-tier |
| Knowledge graph | 🔒 Pro $249/mo | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
Temporal facts (valid_from/valid_to, point-in-time queries) |
❌ | ❌ | ✅ bi-temporal | ❌ | 🟡 | ❌ | ❌ | ✅ |
| Procedural memory (learns workflows, predicts next task) | 🟡 (API only) | ❌ | ❌ | ❌ | ❌ | 🟡 | ❌ | ✅ workflow_predict |
| Self-improving rules (errors → patterns → auto-rules) | ❌ | ❌ | ❌ | ❌ | 🟡 memify | ❌ | ❌ | ✅ |
| Cross-project analogy (found-similar-in-another-repo) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ analogize |
| AST codebase ingest (tree-sitter, 9 langs) | ❌ | ❌ | ❌ | ❌ | 🟡 docs | ❌ | 🟡 | ✅ ingest_codebase |
| Pre-edit risk warnings (read file hot-spots before Edit) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ file_context |
| Post-bash-error learning | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ learn_error |
| Multi-agent A2A protocol | ❌ | 🟡 | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 3D WebGL live graph dashboard | ❌ | ❌ | 🟡 | ✅ | ❌ | ❌ | ❌ | ✅ |
| Privacy: your data never leaves your machine | 🟡 | ✅ | 🟡 | ❌ | 🟡 | 🟡 | ❌ | ✅ |
| Project | Free tier | Paid | What's gated |
|---|---|---|---|
| mem0 | Vector search only | $19/mo Standard, $249/mo Pro | Graph, multi-hop queries, structured knowledge |
| Letta | OSS full-feature | Managed cloud (custom) | Hosting, SSO |
| Zep | OSS Community | Zep Cloud (from ~$99/mo seat) | Temporal KG, managed Neo4j, support |
| Supermemory | 1M tokens/mo, 10k queries | $0.01 / 1k tokens usage-based | Tokens beyond free tier |
| Cognee | OSS full-feature | Managed Cloud (beta) | Hosting |
| LangMem | OSS | LangSmith / LangGraph Platform | Hosting, observability |
| Claude Memory | free tier included | Pro / Max plan quotas | Larger memory budget |
| ChatGPT Memory | free with ChatGPT | Plus / Team / Enterprise | Larger memory budget |
| total-agent-memory | full, forever | — | — (BYO Ollama for LLM enrichment) |
| Project | p50 | p95 | Source |
|---|---|---|---|
| Supermemory | ~200 ms | <300 ms | supermemory blog |
| Zep | ~100 ms | — | Zep paper (arxiv 2501.13956) |
| LangMem | — | 59.82 s | third-party (atlan.com) |
| total-agent-memory (cold) | 1332 ms | 2314 ms | evals/results-2026-04-17.json |
| total-agent-memory (warm) | 0.065 ms | 2.97 ms | evals/results-2026-04-17.json |
Cold = first invocation after process start (model load + index warmup). Warm = steady-state with HNSW in memory, FTS5 cached, LRU query cache hit.
Run it yourself on your data:
python -m claude_total_memory.cli benchmark
# or via MCP: benchmark()| Project | Benchmark | Score | Avg latency |
|---|---|---|---|
| Mastra "Observational Memory" | LongMemEval | 95.0% | cloud |
total-agent-memory (v7.0, full mode) |
LongMemEval (470 q, public) | R@5 96.2% / R_all 84.5% / NDCG 82.4% | 38.8 ms |
| Supermemory | LongMemEval | 85.4% overall | ~200 ms |
| Zep | DMR (different benchmark) | beats MemGPT SOTA | ~100 ms |
| mem0 | their own eval | >90% on internal tasks | cloud |
| LangMem | — | — | p95 59.82 s |
Reproducible: evals/longmemeval-2026-04-17.json.
Dataset: xiaowu0162/longmemeval-cleaned, 470 questions (abstention type excluded by the official bench).
| Question type | Count | R@5 |
|---|---|---|
| single-session-user | 64 | 100.0% |
| knowledge-update | 72 | 100.0% |
| multi-session | 121 | 96.7% |
| single-session-assistant | 56 | 96.4% |
| temporal-reasoning | 127 | 95.3% |
| single-session-preference | 30 | 80.0% |
Weakest spot is preference tracking (80%) — a known area where richer entity-level profiles help.
Temporal reasoning at 95.3% confirms the bi-temporal KG (kg_at) is pulling its weight.
+10.8 pp over Supermemory's published 85.4% on the same dataset.
How to read this.
recall_any@5= at least one required evidence fragment in top-5.recall_all@5= every required fragment in top-5. Supermemory's 85.4% is their headline number against the full LongMemEval corpus. Our 96.2% is the comparable headline (any match); 84.5% is the strict "all fragments" version, still at parity with their overall.
These are the actual moats. They're the reason a coding agent on your machine plays in a different league than a chatbot memory SaaS.
You start a non-trivial task. The memory layer predicts the approach based on how similar tasks
played out in your past — including which files got touched in what order, which tests failed, which
fix worked. If confidence < 0.3, the agent asks you before diving in. When you finish, you call
workflow_track(outcome) and the predictor learns from success and failure.
Nobody else does this. mem0 stores what you said; this stores how you work.
"Was there something like this in another repo?" One call, Jaccard similarity with Dempster-Shafer fusion across every project you've ever worked on. Instant transfer of hard-won lessons from project A to project B, without you remembering project A existed.
When a bash command fails or an edit backfires, learn_error(file, error, root_cause, fix, pattern)
records it. After N≥3 same patterns across projects it auto-consolidates into a behavioral rule,
surfaced to the agent at next session start via self_rules_context. Your memory stops making
the same mistakes. Literally nobody else ships this out of the box.
- mem0 — You're building a chatbot for customers and want a battle-tested SDK with enterprise support. AWS Agent SDK integration is a huge plus if you live in that ecosystem.
- Letta — You want to build a stateful agent from scratch and you're ok buying into their runtime. Their memory blocks + Sonnet 4.5 self-editing loop is genuinely cool.
- Zep / Graphiti — You're shipping an enterprise assistant where temporal correctness of facts matters (healthcare, finance, compliance). The bi-temporal model is the best in the field.
- Supermemory — You want a personal second brain with a beautiful UI across devices. B2C wins.
- Cognee — You want a research-grade open-source GraphRAG stack and you're willing to integrate it yourself with your agent.
- LangMem — You already use LangGraph and want long-term memory primitives that slot in naturally.
- Claude Memory / ChatGPT Memory — You're a consumer user and just want the chat to remember you.
- total-agent-memory — You are a developer using Claude Code / Codex CLI / Cursor / any MCP agent, and you want: (a) 100% local/private, (b) MCP-native, (c) persistent across sessions and projects, (d) memory that actually learns your workflow, not just stores your conversation.
Last updated: 2026-04-17. Benchmarks reproducible via evals/results-2026-04-17.json and
python -m claude_total_memory.cli benchmark. PRs with corrections welcome — if we got your
project wrong, open an issue.