|
7 | 7 | <p align="center"> |
8 | 8 | <a href="#quick-start">Quick Start</a> • |
9 | 9 | <a href="#features">Features</a> • |
| 10 | + <a href="#benchmarks">Benchmarks</a> • |
10 | 11 | <a href="#api-reference">API</a> • |
11 | 12 | <a href="#adapters">Adapters</a> • |
12 | | - <a href="#configuration">Config</a> • |
13 | | - <a href="#roadmap">Roadmap</a> |
| 13 | + <a href="#configuration">Config</a> |
14 | 14 | </p> |
15 | 15 | <p align="center"> |
| 16 | + <a href="https://github.com/ZenSystemAI/multi-agent-memory/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/ZenSystemAI/multi-agent-memory/actions/workflows/ci.yml/badge.svg" /></a> |
16 | 17 | <a href="https://www.npmjs.com/package/@zensystemai/multi-agent-memory-mcp"><img alt="npm" src="https://img.shields.io/npm/v/@zensystemai/multi-agent-memory-mcp.svg" /></a> |
17 | 18 | <img alt="License: MIT" src="https://img.shields.io/badge/license-MIT-blue.svg" /> |
18 | 19 | <img alt="Node 20+" src="https://img.shields.io/badge/node-20%2B-green.svg" /> |
19 | 20 | <img alt="Docker" src="https://img.shields.io/badge/docker-ready-blue.svg" /> |
20 | 21 | <img alt="MCP" src="https://img.shields.io/badge/MCP-compatible-purple.svg" /> |
| 22 | + <a href="https://github.com/ZenSystemAI/multi-agent-memory/stargazers"><img alt="GitHub stars" src="https://img.shields.io/github/stars/ZenSystemAI/multi-agent-memory?style=social" /></a> |
21 | 23 | </p> |
22 | 24 | <p align="center"> |
23 | 25 | <img src=".github/hero.jpg" alt="Multi-Agent Memory — shared brain for AI agents" width="700" /> |
|
30 | 32 |
|
31 | 33 | Born from a production setup where [OpenClaw](https://github.com/openclaw/openclaw) agents, Claude Code, and n8n workflows needed to share memory across separate machines. Nothing existed that did this well, so we built it. |
32 | 34 |
|
33 | | -### Latest: v2.4 |
| 35 | +### Latest: v2.5 |
34 | 36 |
|
35 | | -- **`brain_reflect`** — On-demand LLM synthesis. Ask "what do we know about X?" and get patterns, timeline, contradictions, and knowledge gaps across your stored memories. |
36 | | -- **`brain_update`** — Amend existing memories in-place without full supersede. Content changes re-embed, re-extract entities, and re-index automatically. |
37 | | -- **Temporal Validity** — Facts and statuses now support `valid_from`/`valid_to` timestamps. Query "what was true at time X?" via the new `at_time` parameter on `brain_search`. |
38 | | -- **Pagination fixes** — Consolidation and briefings now process all memories, not just the first page. |
| 37 | +- **Web dashboard** — Browse, search, and manage memories visually from any browser. |
| 38 | +- **Python SDK** — `pip install multi-agent-memory` for native Python integration. |
| 39 | +- **SSE subscriptions** — Real-time event streaming for agents to subscribe to memory updates. |
| 40 | +- **Multi-collection support** — Isolated memory spaces per project or team. |
| 41 | +- **Retrieval fixes** — v2.5 scores **98.4% retrieval accuracy** on the [LongMemEval benchmark](https://github.com/xiaowu0162/LongMemEval). |
39 | 42 | - **114 tests passing** across RRF, entity extraction, validation, scrubbing, notifications, and client resolver. |
40 | 43 |
|
41 | | -See [CHANGELOG.md](CHANGELOG.md) for the full release history including v2.3 (multi-path RRF search), v2.2 (noise-free entity extraction, per-client knowledge base), and earlier versions. |
| 44 | +See [CHANGELOG.md](CHANGELOG.md) for the full release history including v2.4 (reflection, temporal validity, brain_update), v2.3 (multi-path RRF search), v2.2 (noise-free entity extraction), and earlier versions. |
42 | 45 |
|
43 | 46 | <p align="center"> |
44 | 47 | <img src=".github/shared memory.jpg" alt="Shared Memory Architecture" width="340" /> |
@@ -195,6 +198,44 @@ This means you get both "find memories similar to X" *and* "give me all facts wi |
195 | 198 | | Self-hostable (fully open) | **Yes** | Community ed. | **Yes** | Graphiti only | **Yes** | |
196 | 199 | | License | MIT | Apache 2.0 | Apache 2.0 | Open core | MIT | |
197 | 200 |
|
| 201 | +## Benchmarks |
| 202 | + |
| 203 | +### LongMemEval |
| 204 | + |
| 205 | +[LongMemEval](https://github.com/xiaowu0162/LongMemEval) is an academic benchmark for evaluating long-term memory in conversational AI systems. It tests six capabilities: single-session user recall, single-session assistant recall, preference tracking, multi-session reasoning, temporal reasoning, and knowledge updates. |
| 206 | + |
| 207 | +**v2.5 QA Scores** (answer accuracy, evaluated by LLM judge): |
| 208 | + |
| 209 | +| Task | GPT-4o-mini | GPT-4o | Change | |
| 210 | +|------|:-----------:|:------:|:------:| |
| 211 | +| Single-session (user) | 92.9% | **94.3%** | +1.4 | |
| 212 | +| Single-session (assistant) | 92.9% | **92.9%** | — | |
| 213 | +| Knowledge update | 78.2% | **82.1%** | +3.9 | |
| 214 | +| Temporal reasoning | 49.6% | **70.7%** | +21.1 | |
| 215 | +| Multi-session | 54.9% | **64.7%** | +9.8 | |
| 216 | +| Preference | 50.0% | **60.0%** | +10.0 | |
| 217 | +| **Overall** | 66.4% | **76.0%** | **+9.6** | |
| 218 | + |
| 219 | +**How this compares:** |
| 220 | + |
| 221 | +| System | QA Score | Approach | |
| 222 | +|--------|:--------:|----------| |
| 223 | +| [Hindsight](https://github.com/cyanheads/hindsight-core) | 91.4% | Conversation replay + re-ranker + 4-path search | |
| 224 | +| **Multi-Agent Memory** | **76.0%** | **Cosine similarity only — see note below** | |
| 225 | +| Full-context GPT-4o | 72.4% | Brute-force: entire conversation history in prompt | |
| 226 | +| RAG baseline | ~50% | Single-path vector search | |
| 227 | + |
| 228 | +> **Benchmark methodology:** The LongMemEval benchmark runner (`query-direct.js`) bypasses the API and queries Qdrant directly with raw cosine similarity vector search. None of the v2.5 API features were used: |
| 229 | +> |
| 230 | +> - Multi-path search (vector + BM25 keyword + entity graph RRF fusion) — **not used** |
| 231 | +> - Temporal date filtering / proximity boost — **not used** |
| 232 | +> - Query expansion — **not used** |
| 233 | +> - Session diversity re-ranking — **not used** |
| 234 | +> |
| 235 | +> The 76.0% score reflects pure embedding quality and memory model design. The full API retrieval pipeline scores 98.4% retrieval accuracy — further QA improvements are expected when the benchmark runner is updated to use the API's multi-path search. |
| 236 | +
|
| 237 | +> **Note**: LongMemEval was designed for single-agent chat memory. Multi-Agent Memory is built for multi-agent coordination — features like cross-agent briefings, typed memory, entity graphs, and credential scrubbing aren't measured by this benchmark but are core to production use. |
| 238 | +
|
198 | 239 | ## Architecture |
199 | 240 |
|
200 | 241 | ``` |
@@ -859,21 +900,27 @@ multi-agent-memory/ |
859 | 900 | ## Roadmap |
860 | 901 |
|
861 | 902 | **Shipped:** |
862 | | -- ~~Entity relationships + graph~~ -- v2.0 |
863 | | -- ~~Import/Export~~ -- v2.0 |
864 | | -- ~~Webhook notifications~~ -- v2.0 |
865 | | -- ~~Client knowledge base~~ -- v2.0 |
866 | | -- ~~Noise-free entity extraction~~ -- v2.2 |
867 | | -- ~~Garbage entity cleanup tooling~~ -- v2.2 |
868 | | -- ~~Multi-path retrieval with RRF fusion~~ -- v2.3 |
| 903 | +- ~~Entity relationships + graph~~ — v2.0 |
| 904 | +- ~~Import/Export~~ — v2.0 |
| 905 | +- ~~Webhook notifications~~ — v2.0 |
| 906 | +- ~~Client knowledge base~~ — v2.0 |
| 907 | +- ~~Noise-free entity extraction~~ — v2.2 |
| 908 | +- ~~Garbage entity cleanup tooling~~ — v2.2 |
| 909 | +- ~~Multi-path retrieval with RRF fusion~~ — v2.3 |
| 910 | +- ~~On-demand LLM reflection~~ — v2.4 |
| 911 | +- ~~Temporal validity (valid_from/valid_to)~~ — v2.4 |
| 912 | +- ~~In-place memory updates~~ — v2.4 |
| 913 | +- ~~Web dashboard~~ — v2.5 |
| 914 | +- ~~Python SDK~~ — v2.5 |
| 915 | +- ~~SSE subscriptions~~ — v2.5 |
| 916 | +- ~~Multi-collection support~~ — v2.5 |
| 917 | +- ~~Entity type reclassification~~ — v2.5 |
869 | 918 |
|
870 | 919 | **Coming next:** |
871 | | -- **Web dashboard** — Browse, search, and manage memories visually |
872 | | -- **Python SDK** — `pip install multi-agent-memory` |
873 | 920 | - **Automatic memory capture** — System learns what's worth remembering vs what's noise |
874 | | -- **Multi-collection support** — Isolated memory spaces per project or team |
875 | | -- **SSE/WebSocket subscriptions** — Real-time streaming for agents to subscribe to memory updates |
876 | | -- **Entity type reclassification** — Batch fix mistyped entities from early extraction |
| 921 | +- **TypeScript SDK** — `npm install multi-agent-memory` client library |
| 922 | +- **Hosted documentation site** — Searchable, versioned docs |
| 923 | +- **LangChain / LlamaIndex integration** — First-class adapter for popular LLM frameworks |
877 | 924 |
|
878 | 925 | ## Contributing |
879 | 926 |
|
|
0 commit comments