Add RLM + MAGMA theoretical foundations to DESIGN docs

Grivn · claude · Grivn · commit 008f6fd455d4 · 2026-02-21T22:15:17.000+08:00
Position Mnemon at the intersection of two research directions:
- RLM paradigm (LLM as orchestrator of external environments)
- MAGMA methodology (four-graph memory architecture)
- Mnemon's own engineering bridge (CLI protocol, hook lifecycle,
  zero-dependency distribution)

Changes across README (EN/ZH) and DESIGN (EN/ZH):
- DESIGN intro: lead with LLM-Supervised philosophy, not paper names
- DESIGN §2.4: new Theoretical Foundations section with 5-layer table
- DESIGN §5: connect MAGMA to RLM paradigm context
- DESIGN §12: reference RLM for two-stage validation pattern
- README: add References section (bottom, before License)
- Flatten ToC to top-level entries only

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -203,11 +203,18 @@ make help           # show all targets
 
 ## Documentation
 
-- [Design & Architecture](docs/DESIGN.md) — philosophy, MAGMA four-graph model, algorithms, integration design
+- [Design & Architecture](docs/DESIGN.md) — philosophy, algorithms, integration design
 - [Usage & Reference](docs/USAGE.md) — CLI commands, embedding support, architecture overview
 - [Architecture Diagrams](docs/diagrams/) — system architecture, pipelines, lifecycle management
 - [中文文档](docs/zh/) — Chinese documentation
 
+## References
+
+Mnemon combines the paradigm of one paper with the methodology of another. See [Theoretical Foundations](docs/DESIGN.md#24-theoretical-foundations) for details.
+
+- **RLM** — Zhang, Kraska & Khattab. [Recursive Language Models](https://arxiv.org/abs/2512.24601). 2025. Establishes the paradigm: LLMs are more effective as orchestrators of external environments than as direct data processors.
+- **MAGMA** — Zou et al. [A Multi-Graph based Agentic Memory Architecture](https://arxiv.org/abs/2601.03236). 2025. Provides the methodology: four-graph model (temporal, entity, causal, semantic) with intent-adaptive retrieval.
+
 ## License
 
 [MIT](LICENSE)
diff --git a/docs/DESIGN.md b/docs/DESIGN.md
@@ -4,7 +4,7 @@
 >
 > The word shares its root with Mnemosyne (Μνημοσύνη), the goddess of memory — from her union with Zeus the nine Muses were born, symbolizing memory as the wellspring of all knowledge and creativity.
 
-Mnemon is a persistent memory system designed for LLM agents. It implements the four-graph architecture from the [MAGMA](https://arxiv.org/abs/2601.03236) (Multi-Graph Agentic Memory Architecture) paper as a single Go binary + SQLite, with no external API dependencies.
+Mnemon is a persistent memory system designed for LLM agents. It adopts the **LLM-Supervised** pattern: the host LLM acts as external orchestrator of a standalone memory binary through symbolic CLI interfaces, while the binary handles deterministic storage, graph indexing, and lifecycle management. Memory is organized as a four-graph knowledge structure with temporal, entity, causal, and semantic edges. Implemented as a single Go binary + SQLite, with no external API dependencies.
 
 This document describes Mnemon's design philosophy, core concepts, system architecture, and key algorithms in detail.
 
@@ -16,8 +16,6 @@ This document describes Mnemon's design philosophy, core concepts, system archit
 - [2. Design Philosophy](#2-design-philosophy)
 - [3. Core Concepts](#3-core-concepts)
 - [4. System Architecture](#4-system-architecture)
-  - [4.1 Data Directory Layout](#41-data-directory-layout)
-  - [4.2 Store Isolation](#42-store-isolation)
 - [5. MAGMA Four-Graph Model](#5-magma-four-graph-model)
 - [6. Write Pipeline: Remember](#6-write-pipeline-remember)
 - [7. Read Pipeline: Smart Recall](#7-read-pipeline-smart-recall)
@@ -126,6 +124,40 @@ Binary encapsulates all logic that does not require an LLM; Skill only teaches t
 - **The memory layer is the only part worth deep investment** — memory has a compound interest effect; it is the dividing line between an agent as a "tool" versus an "assistant"
 - **The LLM itself is the best orchestrator** — no need for Python DAG orchestration of call chains; the LLM reads the Skill and knows what to do
 
+### 2.4 Theoretical Foundations
+
+Mnemon's design draws on the **paradigm** of one paper and the **methodology** of another, while making its own engineering choices for the bridge between them.
+
+**RLM Paradigm: LLM as Orchestrator**
+
+The [Recursive Language Models](https://arxiv.org/abs/2512.24601) paper (Zhang, Kraska & Khattab, MIT 2025) establishes the paradigm that LLMs are more effective as orchestrators of external structured environments than as direct data processors. The paper's key findings at the paradigm level:
+
+- An 8B model handles inputs **100x beyond its context window** by treating data as external environment variables
+- **Two-stage pipelines** (fast filtering + LLM semantic verification) consistently outperform single-pass approaches
+- Passing **constant-size metadata** — not raw data — to the model is more effective
+
+The RLM paper's own implementation uses **code generation + Python REPL** as the interaction mechanism: the LLM writes Python code, a sandbox executes it, and results feed back. Mnemon shares the paradigm but takes a different path at the protocol level (see below).
+
+**MAGMA Methodology: Four-Graph Memory Architecture**
+
+The [MAGMA](https://arxiv.org/abs/2601.03236) paper provides the concrete methodology for **what the external environment should contain**. Its key contribution: a single edge type (e.g., vector similarity) is insufficient for memory — different query intents require different relational perspectives. MAGMA's four-graph architecture (temporal, entity, causal, semantic) with intent-adaptive retrieval and multi-signal fusion gives Mnemon its data model and retrieval algorithms.
+
+**Mnemon's Own Contribution: The Engineering Bridge**
+
+Neither paper addresses how to connect an LLM orchestrator to a graph-structured memory in production. Mnemon fills this gap:
+
+| Layer | Source | Choice |
+|---|---|---|
+| **Paradigm** — who orchestrates? | RLM | The host LLM, not an embedded model |
+| **Methodology** — what's in the environment? | MAGMA | Four-graph with intent-adaptive retrieval |
+| **Protocol** — how do they talk? | Mnemon | CLI commands + structured JSON (not code generation) |
+| **Lifecycle** — how does memory evolve? | Mnemon | Hook-driven remember → diff → link → gc |
+| **Distribution** — how to ship it? | Mnemon | Single Go binary, zero dependencies |
+
+Where the RLM implementation relies on code generation in a sandboxed REPL (flexible but requires a runtime and raises safety concerns), Mnemon uses deterministic CLI commands as the symbolic interface — constrained, but auditable, portable, and zero-sandbox. Where MAGMA's reference implementation is a Python library with in-memory NetworkX graphs, Mnemon persists everything in SQLite with a complete write-back lifecycle.
+
+The result is: **RLM's paradigm + MAGMA's methodology + a CLI-native engineering path** that runs on any LLM CLI without Python, without sandboxes, without API keys.
+
 ![LLM-Supervised Architecture](diagrams/05-llm-supervised.jpg)
 
 ![System Architecture](diagrams/01-system-architecture.jpg)
@@ -359,7 +391,7 @@ This layered design serves different scenarios:
 
 ## 5. MAGMA Four-Graph Model
 
-The core idea of the MAGMA paper is: **a single edge type (such as pure vector similarity) is insufficient to capture the multidimensional relationships between memories.** Different query intents require different relational perspectives — asking "why" requires causal chains, asking "when" requires timelines, asking "about X" requires entity associations.
+Within the [RLM paradigm](#24-theoretical-foundations), MAGMA provides the specific data structure for the external environment that the LLM orchestrates. The core idea of the MAGMA paper is: **a single edge type (such as pure vector similarity) is insufficient to capture the multidimensional relationships between memories.** Different query intents require different relational perspectives — asking "why" requires causal chains, asking "when" requires timelines, asking "about X" requires entity associations.
 
 Mnemon implements four graphs, each capturing one dimension of relationships:
 
@@ -1022,4 +1054,4 @@ For CLIs without hook support, merge the recall/remember guidance into the corre
 | Embeddings | FAISS + OpenAI | Ollama (local, optional) |
 | Deployment | Python library | Single Go binary |
 
-Mnemon retains MAGMA's **architectural skeleton** (four-graph separation, intent-adaptive retrieval, multi-signal fusion) while replacing academic implementation details with production-ready simplifications. The core trade-off is: **use regex/heuristics to handle 80% of automation scenarios, and delegate the 20% requiring deep understanding to the host LLM.**
+Mnemon retains MAGMA's **architectural skeleton** (four-graph separation, intent-adaptive retrieval, multi-signal fusion) while replacing academic implementation details with production-ready simplifications. This two-tier approach — deterministic automation for the majority of cases, LLM judgment for the complex minority — is precisely the pattern validated by the [RLM paper](#24-theoretical-foundations): regex-based filtering plus LLM-driven semantic verification consistently outperforms either approach alone. The core trade-off is: **use regex/heuristics to handle 80% of automation scenarios, and delegate the 20% requiring deep understanding to the host LLM.**
diff --git a/docs/zh/DESIGN.md b/docs/zh/DESIGN.md
@@ -4,7 +4,7 @@
 >
 > 该词同源于记忆女神 Mnemosyne（Μνημοσύνη）——宙斯与她结合诞生了九位缪斯，象征记忆是一切知识与创造的源泉。
 
-Mnemon 是一个为 LLM 智能体设计的持久化记忆系统。它基于 [MAGMA](https://arxiv.org/abs/2601.03236)（Multi-Graph Agentic Memory Architecture）论文的四图架构，以单一 Go 二进制 + SQLite 的形式实现，不依赖任何外部 API。
+Mnemon 是一个为 LLM agent 设计的持久化记忆系统。它采用 **LLM-Supervised** 模式：宿主 LLM 作为独立记忆 Binary 的外部编排者，通过符号化 CLI 接口交互，而 Binary 负责确定性的存储、图索引和生命周期管理。记忆以四图知识结构组织 — temporal、entity、causal、semantic 四种 edge。以单一 Go binary + SQLite 的形式实现，不依赖任何外部 API。
 
 本文档详细描述 Mnemon 的设计理念、核心概念、系统架构和关键算法。
 
@@ -16,8 +16,6 @@ Mnemon 是一个为 LLM 智能体设计的持久化记忆系统。它基于 [MAG
 - [2. 设计哲学](#2-设计哲学)
 - [3. 核心概念](#3-核心概念)
 - [4. 系统架构](#4-系统架构)
-  - [4.1 数据目录布局](#41-数据目录布局)
-  - [4.2 记忆体隔离](#42-记忆体隔离)
 - [5. MAGMA 四图模型](#5-magma-四图模型)
 - [6. 写入管线：Remember](#6-写入管线remember)
 - [7. 读取管线：Smart Recall](#7-读取管线smart-recall)
@@ -126,6 +124,40 @@ Binary 封装了所有不需要 LLM 的逻辑，Skill 只教 LLM 做需要智能
 - **记忆层是唯一需要深耕的部分**——记忆有复利效应，是 agent 从"工具"变成"助手"的分界线
 - **LLM 本身就是最好的编排器**——不需要 Python DAG 编排调用链，LLM 读了 Skill 就知道该怎么做
 
+### 2.4 理论基础
+
+Mnemon 的设计取用了一篇论文的**范式**和另一篇论文的**方法论**，并在两者之间做出了自己的工程选择。
+
+**RLM Paradigm：LLM as Orchestrator**
+
+[Recursive Language Models](https://arxiv.org/abs/2512.24601) 论文（Zhang, Kraska & Khattab, MIT 2025）建立了一个范式：LLM 作为外部结构化环境的 orchestrator，比直接处理原始数据更有效。论文在范式层的关键发现：
+
+- 一个 8B 模型通过将数据视为外部环境变量，可以处理超出 context window **100 倍**的输入
+- **两阶段 pipeline**（fast filtering + LLM semantic verification）持续优于单遍处理方案
+- 向模型传递 **constant-size metadata** — 而非原始数据 — 效果更好
+
+RLM 论文自身的实现采用**代码生成 + Python REPL** 作为交互机制：LLM 写 Python 代码，sandbox 执行，结果反馈。Mnemon 共享这一范式，但在协议层走了不同的路径（见下文）。
+
+**MAGMA Methodology：Four-Graph Memory Architecture**
+
+[MAGMA](https://arxiv.org/abs/2601.03236) 论文提供了**外部环境应包含什么**的具体方法论。其核心贡献：单一的 edge type（如 vector similarity）不足以支撑记忆系统 — 不同 query intent 需要不同的关系视角。MAGMA 的四图架构（temporal、entity、causal、semantic）加上 intent-adaptive retrieval 和 multi-signal fusion，为 Mnemon 提供了 data model 和 retrieval algorithm。
+
+**Mnemon's Own Contribution：The Engineering Bridge**
+
+两篇论文都未解决如何在生产环境中将 LLM orchestrator 与 graph-structured memory 连接起来。Mnemon 填补了这一空白：
+
+| 层级 | 来源 | 选择 |
+|---|---|---|
+| **Paradigm** — 谁来编排？ | RLM | 宿主 LLM，而非 embedded model |
+| **Methodology** — 环境里放什么？ | MAGMA | 四图架构 + intent-adaptive retrieval |
+| **Protocol** — 如何通信？ | Mnemon | CLI 命令 + structured JSON（非代码生成） |
+| **Lifecycle** — 记忆如何演化？ | Mnemon | Hook-driven remember → diff → link → gc |
+| **Distribution** — 如何分发？ | Mnemon | 单一 Go binary，零依赖 |
+
+RLM 的实现依赖 sandboxed REPL 中的代码生成（灵活但需要 runtime 且有安全顾虑），Mnemon 用确定性的 CLI 命令作为 symbolic interface — 受限，但可审计、可移植、零 sandbox。MAGMA 的参考实现是 Python library + 内存中的 NetworkX 图，Mnemon 将一切持久化到 SQLite 并提供完整的 write-back lifecycle。
+
+最终结果是：**RLM 的范式 + MAGMA 的方法论 + CLI-native 的工程路径** — 无需 Python，无需 sandbox，无需 API key，可运行在任何 LLM CLI 上。
+
 ![LLM-Supervised Architecture](../diagrams/05-llm-supervised.jpg)
 
 ![System Architecture](../diagrams/01-system-architecture.jpg)
@@ -359,7 +391,7 @@ Mnemon 支持命名记忆体（store），为不同 agent、项目或场景提
 
 ## 5. MAGMA 四图模型
 
-MAGMA 论文的核心思想是：**单一的边类型（如纯向量相似度）不足以捕捉记忆间的多维关系。** 不同的查询意图需要不同的关系视角——问"为什么"需要因果链，问"什么时候"需要时间线，问"关于 X"需要实体关联。
+在 [RLM 范式](#24-理论基础)中，MAGMA 为 LLM 编排的外部环境提供了具体的 data structure。MAGMA 论文的核心思想是：**单一的 edge type（如纯 vector similarity）不足以捕捉记忆间的多维关系。** 不同的查询意图需要不同的关系视角——问"为什么"需要因果链，问"什么时候"需要时间线，问"关于 X"需要实体关联。
 
 Mnemon 实现了四种图，每种捕捉一个维度的关系：
 
@@ -1022,4 +1054,4 @@ Prime 钩子始终安装。Remind、Nudge、Compact 钩子可选（Remind 和 Nu
 | 嵌入 | FAISS + OpenAI | Ollama（本地，可选） |
 | 部署 | Python 库 | 单一 Go 二进制 |
 
-Mnemon 保留了 MAGMA 的**架构骨架**（四图分离、意图自适应检索、多信号融合），同时用工业化的简化手段替换了学术实现细节。核心取舍是：**用正则/启发式处理 80% 的自动化场景，将需要深度理解的 20% 交给宿主 LLM。**
+Mnemon 保留了 MAGMA 的**架构骨架**（四图分离、intent-adaptive retrieval、multi-signal fusion），同时用工业化的简化手段替换了学术实现细节。这种两层方法——多数场景的确定性自动化 + 复杂少数的 LLM judgment——正是 [RLM 论文](#24-理论基础)所验证的模式：regex filtering 加 LLM semantic verification 的组合，持续优于任一单独使用。核心取舍是：**用 regex/heuristics 处理 80% 的自动化场景，将需要深度理解的 20% 交给宿主 LLM。**
diff --git a/docs/zh/README.md b/docs/zh/README.md
@@ -190,10 +190,17 @@ make help           # 显示所有目标
 
 ## 文档
 
-- [设计与架构](DESIGN.md) — 核心概念、四图模型、LLM 监督式架构、算法、集成设计
+- [设计与架构](DESIGN.md) — 核心概念、算法、集成设计
 - [用法与参考](USAGE.md) — CLI 命令、嵌入向量支持、架构概览
 - [架构图](../diagrams/) — 系统架构、记忆/召回流程、四图模型、生命周期管理
 
+## 参考文献
+
+Mnemon 取用了一篇论文的范式和另一篇论文的方法论。详见[理论基础](DESIGN.md#24-理论基础)。
+
+- **RLM** — Zhang, Kraska & Khattab. [Recursive Language Models](https://arxiv.org/abs/2512.24601). 2025. 建立范式：LLM 作为外部环境的 orchestrator 比直接处理数据更有效。
+- **MAGMA** — Zou et al. [A Multi-Graph based Agentic Memory Architecture](https://arxiv.org/abs/2601.03236). 2025. 提供方法论：四图模型（temporal、entity、causal、semantic）+ intent-adaptive retrieval。
+
 ## 许可证
 
 [MIT](../../LICENSE)