|
| 1 | +# memU Architecture |
| 2 | + |
| 3 | +## Purpose and scope |
| 4 | + |
| 5 | +This document describes the self-hosted `memu` Python package architecture as implemented in this repository. |
| 6 | + |
| 7 | +The repository also describes a hosted Cloud product in `README.md`, but this document focuses on the local `MemoryService` runtime and its code paths. |
| 8 | + |
| 9 | +## System overview |
| 10 | + |
| 11 | +memU follows the "memory as file system" concept from the README and implements it with three persistent layers: |
| 12 | + |
| 13 | +- `Resource`: raw source artifacts (conversation/document/image/video/audio) |
| 14 | +- `MemoryItem`: extracted atomic memories with embeddings |
| 15 | +- `MemoryCategory`: grouped topic summaries |
| 16 | +- `CategoryItem`: item-category relation edges |
| 17 | + |
| 18 | +At runtime, `MemoryService` orchestrates ingestion, retrieval, and manual CRUD over these layers. |
| 19 | + |
| 20 | +```mermaid |
| 21 | +flowchart TD |
| 22 | + A["Input Resource or Query"] --> B["MemoryService"] |
| 23 | + B --> C["Workflow Pipelines"] |
| 24 | + C --> D["LLM Clients"] |
| 25 | + C --> E["Database Repositories"] |
| 26 | + E --> F["Resources"] |
| 27 | + E --> G["Memory Items"] |
| 28 | + E --> H["Memory Categories"] |
| 29 | + E --> I["Category Relations"] |
| 30 | +``` |
| 31 | + |
| 32 | +## Core runtime components |
| 33 | + |
| 34 | +### `MemoryService` as composition root |
| 35 | + |
| 36 | +`src/memu/app/service.py` constructs and owns: |
| 37 | + |
| 38 | +- typed configs (`LLMProfilesConfig`, `DatabaseConfig`, `MemorizeConfig`, `RetrieveConfig`, `UserConfig`) |
| 39 | +- storage backend (`build_database(...)`) |
| 40 | +- resource filesystem fetcher (`LocalFS`) |
| 41 | +- LLM client cache and wrappers |
| 42 | +- workflow and LLM interceptor registries |
| 43 | +- workflow runner (`local` by default, pluggable) |
| 44 | +- named workflow pipelines via `PipelineManager` |
| 45 | + |
| 46 | +Public APIs are assembled by mixins: |
| 47 | + |
| 48 | +- `MemorizeMixin`: `memorize(...)` |
| 49 | +- `RetrieveMixin`: `retrieve(...)` |
| 50 | +- `CRUDMixin`: list/clear/create/update/delete memory operations |
| 51 | + |
| 52 | +### Workflow engine |
| 53 | + |
| 54 | +All major operations execute as workflows (`WorkflowStep`) with: |
| 55 | + |
| 56 | +- explicit required/produced state keys |
| 57 | +- declared capability tags (`llm`, `vector`, `db`, `io`, `vision`) |
| 58 | +- per-step config (for profile selection) |
| 59 | + |
| 60 | +`PipelineManager` validates step dependencies at registration/mutation time and supports runtime pipeline revisioning (`config_step`, `insert_before/after`, `replace_step`, `remove_step`). |
| 61 | + |
| 62 | +`WorkflowRunner` is a protocol; default `LocalWorkflowRunner` executes sequentially with `run_steps(...)`. |
| 63 | + |
| 64 | +### Interception and observability hooks |
| 65 | + |
| 66 | +Two interceptor systems exist: |
| 67 | + |
| 68 | +- workflow step interceptors: before/after/on_error around each step |
| 69 | +- LLM call interceptors: before/after/on_error around `chat/summarize/vision/embed/transcribe` |
| 70 | + |
| 71 | +LLM wrappers also extract best-effort usage metadata from raw provider responses. |
| 72 | + |
| 73 | +## Ingestion architecture (`memorize`) |
| 74 | + |
| 75 | +`memorize(...)` executes the `memorize` pipeline: |
| 76 | + |
| 77 | +1. `ingest_resource`: fetch local/remote resource into `blob_config.resources_dir` via `LocalFS` |
| 78 | +2. `preprocess_multimodal`: modality-specific preprocessing for conversation/document/audio (text-oriented path) and image/video (vision-oriented path) |
| 79 | +3. `extract_items`: per-memory-type LLM extraction into structured entries |
| 80 | +4. `dedupe_merge`: placeholder stage (currently pass-through) |
| 81 | +5. `categorize_items`: persist resource + memory items + item-category relations and embeddings |
| 82 | +6. `persist_index`: update category summaries; optionally persist item references |
| 83 | +7. `build_response`: return resource(s), items, categories, relations |
| 84 | + |
| 85 | +Category bootstrap is lazy and scoped: categories are initialized when needed with embeddings, and mapped by normalized category name. |
| 86 | + |
| 87 | +## Retrieval architecture (`retrieve`) |
| 88 | + |
| 89 | +`retrieve(...)` chooses one of two pipelines from config: |
| 90 | + |
| 91 | +- `retrieve_rag` (embedding-driven ranking) |
| 92 | +- `retrieve_llm` (LLM-driven ranking) |
| 93 | + |
| 94 | +Both use the same staged pattern: |
| 95 | + |
| 96 | +1. route intention + optional query rewrite |
| 97 | +2. category recall |
| 98 | +3. sufficiency check (optional) |
| 99 | +4. item recall |
| 100 | +5. sufficiency check (optional) |
| 101 | +6. resource recall |
| 102 | +7. response build |
| 103 | + |
| 104 | +Key behavior: |
| 105 | + |
| 106 | +- `where` filters are validated against `user_model` fields before querying |
| 107 | +- RAG path uses vector similarity (and optional salience ranking for items) |
| 108 | +- LLM path ranks IDs from formatted category/item/resource context |
| 109 | +- each stage can stop early if sufficiency check decides context is enough |
| 110 | + |
| 111 | +## Data and storage architecture |
| 112 | + |
| 113 | +### Repository contracts |
| 114 | + |
| 115 | +Storage is abstracted through a `Database` protocol with four repositories: |
| 116 | + |
| 117 | +- `ResourceRepo` |
| 118 | +- `MemoryItemRepo` |
| 119 | +- `MemoryCategoryRepo` |
| 120 | +- `CategoryItemRepo` |
| 121 | + |
| 122 | +### Backends |
| 123 | + |
| 124 | +`build_database(...)` selects backend by `database_config.metadata_store.provider`: |
| 125 | + |
| 126 | +- `inmemory`: in-process dict/list state |
| 127 | +- `sqlite`: SQLModel persistence, embeddings stored as JSON text, brute-force cosine search |
| 128 | +- `postgres`: SQLModel persistence with pgvector support (when enabled), local fallback ranking when needed |
| 129 | + |
| 130 | +For Postgres, startup runs migration bootstrap and attempts `CREATE EXTENSION IF NOT EXISTS vector` in `ddl_mode="create"`. |
| 131 | + |
| 132 | +### Scope model propagation |
| 133 | + |
| 134 | +`UserConfig.model` is merged into record/table models so scope fields (for example `user_id`) become first-class columns/attributes across resources, items, categories, and relations. |
| 135 | + |
| 136 | +This is why `where` filters and `user_data` writes are consistently available across APIs. |
| 137 | + |
| 138 | +## LLM/provider architecture |
| 139 | + |
| 140 | +LLM access is profile-based (`llm_profiles`): |
| 141 | + |
| 142 | +- `default` profile for chat-like tasks |
| 143 | +- `embedding` profile for embedding tasks (auto-derived from default if not set) |
| 144 | + |
| 145 | +Per-step profile routing happens through step config (`chat_llm_profile`, `embed_llm_profile`, or `llm_profile`). |
| 146 | + |
| 147 | +Client backends: |
| 148 | + |
| 149 | +- `sdk`: official OpenAI SDK wrapper |
| 150 | +- `httpx`: provider-adapted HTTP backend (OpenAI, Doubao, Grok, OpenRouter) |
| 151 | +- `lazyllm_backend`: LazyLLM adapter |
| 152 | + |
| 153 | +## Integration surfaces |
| 154 | + |
| 155 | +- `memu.client.openai_wrapper`: opt-in OpenAI client wrapper that auto-retrieves memories and injects them into system context |
| 156 | +- `memu.integrations.langgraph`: LangChain/LangGraph tool adapter (`save_memory`, `search_memory`) |
| 157 | + |
| 158 | +## Current constraints and tradeoffs |
| 159 | + |
| 160 | +- workflow state is dict-based, so step contracts are validated by key names rather than static types |
| 161 | +- SQLite/inmemory vector search is brute-force (portable but less scalable) |
| 162 | +- category update quality and extraction quality are prompt/LLM dependent |
| 163 | +- some extension hooks exist as placeholders (for example dedupe/merge stage) |
| 164 | + |
| 165 | +## Related ADRs |
| 166 | + |
| 167 | +- `docs/adr/0001-workflow-pipeline-architecture.md` |
| 168 | +- `docs/adr/0002-pluggable-storage-and-vector-strategy.md` |
| 169 | +- `docs/adr/0003-user-scope-in-data-model.md` |
0 commit comments