docs: add architecture explanation and adrs (#353)

ankaisen · web-flow · commit 8676a2721917 · 2026-02-24T22:39:42.000+09:00
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,79 @@
+# AGENTS.md
+
+Guidance for AI coding agents working in this repository.
+
+## Goal
+
+Implement features and fix bugs with minimal regression risk, while preserving memU's architecture:
+
+- `MemoryService` as composition root
+- workflow-based execution (`memorize`, `retrieve`, CRUD/patch)
+- pluggable storage backends (`inmemory`, `sqlite`, `postgres`)
+- profile-based LLM routing (`default`, `embedding`, custom profiles)
+
+See `docs/architecture.md` for the current architectural view.
+
+## Where to Change Code
+
+- Service/runtime wiring: `src/memu/app/service.py`
+- Memorize flow: `src/memu/app/memorize.py`
+- Retrieve flow: `src/memu/app/retrieve.py`
+- CRUD/Patch flow: `src/memu/app/crud.py`
+- Config models/defaults: `src/memu/app/settings.py`
+- Workflow engine: `src/memu/workflow/*`
+- Storage abstraction/factory: `src/memu/database/interfaces.py`, `src/memu/database/factory.py`
+- In-memory: `src/memu/database/inmemory/*`
+- SQLite: `src/memu/database/sqlite/*`
+- Postgres: `src/memu/database/postgres/*`
+- LLM clients/wrappers/interceptors: `src/memu/llm/*`
+- Integrations: `src/memu/integrations/*`, `src/memu/client/*`
+- Tests: `tests/*`
+
+## Implementation Rules
+
+- Keep changes small and localized.
+- Do not change public API signatures unless explicitly required.
+- Preserve async behavior and existing workflow step contracts (`requires`/`produces` keys).
+- If adding a new capability, prefer integrating through an existing pipeline step or a new clearly named step.
+- Maintain backend parity where appropriate (if a repository contract changes, update all relevant backends).
+- Validate `where`/scope behavior against `UserConfig.model`; do not bypass scope filtering.
+- Keep type hints and mypy compatibility intact.
+
+## Feature Work Checklist
+
+1. Locate affected flow(s): memorize, retrieve, CRUD, or integration layer.
+2. Update config models/defaults if behavior is configurable.
+3. Wire behavior through `MemoryService` pipelines and step config (LLM profiles/capabilities).
+4. Implement backend/repository changes for all impacted providers.
+5. Add/extend tests for happy path and edge cases.
+6. Update docs when behavior changes (`README.md`, `docs/*`, examples if needed).
+7. If the change is architectural, add/update ADRs under `docs/adr/`.
+
+## Bug Fix Checklist
+
+1. Reproduce with an existing or new failing test.
+2. Implement the smallest safe fix at the correct layer.
+3. Add a regression test that fails before and passes after.
+4. Check cross-backend effects (`inmemory`, `sqlite`, `postgres`) and retrieval modes (`rag`, `llm`) when relevant.
+5. Verify no unintended API/output shape changes.
+
+## Testing and Validation
+
+Use `uv` for all local runs.
+
+- Setup: `make install`
+- Run all tests: `make test`
+- Run focused tests: `uv run python -m pytest tests/<target_test>.py`
+- Full quality checks: `make check`
+
+At minimum, run targeted tests for touched code. Run `make check` for broad or cross-cutting changes.
+If you cannot run a required check, state it explicitly in your final summary.
+
+## Done Criteria
+
+Before finishing, ensure:
+
+- Code compiles and tests for changed behavior pass.
+- New behavior is covered by tests.
+- Docs are updated for user-visible or architectural changes.
+- No unrelated files were modified.
diff --git a/docs/adr/0001-workflow-pipeline-architecture.md b/docs/adr/0001-workflow-pipeline-architecture.md
@@ -0,0 +1,35 @@
+# ADR 0001: Use Workflow Pipelines for Core Operations
+
+- Status: Accepted
+- Date: 2026-02-24
+
+## Context
+
+memU has multiple high-level operations (`memorize`, `retrieve`, and CRUD/patch operations) that each require multi-stage execution, LLM calls, storage writes, and optional short-circuit behavior.
+
+A single monolithic function per operation would make these flows hard to extend, observe, and customize.
+
+## Decision
+
+Model each core operation as a named workflow pipeline composed of ordered `WorkflowStep` units.
+
+- Register pipelines centrally in `MemoryService` via `PipelineManager`
+- Validate required/produced state keys at pipeline registration/mutation time
+- Execute through a `WorkflowRunner` abstraction (`local` by default)
+- Support runtime customization by step-level config and structural mutation (insert/replace/remove)
+- Provide before/after/on_error step interceptors for instrumentation and control
+
+## Consequences
+
+Positive:
+
+- uniform execution model across memorize/retrieve/CRUD
+- explicit, inspectable stage boundaries
+- extension points for custom runners and step customization
+- easier interception and observability around stage execution
+
+Negative:
+
+- dict-based workflow state relies on key naming discipline
+- pipeline mutation can increase behavioral variance between deployments
+- more framework code compared to direct function calls
diff --git a/docs/adr/0002-pluggable-storage-and-vector-strategy.md b/docs/adr/0002-pluggable-storage-and-vector-strategy.md
@@ -0,0 +1,42 @@
+# ADR 0002: Use Pluggable Storage with Backend-Specific Vector Search
+
+- Status: Accepted
+- Date: 2026-02-24
+
+## Context
+
+memU must support:
+
+- zero-setup local development
+- lightweight persisted deployments
+- production deployments that need scalable vector similarity
+
+No single storage engine fits all three cases.
+
+## Decision
+
+Adopt repository-based storage abstraction behind a `Database` protocol, with selectable providers:
+
+- `inmemory`: in-process state, brute-force similarity
+- `sqlite`: file-based persistence, embeddings stored as JSON text, brute-force similarity
+- `postgres`: SQL persistence, pgvector-enabled similarity when configured
+
+Vector behavior is backend-aware:
+
+- brute-force cosine search remains available for portability
+- Postgres can use pgvector distance queries when vector support is enabled
+- salience ranking (reinforcement/recency-aware) uses local scoring logic
+
+## Consequences
+
+Positive:
+
+- one service API works across local and production footprints
+- clear backend contracts through repository interfaces
+- predictable fallback behavior when native vector index is unavailable
+
+Negative:
+
+- duplicate repository logic across backends
+- behavior/performance differences between providers
+- SQLite and in-memory vector search does not scale as well as indexed pgvector
diff --git a/docs/adr/0003-user-scope-in-data-model.md b/docs/adr/0003-user-scope-in-data-model.md
@@ -0,0 +1,32 @@
+# ADR 0003: Model User Scope as First-Class Fields on Memory Records
+
+- Status: Accepted
+- Date: 2026-02-24
+
+## Context
+
+memU retrieval and writes need scoped operation (for example per `user_id`, `agent_id`, or session) for multi-user and multi-agent scenarios.
+
+Keeping scope outside stored records would force ad-hoc filtering logic and weaken data isolation.
+
+## Decision
+
+Embed scope directly into all persisted entities by merging a configurable `UserConfig.model` with core record models.
+
+- Scope fields are part of resource/category/item/relation models
+- Repositories accept `user_data` on writes and `where` filters on reads
+- API-level `where` filters are validated against configured scope fields before execution
+
+## Consequences
+
+Positive:
+
+- consistent filtering model across memorize/retrieve/CRUD APIs
+- backend-independent scoping semantics
+- supports multi-tenant and multi-agent patterns without separate storage stacks
+
+Negative:
+
+- schema/model generation complexity increases
+- schema and index shape can vary by chosen scope model
+- callers must keep `where` and `user` payloads aligned with configured scope fields
diff --git a/docs/adr/README.md b/docs/adr/README.md
@@ -0,0 +1,5 @@
+# Architecture Decision Records
+
+- [0001: Use Workflow Pipelines for Core Operations](0001-workflow-pipeline-architecture.md)
+- [0002: Use Pluggable Storage with Backend-Specific Vector Search](0002-pluggable-storage-and-vector-strategy.md)
+- [0003: Model User Scope as First-Class Fields on Memory Records](0003-user-scope-in-data-model.md)
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -0,0 +1,169 @@
+# memU Architecture
+
+## Purpose and scope
+
+This document describes the self-hosted `memu` Python package architecture as implemented in this repository.
+
+The repository also describes a hosted Cloud product in `README.md`, but this document focuses on the local `MemoryService` runtime and its code paths.
+
+## System overview
+
+memU follows the "memory as file system" concept from the README and implements it with three persistent layers:
+
+- `Resource`: raw source artifacts (conversation/document/image/video/audio)
+- `MemoryItem`: extracted atomic memories with embeddings
+- `MemoryCategory`: grouped topic summaries
+- `CategoryItem`: item-category relation edges
+
+At runtime, `MemoryService` orchestrates ingestion, retrieval, and manual CRUD over these layers.
+
+```mermaid
+flowchart TD
+  A["Input Resource or Query"] --> B["MemoryService"]
+  B --> C["Workflow Pipelines"]
+  C --> D["LLM Clients"]
+  C --> E["Database Repositories"]
+  E --> F["Resources"]
+  E --> G["Memory Items"]
+  E --> H["Memory Categories"]
+  E --> I["Category Relations"]
+```
+
+## Core runtime components
+
+### `MemoryService` as composition root
+
+`src/memu/app/service.py` constructs and owns:
+
+- typed configs (`LLMProfilesConfig`, `DatabaseConfig`, `MemorizeConfig`, `RetrieveConfig`, `UserConfig`)
+- storage backend (`build_database(...)`)
+- resource filesystem fetcher (`LocalFS`)
+- LLM client cache and wrappers
+- workflow and LLM interceptor registries
+- workflow runner (`local` by default, pluggable)
+- named workflow pipelines via `PipelineManager`
+
+Public APIs are assembled by mixins:
+
+- `MemorizeMixin`: `memorize(...)`
+- `RetrieveMixin`: `retrieve(...)`
+- `CRUDMixin`: list/clear/create/update/delete memory operations
+
+### Workflow engine
+
+All major operations execute as workflows (`WorkflowStep`) with:
+
+- explicit required/produced state keys
+- declared capability tags (`llm`, `vector`, `db`, `io`, `vision`)
+- per-step config (for profile selection)
+
+`PipelineManager` validates step dependencies at registration/mutation time and supports runtime pipeline revisioning (`config_step`, `insert_before/after`, `replace_step`, `remove_step`).
+
+`WorkflowRunner` is a protocol; default `LocalWorkflowRunner` executes sequentially with `run_steps(...)`.
+
+### Interception and observability hooks
+
+Two interceptor systems exist:
+
+- workflow step interceptors: before/after/on_error around each step
+- LLM call interceptors: before/after/on_error around `chat/summarize/vision/embed/transcribe`
+
+LLM wrappers also extract best-effort usage metadata from raw provider responses.
+
+## Ingestion architecture (`memorize`)
+
+`memorize(...)` executes the `memorize` pipeline:
+
+1. `ingest_resource`: fetch local/remote resource into `blob_config.resources_dir` via `LocalFS`
+2. `preprocess_multimodal`: modality-specific preprocessing for conversation/document/audio (text-oriented path) and image/video (vision-oriented path)
+3. `extract_items`: per-memory-type LLM extraction into structured entries
+4. `dedupe_merge`: placeholder stage (currently pass-through)
+5. `categorize_items`: persist resource + memory items + item-category relations and embeddings
+6. `persist_index`: update category summaries; optionally persist item references
+7. `build_response`: return resource(s), items, categories, relations
+
+Category bootstrap is lazy and scoped: categories are initialized when needed with embeddings, and mapped by normalized category name.
+
+## Retrieval architecture (`retrieve`)
+
+`retrieve(...)` chooses one of two pipelines from config:
+
+- `retrieve_rag` (embedding-driven ranking)
+- `retrieve_llm` (LLM-driven ranking)
+
+Both use the same staged pattern:
+
+1. route intention + optional query rewrite
+2. category recall
+3. sufficiency check (optional)
+4. item recall
+5. sufficiency check (optional)
+6. resource recall
+7. response build
+
+Key behavior:
+
+- `where` filters are validated against `user_model` fields before querying
+- RAG path uses vector similarity (and optional salience ranking for items)
+- LLM path ranks IDs from formatted category/item/resource context
+- each stage can stop early if sufficiency check decides context is enough
+
+## Data and storage architecture
+
+### Repository contracts
+
+Storage is abstracted through a `Database` protocol with four repositories:
+
+- `ResourceRepo`
+- `MemoryItemRepo`
+- `MemoryCategoryRepo`
+- `CategoryItemRepo`
+
+### Backends
+
+`build_database(...)` selects backend by `database_config.metadata_store.provider`:
+
+- `inmemory`: in-process dict/list state
+- `sqlite`: SQLModel persistence, embeddings stored as JSON text, brute-force cosine search
+- `postgres`: SQLModel persistence with pgvector support (when enabled), local fallback ranking when needed
+
+For Postgres, startup runs migration bootstrap and attempts `CREATE EXTENSION IF NOT EXISTS vector` in `ddl_mode="create"`.
+
+### Scope model propagation
+
+`UserConfig.model` is merged into record/table models so scope fields (for example `user_id`) become first-class columns/attributes across resources, items, categories, and relations.
+
+This is why `where` filters and `user_data` writes are consistently available across APIs.
+
+## LLM/provider architecture
+
+LLM access is profile-based (`llm_profiles`):
+
+- `default` profile for chat-like tasks
+- `embedding` profile for embedding tasks (auto-derived from default if not set)
+
+Per-step profile routing happens through step config (`chat_llm_profile`, `embed_llm_profile`, or `llm_profile`).
+
+Client backends:
+
+- `sdk`: official OpenAI SDK wrapper
+- `httpx`: provider-adapted HTTP backend (OpenAI, Doubao, Grok, OpenRouter)
+- `lazyllm_backend`: LazyLLM adapter
+
+## Integration surfaces
+
+- `memu.client.openai_wrapper`: opt-in OpenAI client wrapper that auto-retrieves memories and injects them into system context
+- `memu.integrations.langgraph`: LangChain/LangGraph tool adapter (`save_memory`, `search_memory`)
+
+## Current constraints and tradeoffs
+
+- workflow state is dict-based, so step contracts are validated by key names rather than static types
+- SQLite/inmemory vector search is brute-force (portable but less scalable)
+- category update quality and extraction quality are prompt/LLM dependent
+- some extension hooks exist as placeholders (for example dedupe/merge stage)
+
+## Related ADRs
+
+- `docs/adr/0001-workflow-pipeline-architecture.md`
+- `docs/adr/0002-pluggable-storage-and-vector-strategy.md`
+- `docs/adr/0003-user-scope-in-data-model.md`