Skip to content

Commit 40fa728

Browse files
Aayush KatariaCopilot
authored andcommitted
W2: split god-class client into store + 4 services + base mixin
Refactor CosmosMemoryClient from a 1,749 LOC god-class into a thin orchestrator that delegates to a typed MemoryStore plus four small services. Async client gets the same treatment. New packages: - agent_memory_toolkit/store/ -- MemoryStore + AsyncMemoryStore. Typed CRUD + query primitives. Owns Cosmos container interaction, ETag guards on supersede writes, and partition-key-aware reads. - agent_memory_toolkit/services/ -- LLMService (ChatClient + embeddings + telemetry), RetrievalService (semantic + hybrid + tag-filtered query composition; owns _QueryBuilder consumer surface), PipelineService (extract_memories, synthesize_procedural, reconcile_memories, thread/user summaries -- the LLM orchestration lifted from pipeline.py), ProcessorService (process_now, process_now_and_wait, _maybe_auto_trigger, counter container). - agent_memory_toolkit/_base/ -- _BaseMemoryClient mixin holding shared init-config, validation, local-store CRUD, scope resolution, and the embedding-dim-mismatch warning. - agent_memory_toolkit/services/__init__.py -- Protocol stubs (MemoryStoreProtocol, LLMServiceProtocol, RetrievalServiceProtocol, PipelineServiceProtocol, ProcessorServiceProtocol + async variants) for clean dependency injection and future swap-ability. Slim-down: - cosmos_memory_client.py: 1,749 -> 519 LOC - aio/cosmos_memory_client.py: 1,903 -> 593 LOC - pipeline.py: 1,648 LOC -> 33-line re-export shim (services/pipeline.py is the implementation; shim keeps existing imports working). Public API unchanged: 30 public methods on CosmosMemoryClient, same signatures. Behavior preserved end-to-end. Tests: - 553 -> 585 unit tests passing (+32 new tests across the 5 new packages: store, services/llm, services/retrieval, services/pipeline, services/processor, _base). - 20 integration tests collect cleanly. - Radon average cyclomatic complexity: A (4.28). A handful of legacy methods on _BaseMemoryClient measure C(11-14) -- lifted verbatim from the old client; further reduction can land in a follow-up. Function App updated: pipeline_factory now constructs PipelineService directly. function_app/shared/cosmos_clients.py + the thread_summary orchestrator updated for the new service entrypoints. Docs: Docs/public_api.md added documenting the public client API, async client surface, and the service Protocols as advertised extension points. Architecture diagram (high level): CosmosMemoryClient (519 LOC orchestrator) |- MemoryStore (CRUD + query primitives) |- LLMService (ChatClient + embeddings + telemetry) |- RetrievalService (semantic/hybrid/tag-filtered queries) |- PipelineService (extract/synthesize/reconcile/summarize) |- ProcessorService (process_now + auto-trigger counters) Async parity follows the same shape via AsyncMemoryStore, AsyncLLMService, AsyncRetrievalService, AsyncProcessorService. The async client wraps the sync PipelineService in asyncio.to_thread for LLM orchestration -- W3 will land a true-async PipelineService. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent e87df99 commit 40fa728

31 files changed

Lines changed: 7031 additions & 4633 deletions

Docs/public_api.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Public API
2+
3+
## Architecture
4+
5+
`CosmosMemoryClient` and `AsyncCosmosMemoryClient` are thin orchestrators. They keep local-buffer state and Cosmos connection lifecycle, then delegate persistence to `MemoryStore` / `AsyncMemoryStore` and higher-level behavior to four services:
6+
7+
- `LLMService` / `AsyncLLMService` for chat completions and embeddings.
8+
- `RetrievalService` / `AsyncRetrievalService` for filtering, vector search, and episodic context.
9+
- `PipelineService` for extraction, summaries, procedural synthesis, and reconciliation.
10+
- `ProcessorService` / `AsyncProcessorService` for immediate processing and threshold-driven auto-triggering.
11+
12+
## CosmosMemoryClient (sync)
13+
14+
### Connection
15+
16+
- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure local state, model clients, optional Cosmos auto-connect, and optional processing backend.
17+
- `close() -> None` — close Cosmos/model clients and owned credentials.
18+
- `connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None) -> None` — connect to an existing memory container.
19+
- `create_memory_store(database=None, container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect the memory, counter, and lease containers.
20+
21+
### Memory CRUD
22+
23+
- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
24+
- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
25+
- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
26+
- `delete_local(memory_id) -> None` — remove a local buffered memory.
27+
- `add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
28+
- `push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
29+
- `get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags=None, any_tags=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None) -> list[dict]` — retrieve memories with filters.
30+
- `update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
31+
- `delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
32+
- `get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags=None, exclude_tags=None, include_superseded=False) -> list[dict]` — retrieve a thread oldest-first.
33+
- `get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.
34+
35+
### Retrieval
36+
37+
- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags=None, any_tags=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None) -> list[dict]` — vector or hybrid search memories.
38+
- `get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
39+
- `get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
40+
- `get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
41+
- `search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
42+
- `build_procedural_context(user_id) -> str` — format procedural context for prompts.
43+
- `build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.
44+
45+
### Processing
46+
47+
- `extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
48+
- `synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
49+
- `generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
50+
- `generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
51+
- `reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
52+
- `process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
53+
- `process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.
54+
55+
### Tagging
56+
57+
- `add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
58+
- `remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
59+
60+
## AsyncCosmosMemoryClient
61+
62+
Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval, and processing methods are `async` and must be awaited.
63+
64+
### Connection
65+
66+
- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure async local state, model clients, and optional processing backend.
67+
- `async close() -> None` — close async/sync resources and owned credentials.
68+
- `async connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None) -> None` — connect to an existing memory container.
69+
- `async create_memory_store(database=None, container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect memory, counter, and lease containers.
70+
71+
### Memory CRUD
72+
73+
- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
74+
- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
75+
- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
76+
- `delete_local(memory_id) -> None` — remove a local buffered memory.
77+
- `async add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
78+
- `async push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
79+
- `async get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags=None, any_tags=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None) -> list[dict]` — retrieve memories with filters.
80+
- `async update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
81+
- `async delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
82+
- `async get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags=None, exclude_tags=None, include_superseded=False) -> list[dict]` — retrieve a thread oldest-first.
83+
- `async get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.
84+
85+
### Retrieval
86+
87+
- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags=None, any_tags=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None) -> list[dict]` — vector or hybrid search memories.
88+
- `async get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
89+
- `async get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
90+
- `async get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
91+
- `async search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
92+
- `async build_procedural_context(user_id) -> str` — format procedural context for prompts.
93+
- `async build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.
94+
95+
### Processing
96+
97+
- `async extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
98+
- `async synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
99+
- `async generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
100+
- `async generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
101+
- `async reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
102+
- `async process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
103+
- `async process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.
104+
105+
### Tagging
106+
107+
- `async add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
108+
- `async remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
109+
110+
## Extension Points
111+
112+
Advanced users can subclass or swap service implementations by matching these protocols from `agent_memory_toolkit.services`:
113+
114+
- `LLMServiceProtocol` / `AsyncLLMServiceProtocol`: `chat`, `embed`, `embed_one`.
115+
- `RetrievalServiceProtocol` / `AsyncRetrievalServiceProtocol`: `search`, `get_memories`, `search_episodic`, `build_episodic_context`.
116+
- `PipelineServiceProtocol` / `AsyncPipelineServiceProtocol`: extraction, procedural synthesis, summaries, reconciliation, procedural context.
117+
- `ProcessorServiceProtocol` / `AsyncProcessorServiceProtocol`: processor access, counter access, immediate processing, wait, and auto-triggering.
118+
- Store layer protocols (`MemoryStoreProtocol`, `AsyncMemoryStoreProtocol`) define the persistence primitives consumed by services.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
"""Shared client mixins."""
2+
3+
from agent_memory_toolkit._base.base_client import _BaseMemoryClient
4+
5+
__all__ = ["_BaseMemoryClient"]

0 commit comments

Comments
 (0)