|
| 1 | +# Public API |
| 2 | + |
| 3 | +## Architecture |
| 4 | + |
| 5 | +`CosmosMemoryClient` and `AsyncCosmosMemoryClient` are thin orchestrators. They keep local-buffer state and Cosmos connection lifecycle, then delegate persistence to `MemoryStore` / `AsyncMemoryStore` and higher-level behavior to: |
| 6 | + |
| 7 | +- `ChatClient` / `EmbeddingsClient` (sync) and `AsyncEmbeddingsClient` (async) — Azure OpenAI wrappers. |
| 8 | +- `RetrievalService` / `AsyncRetrievalService` for filtering, vector search, and episodic context. |
| 9 | +- `PipelineService` for extraction, summaries, procedural synthesis, and reconciliation. |
| 10 | +- `InProcessProcessor` / `AsyncInProcessProcessor` / `DurableFunctionProcessor` for immediate or change-feed-driven processing. |
| 11 | +- `auto_trigger.maybe_trigger_steps` (sync) and `aio.auto_trigger.maybe_trigger_steps` (async) for threshold-driven step firing after each `push_to_cosmos`. |
| 12 | + |
| 13 | +## CosmosMemoryClient (sync) |
| 14 | + |
| 15 | +### Connection |
| 16 | + |
| 17 | +- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure local state, model clients, optional Cosmos auto-connect, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes. |
| 18 | +- `close() -> None` — close Cosmos/model clients and owned credentials. |
| 19 | +- `connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container. |
| 20 | +- `create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect the memory, optional turns, counter, and lease containers. |
| 21 | + |
| 22 | +### Memory CRUD |
| 23 | + |
| 24 | +- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer. |
| 25 | +- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories. |
| 26 | +- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory. |
| 27 | +- `delete_local(memory_id) -> None` — remove a local buffered memory. |
| 28 | +- `add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id. |
| 29 | +- `push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos. |
| 30 | +- `get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters. |
| 31 | +- `update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory. |
| 32 | +- `delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory. |
| 33 | +- `get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first. |
| 34 | +- `get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document. |
| 35 | + |
| 36 | +### Retrieval |
| 37 | + |
| 38 | +- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories. |
| 39 | +- `get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt. |
| 40 | +- `get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history. |
| 41 | +- `get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents. |
| 42 | +- `search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories. |
| 43 | +- `build_procedural_context(user_id) -> str` — format procedural context for prompts. |
| 44 | +- `build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context. |
| 45 | + |
| 46 | +### Processing |
| 47 | + |
| 48 | +- `extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread. |
| 49 | +- `synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt. |
| 50 | +- `generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary. |
| 51 | +- `generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary. |
| 52 | +- `reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts. |
| 53 | +- `process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately. |
| 54 | +- `process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary. |
| 55 | + |
| 56 | +### Tagging |
| 57 | + |
| 58 | +- `add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory. |
| 59 | +- `remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory. |
| 60 | +- `list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default. |
| 61 | + |
| 62 | +## AsyncCosmosMemoryClient |
| 63 | + |
| 64 | +Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval, and processing methods are `async` and must be awaited. |
| 65 | + |
| 66 | +### Connection |
| 67 | + |
| 68 | +- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure async local state, model clients, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes. |
| 69 | +- `async close() -> None` — close async/sync resources and owned credentials. |
| 70 | +- `async connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container. |
| 71 | +- `async create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect memory, optional turns, counter, and lease containers. |
| 72 | + |
| 73 | +### Memory CRUD |
| 74 | + |
| 75 | +- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer. |
| 76 | +- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories. |
| 77 | +- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory. |
| 78 | +- `delete_local(memory_id) -> None` — remove a local buffered memory. |
| 79 | +- `async add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id. |
| 80 | +- `async push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos. |
| 81 | +- `async get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters. |
| 82 | +- `async update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory. |
| 83 | +- `async delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory. |
| 84 | +- `async get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first. |
| 85 | +- `async get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document. |
| 86 | + |
| 87 | +### Retrieval |
| 88 | + |
| 89 | +- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories. |
| 90 | +- `async get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt. |
| 91 | +- `async get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history. |
| 92 | +- `async get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents. |
| 93 | +- `async search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories. |
| 94 | +- `async build_procedural_context(user_id) -> str` — format procedural context for prompts. |
| 95 | +- `async build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context. |
| 96 | + |
| 97 | +### Processing |
| 98 | + |
| 99 | +- `async extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread. |
| 100 | +- `async synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt. |
| 101 | +- `async generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary. |
| 102 | +- `async generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary. |
| 103 | +- `async reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts. |
| 104 | +- `async process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately. |
| 105 | +- `async process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary. |
| 106 | + |
| 107 | +### Tagging |
| 108 | + |
| 109 | +- `async add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory. |
| 110 | +- `async remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory. |
| 111 | +- `async list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default. |
| 112 | + |
| 113 | +## Extension Points |
| 114 | + |
| 115 | +Sync extension protocols live in `agent_memory_toolkit.services`; async variants live in `agent_memory_toolkit.aio.services`. |
| 116 | + |
| 117 | +- `MemoryStoreProtocol` (`agent_memory_toolkit.services`): persistence primitives (`query`, `read_item`, `add_cosmos`, `mark_superseded`) consumed by the pipeline. |
| 118 | + |
| 119 | +Concrete service classes are exported from their respective packages: |
| 120 | + |
| 121 | +- Sync: `RetrievalService`, `PipelineService` from `agent_memory_toolkit.services` (sub-modules `retrieval`, `pipeline`). |
| 122 | +- Async: `AsyncRetrievalService` and `AsyncPipelineService` from `agent_memory_toolkit.aio.services` (sub-modules `retrieval`, `pipeline`). The async pipeline is a fully-native asyncio implementation — not an `asyncio.to_thread` shim over the sync pipeline. |
| 123 | +- Threshold-driven auto-trigger: `maybe_trigger_steps` from `agent_memory_toolkit.auto_trigger` (sync) and `agent_memory_toolkit.aio.auto_trigger` (async). |
0 commit comments