Skip to content

Commit e0de0b1

Browse files
aayush3011Aayush KatariaCopilot
authored
Adding support for procedural memory, hardens the durable + SDK pipeline, and simplifies the deployment for Release (#15)
* Add procedural synthesis, strip per-turn procedural, fact_count rename Brings forward the procedural-synthesis feature on top of the post-PR-#13 baseline and applies hygiene cleanup. - Adds synthesize_procedural() on pipeline + processors (sync/async) and the Durable Functions orchestrator. Procedural memory is now produced via this dedicated reflection flow over a user's full history (system- prompt-style), not as a per-turn extraction. - Strips procedural emission from extract_memories: removes the procedural type from the extraction prompt, drops procedural_count from result dicts and docstrings, and trims procedural from _load_existing_memories. - Renames facts_count -> fact_count across pipeline, processors, FA orchestrator docstring, and all tests. - Cosmetic enum rename: supersede_reason value 'contradiction' -> 'contradict' for verb-tense consistency with 'update'. README and Docs/concepts.md updated accordingly. Unit suite: 531 passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor store + services * updating record schema, adding logging * improvements to tagging, filtering * Improving the bicep files * resolving comments * resolving comments * lint: fix ruff E501/F841 violations and apply ruff format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * resolving comments * resolving comments * resolving comments * resolving comments --------- Co-authored-by: Aayush Kataria <aayushkataria@Aayushs-MacBook-Pro-2.local> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent af13306 commit e0de0b1

126 files changed

Lines changed: 14563 additions & 6607 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,10 @@ venv/
3535

3636
# macOS
3737
.DS_Store
38+
39+
# JetBrains IDEs
40+
.idea/
41+
42+
# Local scratch notes
43+
Agent Memory Toolkit.txt
44+
Cosmos AI Memory Service.txt

Docs/architecture/orchestrators.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Durable orchestrator activity chains
2+
3+
W15 splits LLM extraction from persistence so Durable retries after a Cosmos write failure do not re-run the LLM activity.
4+
5+
```text
6+
ExtractMemoriesOrchestrator
7+
em_Extract (load recent turns + LLM + parse; no embeddings/writes)
8+
em_Persist (embeddings + deterministic create_item; 409 = already persisted)
9+
em_ReconcileMemories (optional; single activity for GA)
10+
11+
ThreadSummaryOrchestrator
12+
ts_Extract
13+
ts_PersistSummary
14+
15+
UserSummaryOrchestrator
16+
us_Extract
17+
us_PersistUserSummary
18+
19+
SynthesizeProceduralOrchestrator
20+
sp_SynthesizeProcedural (single activity for GA)
21+
```
22+
23+
Fact and episodic IDs are deterministic from user, thread, and normalized content. Thread and user summaries keep their deterministic summary IDs.

Docs/concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ Prompts for summarization and fact extraction live in `azure_functions/prompts/`
119119
The `reconcile_memories(user_id, n=50)` pipeline step reads up to N most-recent active facts for a user and asks the LLM to identify two orthogonal outcomes in one pass:
120120

121121
- **Duplicates** — two or more facts that restate the same claim in different words. Resolution: collapse into one merged fact; the originals are soft-deleted with `supersede_reason="duplicate"` and `superseded_by` set to the merged fact's id.
122-
- **Contradictions** — two facts that assert opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradiction"` and `superseded_by` set to the winner.
122+
- **Contradictions** — two facts that assert opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradict"` and `superseded_by` set to the winner.
123123

124124
### Why one pass
125125

Docs/operations.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Operations
2+
3+
Runtime knobs for an Agent-Memory-Toolkit deployment. Most ops levers live in `.env` / Function-app App Settings — change them, restart the consumer, and you're done. Deployment-time knobs (Bicep params bound to `azd env set ...`) live in [`infra/README.md`](../infra/README.md).
4+
5+
## Memory lifecycle (TTL)
6+
7+
| Type | Default TTL | Source |
8+
|---|---:|---|
9+
| turn | 30 d | container default (memories_turns) |
10+
| episodic | 90 d | per-doc ttl (memories container) |
11+
| thread_summary | never | container default (memories, -1) |
12+
| user_summary | never | container default |
13+
| fact | never | container default; supersession handles aging |
14+
| procedural | never | container default; supersession handles aging |
15+
16+
Override per write:
17+
18+
client.add_memory(text, type="turn", ttl=60) # expires in 60 seconds
19+
20+
Override per container at provision time:
21+
22+
azd env set MEMORIES_TURNS_DEFAULT_TTL 86400 # 1 day
23+
24+
## Counter-based trigger configuration
25+
26+
Function-app threshold knobs (`THREAD_SUMMARY_EVERY_N`, `FACT_EXTRACTION_EVERY_N`, `DEDUP_EVERY_N`, `USER_SUMMARY_EVERY_N`, `MAX_BATCH_SIZE`, `MEMORY_PROCESSOR_OWNER`) are documented in [`infra/README.md` → Counter-based trigger configuration](../infra/README.md#counter-based-trigger-configuration-function-app-only). Change them with `azd env set ...` then `azd up`.
27+

Docs/public_api.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Public API
2+
3+
## Architecture
4+
5+
`CosmosMemoryClient` and `AsyncCosmosMemoryClient` are thin orchestrators. They keep local-buffer state and Cosmos connection lifecycle, then delegate persistence to `MemoryStore` / `AsyncMemoryStore` and higher-level behavior to:
6+
7+
- `ChatClient` / `EmbeddingsClient` (sync) and `AsyncEmbeddingsClient` (async) — Azure OpenAI wrappers.
8+
- `RetrievalService` / `AsyncRetrievalService` for filtering, vector search, and episodic context.
9+
- `PipelineService` for extraction, summaries, procedural synthesis, and reconciliation.
10+
- `InProcessProcessor` / `AsyncInProcessProcessor` / `DurableFunctionProcessor` for immediate or change-feed-driven processing.
11+
- `auto_trigger.maybe_trigger_steps` (sync) and `aio.auto_trigger.maybe_trigger_steps` (async) for threshold-driven step firing after each `push_to_cosmos`.
12+
13+
## CosmosMemoryClient (sync)
14+
15+
### Connection
16+
17+
- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure local state, model clients, optional Cosmos auto-connect, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes.
18+
- `close() -> None` — close Cosmos/model clients and owned credentials.
19+
- `connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container.
20+
- `create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect the memory, optional turns, counter, and lease containers.
21+
22+
### Memory CRUD
23+
24+
- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
25+
- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
26+
- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
27+
- `delete_local(memory_id) -> None` — remove a local buffered memory.
28+
- `add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
29+
- `push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
30+
- `get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters.
31+
- `update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
32+
- `delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
33+
- `get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first.
34+
- `get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.
35+
36+
### Retrieval
37+
38+
- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
39+
- `get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
40+
- `get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
41+
- `get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
42+
- `search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
43+
- `build_procedural_context(user_id) -> str` — format procedural context for prompts.
44+
- `build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.
45+
46+
### Processing
47+
48+
- `extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
49+
- `synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
50+
- `generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
51+
- `generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
52+
- `reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
53+
- `process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
54+
- `process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.
55+
56+
### Tagging
57+
58+
- `add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
59+
- `remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
60+
- `list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default.
61+
62+
## AsyncCosmosMemoryClient
63+
64+
Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval, and processing methods are `async` and must be awaited.
65+
66+
### Connection
67+
68+
- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure async local state, model clients, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes.
69+
- `async close() -> None` — close async/sync resources and owned credentials.
70+
- `async connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container.
71+
- `async create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect memory, optional turns, counter, and lease containers.
72+
73+
### Memory CRUD
74+
75+
- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
76+
- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
77+
- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
78+
- `delete_local(memory_id) -> None` — remove a local buffered memory.
79+
- `async add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
80+
- `async push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
81+
- `async get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters.
82+
- `async update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
83+
- `async delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
84+
- `async get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first.
85+
- `async get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.
86+
87+
### Retrieval
88+
89+
- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
90+
- `async get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
91+
- `async get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
92+
- `async get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
93+
- `async search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
94+
- `async build_procedural_context(user_id) -> str` — format procedural context for prompts.
95+
- `async build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.
96+
97+
### Processing
98+
99+
- `async extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
100+
- `async synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
101+
- `async generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
102+
- `async generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
103+
- `async reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
104+
- `async process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
105+
- `async process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.
106+
107+
### Tagging
108+
109+
- `async add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
110+
- `async remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
111+
- `async list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default.
112+
113+
## Extension Points
114+
115+
Sync extension protocols live in `agent_memory_toolkit.services`; async variants live in `agent_memory_toolkit.aio.services`.
116+
117+
- `MemoryStoreProtocol` (`agent_memory_toolkit.services`): persistence primitives (`query`, `read_item`, `add_cosmos`, `mark_superseded`) consumed by the pipeline.
118+
119+
Concrete service classes are exported from their respective packages:
120+
121+
- Sync: `RetrievalService`, `PipelineService` from `agent_memory_toolkit.services` (sub-modules `retrieval`, `pipeline`).
122+
- Async: `AsyncRetrievalService` and `AsyncPipelineService` from `agent_memory_toolkit.aio.services` (sub-modules `retrieval`, `pipeline`). The async pipeline is a fully-native asyncio implementation — not an `asyncio.to_thread` shim over the sync pipeline.
123+
- Threshold-driven auto-trigger: `maybe_trigger_steps` from `agent_memory_toolkit.auto_trigger` (sync) and `agent_memory_toolkit.aio.auto_trigger` (async).

README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,7 @@ cp .env.template .env
6868
# edit COSMOS_DB_ENDPOINT, AI_FOUNDRY_ENDPOINT, AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME, AI_FOUNDRY_CHAT_DEPLOYMENT_NAME
6969
```
7070

71-
You can also point `azd up` at existing resources via `azd env set USE_EXISTING_COSMOS true` / `USE_EXISTING_AI_FOUNDRY true` (full BYOR flag list in `infra/README.md`).
72-
73-
> For the Durable Function app counter-trigger settings, Bicep module reference, and RBAC scopes — see **[`infra/README.md`](infra/README.md)**.
71+
> For the Durable Function app counter-trigger settings, Bicep module reference, RBAC scopes, and the SDK-only escape hatch (`DEPLOY_FUNCTION_APP=false`) — see **[`infra/README.md`](infra/README.md)**.
7472
7573
### 3. Use the SDK
7674

@@ -170,14 +168,14 @@ high_conf_facts = memory.get_memories(user_id="u1", memory_types=["fact"], min_c
170168

171169
### Memory Reconciliation
172170

173-
`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradiction"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.
171+
`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradict"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.
174172

175173
> **Cost note.** Each reconciliation makes one LLM call covering up to `n` facts (default 50, hard cap 500). With auto-trigger, this fires every `FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N` turns per user, with `n` taken from `DEDUP_POOL_SIZE`. The previous cosine-cluster pre-filter was removed deliberately — it could not catch semantic contradictions like "vegetarian" vs "ribeye steak" — so the LLM is now invoked whenever there are ≥ 2 active facts. To bound LLM cost more tightly: raise `DEDUP_EVERY_N` (lower frequency — reconcile fires every Nth extraction, so a *higher* N means *less often*), lower `DEDUP_POOL_SIZE` (smaller per-call pool), or override `n` per call when invoking `reconcile()` directly.
176174
177175
| New `MemoryRecord` field | Meaning |
178176
|---|---|
179177
| `content_hash` | SHA-256 of normalized content; enables write-time exact-dedup short-circuit |
180-
| `supersede_reason` | `"duplicate"` or `"contradiction"` (None for live records) |
178+
| `supersede_reason` | `"duplicate"` or `"contradict"` (None for live records) |
181179
| `superseded_at` | ISO timestamp when the supersede happened (None for live records) |
182180
| `superseded_by` | Id of the record that replaced this one (existing field) |
183181

@@ -282,7 +280,7 @@ Async equivalents (`AsyncInProcessProcessor`, `AsyncDurableFunctionProcessor`) l
282280
- **[Docs/design_patterns.md](Docs/design_patterns.md)** — Integration patterns for chat apps and multi-agent systems
283281
- **[Docs/local_testing.md](Docs/local_testing.md)** — Prerequisites, environment setup, running locally, debugging
284282
- **[Docs/azure_testing.md](Docs/azure_testing.md)** — Azure deployment, RBAC, cloud validation
285-
- **[infra/README.md](infra/README.md)**`azd` deployment, Bicep modules, BYOR settings, counter-trigger tuning
283+
- **[infra/README.md](infra/README.md)**`azd` deployment, Bicep modules, RBAC, counter-trigger tuning, SDK-only mode
286284
- **[Docs/troubleshooting.md](Docs/troubleshooting.md)** — Common issues and resolutions for setup, auth, Cosmos DB, embeddings, Durable Functions, vector search, change feed, etc.
287285

288286
---

Samples/Advanced/advanced_search_patterns.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ def filtered_by_memory_type(mem: CosmosMemoryClient, user_id: str) -> None:
140140
results = mem.search_cosmos(
141141
search_terms="food preferences",
142142
user_id=user_id,
143-
memory_type="fact",
143+
memory_types=["fact"],
144144
top_k=3,
145145
)
146146
print_results(results)

0 commit comments

Comments
 (0)