You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -59,7 +59,7 @@ tests/ Unit + integration tests (pytest)
59
59
60
60
---
61
61
62
-
## Quick Start
62
+
## Quickstart
63
63
64
64
### 1. Install
65
65
@@ -70,92 +70,102 @@ pip install .
70
70
pip install ".[dev]"
71
71
```
72
72
73
-
### 2. Local-only (no Azure)
73
+
### 2. Provision Azure resources
74
74
75
-
```python
76
-
import uuid
77
-
from agent_memory_toolkit import CosmosMemoryClient
75
+
The toolkit needs a Cosmos DB account, an Azure OpenAI / AI Foundry deployment, and (optionally for the remote processor) an Azure Function app. Pick whichever path matches your situation:
76
+
77
+
**Option A — One-command provision (`azd up`).** Creates everything from scratch — Cosmos + AI Foundry + Function app (Flex Consumption, idle cost ≈ $0) + UAMI + RBAC — and writes a working `.env` to `.azure/<env>/.env`:
78
+
79
+
```bash
80
+
# Prereqs: az + azd installed; subscription with quota for gpt-4o-mini
81
+
# and text-embedding-3-large in your chosen region (default: eastus2,
# ~10 min later: Cosmos account + AI Foundry account + 2 model deployments
93
+
# (gpt-4o-mini, text-embedding-3-large) + UAMI + RBAC + Function app
94
+
# are provisioned. Outputs are written to .azure/memorytoolkit-dev/.env
83
95
```
84
96
85
-
### 3. With Cosmos DB + Azure OpenAI
97
+
The Function app is always provisioned but only used when you opt into `DurableFunctionProcessor` — it sits idle (and bills nothing) for in-process workloads.
98
+
99
+
Load the generated env vars and you're ready to use the SDK:
86
100
87
101
```bash
88
-
cp .env.template .env # fill in endpoint values
102
+
set -a &&. ./.azure/memorytoolkit-dev/.env &&set +a
89
103
```
90
104
105
+
To tear everything down later: `azd down --purge` (the `--purge` flag skips Cosmos / AI Foundry soft-delete so names are immediately reusable).
106
+
107
+
**Option B — Bring your own resources.** If you already have a Cosmos DB account and an AI Foundry / Azure OpenAI deployment, copy the env template and fill in the endpoints:
You can also point `azd up` at existing resources via `azd env set USE_EXISTING_COSMOS true` / `USE_EXISTING_AI_FOUNDRY true` (full BYOR flag list in `infra/README.md`).
115
+
116
+
> For the Durable Function app counter-trigger settings, Bicep module reference, and RBAC scopes — see **[`infra/README.md`](infra/README.md)**.
117
+
118
+
### 3. Use the SDK
119
+
91
120
```python
92
121
import os, uuid
93
122
from dotenv import load_dotenv
94
-
from azure.identity import DefaultAzureCredential
95
123
from agent_memory_toolkit import CosmosMemoryClient
The [`infra/`](infra/) folder contains the Bicep templates and Azure Developer CLI (`azd`) configuration for provisioning the toolkit's Azure resources end-to-end. It deploys Cosmos DB for NoSQL, AI Foundry model deployments, managed identity and RBAC assignments, and the Azure Function app used for Durable Functions processing.
164
+
```bash
165
+
python Samples/quickstart_cosmos.py
166
+
```
157
167
158
-
For deployment prerequisites, configuration options, bring-your-own-resource settings, and cleanup commands, see the [infra README](infra/README.md).
168
+
See [`Samples/`](Samples/) for end-to-end scenarios (chat memory, RAG, multi-agent, customer support, remote processor).
159
169
160
170
---
161
171
@@ -173,6 +183,148 @@ All services use **Entra ID** auth via `DefaultAzureCredential`.
173
183
174
184
---
175
185
186
+
## Concepts in 60 seconds
187
+
188
+
| Concept | What it is | API |
189
+
|---|---|---|
190
+
|**Turn**| One message (user or assistant) — the raw conversation atom |`add_cosmos(...)`, `add_local(...)`|
191
+
|**Thread summary**| LLM-generated, incrementally updated rollup of a single thread |`generate_thread_summary(...)`|
192
+
|**Fact**| Discrete, independently searchable assertion extracted from turns |`extract_memories(...)`|
193
+
|**Procedural**| Behavioral rule / instruction the user wants followed |`extract_memories(...)`|
|**User summary**| Cross-thread profile of what's known about a user |`generate_user_summary(...)`, `get_user_summary(...)`|
196
+
|**Search**| Vector + full-text + filter; returns any of the above |`search_cosmos(...)`|
197
+
|**Process now**| Run the full pipeline (summary → facts → user profile) for recent turns |`process_now(...)`, `process_now_and_wait(...)`|
198
+
199
+
All memory kinds live in the same Cosmos container, partitioned by `(user_id, thread_id)`, distinguished by a `type` discriminator.
200
+
201
+
### Memory Type Taxonomy
202
+
203
+
The `extract_memories` pipeline classifies each item it pulls from the conversation into one of four buckets. Every memory carries a top-level `confidence` (0.0–1.0) so retrieval can suppress weakly-grounded extractions.
`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradiction"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.
231
+
232
+
> **Cost note.** Each reconciliation makes one LLM call covering up to `n` facts (default 50, hard cap 500). With auto-trigger, this fires every `FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N` turns per user, with `n` taken from `DEDUP_POOL_SIZE`. The previous cosine-cluster pre-filter was removed deliberately — it could not catch semantic contradictions like "vegetarian" vs "ribeye steak" — so the LLM is now invoked whenever there are ≥ 2 active facts. To bound LLM cost more tightly: raise `DEDUP_EVERY_N` (lower frequency — reconcile fires every Nth extraction, so a *higher* N means *less often*), lower `DEDUP_POOL_SIZE` (smaller per-call pool), or override `n` per call when invoking `reconcile()` directly.
233
+
234
+
| New `MemoryRecord` field | Meaning |
235
+
|---|---|
236
+
|`content_hash`| SHA-256 of normalized content; enables write-time exact-dedup short-circuit |
237
+
|`supersede_reason`|`"duplicate"` or `"contradiction"` (None for live records) |
238
+
|`superseded_at`| ISO timestamp when the supersede happened (None for live records) |
239
+
|`superseded_by`| Id of the record that replaced this one (existing field) |
240
+
241
+
### Auto-trigger (per-turn extraction)
242
+
243
+
By default, the **InProcess processor** runs each pipeline step independently as its own threshold trips inside `push_to_cosmos()`:
244
+
245
+
| Env var | Default | Step that fires | Async behavior |
246
+
|---|---|---|---|
247
+
|`FACT_EXTRACTION_EVERY_N`|`1` (every turn) |`process_extract_memories`| scheduled via `asyncio.create_task`|
248
+
|`DEDUP_EVERY_N`|`5`|`process_reconcile` (fires every Nth extract → effectively every `FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N` turns) | scheduled via `asyncio.create_task`|
249
+
|`DEDUP_POOL_SIZE`|`50`| pool size (`n`) passed to `process_reconcile` from the auto-trigger; hard-capped at `500`| n/a (per-call) |
250
+
|`THREAD_SUMMARY_EVERY_N`|`10`|`process_thread_summary`| scheduled via `asyncio.create_task`|
251
+
|`USER_SUMMARY_EVERY_N`|`20`|`process_user_summary`| scheduled via `asyncio.create_task`|
252
+
253
+
Each `*_EVERY_N=0` disables only that step. Dedup is gated independently of extract because cross-thread dedup is dramatically more expensive than per-thread extract (it reads every active fact for the user) — running it on every extract slammed AI Foundry. The Durable backend uses the same defaults via the change-feed function app (the function-app `azd` deploy bumps `FACT_EXTRACTION_EVERY_N` to `5` since the FA path is intended for higher-volume workloads). Calling `process_now()` is normally redundant — it remains as an explicit "process now" hook for tests, manual workflows, and operators who set every threshold to `0`.
254
+
255
+
The async client (`AsyncCosmosMemoryClient.push_to_cosmos`) does **not** await the auto-trigger; it schedules it as a background `asyncio.Task` so the write call returns as soon as the Cosmos upserts complete. Background failures are surfaced via `logger.warning` (search for `"Background auto-trigger task failed"`).
Both the SDK auto-trigger and the function-app change-feed processor write into the same `counter` container. If you accidentally point an `InProcessProcessor` at a Cosmos container that already has a function app attached, both backends will run the pipeline on the same writes — double extraction, double dedup, double counters.
260
+
261
+
Set the env var on **both sides** to make ownership explicit:
The default (unset) preserves backward compatibility. For any production deployment we recommend setting it on both sides so a misconfiguration produces a loud log line instead of silent double-work.
270
+
271
+
> **Advisory, not enforced.**`MEMORY_PROCESSOR_OWNER` is operator-configured exclusivity, not a server-side lock. Each backend reads its own env var; if the SDK is set to `inprocess` but the FA forgets to set `durable` (or vice versa), both still run. As a backstop, every counter write stamps `last_owner=<this backend>` on the doc — when the SDK observes a counter previously written by `durable` (or vice versa), it logs a one-shot `WARN` so misconfiguration surfaces in logs without spamming. Treat this as a configuration audit signal, not a hard guarantee.
272
+
273
+
---
274
+
275
+
## Two processor flavors
276
+
277
+
Pick at construction time via the `processor=` kwarg.
`DurableFunctionProcessor` is a thin marker — there is no SDK→Function HTTP call. The SDK just writes turns; the deployed Function app picks them up via the Cosmos change feed. Counter-based trigger configuration and Bicep module reference live in [`infra/README.md`](infra/README.md).
293
+
294
+
---
295
+
296
+
## Public API reference
297
+
298
+
| Symbol | Module | Purpose |
299
+
|---|---|---|
300
+
|`CosmosMemoryClient`|`agent_memory_toolkit`| Sync client — local CRUD, Cosmos DB I/O, processing |
-**[troubleshooting.md](Docs/troubleshooting.md)** — Common setup, auth, Cosmos DB, Durable Functions, search, and change feed failure modes
183
335
184
-
---
336
+
---
337
+
338
+
## Trademark notice
339
+
Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
0 commit comments