Skip to content

Commit aefb567

Browse files
committed
fixed accidental removal of README.md content
1 parent 41ab8b3 commit aefb567

1 file changed

Lines changed: 212 additions & 57 deletions

File tree

README.md

Lines changed: 212 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ tests/ Unit + integration tests (pytest)
5959

6060
---
6161

62-
## Quick Start
62+
## Quickstart
6363

6464
### 1. Install
6565

@@ -70,92 +70,102 @@ pip install .
7070
pip install ".[dev]"
7171
```
7272

73-
### 2. Local-only (no Azure)
73+
### 2. Provision Azure resources
7474

75-
```python
76-
import uuid
77-
from agent_memory_toolkit import CosmosMemoryClient
75+
The toolkit needs a Cosmos DB account, an Azure OpenAI / AI Foundry deployment, and (optionally for the remote processor) an Azure Function app. Pick whichever path matches your situation:
76+
77+
**Option A — One-command provision (`azd up`).** Creates everything from scratch — Cosmos + AI Foundry + Function app (Flex Consumption, idle cost ≈ $0) + UAMI + RBAC — and writes a working `.env` to `.azure/<env>/.env`:
78+
79+
```bash
80+
# Prereqs: az + azd installed; subscription with quota for gpt-4o-mini
81+
# and text-embedding-3-large in your chosen region (default: eastus2,
82+
# also supported: swedencentral, westus3).
7883

79-
memory = CosmosMemoryClient(use_default_credential=False)
80-
thread_id = str(uuid.uuid4())
81-
memory.add_local(user_id="user-001", role="user", thread_id=thread_id, content="Hello world")
82-
print(memory.get_local())
84+
az login
85+
azd auth login
86+
87+
azd env new memorytoolkit-dev
88+
# Optional: pin a region other than eastus2
89+
# azd env set AZURE_LOCATION swedencentral
90+
91+
azd up
92+
# ~10 min later: Cosmos account + AI Foundry account + 2 model deployments
93+
# (gpt-4o-mini, text-embedding-3-large) + UAMI + RBAC + Function app
94+
# are provisioned. Outputs are written to .azure/memorytoolkit-dev/.env
8395
```
8496

85-
### 3. With Cosmos DB + Azure OpenAI
97+
The Function app is always provisioned but only used when you opt into `DurableFunctionProcessor` — it sits idle (and bills nothing) for in-process workloads.
98+
99+
Load the generated env vars and you're ready to use the SDK:
86100

87101
```bash
88-
cp .env.template .env # fill in endpoint values
102+
set -a && . ./.azure/memorytoolkit-dev/.env && set +a
89103
```
90104

105+
To tear everything down later: `azd down --purge` (the `--purge` flag skips Cosmos / AI Foundry soft-delete so names are immediately reusable).
106+
107+
**Option B — Bring your own resources.** If you already have a Cosmos DB account and an AI Foundry / Azure OpenAI deployment, copy the env template and fill in the endpoints:
108+
109+
```bash
110+
cp .env.template .env
111+
# edit COSMOS_DB_ENDPOINT, AI_FOUNDRY_ENDPOINT, AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME, AI_FOUNDRY_CHAT_DEPLOYMENT_NAME
112+
```
113+
114+
You can also point `azd up` at existing resources via `azd env set USE_EXISTING_COSMOS true` / `USE_EXISTING_AI_FOUNDRY true` (full BYOR flag list in `infra/README.md`).
115+
116+
> For the Durable Function app counter-trigger settings, Bicep module reference, and RBAC scopes — see **[`infra/README.md`](infra/README.md)**.
117+
118+
### 3. Use the SDK
119+
91120
```python
92121
import os, uuid
93122
from dotenv import load_dotenv
94-
from azure.identity import DefaultAzureCredential
95123
from agent_memory_toolkit import CosmosMemoryClient
96124

97125
load_dotenv()
98126

99127
memory = CosmosMemoryClient(
100-
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
101-
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
102-
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
103-
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
104-
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
105-
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
106-
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
107-
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
108-
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
109-
adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
110-
adf_key=os.getenv("ADF_KEY", ""),
128+
cosmos_endpoint=os.environ["COSMOS_DB_ENDPOINT"],
129+
cosmos_database=os.getenv("COSMOS_DB_DATABASE", "ai_memory"),
130+
cosmos_container=os.getenv("COSMOS_DB_CONTAINER", "memories"),
131+
ai_foundry_endpoint=os.environ["AI_FOUNDRY_ENDPOINT"],
132+
embedding_deployment_name=os.getenv("AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-3-large"),
133+
chat_deployment_name=os.getenv("AI_FOUNDRY_CHAT_DEPLOYMENT_NAME", "gpt-4o-mini"),
111134
use_default_credential=True,
112-
cosmos_credential=DefaultAzureCredential(),
135+
# processor=InProcessProcessor() # implicit default
113136
)
114-
# Constructor auto-creates the database and required containers if they don't exist.
115-
# `serverless` is the default throughput mode. Set `COSMOS_DB_THROUGHPUT_MODE=autoscale`
116-
# to provision memories, counter, and lease containers with a shared autoscale RU cap.
117-
118-
# Add directly to Cosmos
119-
thread_id = str(uuid.uuid4())
120-
memory.add_cosmos(user_id="user-001", role="user", thread_id=thread_id, content="Stored in Cosmos")
121-
print(memory.get_memories(user_id="user-001", thread_id=thread_id))
122-
123-
# Or add locally first, then bulk-upload
124-
memory.add_local(user_id="user-001", role="agent", thread_id=thread_id, content="Response text")
125-
memory.push_to_cosmos()
126-
```
137+
memory.connect_cosmos() # auto-creates database + containers if missing
127138

128-
### 4. Durable Function operations
139+
USER, THREAD = "user-001", str(uuid.uuid4())
129140

130-
These require the Azure Durable Functions host. See [local_testing.md](Docs/local_testing.md) for setup.
131-
132-
```python
133-
# Thread summary (incremental — merges with existing if present)
134-
result = memory.generate_thread_summary(user_id="user-001", thread_id=thread_id, recent_k=5)
141+
# Add raw turns to a conversation
142+
memory.add_cosmos(user_id=USER, thread_id=THREAD, role="user", content="I love Cosmos DB.")
143+
memory.add_cosmos(user_id=USER, thread_id=THREAD, role="assistant", content="It is fantastic.")
135144

136-
# Fact extraction
137-
result = memory.extract_facts(user_id="user-001", thread_id=thread_id)
145+
# Run the processing pipeline (thread summary + fact extraction + user summary)
146+
memory.process_now(user_id=USER, thread_id=THREAD)
138147

139-
# User summary (incremental — cross-thread profile)
140-
result = memory.generate_user_summary(user_id="user-001")
148+
# Search semantically across the stored memory
149+
hits = memory.search_cosmos(user_id=USER, query_text="Cosmos DB preferences", top=5)
150+
for h in hits:
151+
print(h["memory_type"], "-", h["content"][:80])
141152

142-
# Retrieve stored user summary
143-
summary = memory.get_user_summary(user_id="user-001")
153+
# Retrieve the cross-thread user profile
154+
print(memory.get_user_summary(user_id=USER))
144155
```
145156

146-
> The async API (`AsyncCosmosMemoryClient`) is identical — just `await` each call. Import from the `aio` subpackage:
147-
>
157+
> Async API is identical — just `await` each call:
148158
> ```python
149159
> from agent_memory_toolkit.aio import AsyncCosmosMemoryClient
150160
> ```
151161
152-
---
153-
154-
## Deploy Azure Resources
162+
### 4. Run a sample
155163
156-
The [`infra/`](infra/) folder contains the Bicep templates and Azure Developer CLI (`azd`) configuration for provisioning the toolkit's Azure resources end-to-end. It deploys Cosmos DB for NoSQL, AI Foundry model deployments, managed identity and RBAC assignments, and the Azure Function app used for Durable Functions processing.
164+
```bash
165+
python Samples/quickstart_cosmos.py
166+
```
157167
158-
For deployment prerequisites, configuration options, bring-your-own-resource settings, and cleanup commands, see the [infra README](infra/README.md).
168+
See [`Samples/`](Samples/) for end-to-end scenarios (chat memory, RAG, multi-agent, customer support, remote processor).
159169

160170
---
161171

@@ -173,6 +183,148 @@ All services use **Entra ID** auth via `DefaultAzureCredential`.
173183

174184
---
175185

186+
## Concepts in 60 seconds
187+
188+
| Concept | What it is | API |
189+
|---|---|---|
190+
| **Turn** | One message (user or assistant) — the raw conversation atom | `add_cosmos(...)`, `add_local(...)` |
191+
| **Thread summary** | LLM-generated, incrementally updated rollup of a single thread | `generate_thread_summary(...)` |
192+
| **Fact** | Discrete, independently searchable assertion extracted from turns | `extract_memories(...)` |
193+
| **Procedural** | Behavioral rule / instruction the user wants followed | `extract_memories(...)` |
194+
| **Episodic** | Past situation → action → outcome experience (90-day TTL) | `extract_memories(...)` |
195+
| **User summary** | Cross-thread profile of what's known about a user | `generate_user_summary(...)`, `get_user_summary(...)` |
196+
| **Search** | Vector + full-text + filter; returns any of the above | `search_cosmos(...)` |
197+
| **Process now** | Run the full pipeline (summary → facts → user profile) for recent turns | `process_now(...)`, `process_now_and_wait(...)` |
198+
199+
All memory kinds live in the same Cosmos container, partitioned by `(user_id, thread_id)`, distinguished by a `type` discriminator.
200+
201+
### Memory Type Taxonomy
202+
203+
The `extract_memories` pipeline classifies each item it pulls from the conversation into one of four buckets. Every memory carries a top-level `confidence` (0.0–1.0) so retrieval can suppress weakly-grounded extractions.
204+
205+
| Bucket | Meaning | Storage type | TTL |
206+
|---|---|---|---|
207+
| Fact | Declarative knowledge ("user prefers dark mode") | `type="fact"` | none |
208+
| Procedural | Behavioral rule ("always confirm before deleting") | `type="procedural"` | none |
209+
| Episodic | Past experience: situation → action → outcome | `type="episodic"` | 90 days |
210+
| Unclassified | Item worth keeping but the LLM couldn't confidently classify | `type="fact"` + tag `sys:unclassified` | none |
211+
212+
#### Confidence Scale
213+
214+
| Range | Meaning |
215+
|---|---|
216+
| 0.9–1.0 | Directly stated and unambiguous |
217+
| 0.7–0.9 | Clearly implied, no contradicting evidence |
218+
| 0.5–0.7 | Inferred from context — plausible but not explicit |
219+
| < 0.5 | Should be in `unclassified` instead |
220+
221+
Filter at retrieval time:
222+
223+
```python
224+
results = memory.search_cosmos("user preferences", user_id="u1", min_confidence=0.7)
225+
high_conf_facts = memory.get_memories(user_id="u1", memory_types=["fact"], min_confidence=0.7)
226+
```
227+
228+
### Memory Reconciliation
229+
230+
`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradiction"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.
231+
232+
> **Cost note.** Each reconciliation makes one LLM call covering up to `n` facts (default 50, hard cap 500). With auto-trigger, this fires every `FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N` turns per user, with `n` taken from `DEDUP_POOL_SIZE`. The previous cosine-cluster pre-filter was removed deliberately — it could not catch semantic contradictions like "vegetarian" vs "ribeye steak" — so the LLM is now invoked whenever there are ≥ 2 active facts. To bound LLM cost more tightly: raise `DEDUP_EVERY_N` (lower frequency — reconcile fires every Nth extraction, so a *higher* N means *less often*), lower `DEDUP_POOL_SIZE` (smaller per-call pool), or override `n` per call when invoking `reconcile()` directly.
233+
234+
| New `MemoryRecord` field | Meaning |
235+
|---|---|
236+
| `content_hash` | SHA-256 of normalized content; enables write-time exact-dedup short-circuit |
237+
| `supersede_reason` | `"duplicate"` or `"contradiction"` (None for live records) |
238+
| `superseded_at` | ISO timestamp when the supersede happened (None for live records) |
239+
| `superseded_by` | Id of the record that replaced this one (existing field) |
240+
241+
### Auto-trigger (per-turn extraction)
242+
243+
By default, the **InProcess processor** runs each pipeline step independently as its own threshold trips inside `push_to_cosmos()`:
244+
245+
| Env var | Default | Step that fires | Async behavior |
246+
|---|---|---|---|
247+
| `FACT_EXTRACTION_EVERY_N` | `1` (every turn) | `process_extract_memories` | scheduled via `asyncio.create_task` |
248+
| `DEDUP_EVERY_N` | `5` | `process_reconcile` (fires every Nth extract → effectively every `FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N` turns) | scheduled via `asyncio.create_task` |
249+
| `DEDUP_POOL_SIZE` | `50` | pool size (`n`) passed to `process_reconcile` from the auto-trigger; hard-capped at `500` | n/a (per-call) |
250+
| `THREAD_SUMMARY_EVERY_N` | `10` | `process_thread_summary` | scheduled via `asyncio.create_task` |
251+
| `USER_SUMMARY_EVERY_N` | `20` | `process_user_summary` | scheduled via `asyncio.create_task` |
252+
253+
Each `*_EVERY_N=0` disables only that step. Dedup is gated independently of extract because cross-thread dedup is dramatically more expensive than per-thread extract (it reads every active fact for the user) — running it on every extract slammed AI Foundry. The Durable backend uses the same defaults via the change-feed function app (the function-app `azd` deploy bumps `FACT_EXTRACTION_EVERY_N` to `5` since the FA path is intended for higher-volume workloads). Calling `process_now()` is normally redundant — it remains as an explicit "process now" hook for tests, manual workflows, and operators who set every threshold to `0`.
254+
255+
The async client (`AsyncCosmosMemoryClient.push_to_cosmos`) does **not** await the auto-trigger; it schedules it as a background `asyncio.Task` so the write call returns as soon as the Cosmos upserts complete. Background failures are surfaced via `logger.warning` (search for `"Background auto-trigger task failed"`).
256+
257+
#### Backend exclusivity (`MEMORY_PROCESSOR_OWNER`)
258+
259+
Both the SDK auto-trigger and the function-app change-feed processor write into the same `counter` container. If you accidentally point an `InProcessProcessor` at a Cosmos container that already has a function app attached, both backends will run the pipeline on the same writes — double extraction, double dedup, double counters.
260+
261+
Set the env var on **both sides** to make ownership explicit:
262+
263+
| `MEMORY_PROCESSOR_OWNER` | SDK behavior | Function-app behavior |
264+
|---|---|---|
265+
| _unset_ (default) | runs auto-trigger | runs orchestrator (today's behavior) |
266+
| `inprocess` | runs auto-trigger | change-feed trigger skips batch + logs |
267+
| `durable` | auto-trigger logs warning + skips | runs orchestrator |
268+
269+
The default (unset) preserves backward compatibility. For any production deployment we recommend setting it on both sides so a misconfiguration produces a loud log line instead of silent double-work.
270+
271+
> **Advisory, not enforced.** `MEMORY_PROCESSOR_OWNER` is operator-configured exclusivity, not a server-side lock. Each backend reads its own env var; if the SDK is set to `inprocess` but the FA forgets to set `durable` (or vice versa), both still run. As a backstop, every counter write stamps `last_owner=<this backend>` on the doc — when the SDK observes a counter previously written by `durable` (or vice versa), it logs a one-shot `WARN` so misconfiguration surfaces in logs without spamming. Treat this as a configuration audit signal, not a hard guarantee.
272+
273+
---
274+
275+
## Two processor flavors
276+
277+
Pick at construction time via the `processor=` kwarg.
278+
279+
| | `InProcessProcessor` (default) | `DurableFunctionProcessor` |
280+
|---|---|---|
281+
| Infra | None — just `pip install` | Sibling Azure Function app |
282+
| Best for | Prototypes, low TPS, single-agent | Fleet / multi-agent / high TPS |
283+
| `process_now()` | Synchronous, returns when done | No-op (work runs async on change feed) |
284+
| `process_now_and_wait()` | Returns immediately after flush | Polls until summary visible (RU-costly; tests/demos) |
285+
286+
```python
287+
from agent_memory_toolkit import CosmosMemoryClient, DurableFunctionProcessor
288+
289+
memory = CosmosMemoryClient(..., processor=DurableFunctionProcessor())
290+
```
291+
292+
`DurableFunctionProcessor` is a thin marker — there is no SDK→Function HTTP call. The SDK just writes turns; the deployed Function app picks them up via the Cosmos change feed. Counter-based trigger configuration and Bicep module reference live in [`infra/README.md`](infra/README.md).
293+
294+
---
295+
296+
## Public API reference
297+
298+
| Symbol | Module | Purpose |
299+
|---|---|---|
300+
| `CosmosMemoryClient` | `agent_memory_toolkit` | Sync client — local CRUD, Cosmos DB I/O, processing |
301+
| `AsyncCosmosMemoryClient` | `agent_memory_toolkit.aio` | Async mirror |
302+
| `MemoryProcessor` | `agent_memory_toolkit` | Protocol that any processor backend implements |
303+
| `InProcessProcessor` | `agent_memory_toolkit` | Default backend — runs the pipeline in-process |
304+
| `DurableFunctionProcessor` | `agent_memory_toolkit` | Marker backend — work runs in sibling Function app via change feed |
305+
| `client.process_now()` || Run the pipeline for recent turns (in-process) or no-op (remote) |
306+
| `client.process_now_and_wait()` || Opt-in poll until processing completes; useful for tests/demos with the remote backend |
307+
| `MemoryRecord`, `MemoryType`, `Role` | `agent_memory_toolkit` | Pydantic models / enums |
308+
309+
Async equivalents (`AsyncInProcessProcessor`, `AsyncDurableFunctionProcessor`) live in `agent_memory_toolkit.aio`.
310+
311+
---
312+
313+
## Project structure
314+
315+
```
316+
agent_memory_toolkit/ Python SDK (sync + aio mirror)
317+
processors/ MemoryProcessor Protocol + InProcess/Durable backends
318+
function_app/ Sibling Azure Durable Function app
319+
infra/ Bicep modules + main.bicep for `azd up`
320+
azure.yaml `azd` config — provisions Cosmos + AI Foundry + Function app
321+
Samples/ Demo notebooks + sample scripts
322+
Docs/ Conceptual + operational docs
323+
tests/ Unit + integration tests (pytest)
324+
```
325+
326+
---
327+
176328
## Documentation
177329

178330
- **[concepts.md](Docs/concepts.md)** — Memory types, threads, roles, embeddings, processing pipeline
@@ -181,4 +333,7 @@ All services use **Entra ID** auth via `DefaultAzureCredential`.
181333
- **[azure_testing.md](Docs/azure_testing.md)** — Azure deployment, RBAC, cloud validation
182334
- **[troubleshooting.md](Docs/troubleshooting.md)** — Common setup, auth, Cosmos DB, Durable Functions, search, and change feed failure modes
183335

184-
---
336+
---
337+
338+
## Trademark notice
339+
Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

0 commit comments

Comments
 (0)