Skip to content

Commit 973cb14

Browse files
committed
updated toolkit to use Serverless Cosmos DB by default, configrable for autoscale
1 parent 1fac8b9 commit 973cb14

15 files changed

Lines changed: 517 additions & 76 deletions

.env.template

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@ COSMOS_DB_DATABASE=ai_memory
99
COSMOS_DB_CONTAINER=memories
1010
COSMOS_DB_COUNTERS_CONTAINER=counter
1111
COSMOS_DB_LEASE_CONTAINER=leases
12+
# Throughput mode for all required Cosmos DB containers created by the toolkit
13+
# (memories, counter, and lease).
14+
# - serverless: default. The toolkit does not send container RU/s settings.
15+
# Use this only with a Cosmos DB account configured for serverless.
16+
# - autoscale: the toolkit provisions all required containers with autoscale
17+
# throughput using COSMOS_DB_AUTOSCALE_MAX_RU as the max RU/s cap.
18+
# Default max RU/s is 1000.
19+
COSMOS_DB_THROUGHPUT_MODE=serverless
1220
COSMOS_DB_AUTOSCALE_MAX_RU=1000
1321

1422
# ---- Change Feed Thresholds (set to 0 to disable) ----

Docs/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ This folder contains the main project documentation for Agent Memory Toolkit.
66

77
| Document | Purpose |
88
|----------|---------|
9-
| [concepts.md](concepts.md) | Explains the core memory model, including memory types (turn, summary, fact, user summary), threads, roles, the processing pipeline, and automatic change feed processing. |
10-
| [local_testing.md](local_testing.md) | Covers local setup, environment configuration, RBAC, Cosmos provisioning, running the toolkit and Azure Functions locally, and testing change feed auto-processing. |
11-
| [azure_testing.md](azure_testing.md) | Covers Azure deployment, cloud configuration, required services, change feed settings, and validation steps for running the toolkit in Azure. |
9+
| [concepts.md](concepts.md) | Explains the core memory model, including memory types (turn, summary, fact, user summary), threads, roles, the processing pipeline, automatic change feed processing, and shared Cosmos throughput configuration. |
10+
| [local_testing.md](local_testing.md) | Covers local setup, environment configuration, RBAC, Cosmos provisioning, running the toolkit and Azure Functions locally, and testing change feed auto-processing with serverless or autoscale container provisioning. |
11+
| [azure_testing.md](azure_testing.md) | Covers Azure deployment, cloud configuration, required services, change feed settings, throughput mode configuration, and validation steps for running the toolkit in Azure. |
1212
| [design_patterns.md](design_patterns.md) | Shows when and how to call CRUD operations, summarization, fact extraction, and memory retrieval in chat and multi-agent applications, including automatic processing via the change feed. |
1313

1414
## Recommended Reading Order

Docs/azure_testing.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Deploying and Testing Agent Memory Toolkit in Azure
22

3-
This guide covers the minimum Azure resources, deployment steps, and validation order for running the toolkit in Azure.
3+
This guide covers the minimum Azure resources, deployment steps, throughput settings, and validation order for running the toolkit in Azure.
44

55
---
66

@@ -71,7 +71,7 @@ az cosmosdb create \
7171
--resource-group <resource-group>
7272
```
7373

74-
The toolkit can create the database and container later via `create_memory_store()`.
74+
The toolkit can create the database and required containers later via `create_memory_store()`.
7575

7676
---
7777

@@ -104,13 +104,18 @@ az functionapp config appsettings set \
104104
COSMOS_DB_ENDPOINT="https://<cosmos-account-name>.documents.azure.com:443/" \
105105
COSMOS_DB_DATABASE="ai_memory" \
106106
COSMOS_DB_CONTAINER="memories" \
107+
COSMOS_DB_COUNTERS_CONTAINER="counter" \
108+
COSMOS_DB_LEASE_CONTAINER="leases" \
109+
COSMOS_DB_THROUGHPUT_MODE="serverless" \
107110
COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
108111
AI_FOUNDRY_ENDPOINT="https://<openai-account-name>.openai.azure.com/" \
109112
EMBEDDING_MODEL="text-embedding-3-large" \
110113
EMBEDDING_DIMENSIONS="1536" \
111114
LLM_MODEL="gpt-5-mini"
112115
```
113116

117+
`COSMOS_DB_THROUGHPUT_MODE=serverless` is the default and creates the `memories`, `counter`, and `leases` containers without specifying RU/s. Set `COSMOS_DB_THROUGHPUT_MODE=autoscale` to apply the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to all required containers.
118+
114119
### Change feed settings (optional)
115120

116121
To enable automatic processing via the change feed trigger, add these settings:
@@ -122,14 +127,17 @@ az functionapp config appsettings set \
122127
--settings \
123128
COSMOS_DB__accountEndpoint="https://<cosmos-account-name>.documents.azure.com:443/" \
124129
COSMOS_DB_COUNTERS_CONTAINER="counter" \
130+
COSMOS_DB_LEASE_CONTAINER="leases" \
131+
COSMOS_DB_THROUGHPUT_MODE="serverless" \
132+
COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
125133
THREAD_SUMMARY_EVERY_N="5" \
126134
FACT_EXTRACTION_EVERY_N="3" \
127135
USER_SUMMARY_EVERY_N="10"
128136
```
129137

130138
Set any threshold to `"0"` to disable that processing type.
131139

132-
The `leases` container is created automatically by the Azure Functions runtime.
140+
The `leases` container is provisioned by `create_memory_store()` alongside the `memories` and `counter` containers, so the Function App should be configured to use that existing lease container.
133141

134142
If you use function-key auth for the HTTP trigger, keep the key for the client as `ADF_KEY`.
135143

@@ -161,6 +169,9 @@ Update `.env` to point at Azure instead of localhost:
161169
COSMOS_DB_ENDPOINT=https://<cosmos-account-name>.documents.azure.com:443/
162170
COSMOS_DB_DATABASE=ai_memory
163171
COSMOS_DB_CONTAINER=memories
172+
COSMOS_DB_COUNTERS_CONTAINER=counter
173+
COSMOS_DB_LEASE_CONTAINER=leases
174+
COSMOS_DB_THROUGHPUT_MODE=serverless
164175
COSMOS_DB_AUTOSCALE_MAX_RU=1000
165176
166177
AI_FOUNDRY_ENDPOINT=https://<openai-account-name>.openai.azure.com/
@@ -192,6 +203,10 @@ memory = AgentMemory(
192203
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
193204
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
194205
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
206+
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
207+
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
208+
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
209+
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
195210
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
196211
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
197212
adf_endpoint=os.getenv("ADF_ENDPOINT"),
@@ -218,6 +233,10 @@ memory = AsyncAgentMemory(
218233
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
219234
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
220235
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
236+
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
237+
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
238+
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
239+
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
221240
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
222241
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
223242
adf_endpoint=os.getenv("ADF_ENDPOINT"),
@@ -235,7 +254,7 @@ await memory.connect_cosmos(
235254
await memory.create_memory_store()
236255
```
237256

238-
This provisions the hierarchical partition key (`user_id`, `thread_id`), vector index, full-text index, and autoscale throughput.
257+
This provisions the `memories`, `counter`, and `leases` containers. `serverless` is the default throughput mode; if you set `COSMOS_DB_THROUGHPUT_MODE=autoscale`, the shared `COSMOS_DB_AUTOSCALE_MAX_RU` value is applied to all three containers.
239258

240259
---
241260

Docs/concepts.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,16 @@ Set any value to `0` to disable that processing type. For example, setting `THRE
154154
|-----------|---------------|---------|
155155
| `memories` | `/user_id`, `/thread_id` (hierarchical) | Existing memory store |
156156
| `counter` | `/user_id`, `/thread_id` (hierarchical) | Message count tracking for automatic processing |
157-
| `leases` | `/id` | Auto-created by the trigger for change feed checkpointing |
157+
| `leases` | `/id` | Change feed checkpointing container created by `create_memory_store()` |
158+
159+
### Throughput configuration
160+
161+
The toolkit provisions all required Cosmos containers under one shared throughput mode:
162+
163+
- `serverless` is the default. The toolkit creates the `memories`, `counter`, and `leases` containers without specifying RU/s.
164+
- `autoscale` applies the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to all three containers.
165+
166+
This keeps the change feed dependencies aligned with the main memory store instead of letting the Functions trigger create the lease container independently.
158167

159168
### Push vs. pull
160169

Docs/local_testing.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,9 @@ Minimum `.env` values:
7272
COSMOS_DB_ENDPOINT=https://<your-account>.documents.azure.com:443/
7373
COSMOS_DB_DATABASE=ai_memory
7474
COSMOS_DB_CONTAINER=memories
75+
COSMOS_DB_COUNTERS_CONTAINER=counter
76+
COSMOS_DB_LEASE_CONTAINER=leases
77+
COSMOS_DB_THROUGHPUT_MODE=serverless
7578
COSMOS_DB_AUTOSCALE_MAX_RU=1000
7679
7780
AI_FOUNDRY_ENDPOINT=https://<your-project>.services.ai.azure.com/
@@ -85,13 +88,18 @@ ADF_KEY=
8588

8689
The Functions runtime uses `azure_functions/local.settings.json`, not `.env`, so mirror the same values there.
8790

91+
`COSMOS_DB_THROUGHPUT_MODE=serverless` is the default and creates the required Cosmos containers without specifying RU/s. If you set `COSMOS_DB_THROUGHPUT_MODE=autoscale`, the toolkit provisions the memories, counter, and lease containers with the shared max RU/s value from `COSMOS_DB_AUTOSCALE_MAX_RU`.
92+
8893
### Change feed settings (optional)
8994

9095
In `azure_functions/local.settings.json`, add these to enable automatic processing:
9196

9297
```json
9398
"COSMOS_DB__accountEndpoint": "https://<your-account>.documents.azure.com:443/",
9499
"COSMOS_DB_COUNTERS_CONTAINER": "counter",
100+
"COSMOS_DB_LEASE_CONTAINER": "leases",
101+
"COSMOS_DB_THROUGHPUT_MODE": "serverless",
102+
"COSMOS_DB_AUTOSCALE_MAX_RU": "1000",
95103
"THREAD_SUMMARY_EVERY_N": "5",
96104
"FACT_EXTRACTION_EVERY_N": "3",
97105
"USER_SUMMARY_EVERY_N": "10"
@@ -153,6 +161,10 @@ memory = AgentMemory(
153161
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
154162
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
155163
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
164+
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
165+
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
166+
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
167+
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
156168
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
157169
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
158170
adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
@@ -192,6 +204,10 @@ memory = AsyncAgentMemory(
192204
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
193205
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
194206
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
207+
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
208+
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
209+
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
210+
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
195211
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
196212
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
197213
adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
@@ -217,7 +233,7 @@ for r in results:
217233
await memory.close()
218234
```
219235

220-
`create_memory_store()` creates the database/container and configures the hierarchical partition key (`user_id`, `thread_id`), vector index, full-text index, and autoscale throughput.
236+
`create_memory_store()` creates the database and required containers, configures the hierarchical partition key (`user_id`, `thread_id`) for memories and counters, uses `/id` for the lease container, and applies either serverless or autoscale throughput based on `COSMOS_DB_THROUGHPUT_MODE`.
221237

222238
---
223239

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,14 +134,20 @@ memory = CosmosMemoryClient(
134134
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
135135
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
136136
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
137+
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
138+
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
139+
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
140+
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
137141
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
138142
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
139143
adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
140144
adf_key=os.getenv("ADF_KEY", ""),
141145
use_default_credential=True,
142146
cosmos_credential=DefaultAzureCredential(),
143147
)
144-
# Constructor auto-creates the database and container if they don't exist.
148+
# Constructor auto-creates the database and required containers if they don't exist.
149+
# `serverless` is the default throughput mode. Set `COSMOS_DB_THROUGHPUT_MODE=autoscale`
150+
# to provision memories, counter, and lease containers with a shared autoscale RU cap.
145151

146152
# Add directly to Cosmos
147153
thread_id = str(uuid.uuid4())
@@ -187,7 +193,7 @@ summary = memory.get_user_summary(user_id="user-001")
187193
| **Azure OpenAI / AI Foundry** | Embedding model + chat model for summarization / fact extraction |
188194
| **Azure Functions** | Durable Functions orchestrator and activity functions |
189195
190-
Automatic change feed processing stores lightweight counter documents in a dedicated `counter` container and also uses a `leases` container (auto-created). See [concepts.md](Docs/concepts.md#automatic-processing-change-feed) for details.
196+
Automatic change feed processing stores lightweight counter documents in a dedicated `counter` container and also uses a `leases` container that is provisioned by `create_memory_store()`. Throughput defaults to `serverless`; set `COSMOS_DB_THROUGHPUT_MODE=autoscale` to apply the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to the memories, counter, and lease containers. See [concepts.md](Docs/concepts.md#automatic-processing-change-feed) for details.
191197
192198
All services use **Entra ID** auth via `DefaultAzureCredential`.
193199

0 commit comments

Comments
 (0)