This guide covers the minimum Azure resources, deployment steps, throughput settings, and validation order for running the toolkit in Azure.
| Service | Purpose |
|---|---|
| Azure Cosmos DB for NoSQL | Persistent memory store with vector and full-text indexes |
| Azure OpenAI / AI Services | Embeddings and chat generation |
| Azure Functions | Durable Functions orchestrator and activities |
| Azure Storage Account | Required by Azure Functions |
| Application Insights | Recommended for monitoring |
You need:
- an Azure subscription
az login- Python 3.11+
- Azure Functions Core Tools v4
- dependencies installed:
pip install -e ".[dev]"
pip install -r function_app/requirements.txtThe recommended way to provision all required Azure resources is
azd upfrom the repo root, which uses the Bicep templates underinfra/. Seeinfra/README.mdfor details. The manualaz ...commands below are kept as a reference for operators who can't useazd.
Create, or reuse, the following:
- resource group
- storage account
- Function App
- Cosmos DB for NoSQL account
- Azure OpenAI resource with:
- one embedding model
- one chat model
Examples:
az group create --name <resource-group> --location <location>
az storage account create \
--name <storage-account-name> \
--resource-group <resource-group> \
--location <location> \
--sku Standard_LRS
az functionapp create \
--name <function-app-name> \
--resource-group <resource-group> \
--storage-account <storage-account-name> \
--consumption-plan-location <location> \
--runtime python \
--runtime-version 3.11 \
--functions-version 4 \
--os-type Linux
az cosmosdb create \
--name <cosmos-account-name> \
--resource-group <resource-group>The toolkit can create the database and required containers later via create_memory_store().
Grant these roles:
- Cosmos DB Built-in Data Contributor on the Cosmos account
- Cognitive Services OpenAI User on the AI resource
Enable managed identity on the Function App and use that principal for production role assignments:
az functionapp identity assign \
--name <function-app-name> \
--resource-group <resource-group>Set the runtime settings:
az functionapp config appsettings set \
--name <function-app-name> \
--resource-group <resource-group> \
--settings \
COSMOS_DB_ENDPOINT="https://<cosmos-account-name>.documents.azure.com:443/" \
COSMOS_DB_DATABASE="ai_memory" \
COSMOS_DB_MEMORIES_CONTAINER="memories" \
COSMOS_DB_COUNTERS_CONTAINER="counter" \
COSMOS_DB_LEASE_CONTAINER="leases" \
COSMOS_DB_THROUGHPUT_MODE="serverless" \
COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
AI_FOUNDRY_ENDPOINT="https://<openai-account-name>.openai.azure.com/" \
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME="text-embedding-3-large" \
AI_FOUNDRY_EMBEDDING_DIMENSIONS="1536" \
AI_FOUNDRY_CHAT_DEPLOYMENT_NAME="gpt-5-mini" \
THREAD_SUMMARY_EVERY_N="10" \
FACT_EXTRACTION_EVERY_N="1" \
USER_SUMMARY_EVERY_N="20" \
MEMORY_PROCESSOR_OWNER="durable"COSMOS_DB_THROUGHPUT_MODE=serverless is the default and creates the memories, memories_turns, memories_summaries, counter, and leases containers without specifying RU/s. Set COSMOS_DB_THROUGHPUT_MODE=autoscale to apply the shared COSMOS_DB_AUTOSCALE_MAX_RU cap to all required containers.
MEMORY_PROCESSOR_OWNER=durable tells the SDK that the deployed Function App owns processing, so any CosmosMemoryClient pointed at the same container will skip its in-process auto-trigger and avoid double-extraction. See the README's processor-ownership table for details.
To enable automatic processing via the change feed trigger, add these settings:
az functionapp config appsettings set \
--name <function-app-name> \
--resource-group <resource-group> \
--settings \
COSMOS_DB__accountEndpoint="https://<cosmos-account-name>.documents.azure.com:443/" \
COSMOS_DB_COUNTERS_CONTAINER="counter" \
COSMOS_DB_LEASE_CONTAINER="leases" \
COSMOS_DB_THROUGHPUT_MODE="serverless" \
COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
THREAD_SUMMARY_EVERY_N="5" \
FACT_EXTRACTION_EVERY_N="3" \
USER_SUMMARY_EVERY_N="10"Set any threshold to "0" to disable that processing type.
The leases container is provisioned by create_memory_store() alongside the memories and counter containers, so the Function App should be configured to use that existing lease container.
The Function App authenticates to Cosmos DB and Azure OpenAI via its managed identity — there's no shared key or function-key handoff between the SDK and the Function App.
The recommended path is azd up (which builds and deploys the function_app/ service automatically). For manual deployment:
cd function_app
func azure functionapp publish <function-app-name>Verify deployment:
az functionapp function list \
--name <function-app-name> \
--resource-group <resource-group> \
-o tableUpdate .env to point at Azure instead of localhost:
COSMOS_DB_ENDPOINT=https://<cosmos-account-name>.documents.azure.com:443/
COSMOS_DB_DATABASE=ai_memory
COSMOS_DB_MEMORIES_CONTAINER=memories
COSMOS_DB_COUNTERS_CONTAINER=counter
COSMOS_DB_LEASE_CONTAINER=leases
COSMOS_DB_THROUGHPUT_MODE=serverless
COSMOS_DB_AUTOSCALE_MAX_RU=1000
AI_FOUNDRY_ENDPOINT=https://<openai-account-name>.openai.azure.com/
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_EMBEDDING_DIMENSIONS=1536
AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=gpt-5-mini
# Tells the SDK that the deployed Function App owns auto-processing,
# so this client skips its in-process auto-trigger.
MEMORY_PROCESSOR_OWNER=durableRun once if the database and container do not already exist:
import os
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.cosmos.agent_memory import CosmosMemoryClient
load_dotenv()
memory = CosmosMemoryClient(
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
cosmos_database=os.getenv("COSMOS_DB_DATABASE", "ai_memory"),
cosmos_container=os.getenv("COSMOS_DB_MEMORIES_CONTAINER", "memories"),
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
embedding_deployment_name=os.getenv("AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-3-large"),
chat_deployment_name=os.getenv("AI_FOUNDRY_CHAT_DEPLOYMENT_NAME", "gpt-5-mini"),
use_default_credential=True,
cosmos_credential=DefaultAzureCredential(),
)
memory.create_memory_store()
memory.connect_cosmos()import os
from dotenv import load_dotenv
from azure.identity.aio import DefaultAzureCredential as AsyncDefaultAzureCredential
from azure.cosmos.agent_memory.aio import AsyncCosmosMemoryClient
load_dotenv()
memory = AsyncCosmosMemoryClient(
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
cosmos_database=os.getenv("COSMOS_DB_DATABASE", "ai_memory"),
cosmos_container=os.getenv("COSMOS_DB_MEMORIES_CONTAINER", "memories"),
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
embedding_deployment_name=os.getenv("AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-3-large"),
chat_deployment_name=os.getenv("AI_FOUNDRY_CHAT_DEPLOYMENT_NAME", "gpt-5-mini"),
use_default_credential=True,
cosmos_credential=AsyncDefaultAzureCredential(),
)
await memory.connect_cosmos()
await memory.create_memory_store()This provisions the memories, memories_turns, memories_summaries, counter, and leases containers. serverless is the default throughput mode; if you set COSMOS_DB_THROUGHPUT_MODE=autoscale, the shared COSMOS_DB_AUTOSCALE_MAX_RU value is applied to all five containers.
Bring the environment up in this order:
az login- verify Cosmos DB RBAC
- verify Azure OpenAI RBAC
- create Cosmos resources with
create_memory_store() - test
add_cosmos()/push_to_cosmos()/get_memories() - test
get_memories(user_id=..., thread_id=...)filtering - test
search_cosmos() - deploy the Function App (e.g., via
azd up) so the change-feed processor is running - write a few turns and verify a thread
summarymemory appears - write more turns and verify
fact,procedural, andepisodicmemories appear - verify a per-user
user_summarymemory appears onceUSER_SUMMARY_EVERY_Nturns have accumulated for that user - test deduplication by writing two near-duplicate facts and confirming the dedup orchestrator merges them
This keeps failures isolated and easier to diagnose.
memory.add_cosmos(user_id="user-1", role="user", content="Hello from Azure")
print(memory.get_memories(user_id="user-1"))print(memory.search_cosmos("hello", user_id="user-1"))Processing is no longer invoked directly from the SDK — write turns with add_cosmos() / push_to_cosmos() and the deployed Function App's change-feed trigger fires the extract_memories, thread_summary, and user_summary orchestrators per the configured thresholds.
# Write enough turns to cross THREAD_SUMMARY_EVERY_N (default 10).
for i in range(10):
memory.add_cosmos(
user_id="user-1",
thread_id="thread-1",
role="user",
content=f"Turn {i+1}",
)
# Wait for the change-feed processor to catch up, then read derived memories.
import time; time.sleep(15)
print(memory.get_thread_summary(user_id="user-1", thread_id="thread-1"))
print(memory.get_memories(user_id="user-1", memory_types=["fact"]))
print(memory.get_user_summary(user_id="user-1"))If you configured the change feed settings, verify automatic processing:
import uuid
# Use a threshold of 3 (THREAD_SUMMARY_EVERY_N=3) for testing
thread_id = str(uuid.uuid4())
for i in range(3):
memory.add_cosmos(
user_id="user-1",
thread_id=thread_id,
role="user",
content=f"Turn {i+1} for change feed validation",
)
# Wait a few seconds for the change feed to trigger, then check:
import time
time.sleep(10)
results = memory.get_thread_summary(user_id="user-1", thread_id=thread_id)
print(results) # Should contain an auto-generated summaryCheck the Function App logs to confirm the on_memory_change trigger fired and the orchestrator completed.
print(memory.get_thread_summary(user_id="user-1", thread_id="thread-1"))
print(memory.get_memories(user_id="user-1", memory_types=["fact"]))
print(memory.get_user_summary(user_id="user-1"))Tail Function App logs:
az functionapp log tail \
--name <function-app-name> \
--resource-group <resource-group>Common issues:
| Symptom | Likely Cause |
|---|---|
| 401 / 403 from Cosmos DB | Missing Cosmos DB RBAC |
| 401 / 403 from Azure OpenAI | Missing OpenAI RBAC |
| Durable Function starts but fails | Missing app settings or downstream RBAC |
No memories found |
No turn memories exist, or all candidate turns predate the existing summary |
| Search is slow | Embedding latency, index choice, or region mismatch |
| Change feed trigger not firing | Verify COSMOS_DB__accountEndpoint is set and the function can write to the configured COSMOS_DB_COUNTERS_CONTAINER container |
| Auto-processing not starting | Check threshold settings are > 0 in Function App configuration |
Recommended checks:
- enable Application Insights
- confirm Function App managed identity roles
- confirm
MEMORY_PROCESSOR_OWNER=durableis set on any client pointed at a container that the Function App is also processing - confirm model deployment names are correct