This folder provisions everything the Agent Memory Toolkit needs in a single Azure subscription via the Azure Developer CLI (azd). One command deploys the full stack end-to-end.
azd up creates all of the following:
-
Cosmos DB for NoSQL — serverless account with the
ai_memorydatabase and this container topology:Container Purpose memoriesFacts, episodics, and procedural memories; vector + FTS index memories_turnsTurns; minimal index, change-feed binding target memories_summariesThread + user summaries; minimal index, point-read access pattern leasesChange-feed checkpoints counterAtomic counters -
AI Foundry (
Microsoft.CognitiveServices/accountswithkind: AIServices) — withgpt-4o-miniandtext-embedding-3-largedeployments -
User-assigned managed identity (UAMI) — used by the Function app
-
RBAC role assignments — Cosmos DB Built-in Data Reader + Contributor, Cognitive Services OpenAI User, Storage Blob/Queue/Table data roles, granted to both the UAMI and the deploying user (full table in Identity & RBAC)
-
Function app — Flex Consumption (Python 3.11), Storage account, App Insights, Log Analytics
The Function app is always provisioned, even if you plan to use InProcessProcessor only. Flex Consumption is pay-per-execution — at zero traffic the Function app is essentially free (idle cost is the Storage account, ~$0.05/month). The Function app sits idle and unused for in-process workloads.
Advanced escape hatch: set
azd env set DEPLOY_FUNCTION_APP falseand runazd provisionto skip the Function app + its supporting resources entirely. Not recommended unless you have a strong reason. See SDK-only mode below for the full procedure.
Bring-your-own-resources is not supported. If you already have a Cosmos account or AI Foundry account you want to reuse, point the SDK and Function app at them via the standard
COSMOS_DB_ENDPOINT/AI_FOUNDRY_ENDPOINTenvironment variables and skipazd upentirely — you only need the Bicep when you want the toolkit to manage the accounts for you. Wiring BYO accounts into this template proved fragile (cross-RG scoping, account-already-exists races, partial RBAC) for low real-world value.
az(Azure CLI) andazd(Azure Developer CLI) installed- An Azure subscription with quota for
gpt-4o-miniandtext-embedding-3-largein the chosen region (defaulteastus2; allowed:eastus2,swedencentral,westus3,eastus)
az login
azd auth login
azd env new memorytoolkit-dev
azd env set AZURE_LOCATION eastus2 # required — subscription-scoped Bicep needs it
# Optional: pin a different region
# azd env set AZURE_LOCATION swedencentral
azd up
# ~10 min later: provisioned + function code deployedazd writes resource outputs to .azure/<env-name>/.env:
COSMOS_DB_ENDPOINT=...
COSMOS_DB_DATABASE=ai_memory
COSMOS_DB_MEMORIES_CONTAINER=memories
AI_FOUNDRY_ENDPOINT=...
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=gpt-4o-mini
FUNCTION_APP_NAME=func-...
FUNCTION_APP_URL=https://func-....azurewebsites.net
Source it before running samples or tests:
set -a && . ./.azure/memorytoolkit-dev/.env && set +aIf you only ever plan to use the in-process MemoryProcessor and want to keep the Bicep footprint minimal — no Function app, no Storage account, no App Insights, no Log Analytics:
azd env new memorytoolkit-sdkonly
azd env set AZURE_LOCATION eastus2
azd env set DEPLOY_FUNCTION_APP false
azd provision # skips Function app, Storage, App Insights, Log Analytics, storage RBACUse
azd provision(notazd up) for SDK-only.azd upalways invokesazd deploy --all, which looks for a resource taggedazd-service-name: function_app. With the Function app turned off there is no such resource and the deploy step fails.azd provisionruns Bicep without the deploy step.
When DEPLOY_FUNCTION_APP=false, the Bicep template automatically sets MEMORY_PROCESSOR_OWNER=inprocess in the generated .env so the in-process processor's auto-trigger fires. If you previously deployed with DEPLOY_FUNCTION_APP=true and are switching to SDK-only mode, also clear the override explicitly:
azd env set MEMORY_PROCESSOR_OWNER inprocessTwo concepts kept separate:
| Concept | What it is | Default |
|---|---|---|
Model name (*_MODEL_NAME) |
The catalog model published by Azure OpenAI (e.g. gpt-4o-mini, text-embedding-3-large). |
gpt-4o-mini / text-embedding-3-large |
Deployment name (*_DEPLOYMENT_NAME) |
The name you give the deployment in your AOAI account. Can be anything. | empty → defaults to model name |
Override either before azd up:
# Use a different catalog model with the default deployment name
azd env set AI_FOUNDRY_CHAT_MODEL_NAME gpt-4o
# Or pin a custom deployment name (existing or to-be-created)
azd env set AI_FOUNDRY_CHAT_DEPLOYMENT_NAME my-prod-chat
azd env set AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME my-prod-embedThe *_DEPLOYMENT_NAME value is what the SDK and Function app pass as the model= argument to the Azure OpenAI client at runtime.
The Function app uses a counter document per (user_id, thread_id) to decide when to fire each orchestrator. Every knob is a Bicep parameter bound to an azd env variable — azd up re-renders them on every deploy, so changing a value is a one-line azd env set ... followed by azd up.
azd env variable |
Bicep param | Default | Effect |
|---|---|---|---|
THREAD_SUMMARY_EVERY_N |
threadSummaryEveryN |
10 |
Run thread-summary orchestration every N turns within a (user_id, thread_id). 0 disables it. |
FACT_EXTRACTION_EVERY_N |
factExtractionEveryN |
1 |
Run fact / episodic / procedural extraction every N turns within a (user_id, thread_id). 0 disables it. |
DEDUP_EVERY_N |
dedupEveryN |
5 |
Run fact dedup every Nth fact-extraction (so dedup actually fires every FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N turns). |
USER_SUMMARY_EVERY_N |
userSummaryEveryN |
20 |
Run user-summary orchestration every N turns from a given user_id across all threads. 0 disables it. |
MAX_BATCH_SIZE |
maxBatchSize |
20 |
Maximum number of change-feed items processed per orchestration batch. |
MEMORY_PROCESSOR_OWNER |
memoryProcessorOwner |
durable |
Backend that owns processing. durable = this function-app fleet owns it; SDK clients pointed at the same container will skip auto-triggering. inprocess = SDK owns processing instead. |
The Function-app defaults match the SDK in-process defaults so behavior is consistent across backends. The Function-app fleet pays real per-turn LLM cost on every orchestrator invocation; bump FACT_EXTRACTION_EVERY_N / DEDUP_EVERY_N for cost-sensitive production traffic.
Set any value to 0 to disable auto-triggering for that orchestrator. Update at runtime with:
azd env set THREAD_SUMMARY_EVERY_N 8
azd up # re-runs provisioning and pushes new App Settingsazd deploy (code-only) is also supported but will not pick up changes to threshold values because they live in Bicep; always use azd up after threshold changes.
Every data-plane role is granted at the resource (account) scope — never at subscription or RG level. principalType is set explicitly on every standard Microsoft.Authorization/roleAssignments so first-deploy from a freshly-created service principal succeeds without the usual "PrincipalNotFound" RBAC race.
| Resource | Built-in role | Granted to | Why |
|---|---|---|---|
| Cosmos DB account | 00000000-0000-0000-0000-000000000001 — Cosmos DB Built-in Data Reader |
UAMI + deploying user | Explicit read-only scope. Granted alongside Data Contributor so downstream consumers (audit dashboards, analytics jobs) can run as the same identity but be validated by security review as needing only the Reader scope. Cosmos uses its own sqlRoleAssignments resource type which does not accept principalType (Cosmos enforces it internally via principalId). |
| Cosmos DB account | 00000000-0000-0000-0000-000000000002 — Cosmos DB Built-in Data Contributor |
UAMI + deploying user | Data-plane reads/writes from Function app + local samples. |
| AI Foundry account | 5e0bd9bd-7b93-4f28-af87-19fc36ad61bd — Cognitive Services OpenAI User |
UAMI + deploying user | Inference (chat + embeddings) from Function app + local samples. |
| Storage account | b7e6dc6d-f1e8-4753-8033-0f276bb0955b — Storage Blob Data Owner |
UAMI + deploying user | Function-app deployment-from-blob, Durable history blobs, local sample blob inspection. Owner (not Contributor) keeps azd deploy symmetric with manual ops scripts that may need lease/ACL operations. |
| Storage account | 974c5e8b-45b9-4653-ba55-5f855dd0fb88 — Storage Queue Data Contributor |
UAMI only | Durable Functions task hub (default Azure Storage provider) uses Queues for orchestration messages. Without this, the very first orchestration start returns 403 even though Blob is fine. |
| Storage account | 0a9a7e1f-b9d0-4cc4-a60d-0319b160aaa3 — Storage Table Data Contributor |
UAMI only | Durable Functions history Tables. Same 403 symptom as queues if missing. |
All Storage roles are skipped when DEPLOY_FUNCTION_APP=false (no Storage account = nothing to grant on).
For a same-name re-deploy, the simplest path is azd down --purge followed by azd up. --purge instructs azd to skip the Cosmos / AI Foundry soft-delete window so the names are free to reuse immediately.
azd down --purgeGenerate a GitHub Actions or Azure Pipelines pipeline for the same flow:
azd pipeline config| Gotcha | Mitigation |
|---|---|
| First-time provisioning is slow (8–15 min for Cosmos + AI Foundry + Function app) | azd up shows progress; just wait |
location property missing — subscription-scoped Bicep requires AZURE_LOCATION |
Always run azd env set AZURE_LOCATION eastus2 after azd env new |
| AI Foundry region constraints — many regions don't have all features / models | Default AZURE_LOCATION=eastus2; supported: eastus2, swedencentral, westus3, eastus |
| Model deployment quota — fails if the subscription has zero quota for the model in the chosen region | Request quota or change region; error from Azure points to the right doc |
| Cosmos free-tier limit (one per subscription) | Default is serverless — no idle cost, no free-tier conflict |
| AAD propagation — RBAC takes 30–90s; the Function app may briefly 403 on its first invocation after deploy | Retry after a minute. dependsOn chains in Bicep ensure roles exist before the Function app starts |
| Resource naming rules — Storage ≤24 chars lowercase, AI Foundry has its own | Naming uses take(uniqueString(...), 13) and toLower() to satisfy all rules |
The Bicep uses a single Microsoft.CognitiveServices/accounts resource with kind: AIServices (named aif-<token>) instead of the full AI Foundry hub + project + ML workspace. The AIServices account exposes the same Azure OpenAI endpoint and supports the same Cognitive Services OpenAI User RBAC role, which is everything the toolkit needs for embeddings and chat completions. This avoids the extra Storage / Key Vault / App Insights / ML workspace resources a hub-style deployment would create. This is the pattern used by most current azd-based Microsoft samples (e.g. azure-search-openai-demo, openai-chat-app-quickstart).
infra/
├── main.bicep # subscription-scoped entry point
├── main.parameters.json # binds ${AZURE_*} env vars to Bicep params
├── abbreviations.json # standard Azure name prefixes
└── modules/
├── identity.bicep # User-assigned managed identity
├── cosmos.bicep # Cosmos DB NoSQL serverless account + database + 5 containers
├── cosmos-rbac.bicep # Cosmos data-plane role assignments
├── ai-foundry.bicep # Cognitive Services AIServices account + chat + embedding deployments
├── ai-foundry-rbac.bicep # Cognitive Services OpenAI User role assignments
├── functions.bicep # Flex Consumption Python 3.11 function app + Storage + App Insights + Log Analytics
└── storage-rbac.bicep # Blob/Queue/Table role assignments for the function-app UAMI
Runtime ops (TTL tuning, monitoring, scaling beyond defaults) live in
Docs/operations.md. This README owns everything related toazd up/ Bicep / deployment.