From ada618f4f6131407aa2129328bcf89e758a1b78b Mon Sep 17 00:00:00 2001 From: Mahya Gheini <46275333+m-gheini@users.noreply.github.com> Date: Thu, 18 Jun 2026 10:06:46 -0700 Subject: [PATCH] Add deployment preflight check, cleanup, and shared private link setup for search service scripts. (#384) * Added Cleanup and pre-check scripts * Shared private link for Azure AI search service Add shared private link on azure AI search service. Useful when Azure AI Search service needs to reach out Foundry resource for models (for vectorizer and agentic RAG) and Foundry resource is in a vnet * Moved shared private link to common deployment tools * Added README for preflight-check * Updated scripts and added READMEs * Updated error logging for cleanup * Early exit in case of error * Increased caphost deletion time-out * Updated cleanup script after test on hosted agent cleanups * Updated preflight checks per template updates. * Added diagnostic script for post deployment validation * Address Comments * Made SAL wait scoped on account * Updates to diagnostic and preflight scripts --------- Co-authored-by: Pratyush Mishra <8485494+mishrapratyush@users.noreply.github.com> --- .gitignore | 1 + .../README.md | 6 + .../18-managed-virtual-network/README.md | 4 + .../19-private-network-agent-tools/README.md | 6 + .../deployment-tools/cleanup/README.md | 110 ++ .../deployment-tools/cleanup/cleanup.ps1 | 530 +++++++ .../deployment-tools/diagnostic/README.md | 96 ++ .../diagnostic/diagnostic-check.ps1 | 1371 +++++++++++++++++ .../diagnostic/diagnostic.config.sample | 9 + .../deployment-tools/networking/README.md | 156 ++ .../ai-search-shared-private-link.bicep | 121 ++ .../ai-search-shared-private-link.bicepparam | 14 + .../deployment-tools/preflight/README.md | 157 ++ .../preflight/preflight-check.ps1 | 705 +++++++++ .../preflight/preflight.config.sample | 40 + 15 files changed, 3326 insertions(+) create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/README.md create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/cleanup.ps1 create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/README.md create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic-check.ps1 create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic.config.sample create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/networking/README.md create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicep create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicepparam create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/README.md create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight-check.ps1 create mode 100644 infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight.config.sample diff --git a/.gitignore b/.gitignore index 6e2061f54..037d1c906 100644 --- a/.gitignore +++ b/.gitignore @@ -14,6 +14,7 @@ *.sln.docstates *.env venv/ +preflight.config # User-specific files (MonoDevelop/Xamarin Studio) *.userprefs diff --git a/infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup/README.md b/infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup/README.md index 9a2cc8fda..484feef21 100644 --- a/infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup/README.md +++ b/infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup/README.md @@ -87,6 +87,8 @@ Use the table below to choose the right infrastructure template for your scenari * If no parameters are passed in, this template creates an Microsoft Foundry resource, Foundry project, Azure Cosmos DB for NoSQL, Azure AI Search, and Azure Storage account 1. Azure CLI installed and configured on your local workstation or deployment pipeline server +> **💡 Recommended**: Run the [preflight check](../deployment-tools/preflight/README.md) before deploying to catch common misconfigurations (provider registration, subnet conflicts, soft-deleted accounts) before they surface as cryptic ARM errors mid-deploy. + --- ## Pre-Deployment Steps @@ -223,6 +225,8 @@ To use an existing Azure AI Search resource, set aiSearchServiceResourceId param > --aad-auth-failure-mode http401WithBearerChallenge > ``` +> **AI Search → AI Services connectivity**: This template configures AI Services with `networkAcls.bypass: AzureServices`, which allows Azure AI Search to reach AI Services through the trusted-services bypass. This works for most scenarios. If your security policy requires removing the bypass (setting it to `None`), deploy [Shared Private Links](../deployment-tools/networking/README.md) from AI Search to AI Services instead — this creates a private endpoint from AI Search's managed infrastructure directly into AI Services via Private Link. + 4. **Use an existing Azure Storage account** @@ -283,6 +287,8 @@ az group delete --name --yes --no-wait > **Important**: If you need to reuse the same subnet, follow the [Account Deletion Prerequisites and Cleanup Guidance](#account-deletion-prerequisites-and-cleanup-guidance) to properly purge the account and wait for the capability host to fully unlink (~20 minutes). +> **💡 Tip**: For VNet-injection deployments, use the [cleanup tool](../deployment-tools/cleanup/README.md) it handles the required deletion order (project caphost → account caphost → purge → SAL wait) automatically. + --- ## Network Secured Agent Project Architecture Deep Dive diff --git a/infrastructure/infrastructure-setup-bicep/18-managed-virtual-network/README.md b/infrastructure/infrastructure-setup-bicep/18-managed-virtual-network/README.md index 603dcf9a5..9ba19c6fd 100644 --- a/infrastructure/infrastructure-setup-bicep/18-managed-virtual-network/README.md +++ b/infrastructure/infrastructure-setup-bicep/18-managed-virtual-network/README.md @@ -120,6 +120,8 @@ Use the table below to choose the right infrastructure template for your scenari 1. Azure CLI installed and configured on your local workstation or deployment pipeline server. Azure CLI support is required to run the 'az rest' commands to update your managed virtual network. +> **💡 Recommended**: Run the [preflight check](../deployment-tools/preflight/README.md) before deploying to catch common misconfigurations (provider registration, soft-deleted accounts, BYO resource issues) before they surface as cryptic ARM errors mid-deploy. + 1. **Register Resource Providers** Make sure you have an active Azure subscription that allows registering resource providers. If it's not already registered, run the commands below: @@ -294,6 +296,8 @@ az group delete --name --yes --no-wait > **Important**: Follow the [Account Deletion Prerequisites and Cleanup Guidance](#account-deletion-prerequisites-and-cleanup-guidance) to properly purge the account and wait for the capability host to fully unlink (~20 minutes). +> **💡 Tip**: Use the [cleanup tool](../deployment-tools/cleanup/README.md) it handles the required deletion order (project caphost → account caphost → purge → SAL wait) automatically. + --- ## Post-Deployment Steps (Critical for Hosted Agents) diff --git a/infrastructure/infrastructure-setup-bicep/19-private-network-agent-tools/README.md b/infrastructure/infrastructure-setup-bicep/19-private-network-agent-tools/README.md index 5a7ea9c7d..f932214e9 100644 --- a/infrastructure/infrastructure-setup-bicep/19-private-network-agent-tools/README.md +++ b/infrastructure/infrastructure-setup-bicep/19-private-network-agent-tools/README.md @@ -104,6 +104,8 @@ Use the table below to choose the right infrastructure template for your scenari * If no parameters are passed in, this template creates an Microsoft Foundry resource, Foundry project, Azure Cosmos DB for NoSQL, Azure AI Search, and Azure Storage account 1. Azure CLI installed and configured on your local workstation or deployment pipeline server +> **💡 Recommended**: Run the [preflight check](../deployment-tools/preflight/README.md) before deploying to catch common misconfigurations (provider registration, subnet conflicts, soft-deleted accounts) before they surface as cryptic ARM errors mid-deploy. + --- ## Pre-Deployment Steps @@ -275,6 +277,8 @@ To use an existing Cosmos DB for NoSQL resource, set `existingAzureCosmosDBAccou To use an existing Azure AI Search resource, set `existingAiSearchResourceId` to the full ARM ID of the target search service. - `param existingAiSearchResourceId = '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Search/searchServices/{searchServiceName}'` +> **AI Search → AI Services connectivity**: This template configures AI Services with `networkAcls.bypass: AzureServices`, which allows Azure AI Search to reach AI Services through the trusted-services bypass. This works for most scenarios. If your security policy requires removing the bypass (setting it to `None`), deploy [Shared Private Links](../deployment-tools/networking/README.md) from AI Search to AI Services instead — this creates a private endpoint from AI Search's managed infrastructure directly into AI Services via Private Link. + 4. **Use an existing Azure Storage account** @@ -369,6 +373,8 @@ az group delete --name --yes --no-wait > **Important**: If you need to reuse the same subnet, follow the [Account Deletion Prerequisites and Cleanup Guidance](#account-deletion-prerequisites-and-cleanup-guidance) to properly purge the account and wait for the capability host to fully unlink (~20 minutes). +> **💡 Tip**: For VNet-injection deployments, use the [cleanup tool](../deployment-tools/cleanup/README.md) it handles the required deletion order (project caphost → account caphost → purge → SAL wait) automatically. + --- ## Network Secured Agent Project Architecture Deep Dive diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/README.md b/infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/README.md new file mode 100644 index 000000000..03447cc62 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/README.md @@ -0,0 +1,110 @@ +# Foundry Private Network Cleanup + +`cleanup.ps1` safely tears down Foundry deployments with VNet injection. It handles the specific deletion order required to avoid stuck resources, orphaned locks, and subnet conflicts that make manual cleanup painful. + +## Why This Script Exists + +Deleting a Foundry private network deployment is **not** as simple as `az group delete`. The deployment creates capability hosts with service association links (SALs) on VNet subnets. If you delete the resource group directly: + +- **Capability hosts must be deleted in order** — project-level first, then account-level. Deleting in the wrong order can leave the account in a failed state. +- **SALs block subnet reuse** — subnets with active SALs cannot be re-delegated or deleted. SAL cleanup happens asynchronously after caphost deletion and can take up to 24 hours. +- **Soft-deleted accounts block redeployment** — Cognitive Services accounts are soft-deleted for 48 hours. A new deployment with the same name will fail unless the old account is purged. + +This script handles all of this automatically: discovers resources, deletes in the correct order, waits for SAL cleanup, and purges soft-deleted accounts. + +## Important: Use the VNet Resource Group + +The `--ResourceGroup` parameter must point to the resource group containing the **AI Foundry account, project, and VNet** — not the resource group with dependent resources (Search, Cosmos, Storage). + +If your deployment uses multiple resource groups, the cleanup script only needs the one with the AI account and VNet. Dependent resources in other resource groups can be deleted with a simple `az group delete`. + +## Prerequisites + +- [PowerShell 7+](https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell) (cross-platform) +- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) logged in with access to the target subscription +- Active subscription must be set: `az account set --subscription ` + +## Usage + +```bash +cd infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup +``` + +**Always start with a dry run** to see what would be deleted before making changes: + +```powershell +.\cleanup.ps1 -SubscriptionId "" -ResourceGroup "" -DryRun +``` + +> [!IMPORTANT] +> Always run `-DryRun` first and review the discovered accounts/projects/caphosts before running cleanup without `-DryRun`. + +When you're satisfied with the discovery output, run without `-DryRun`. The script will prompt for confirmation before deleting anything. + +```powershell +.\cleanup.ps1 -SubscriptionId "" -ResourceGroup "" +``` + +## Parameters + +| Parameter | Required | Description | +|---|---|---| +| `-SubscriptionId` | Yes | Azure subscription ID | +| `-ResourceGroup` | Yes | Resource group containing the AI Foundry account, project, and VNet | +| `-AccountName` | No | Limit cleanup to a specific AI Services account. When omitted, all AIServices accounts in the RG are discovered and cleaned up. | +| `-DryRun` | No | Show what would be cleaned up without taking any action | +| `-SkipSalWait` | No | Skip waiting for SAL removal (faster but risky — subnet may not be reusable immediately) | +| `-DeleteRG` | No | Delete the resource group after cleanup. Not allowed with `-AccountName` (account-scoped cleanup must not delete the whole RG). | + +When `-AccountName` is provided, active cleanup remains scoped to that account, while soft-deleted account purge remains RG-wide residue cleanup. + +## What It Does + +### Step 0: Discovery + +Auto-discovers all resources in the resource group — no need to know account or project names: + +- AI Foundry accounts (kind: `AIServices`) +- Projects under each account +- Capability hosts (project-level and account-level) +- VNet subnets with active service association links + +After discovery, a summary is printed and you are prompted to confirm before proceeding (unless `-DryRun` is set). + +### Step 1: Delete Project Capability Hosts + +Deletes all project-level capability hosts first. This is required before account-level caphosts can be removed. + +### Step 2: Delete Account Capability Hosts + +Deletes account-level capability hosts. Handles async deletion with polling (up to 30 min timeout). + +### Step 3: Delete Projects and Purge AI Accounts + +Deletes all projects under each account first (accounts cannot be deleted while nested projects exist), then deletes and purges each AI Services account to prevent soft-delete name collisions on redeployment. Also checks for and purges any previously soft-deleted accounts in the resource group. + +### Step 4: Wait for SAL Cleanup + +Waits for service association links to be removed from subnets (up to 20 min). SAL removal happens asynchronously after caphost deletion. If SALs are still present after 20 minutes, the script warns you to check again later — backend cleanup can take up to 24 hours. + +SAL waiting runs only for caphost-linked subnets discovered during cleanup. If none are discovered, SAL waiting is skipped with a warning. + +### Step 5: Resource Group (optional) + +If `-DeleteRG` is specified, initiates an async deletion of the resource group. Otherwise, prints the command for manual deletion. + +## Examples + +```powershell +# Dry run — see what would be cleaned up +.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -DryRun + +# Clean up a specific account only +.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -AccountName "my-ai-account" + +# Full cleanup including resource group deletion +.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -DeleteRG + +# Fast cleanup (skip SAL wait — subnet may not be immediately reusable) +.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -SkipSalWait +``` diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/cleanup.ps1 b/infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/cleanup.ps1 new file mode 100644 index 000000000..483ee78c3 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup/cleanup.ps1 @@ -0,0 +1,530 @@ +<# +.SYNOPSIS + Foundry Private Network Cleanup Script + +.DESCRIPTION + Safely tears down Foundry deployments with VNet injection. + Auto-discovers all resources — no need to know account/project names. + +.PARAMETER SubscriptionId + Azure subscription ID + +.PARAMETER ResourceGroup + Resource group containing the AI Foundry account, project, and VNet + +.PARAMETER AccountName + Optional. Limit cleanup to a specific AI Services account. + When omitted, all AIServices accounts in the RG are discovered. + +.PARAMETER DryRun + Show what would be cleaned up without taking action + +.PARAMETER SkipSalWait + Don't wait for serviceAssociationLink removal (faster but risky) + +.PARAMETER DeleteRG + Delete the resource group after cleanup + +.EXAMPLE + .\cleanup.ps1 -SubscriptionId "xxx" -ResourceGroup "my-rg" -DryRun + .\cleanup.ps1 -SubscriptionId "xxx" -ResourceGroup "my-rg" -AccountName "my-account" + .\cleanup.ps1 -SubscriptionId "xxx" -ResourceGroup "my-rg" -DeleteRG +#> + +param( + [Parameter(Mandatory)][string]$SubscriptionId, + [Parameter(Mandatory)][string]$ResourceGroup, + [string]$AccountName, + [switch]$DryRun, + [switch]$SkipSalWait, + [switch]$DeleteRG +) + +if ($AccountName -and $DeleteRG) { + Write-Host "[FAIL] -DeleteRG cannot be used with -AccountName. Account-scoped cleanup must not delete the whole resource group." -ForegroundColor Red + exit 1 +} + +$ErrorActionPreference = "Continue" +# ARM API version for CognitiveServices capabilityHosts — update when this API reaches GA +$ApiVersion = "2025-04-01-preview" +$script:Errors = 0 + +# ---- Logging ---- +function Log { param([string]$Msg) Write-Host "[INFO] $Msg" -ForegroundColor Cyan } +function Pass { param([string]$Msg) Write-Host "[DONE] $Msg" -ForegroundColor Green } +function Warn { param([string]$Msg) Write-Host "[WARN] $Msg" -ForegroundColor Yellow } +function Fail { param([string]$Msg) Write-Host "[FAIL] $Msg" -ForegroundColor Red; $script:Errors++ } +function Dry { param([string]$Msg) Write-Host "[DRY-RUN] Would: $Msg" -ForegroundColor Yellow } + +function Get-AzToken { + az account get-access-token --query accessToken -o tsv 2>$null +} + +# ---- Caphost deletion with full error handling ---- +function Remove-CapabilityHost { + param( + [string]$ResourcePath, # e.g. "accounts/myaccount" or "accounts/myaccount/projects/myproject" + [string]$CaphostName, + [string]$DisplayName + ) + + $apiUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/$ResourcePath/capabilityHosts/${CaphostName}?api-version=$ApiVersion" + + if ($DryRun) { + Dry "Delete caphost: $DisplayName ($CaphostName)" + return $true + } + + Log "Deleting $DisplayName capability host: $CaphostName" + $token = Get-AzToken + + try { + $headers = @{ Authorization = "Bearer $token"; "Content-Type" = "application/json" } + $response = Invoke-WebRequest -Uri $apiUrl -Method Delete -Headers $headers -ErrorAction Stop + + if ($response.StatusCode -eq 200) { + Pass "$DisplayName caphost deleted (synchronous)" + return $true + } + + if ($response.StatusCode -eq 202) { + # Async — extract operation URL and poll + $operationUrl = $response.Headers["Azure-AsyncOperation"] + if (-not $operationUrl) { + $operationUrl = $response.Headers["azure-asyncoperation"] + } + if (-not $operationUrl) { + Warn "No async operation URL returned. Assuming deletion in progress." + return $true + } + # Handle array header values + if ($operationUrl -is [array]) { $operationUrl = $operationUrl[0] } + + Log "Polling deletion status..." + $status = "Deleting" + $pollCount = 0 + $maxPolls = 60 # 30 min max (account caphosts can take 15-20 min) + + while ($status -eq "Deleting" -or $status -eq "InProgress" -or $status -eq "Creating" -or $status -eq "Running") { + Start-Sleep -Seconds 30 + $pollCount++ + if ($pollCount -ge $maxPolls) { + Fail "Timeout polling caphost deletion after 30 minutes" + return $false + } + $token = Get-AzToken + $pollHeaders = @{ Authorization = "Bearer $token" } + try { + $pollResponse = Invoke-RestMethod -Uri $operationUrl -Headers $pollHeaders -ErrorAction Stop + if ($pollResponse.error.code -eq "TransientError") { + Warn "Transient error, retrying..." + continue + } + $status = $pollResponse.status + Log " Status: $status ($pollCount/$maxPolls)" + } catch { + Warn "Poll error: $($_.Exception.Message). Retrying..." + } + } + + if ($status -eq "Succeeded") { + Pass "$DisplayName caphost deleted successfully" + return $true + } elseif ($status -eq "Failed" -or $status -eq "Canceled") { + Fail "$DisplayName caphost deletion failed with status: $status" + return $false + } else { + Warn "$DisplayName caphost deletion returned status: $status. The DELETE was accepted — backend cleanup may still be in progress." + Warn "The SAL wait step will verify whether cleanup completed." + return $true + } + } + } catch { + $statusCode = $_.Exception.Response.StatusCode.value__ + $errBody = if ($_.ErrorDetails.Message) { $_.ErrorDetails.Message } else { $_.Exception.Message } + switch ($statusCode) { + 404 { + Pass "$DisplayName caphost not found (already deleted)" + return $true + } + 409 { + Fail "$DisplayName caphost returned 409 Conflict. It may be in a failed state." + Fail "Error: $errBody" + return $false + } + default { + Fail "$DisplayName caphost deletion failed (HTTP $statusCode)" + Fail "Error: $errBody" + return $false + } + } + } +} + +# ---- SAL wait ---- +function Wait-ForSalCleanup { + param( + [string]$VnetRg, + [string]$VnetName, + [string]$SubnetName + ) + + if ($DryRun) { + Dry "Wait for SAL removal on $VnetName/$SubnetName" + return $true + } + + if ($SkipSalWait) { + Warn "Skipping SAL wait for $SubnetName (--SkipSalWait)" + return $true + } + + $maxWait = 1200 # 20 min + $elapsed = 0 + + Log "Waiting for serviceAssociationLink cleanup on $SubnetName (up to 20 min)..." + Log "SAL is held by the platform's managed Container Apps environment — cleanup happens asynchronously after caphost deletion." + + while ($elapsed -lt $maxWait) { + $subnetInfo = az network vnet subnet show ` + --resource-group $VnetRg ` + --vnet-name $VnetName ` + --name $SubnetName ` + --query "{salCount:length(serviceAssociationLinks || ``[]``), salType:(serviceAssociationLinks[0].linkedResourceType || serviceAssociationLinks[0].properties.linkedResourceType), salState:(serviceAssociationLinks[0].provisioningState || serviceAssociationLinks[0].properties.provisioningState)}" ` + -o json 2>$null | ConvertFrom-Json + + $salCount = $subnetInfo.salCount + if ($salCount -eq 0) { + Pass "serviceAssociationLink removed from $SubnetName" + return $true + } + + if ($elapsed -eq 0) { + Log " SAL type: $($subnetInfo.salType) | state: $($subnetInfo.salState)" + } + + Log " Still linked ($salCount SALs, state: $($subnetInfo.salState)). Elapsed: ${elapsed}s / ${maxWait}s" + Start-Sleep -Seconds 30 + $elapsed += 30 + } + + Warn "SAL still present on $SubnetName after 20 min. Backend cleanup can take up to 24 hours." + Warn "Diagnostics to run manually:" + Warn " 1. Check SAL: az network vnet subnet show --resource-group $VnetRg --vnet-name $VnetName --name $SubnetName --query serviceAssociationLinks" + Warn " 2. Check caphosts still exist: az rest --method GET --url `"https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts//capabilityHosts?api-version=$ApiVersion`"" + Warn "If still present after 24 hours, file a support ticket." + return $false +} + +# ============================================================================= +# STEP 0: Discovery +# ============================================================================= +Write-Host "========================================" +Write-Host "Foundry Private Network Cleanup" +Write-Host "========================================" +Write-Host "Subscription: $SubscriptionId" +Write-Host "Resource Group: $ResourceGroup" +if ($AccountName) { + Write-Host "Account filter: $AccountName" +} +Write-Host "Dry Run: $DryRun" +Write-Host "" + +# Verify login and active subscription (never switch the user's active subscription) +$azAccount = az account show -o json 2>$null | ConvertFrom-Json +if (-not $azAccount) { + Fail "Not logged in to Azure CLI. Run: az login" + return +} + +# Verify subscription access without switching the active context +$subCheck = az account show --subscription $SubscriptionId --query "id" -o tsv 2>$null +if (-not $subCheck) { + Fail "Cannot access subscription $SubscriptionId. Verify the ID and your access." + return +} + +# Ensure the CLI is already pointed at the right subscription +$activeSubId = ($azAccount.id).Trim() +if ($activeSubId -ne $SubscriptionId) { + Fail "Active subscription ($activeSubId) does not match requested ($SubscriptionId)." + Fail "Run: az account set --subscription $SubscriptionId" + return +} +Pass "Subscription verified: $($azAccount.name)" + +# Discover AI Foundry accounts +Log "Discovering AI Foundry accounts..." +if ($AccountName) { + # Verify the specific account exists and is AIServices + $kind = az cognitiveservices account show --name $AccountName --resource-group $ResourceGroup ` + --query "kind" -o tsv 2>$null + if ($kind -eq "AIServices") { + $accounts = @($AccountName) + } else { + Fail "Account '$AccountName' not found or is not an AIServices account in $ResourceGroup" + return + } +} else { + $accounts = @(az cognitiveservices account list --resource-group $ResourceGroup ` + --query "[?kind=='AIServices'].name" -o tsv 2>$null) | Where-Object { $_ } +} + +if ($accounts.Count -eq 0) { + Warn "No AI Foundry accounts found in $ResourceGroup" +} else { + Log "Found accounts: $($accounts -join ', ')" +} + +# Discover caphosts per account (subnet tracking is derived from caphost properties) +Log "Discovering capability hosts..." +$token = Get-AzToken +$headers = @{ Authorization = "Bearer $token" } + +$projectCaphosts = @() # [PSCustomObject]{Account, Project, Caphost} +$accountCaphosts = @() # [PSCustomObject]{Account, Caphost} +$accountProjects = @() # [PSCustomObject]{Account, Project} +$caphostSubnets = @{} # Deduplicated: subnetResourceId -> [PSCustomObject]{VnetRg, Vnet, Subnet} + +foreach ($account in $accounts) { + # Account-level caphosts + try { + $accCH = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$account/capabilityHosts?api-version=$ApiVersion" -Headers $headers -ErrorAction Stop + foreach ($ch in $accCH.value) { + $accountCaphosts += [PSCustomObject]@{ Account = $account; Caphost = $ch.name } + Log " Account caphost: $account -> $($ch.name)" + # Track the subnet this caphost is linked to + $subnetId = $ch.properties.customerSubnet + if ($subnetId -and -not $caphostSubnets.ContainsKey($subnetId)) { + if ($subnetId -match '/subscriptions/[^/]+/resourceGroups/([^/]+)/providers/Microsoft.Network/virtualNetworks/([^/]+)/subnets/([^/]+)') { + $caphostSubnets[$subnetId] = [PSCustomObject]@{ VnetRg = $Matches[1]; Vnet = $Matches[2]; Subnet = $Matches[3] } + Log " Caphost subnet: $($Matches[2])/$($Matches[3])" + } + } + } + } catch { } + + # Discover projects + try { + $projects = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$account/projects?api-version=$ApiVersion" -Headers $headers -ErrorAction Stop + foreach ($proj in $projects.value) { + # API returns "account/project" format — extract just the project name + $projName = if ($proj.name -match '/') { ($proj.name -split '/')[-1] } else { $proj.name } + $accountProjects += [PSCustomObject]@{ Account = $account; Project = $projName } + Log " Project: $account/$projName" + # Project-level caphosts + try { + $projCH = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$account/projects/$projName/capabilityHosts?api-version=$ApiVersion" -Headers $headers -ErrorAction Stop + foreach ($ch in $projCH.value) { + $projectCaphosts += [PSCustomObject]@{ Account = $account; Project = $projName; Caphost = $ch.name } + Log " Project caphost: $account/$projName -> $($ch.name)" + # Track the subnet this caphost is linked to + $subnetId = $ch.properties.customerSubnet + if ($subnetId -and -not $caphostSubnets.ContainsKey($subnetId)) { + if ($subnetId -match '/subscriptions/[^/]+/resourceGroups/([^/]+)/providers/Microsoft.Network/virtualNetworks/([^/]+)/subnets/([^/]+)') { + $caphostSubnets[$subnetId] = [PSCustomObject]@{ VnetRg = $Matches[1]; Vnet = $Matches[2]; Subnet = $Matches[3] } + Log " Caphost subnet: $($Matches[2])/$($Matches[3])" + } + } + } + } catch { } + } + } catch { } +} + + +if ($caphostSubnets.Count -eq 0 -and $accounts.Count -gt 0) { + Warn "No caphost-linked subnet found. Skipping SAL wait." +} + +# Summary +Write-Host "" +Write-Host "========================================" +Write-Host "Discovery Summary" +Write-Host "========================================" +Write-Host " Accounts: $($accounts.Count)" +Write-Host " Project caphosts: $($projectCaphosts.Count)" +Write-Host " Account caphosts: $($accountCaphosts.Count)" +Write-Host " Projects: $($accountProjects.Count)" +Write-Host " Caphost subnets to monitor: $($caphostSubnets.Count)" +Write-Host "========================================" +Write-Host "" + +# Confirmation prompt (skip with -DryRun) +$totalItems = $accounts.Count + $projectCaphosts.Count + $accountCaphosts.Count + $caphostSubnets.Count +if ($totalItems -eq 0) { + Warn "Nothing to clean up." + return +} + +if (-not $DryRun) { + Write-Host "This will DELETE the resources listed above. This action cannot be undone." -ForegroundColor Yellow + $confirm = Read-Host "Continue? [y/N]" + if ($confirm -notmatch '^[Yy]') { + Write-Host "Aborted." + return + } +} + +# ============================================================================= +# STEP 1: Delete Project Capability Hosts +# ============================================================================= +Write-Host "=== Step 1: Delete Project Capability Hosts ===" +if ($projectCaphosts.Count -eq 0) { + Log "No project capability hosts to delete" +} else { + foreach ($pc in $projectCaphosts) { + $result = Remove-CapabilityHost -ResourcePath "accounts/$($pc.Account)/projects/$($pc.Project)" ` + -CaphostName $pc.Caphost -DisplayName "Project $($pc.Project)" + if (-not $result) { + Fail "Cannot proceed — project caphost deletion failed. Fix the issue above and re-run." + exit 1 + } + } +} + +# ============================================================================= +# STEP 2: Delete Account Capability Hosts +# ============================================================================= +Write-Host "" +Write-Host "=== Step 2: Delete Account Capability Hosts ===" +if ($accountCaphosts.Count -eq 0) { + Log "No account capability hosts to delete" +} else { + foreach ($ac in $accountCaphosts) { + $result = Remove-CapabilityHost -ResourcePath "accounts/$($ac.Account)" ` + -CaphostName $ac.Caphost -DisplayName "Account $($ac.Account)" + if (-not $result) { + Fail "Cannot proceed — account caphost deletion failed. Fix the issue above and re-run." + exit 1 + } + } +} + +# ============================================================================= +# STEP 3: Delete Projects and Purge AI Accounts +# ============================================================================= +Write-Host "" +Write-Host "=== Step 3: Delete Projects and Purge AI Accounts ===" + +# 3a: Delete projects first (accounts cannot be deleted while nested projects exist) +if ($accountProjects.Count -gt 0) { + foreach ($ap in $accountProjects) { + if ($DryRun) { + Dry "Delete project: $($ap.Account)/$($ap.Project)" + continue + } + Log "Deleting project: $($ap.Account)/$($ap.Project)" + $token = Get-AzToken + $projUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$($ap.Account)/projects/$($ap.Project)?api-version=$ApiVersion" + try { + Invoke-WebRequest -Uri $projUrl -Method Delete -Headers @{ Authorization = "Bearer $token" } -ErrorAction Stop | Out-Null + Pass "Project $($ap.Project) deleted" + } catch { + if ($_.Exception.Response.StatusCode.value__ -eq 404) { + Pass "Project $($ap.Project) not found (already deleted)" + } else { + $errMsg = if ($_.ErrorDetails.Message) { $_.ErrorDetails.Message } else { $_.Exception.Message } + Fail "Failed to delete project $($ap.Project): $errMsg" + } + } + } +} + +# 3b: Delete and purge accounts +foreach ($account in $accounts) { + if ($DryRun) { + Dry "Delete + purge account: $account" + continue + } + + $location = az cognitiveservices account show --name $account --resource-group $ResourceGroup ` + --query "location" -o tsv 2>$null + + if ([string]::IsNullOrEmpty($location)) { + Warn "Account $account not found (already deleted)" + } else { + Log "Deleting account: $account" + $delOut = az cognitiveservices account delete --name $account --resource-group $ResourceGroup 2>&1 + if ($LASTEXITCODE -ne 0) { + Fail "Failed to delete account ${account}: $delOut" + continue + } + Log "Purging account: $account (location: $location)" + $purgeOut = az cognitiveservices account purge --name $account --resource-group $ResourceGroup --location $location 2>&1 + if ($LASTEXITCODE -ne 0) { + Fail "Failed to purge account ${account}: $purgeOut" + continue + } + Pass "Account $account deleted and purged" + } +} + +# Check for soft-deleted accounts +Log "Checking for soft-deleted accounts in resource group (residue cleanup)..." +$deletedAccounts = az cognitiveservices account list-deleted ` + --query "[?contains(id, '/resourceGroups/$ResourceGroup/')].{name:name, location:location}" -o json 2>$null | ConvertFrom-Json + +if ($deletedAccounts -and $deletedAccounts.Count -gt 0) { + foreach ($da in $deletedAccounts) { + if ($DryRun) { + Dry "Purge soft-deleted account: $($da.name)" + } else { + Log "Purging soft-deleted account: $($da.name)" + az cognitiveservices account purge --name $da.name --resource-group $ResourceGroup --location $da.location 2>$null + Pass "Purged: $($da.name)" + } + } +} else { + Pass "No soft-deleted accounts found" +} + +# ============================================================================= +# STEP 4: Wait for SAL cleanup +# ============================================================================= +Write-Host "" +Write-Host "=== Step 4: Wait for Subnet Link Cleanup ===" +if ($caphostSubnets.Count -eq 0) { + Log "No caphost subnets to monitor — skipping wait" +} else { + foreach ($entry in $caphostSubnets.Values) { + $salClean = Wait-ForSalCleanup -VnetRg $entry.VnetRg -VnetName $entry.Vnet -SubnetName $entry.Subnet + if (-not $salClean) { + Fail "SAL cleanup timed out on $($entry.Vnet)/$($entry.Subnet) — subnet is still blocked" + } + } +} + +# ============================================================================= +# STEP 5: Delete Resource Group (optional) +# ============================================================================= +Write-Host "" +Write-Host "=== Step 5: Resource Group ===" +if ($DeleteRG) { + if ($DryRun) { + Dry "Delete resource group: $ResourceGroup" + } else { + Log "Deleting resource group: $ResourceGroup" + az group delete --name $ResourceGroup --yes --no-wait 2>$null + Pass "Resource group deletion initiated (async)" + } +} else { + Write-Host "To delete the resource group:" + Write-Host " az group delete --name $ResourceGroup --yes" +} + +# ============================================================================= +# Summary +# ============================================================================= +Write-Host "" +Write-Host "========================================" +if ($DryRun) { + Write-Host "DRY RUN complete. No changes were made." -ForegroundColor Yellow +} elseif ($script:Errors -eq 0) { + Write-Host "Cleanup completed successfully." -ForegroundColor Green +} else { + Write-Host "Cleanup completed with $($script:Errors) error(s). Review output above." -ForegroundColor Red +} +Write-Host "========================================" +return diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/README.md b/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/README.md new file mode 100644 index 000000000..f309290e1 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/README.md @@ -0,0 +1,96 @@ +# Post-Deployment Diagnostic for Foundry Private Network + +Validates that all resources, networking, RBAC, and capability hosts are healthy after deploying a Foundry private network template (Bicep or Terraform). + +## What It Checks + +Checks are ordered **outside-in**, following the network path an agent request takes from the Data Proxy through the VNet to backend resources: + +| # | Area | What's Verified | +|---|------|----------------| +| 1 | Discovery | Auto-discovers AIServices accounts in the resource group | +| 2 | Network Injection | Data Proxy config and subnet reference valid — is the platform infra alive? | +| 3 | VNet & Subnets | Delegations correct, ServiceAssociationLinks present on agent subnet | +| 4 | NSG Rules | Custom NSG rules don't block required traffic (443/445 outbound, VNet inbound on PE/MCP subnets) | +| 5 | DNS Zones | Private DNS zones exist with VNet links | +| 6 | Custom DNS | Detects whether VNet uses custom DNS servers or Azure default DNS; if custom, reports server IPs and required forwarder target (`168.63.129.16`) | +| 7 | Private Endpoints | PE connections, shared PEs (resourceAccessRules), network rules, bypass, and shared key per resource | +| 8 | Projects & MI | Project provisioned with system-assigned managed identity | +| 9 | Capability Host | Project (and optionally account) caphost in `Succeeded` state with connections wired | +| 10 | Connections | Cosmos DB, Storage, and AI Search project connections exist with AAD auth | +| 11 | RBAC | All 5 ARM roles + Cosmos SQL data-plane role assigned to project MI | +| 12 | Provisioning | All resources in `Succeeded` state | +| 13 | Public Access + ACLs | `publicNetworkAccess: Disabled` on all resources, AI Services ACL `Deny` + `AzureServices` bypass | +| 14 | Model Deployment | At least one model deployed and healthy | +| 15 | Azure Policy | Non-compliant policy evaluations, Deny-effect policies that block deployment | + +## Usage + +### With config file + +```powershell +cp diagnostic.config.sample diagnostic.config +# Edit diagnostic.config with your values +.\diagnostic-check.ps1 -ConfigFile .\diagnostic.config +``` + +### With parameters + +```powershell +.\diagnostic-check.ps1 -SubscriptionId "your-sub-id" -ResourceGroup "your-rg" +``` + +### With specific account + +```powershell +.\diagnostic-check.ps1 -SubscriptionId "your-sub-id" -ResourceGroup "your-rg" -AccountName "aifoundryabcd" +``` + +## Prerequisites + +- PowerShell 7+ (`pwsh`) — Windows PowerShell 5.1 is not supported +- Azure CLI installed and logged in (`az login`) +- Active subscription set to the target subscription +- Reader access on the resource group (Contributor for full RBAC checks) + +## Reading the Output + +- **[PASS]** — check passed +- **[FAIL]** — something is wrong that will likely break agent functionality +- **[WARN]** — unexpected configuration that may or may not cause issues +- **[INFO]** — informational (BYO resources in other RGs, etc.) + +## Common Failure Patterns + +| Symptom | Likely Cause | Check # | +|---------|-------------|---------| +| Agent calls timeout | Network injection subnet missing or deleted | 2 | +| Caphost failed | SAL conflict on subnet from prior deployment — use a new VNet | 3, 9 | +| NSG blocks agent traffic | Custom NSG deny-all outbound without AzureCloud allow | 4 | +| MCP tools unreachable | NSG on MCP subnet blocks inbound from VNet | 4 | +| DNS resolution fails | DNS zone not linked to VNet, or custom DNS without forwarders | 5, 6 | +| Private endpoint returns public IP | Custom DNS servers missing conditional forwarders to 168.63.129.16 | 6 | +| Agent can't reach AI Search | Missing PE or DNS link for `privatelink.search.windows.net` | 5, 7 | +| Storage unreachable (no PE, no bypass) | No PE, no shared PE, and no AzureServices bypass on storage | 7 | +| Shared PE pending | Resource access rule exists but PE needs approval on target | 7 | +| AI Search shared PE pending | AI Search outbound shared PE needs approval | 7 | +| Caphost stuck in Creating | RBAC not assigned before caphost creation | 9, 11 | +| Agent can't store threads | Missing Cosmos DB SQL data-plane role | 11 | +| Agent can't write files | Missing Storage Blob Data Owner with ABAC condition | 11 | +| Deployment blocked by policy | Azure Policy with Deny effect on resource config | 15 | + +## BYO (Bring Your Own) Resources in Other Resource Groups + +The script auto-discovers BYO resources that live outside the primary `ResourceGroup`: + +1. **VNet**: Extracted from the network injection `subnetArmId` — if the subnet is in a different RG, the VNet is included in sections 3, 4, and 6 automatically. +2. **Storage, Cosmos DB, AI Search**: Discovered by parsing project connection target URLs and looking up the resource subscription-wide. Automatically included in sections 7, 12, and 13. +3. **DNS Zones**: Searched in both the primary RG and any BYO RGs discovered above. + +No extra parameters are needed. The script logs `[INFO] BYO '' discovered in RG ''` for each cross-RG resource it finds. + +> **Limitation**: Resources in a different *subscription* are not auto-discovered. If you have cross-subscription BYO resources, those sections will show `[WARN]` and you'll need to check them manually. + +## Applies To + +Works with all Foundry private network setups (Bicep and Terraform). diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic-check.ps1 b/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic-check.ps1 new file mode 100644 index 000000000..9dbca72ec --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic-check.ps1 @@ -0,0 +1,1371 @@ +<# +.SYNOPSIS + Post-deployment diagnostic for Foundry private network setups. + +.DESCRIPTION + Runs after 'az deployment group create' or 'terraform apply' to validate + that all resources, networking, RBAC, and capability hosts are healthy. + + Checks are ordered outside-in, following the network path an agent + request takes from the Data Proxy through the VNet to backend resources: + + 1. Discovery — find AI Services accounts + 2. Network Injection (Data Proxy) — is the platform infra alive? + 3. VNet, Subnets, Delegations, SAL — network plumbing + 4. NSG Rules — can traffic flow between subnets? + 5. DNS Zones — zone existence + VNet links + 6. Custom DNS Detection — forwarder requirements + 7. Private Endpoints + Network Rules — PE connectivity + 8. Projects & Managed Identity — project exists with MI + 9. Capability Host — caphost healthy + connections wired + 10. Project Connections — Cosmos/Storage/Search connections exist + 11. RBAC Role Assignments — MI has required roles + 12. Resource Provisioning State — all resources healthy + 13. Public Network Access + AI Services ACLs — lockdown audit + 14. Model Deployment — models ready + 15. Azure Policy — nothing blocking + + Use this script to quickly pinpoint why agents fail after a seemingly + successful deployment. + +.PARAMETER ConfigFile + Path to a config file with key=value pairs. See diagnostic.config.sample. + +.PARAMETER SubscriptionId + Azure subscription ID + +.PARAMETER ResourceGroup + Resource group containing the deployed resources + +.PARAMETER AccountName + Optional. AI Services account name. Auto-discovered if omitted. + +.EXAMPLE + .\diagnostic-check.ps1 -ConfigFile .\diagnostic.config + +.EXAMPLE + .\diagnostic-check.ps1 -SubscriptionId "xxx" -ResourceGroup "my-rg" + +.EXAMPLE + .\diagnostic-check.ps1 -SubscriptionId "xxx" -ResourceGroup "my-rg" -AccountName "aifoundryabcd" +#> + +#Requires -Version 7.0 + +param( + [string]$ConfigFile = '', + [string]$SubscriptionId = '', + [string]$ResourceGroup = '', + [string]$AccountName = '' +) + +# --- Load config file if provided --- +if ($ConfigFile) { + if (-not (Test-Path $ConfigFile)) { + Write-Host "[FAIL] Config file not found: $ConfigFile" -ForegroundColor Red + exit 1 + } + $configLines = Get-Content $ConfigFile | Where-Object { $_ -match '^\s*[^#]' -and $_ -match '=' } + $config = @{} + foreach ($line in $configLines) { + $parts = $line -split '=', 2 + $key = $parts[0].Trim() + $val = $parts[1].Trim() + $config[$key] = $val + } + if (-not $SubscriptionId -and $config['SubscriptionId']) { $SubscriptionId = $config['SubscriptionId'] } + if (-not $ResourceGroup -and $config['ResourceGroup']) { $ResourceGroup = $config['ResourceGroup'] } + if (-not $AccountName -and $config['AccountName']) { $AccountName = $config['AccountName'] } +} + +# --- Validate required params --- +if (-not $SubscriptionId -or -not $ResourceGroup) { + Write-Host "ERROR: SubscriptionId and ResourceGroup are required." -ForegroundColor Red + Write-Host "Usage:" + Write-Host " .\diagnostic-check.ps1 -ConfigFile .\diagnostic.config" + Write-Host " .\diagnostic-check.ps1 -SubscriptionId 'xxx' -ResourceGroup 'my-rg'" + exit 1 +} + +$ScriptVersion = "1.0.0" +$ErrorActionPreference = "Continue" +$script:PassCount = 0 +$script:FailCount = 0 +$script:WarnCount = 0 + +function Pass { param([string]$Msg) Write-Host "[PASS] $Msg" -ForegroundColor Green; $script:PassCount++ } +function Fail { param([string]$Msg) Write-Host "[FAIL] $Msg" -ForegroundColor Red; $script:FailCount++ } +function Warn { param([string]$Msg) Write-Host "[WARN] $Msg" -ForegroundColor Yellow; $script:WarnCount++ } +function Info { param([string]$Msg) Write-Host "[INFO] $Msg" -ForegroundColor Cyan } +function Detail { param([string]$Msg) Write-Host " $Msg" -ForegroundColor Gray } + +Write-Host "" +Write-Host "========================================" +Write-Host "Post-Deployment Diagnostic (outside-in)" +Write-Host "========================================" +Write-Host "Version: $ScriptVersion" +Write-Host "Subscription: $SubscriptionId" +Write-Host "Resource Group: $ResourceGroup" +if ($AccountName) { Write-Host "Account: $AccountName" } +Write-Host "Timestamp: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss UTC' -AsUTC)" +Write-Host "========================================" +Write-Host "" + +# Verify Azure CLI login +$azAccount = az account show -o json 2>$null | ConvertFrom-Json +if (-not $azAccount) { + Write-Host "[FAIL] Not logged in to Azure CLI. Run: az login" -ForegroundColor Red + exit 1 +} + +$activeSubId = ($azAccount.id).Trim() +if ($activeSubId -ne $SubscriptionId) { + Write-Host "[FAIL] Active subscription ($activeSubId) does not match requested ($SubscriptionId)." -ForegroundColor Red + Write-Host " Run: az account set --subscription $SubscriptionId" -ForegroundColor Red + exit 1 +} + +# ARM API version +$CogApiVersion = "2025-04-01-preview" + +function Get-AzToken { + az account get-access-token --query accessToken -o tsv 2>$null +} + +function Invoke-ArmGet { + param([string]$Url) + $token = Get-AzToken + $headers = @{ Authorization = "Bearer $token"; "Content-Type" = "application/json" } + try { + $resp = Invoke-RestMethod -Uri $Url -Headers $headers -Method Get -ErrorAction Stop + return $resp + } catch { + return $null + } +} + +# --- Helper: check PE connections on a resource --- +function Test-PEConnections { + param([string]$ResourceId, [string]$Label) + $peList = az network private-endpoint-connection list --id $ResourceId -o json 2>$null | ConvertFrom-Json + $approvedCount = 0 + if ($peList -and $peList.Count -gt 0) { + foreach ($pec in $peList) { + $pecName = $pec.name + if (-not $pecName -and $pec.id) { $pecName = ($pec.id -split '/')[-1] } + # Handle both nested (ARM REST) and flat (az CLI) property structures + $pecStatus = $pec.properties.privateLinkServiceConnectionState.status + if (-not $pecStatus) { $pecStatus = $pec.privateLinkServiceConnectionState.status } + if ($pecStatus -eq 'Approved') { + Pass "$Label PE '$pecName': $pecStatus" + $approvedCount++ + } elseif ($pecStatus -eq 'Pending') { + Fail "$Label PE '$pecName': $pecStatus — needs manual approval" + } else { + Fail "$Label PE '$pecName': $pecStatus" + } + } + } + return $approvedCount +} + +# ============================================================================= +# 1. DISCOVER AI SERVICES ACCOUNTS +# ============================================================================= +Write-Host "--- 1. Discover AI Services Accounts ---" + +if ($AccountName) { + $accounts = @(@{ name = $AccountName }) + Info "Using provided account: $AccountName" +} else { + $accountsJson = az cognitiveservices account list --resource-group $ResourceGroup --query "[?kind=='AIServices']" -o json 2>$null | ConvertFrom-Json + if (-not $accountsJson -or $accountsJson.Count -eq 0) { + Fail "No AIServices accounts found in resource group '$ResourceGroup'" + Write-Host "" + Write-Host "========================================" + Write-Host "Results: $($script:PassCount) passed, $($script:FailCount) failed, $($script:WarnCount) warnings" + Write-Host "========================================" + exit 1 + } + $accounts = $accountsJson + Pass "Found $($accounts.Count) AIServices account(s): $($accounts.name -join ', ')" +} +Write-Host "" + +foreach ($acct in $accounts) { + $acctName = $acct.name + Write-Host "========== Account: $acctName ==========" + Write-Host "" + + # Get full account details + $acctDetail = az cognitiveservices account show --name $acctName --resource-group $ResourceGroup -o json 2>$null | ConvertFrom-Json + if (-not $acctDetail) { + Fail "Cannot retrieve account '$acctName'. It may have been deleted or you lack access." + continue + } + + $acctLocation = $acctDetail.location + + # Enumerate all resources in the primary RG + $allResources = az resource list --resource-group $ResourceGroup -o json 2>$null | ConvertFrom-Json + + # Extract typed resources for use across sections + $storageAccounts = @($allResources | Where-Object { $_.type -eq 'Microsoft.Storage/storageAccounts' }) + $cosmosAccounts = @($allResources | Where-Object { $_.type -eq 'Microsoft.DocumentDB/databaseAccounts' }) + $searchServices = @($allResources | Where-Object { $_.type -eq 'Microsoft.Search/searchServices' }) + $apimServices = @($allResources | Where-Object { $_.type -eq 'Microsoft.ApiManagement/service' }) + $containerRegistries = @($allResources | Where-Object { $_.type -eq 'Microsoft.ContainerRegistry/registries' }) + $vnets = @($allResources | Where-Object { $_.type -eq 'Microsoft.Network/virtualNetworks' }) + + # Track which RGs we've already scanned to avoid duplicate lookups + $scannedRGs = @($ResourceGroup) + # Track BYO resource groups discovered + $byoRGs = @() + + # --- BYO discovery 1: VNet RG from network injection subnetArmId --- + $networkInjections = $acctDetail.properties.networkInjections + if ($networkInjections) { + foreach ($ni in $networkInjections) { + $subnetId = $ni.subnetArmId + if ($subnetId -and $subnetId -match '/subscriptions/[^/]+/resourceGroups/([^/]+)/') { + $vnetRG = $Matches[1] + if ($vnetRG -ne $ResourceGroup -and $scannedRGs -notcontains $vnetRG) { + Info "BYO VNet detected in RG '$vnetRG' (from network injection subnetArmId)" + $byoRGs += $vnetRG + $scannedRGs += $vnetRG + # Fetch VNet by parsing the full VNet resource ID from the subnet ID + $vnetId = ($subnetId -replace '/subnets/[^/]+$', '') + $byoVnet = az resource show --ids $vnetId -o json 2>$null | ConvertFrom-Json + if ($byoVnet) { + $vnets += $byoVnet + } + } + } + } + } + + # --- BYO discovery 2: Resources from project connection targets --- + # Pre-scan all project connections to discover BYO storage/cosmos/search + $projectsPreUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$acctName/projects?api-version=$CogApiVersion" + $projectsPre = Invoke-ArmGet -Url $projectsPreUrl + + if ($projectsPre -and $projectsPre.value) { + foreach ($proj in $projectsPre.value) { + # ARM returns name as 'accountName/projectName' — extract just the project part + $projNamePre = ($proj.name -split '/')[-1] + $connPreUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$acctName/projects/$projNamePre/connections?api-version=$CogApiVersion" + $connPre = Invoke-ArmGet -Url $connPreUrl + if (-not $connPre -or -not $connPre.value) { continue } + + foreach ($conn in $connPre.value) { + $connCat = $conn.properties.category + $connTarget = $conn.properties.target + if (-not $connTarget) { continue } + + # Parse resource name from connection target URL + $resourceName = $null + $resourceType = $null + switch ($connCat) { + 'AzureStorageAccount' { + # Target: https://.blob.core.windows.net + if ($connTarget -match 'https://([^.]+)\.blob\.core\.windows\.net') { + $resourceName = $Matches[1] + $resourceType = 'Microsoft.Storage/storageAccounts' + } + } + { $_ -in 'AzureCosmosDBNoSQL', 'CosmosDb' } { + # Target: https://.documents.azure.com:443/ + if ($connTarget -match 'https://([^.]+)\.documents\.azure\.com') { + $resourceName = $Matches[1] + $resourceType = 'Microsoft.DocumentDB/databaseAccounts' + } + } + 'CognitiveSearch' { + # Target: https://.search.windows.net + if ($connTarget -match 'https://([^.]+)\.search\.windows\.net') { + $resourceName = $Matches[1] + $resourceType = 'Microsoft.Search/searchServices' + } + } + } + + if (-not $resourceName -or -not $resourceType) { continue } + + # Check if already in our typed arrays + $alreadyKnown = $false + switch ($resourceType) { + 'Microsoft.Storage/storageAccounts' { $alreadyKnown = ($storageAccounts | Where-Object { $_.name -eq $resourceName }).Count -gt 0 } + 'Microsoft.DocumentDB/databaseAccounts' { $alreadyKnown = ($cosmosAccounts | Where-Object { $_.name -eq $resourceName }).Count -gt 0 } + 'Microsoft.Search/searchServices' { $alreadyKnown = ($searchServices | Where-Object { $_.name -eq $resourceName }).Count -gt 0 } + } + if ($alreadyKnown) { continue } + + # Look up the resource subscription-wide by name and type + $byoResource = az resource list --name $resourceName --resource-type $resourceType -o json 2>$null | ConvertFrom-Json + if ($byoResource -and $byoResource.Count -gt 0) { + $byo = $byoResource[0] + $byoRGName = ($byo.id -split '/')[4] # resourceGroups/ + Info "BYO $connCat '$resourceName' discovered in RG '$byoRGName' (from project connection)" + if ($byoRGs -notcontains $byoRGName) { $byoRGs += $byoRGName } + switch ($resourceType) { + 'Microsoft.Storage/storageAccounts' { $storageAccounts += $byo } + 'Microsoft.DocumentDB/databaseAccounts' { $cosmosAccounts += $byo } + 'Microsoft.Search/searchServices' { $searchServices += $byo } + } + } else { + Warn "Connection target '$resourceName' ($connCat) not found in subscription — may be in another subscription" + } + } + } + } + + if ($byoRGs.Count -gt 0) { + Info "BYO resource groups discovered: $($byoRGs -join ', ')" + } + + # ============================================================================= + # 2. NETWORK INJECTION (DATA PROXY) — Is the platform infra alive? + # ============================================================================= + Write-Host "--- 2. Network Injection (Data Proxy) ---" + + $networkInjections = $acctDetail.properties.networkInjections + # Collect injected subnet IDs for subnet role identification in sections 3 & 4 + $injectedSubnetIds = @() + if ($networkInjections -and $networkInjections.Count -gt 0) { + foreach ($ni in $networkInjections) { + $scenario = $ni.scenario + $subnetId = $ni.subnetArmId + $useManagedNet = $ni.useMicrosoftManagedNetwork + if ($subnetId) { $injectedSubnetIds += $subnetId.ToLower() } + Pass "Network injection: scenario=$scenario, useManagedNetwork=$useManagedNet" + if ($subnetId) { + Info " Subnet: $subnetId" + # Verify the subnet exists + $subnetCheck = az resource show --ids $subnetId -o json 2>$null | ConvertFrom-Json + if ($subnetCheck) { + Pass " Injected subnet exists" + } else { + Fail " Injected subnet NOT found — network injection will fail" + } + } + } + } else { + Info "No network injections configured (Managed VNet or non-agent setup)" + } + Write-Host "" + + # ============================================================================= + # 3. VNET, SUBNETS, DELEGATIONS, AND SAL + # ============================================================================= + Write-Host "--- 3. VNet, Subnets, Delegations, and SAL ---" + + foreach ($vnet in $vnets) { + $vnetDetail = az network vnet show --ids $vnet.id -o json 2>$null | ConvertFrom-Json + Pass "VNet '$($vnet.name)' found in $($vnetDetail.location)" + + foreach ($subnet in $vnetDetail.subnets) { + $sName = $subnet.name + $delegations = $subnet.delegations + $sals = $subnet.serviceAssociationLinks + $subnetFullId = $subnet.id.ToLower() + + # Identify subnet role by properties, not names + $hasAppEnvDelegation = $false + $hasDnsResolverDelegation = $false + $hasWebDelegation = $false + $otherDelegations = @() + if ($delegations) { + foreach ($d in $delegations) { + $dServiceName = $d.properties.serviceName ?? $d.serviceName + if ($dServiceName -eq 'Microsoft.App/environments') { + $hasAppEnvDelegation = $true + } elseif ($dServiceName -eq 'Microsoft.Network/dnsResolvers') { + $hasDnsResolverDelegation = $true + } elseif ($dServiceName -eq 'Microsoft.Web/serverFarms') { + $hasWebDelegation = $true + } elseif ($dServiceName) { + $otherDelegations += $dServiceName + } + } + } + + # Agent subnet = Microsoft.App/environments delegation AND referenced by network injection + $isAgentSubnet = $hasAppEnvDelegation -and ($injectedSubnetIds -contains $subnetFullId) + # MCP/other Container Apps subnet = same delegation but NOT the injected agent subnet + $isContainerAppsSubnet = $hasAppEnvDelegation -and -not $isAgentSubnet + # Web app subnet + $isWebAppSubnet = $hasWebDelegation + # PE subnet = no delegation at all + $isPeSubnet = (-not $delegations -or $delegations.Count -eq 0) + + if ($isAgentSubnet) { + Pass "Subnet '$sName': agent subnet (delegated to Microsoft.App/environments, referenced by network injection)" + } elseif ($isContainerAppsSubnet) { + Pass "Subnet '$sName': Container Apps subnet (delegated to Microsoft.App/environments, not the agent subnet)" + } elseif ($isWebAppSubnet) { + Pass "Subnet '$sName': web app subnet (delegated to Microsoft.Web/serverFarms — VNet-integrated app)" + } elseif ($hasDnsResolverDelegation) { + Pass "Subnet '$sName': delegated to Microsoft.Network/dnsResolvers" + } elseif ($isPeSubnet) { + Pass "Subnet '$sName': no delegation (PE or general-purpose subnet)" + } else { + foreach ($od in $otherDelegations) { + Info "Subnet '$sName': delegated to $od" + } + } + + # Check SAL (ServiceAssociationLink) + if ($sals) { + foreach ($sal in $sals) { + $salType = $sal.properties.linkedResourceType ?? $sal.linkedResourceType + $salAllowDelete = $sal.properties.allowDelete ?? $sal.allowDelete + if ($isAgentSubnet -and $salType -eq 'Microsoft.App/environments') { + Info "Subnet '$sName' has SAL: $salType (allowDelete=$salAllowDelete) — expected for active caphost" + } elseif ($hasDnsResolverDelegation -and $salType -eq 'Microsoft.Network/dnsResolvers') { + Info "Subnet '$sName' has SAL: $salType — expected for DNS Private Resolver" + } elseif ($isContainerAppsSubnet -and $salType -eq 'Microsoft.App/environments') { + Info "Subnet '$sName' has SAL: $salType — Container Apps environment bound" + } else { + Warn "Subnet '$sName' has unexpected SAL: $salType" + } + } + } elseif ($isAgentSubnet) { + Warn "Agent subnet '$sName' has no SAL. Capability host may not have provisioned or was deleted." + } + } + } + if ($vnets.Count -eq 0) { + Info "No VNets found in primary RG or via network injection (Managed VNet mode or BYO VNet in unlinked RG)" + } + Write-Host "" + + # ============================================================================= + # 4. NSG RULES — Can traffic flow between subnets? + # ============================================================================= + Write-Host "--- 4. NSG Rules on Subnets ---" + + foreach ($vnet in $vnets) { + $vnetDetail = az network vnet show --ids $vnet.id -o json 2>$null | ConvertFrom-Json + foreach ($subnet in $vnetDetail.subnets) { + $sName = $subnet.name + $nsgRef = $subnet.networkSecurityGroup + + if (-not $nsgRef) { + Info "Subnet '$sName': no NSG attached" + continue + } + + $nsgId = $nsgRef.id + $nsgName = ($nsgId -split '/')[-1] + Info "Subnet '$sName': NSG '$nsgName' attached" + + $nsgDetail = az network nsg show --ids $nsgId -o json 2>$null | ConvertFrom-Json + if (-not $nsgDetail) { + Fail "Cannot read NSG '$nsgName' (may be in another subscription or you lack access)" + continue + } + + # Combine default + custom rules + # az network nsg show returns flat properties (e.g. $r.direction, $r.access) + # ARM REST returns nested ($r.properties.direction) — handle both + $allRules = @() + if ($nsgDetail.securityRules) { $allRules += $nsgDetail.securityRules } + if ($nsgDetail.defaultSecurityRules) { $allRules += $nsgDetail.defaultSecurityRules } + + # Helper to read rule property from flat or nested structure + function Get-RuleProp($rule, $prop) { + $val = $rule.properties.$prop + if ($null -eq $val) { $val = $rule.$prop } + return $val + } + + # Identify subnet role by properties (same logic as section 3) + $hasAppEnvDelegation = $false + $hasWebDelegation = $false + if ($subnet.delegations) { + foreach ($d in $subnet.delegations) { + $dServiceName = $d.properties.serviceName ?? $d.serviceName + if ($dServiceName -eq 'Microsoft.App/environments') { $hasAppEnvDelegation = $true } + if ($dServiceName -eq 'Microsoft.Web/serverFarms') { $hasWebDelegation = $true } + } + } + $subnetFullId = $subnet.id.ToLower() + $isAgentSubnet = $hasAppEnvDelegation -and ($injectedSubnetIds -contains $subnetFullId) + $isContainerAppsSubnet = $hasAppEnvDelegation -and -not $isAgentSubnet + $isWebAppSubnet = $hasWebDelegation + $isPeSubnet = (-not $subnet.delegations -or $subnet.delegations.Count -eq 0) + + # --- Check: Deny-all outbound blocks Azure services --- + $denyAllOut = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Outbound' -and + (Get-RuleProp $_ 'access') -eq 'Deny' -and + (Get-RuleProp $_ 'destinationAddressPrefix') -eq '*' -and + (Get-RuleProp $_ 'protocol') -eq '*' + } | Sort-Object { [int](Get-RuleProp $_ 'priority') } | Select-Object -First 1 + + if ($denyAllOut) { + $denyPriority = [int](Get-RuleProp $denyAllOut 'priority') + + # Check if there's an Allow for HTTPS outbound to AzureCloud before the deny + $allowAzureOut = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Outbound' -and + (Get-RuleProp $_ 'access') -eq 'Allow' -and + [int](Get-RuleProp $_ 'priority') -lt $denyPriority -and + ((Get-RuleProp $_ 'destinationAddressPrefix') -match 'AzureCloud|VirtualNetwork|\*') + } + + if ($allowAzureOut) { + Pass "NSG '$nsgName' on '$sName': has deny-all outbound but allows Azure traffic at higher priority" + } else { + Fail "NSG '$nsgName' on '$sName': deny-all outbound (priority $denyPriority) with no Azure allow rule" + Detail "Agent/PE/MCP subnets need outbound HTTPS (443) to AzureCloud service tag" + } + } + + # --- Check: Required outbound ports --- + if ($isAgentSubnet -or $isContainerAppsSubnet -or $isWebAppSubnet) { + $subnetLabel = if ($isAgentSubnet) { 'agent' } elseif ($isContainerAppsSubnet) { 'Container Apps' } else { 'web app' } + $requiredPorts = @('443') + if ($isAgentSubnet -or $isContainerAppsSubnet) { $requiredPorts += '445' } + foreach ($port in $requiredPorts) { + $blockRule = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Outbound' -and + (Get-RuleProp $_ 'access') -eq 'Deny' -and + ((Get-RuleProp $_ 'destinationPortRange') -eq $port -or + ((Get-RuleProp $_ 'destinationPortRanges') -and (Get-RuleProp $_ 'destinationPortRanges') -contains $port)) + } + if ($blockRule) { + Fail "NSG '$nsgName' on $subnetLabel subnet '$sName': explicitly blocks outbound port $port" + if ($port -eq '443') { Detail "Port 443 is required for Azure service communication (including AI Search, Storage, Cosmos)" } + if ($port -eq '445') { Detail "Port 445 is required for Azure Files (agent file share)" } + } + } + + # Check outbound to VirtualNetwork (needed for PE connectivity from this subnet) + $denyVnetOut = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Outbound' -and + (Get-RuleProp $_ 'access') -eq 'Deny' -and + ((Get-RuleProp $_ 'destinationAddressPrefix') -eq 'VirtualNetwork') -and + [int](Get-RuleProp $_ 'priority') -lt 65000 + } + if ($denyVnetOut) { + Fail "NSG '$nsgName' on $subnetLabel subnet '$sName': blocks outbound to VirtualNetwork — cannot reach private endpoints" + Detail "Resources like AI Search, Storage, and Cosmos are accessed via private endpoints within the VNet" + } + } + + # --- Check: PE subnet inbound from VNet --- + if ($isPeSubnet) { + $denyVnetIn = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Inbound' -and + (Get-RuleProp $_ 'access') -eq 'Deny' -and + ((Get-RuleProp $_ 'sourceAddressPrefix') -eq 'VirtualNetwork' -or + (Get-RuleProp $_ 'sourceAddressPrefix') -eq '*') -and + ((Get-RuleProp $_ 'destinationPortRange') -eq '443' -or + (Get-RuleProp $_ 'destinationPortRange') -eq '*') + } | Sort-Object { [int](Get-RuleProp $_ 'priority') } | Select-Object -First 1 + + if ($denyVnetIn -and [int](Get-RuleProp $denyVnetIn 'priority') -lt 65000) { + $allowBefore = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Inbound' -and + (Get-RuleProp $_ 'access') -eq 'Allow' -and + [int](Get-RuleProp $_ 'priority') -lt [int](Get-RuleProp $denyVnetIn 'priority') -and + ((Get-RuleProp $_ 'sourceAddressPrefix') -match 'VirtualNetwork|\*') + } + if (-not $allowBefore) { + Fail "NSG '$nsgName' on PE subnet '$sName': blocks inbound from VNet — PEs won't be reachable" + } + } + } + + # --- Check: Container Apps subnet inbound (MCP / other Container Apps) --- + if ($isContainerAppsSubnet) { + $denyMcpIn = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Inbound' -and + (Get-RuleProp $_ 'access') -eq 'Deny' -and + (Get-RuleProp $_ 'sourceAddressPrefix') -eq '*' -and + (Get-RuleProp $_ 'destinationPortRange') -eq '*' + } | Sort-Object { [int](Get-RuleProp $_ 'priority') } | Select-Object -First 1 + + if ($denyMcpIn -and [int](Get-RuleProp $denyMcpIn 'priority') -lt 65000) { + $allowBefore = $allRules | Where-Object { + (Get-RuleProp $_ 'direction') -eq 'Inbound' -and + (Get-RuleProp $_ 'access') -eq 'Allow' -and + [int](Get-RuleProp $_ 'priority') -lt [int](Get-RuleProp $denyMcpIn 'priority') -and + ((Get-RuleProp $_ 'sourceAddressPrefix') -match 'VirtualNetwork') + } + if (-not $allowBefore) { + Warn "NSG '$nsgName' on Container Apps subnet '$sName': deny-all inbound with no VNet allow — tools may be unreachable from agents" + } + } + } + + # --- Summary of custom (non-default) rules --- + $customRules = $nsgDetail.securityRules + if ($customRules -and $customRules.Count -gt 0) { + Info " $($customRules.Count) custom rule(s) on '$nsgName':" + foreach ($r in ($customRules | Sort-Object { $p = (Get-RuleProp $_ 'priority'); if ($p) { [int]$p } else { 0 } })) { + $rDir = Get-RuleProp $r 'direction' + $rAcc = Get-RuleProp $r 'access' + $rPri = Get-RuleProp $r 'priority' + $rDstPort = Get-RuleProp $r 'destinationPortRange' + $rDstPrefix = Get-RuleProp $r 'destinationAddressPrefix' + if (-not $rDir) { $rDir = '???' } else { $rDir = $rDir.Substring(0, [Math]::Min(3, $rDir.Length)) } + $acc = if ($rAcc -eq 'Allow') { 'Allow' } else { 'DENY' } + $dst = if ($rDstPort -eq '*') { 'all-ports' } else { "port:$rDstPort" } + Detail " [$rPri] $rDir $acc $dst -> $rDstPrefix ($($r.name))" + } + } else { + Pass "NSG '$nsgName': default rules only (no custom rules)" + } + } + } + Write-Host "" + + # ============================================================================= + # 5. PRIVATE DNS ZONES — Do names resolve to private IPs? + # ============================================================================= + Write-Host "--- 5. Private DNS Zones ---" + + $expectedZones = @( + 'privatelink.services.ai.azure.com', + 'privatelink.openai.azure.com', + 'privatelink.cognitiveservices.azure.com', + 'privatelink.search.windows.net', + 'privatelink.blob.core.windows.net', + 'privatelink.documents.azure.com' + ) + if ($apimServices.Count -gt 0) { + $expectedZones += 'privatelink.azure-api.net' + } + if ($containerRegistries.Count -gt 0) { + $expectedZones += 'privatelink.azurecr.io' + } + + $dnsZones = az network private-dns zone list --resource-group $ResourceGroup -o json 2>$null | ConvertFrom-Json + $foundZoneNames = @() + $dnsZoneRGMap = @{} # zone name -> RG where it was found + if ($dnsZones) { + foreach ($z in $dnsZones) { + $foundZoneNames += $z.name + $dnsZoneRGMap[$z.name] = $ResourceGroup + } + } + + # Also check BYO resource groups for DNS zones + foreach ($byoRG in $byoRGs) { + $byoDnsZones = az network private-dns zone list --resource-group $byoRG -o json 2>$null | ConvertFrom-Json + if ($byoDnsZones) { + foreach ($z in $byoDnsZones) { + if ($foundZoneNames -notcontains $z.name) { + $foundZoneNames += $z.name + $dnsZoneRGMap[$z.name] = $byoRG + Info "DNS zone '$($z.name)' found in BYO RG '$byoRG'" + } + } + } + } + + foreach ($expected in $expectedZones) { + if ($foundZoneNames -contains $expected) { + $zoneRG = $dnsZoneRGMap[$expected] + $rgLabel = if ($zoneRG -ne $ResourceGroup) { " (BYO RG: $zoneRG)" } else { "" } + Pass "DNS zone '$expected': exists$rgLabel" + + # Check VNet links (in the RG where the zone was found) + $links = az network private-dns link vnet list --zone-name $expected --resource-group $zoneRG -o json 2>$null | ConvertFrom-Json + if ($links -and $links.Count -gt 0) { + foreach ($link in $links) { + $linkState = $link.properties.provisioningState ?? $link.provisioningState + if ($linkState -eq 'Succeeded') { + Pass " VNet link '$($link.name)': $linkState" + } elseif (-not $linkState) { + Info " VNet link '$($link.name)': state unknown" + } else { + Fail " VNet link '$($link.name)': $linkState" + } + } + } else { + Fail " DNS zone '$expected' has no VNet links — DNS resolution will fail" + } + } else { + Warn "DNS zone '$expected': not found in RG or discovered BYO RGs (may be in a central DNS RG — verify it exists and is VNet-linked)" + } + } + Write-Host "" + + # ============================================================================= + # 6. CUSTOM DNS SERVER DETECTION + # ============================================================================= + Write-Host "--- 6. Custom DNS Server Detection ---" + + foreach ($vnet in $vnets) { + $vnetDetail = az network vnet show --ids $vnet.id -o json 2>$null | ConvertFrom-Json + $dnsServers = $vnetDetail.dhcpOptions.dnsServers + + if ($dnsServers -and $dnsServers.Count -gt 0) { + Info "VNet '$($vnet.name)' uses custom DNS servers: $($dnsServers -join ', ') — ensure conditional forwarders for privatelink.* zones point to 168.63.129.16" + } else { + Pass "VNet '$($vnet.name)' uses Azure default DNS (168.63.129.16) — privatelink zones resolve automatically" + } + } + Write-Host "" + + # ============================================================================= + # 7. PRIVATE CONNECTIVITY (PEs, Shared PEs, Network Rules) + # ============================================================================= + Write-Host "--- 7. Private Endpoints and Network Rules ---" + + # AI Services PEs + $aiPeCount = Test-PEConnections -ResourceId $acctDetail.id -Label "AI Services" + if ($aiPeCount -eq 0) { + Warn "No approved PE connections on AI Services account (expected for private setup)" + } + + # Storage PEs + shared PE / resource access rules + foreach ($sa in $storageAccounts) { + $saDetail = az storage account show --ids $sa.id -o json 2>$null | ConvertFrom-Json + $saPeCount = Test-PEConnections -ResourceId $sa.id -Label "Storage '$($sa.name)'" + + # Check resourceAccessRules (shared private endpoints from AI Services) + $resourceAccessRules = $saDetail.networkRuleSet.resourceAccessRules + $hasSharedPE = $false + if ($resourceAccessRules -and $resourceAccessRules.Count -gt 0) { + foreach ($rule in $resourceAccessRules) { + $tenantId = $rule.tenantId + $ruleResId = $rule.resourceId + # Check if this grants access from the AI Services account + if ($ruleResId -match 'Microsoft.CognitiveServices/accounts') { + Pass "Storage '$($sa.name)': shared PE / resource access rule allows AI Services ($ruleResId)" + $hasSharedPE = $true + } else { + Info "Storage '$($sa.name)': resource access rule for $ruleResId" + } + } + } + + # Network rules summary + $saNetRules = $saDetail.networkRuleSet + $saDefaultAction = $saNetRules.defaultAction + $saBypass = $saNetRules.bypass + + if ($saDefaultAction -eq 'Deny') { + Pass "Storage '$($sa.name)' network defaultAction: Deny" + } else { + Warn "Storage '$($sa.name)' network defaultAction: $saDefaultAction (expected Deny)" + } + + if ($saBypass -match 'AzureServices') { + Pass "Storage '$($sa.name)' bypass includes AzureServices (trusted services can access)" + } else { + Warn "Storage '$($sa.name)' bypass does NOT include AzureServices — AI Services trusted access blocked" + if (-not $hasSharedPE -and $saPeCount -eq 0) { + Fail "Storage '$($sa.name)' has no PE, no shared PE, and no AzureServices bypass — AI Services cannot reach it" + } + } + + # Shared key access + $allowSharedKey = $saDetail.allowSharedKeyAccess + if ($allowSharedKey -eq $false) { + Pass "Storage '$($sa.name)' allowSharedKeyAccess: Disabled (AAD-only — correct)" + } elseif ($allowSharedKey -eq $true) { + Info "Storage '$($sa.name)' allowSharedKeyAccess: Enabled (consider disabling for security)" + } + + # Connectivity verdict + if ($saPeCount -eq 0 -and -not $hasSharedPE) { + if ($saBypass -match 'AzureServices') { + if ($injectedSubnetIds.Count -gt 0) { + Warn "Storage '$($sa.name)': no PE — agents in VNet rely on AzureServices bypass (trusted access). PE recommended." + } else { + Info "Storage '$($sa.name)': no PE — relying on AzureServices bypass (trusted access)" + } + } else { + Fail "Storage '$($sa.name)': no PE and no shared PE — data-plane access will fail" + } + } + + if ($saNetRules.ipRules -and $saNetRules.ipRules.Count -gt 0) { + Warn "Storage '$($sa.name)': $($saNetRules.ipRules.Count) IP rule(s) — may allow public access" + } + if ($saNetRules.virtualNetworkRules -and $saNetRules.virtualNetworkRules.Count -gt 0) { + Info "Storage '$($sa.name)': $($saNetRules.virtualNetworkRules.Count) VNet rule(s)" + } + } + + # Cosmos DB PEs + network rules + foreach ($cdb in $cosmosAccounts) { + $cdbPeCount = Test-PEConnections -ResourceId $cdb.id -Label "Cosmos DB '$($cdb.name)'" + $cdbDetail = az cosmosdb show --ids $cdb.id -o json 2>$null | ConvertFrom-Json + + # Cosmos uses isVirtualNetworkFilterEnabled + virtualNetworkRules + if ($cdbDetail.isVirtualNetworkFilterEnabled -eq $true) { + Info "Cosmos DB '$($cdb.name)': VNet filter enabled" + if ($cdbDetail.virtualNetworkRules -and $cdbDetail.virtualNetworkRules.Count -gt 0) { + Info " $($cdbDetail.virtualNetworkRules.Count) VNet rule(s) configured" + } + } + + if ($cdbDetail.ipRules -and $cdbDetail.ipRules.Count -gt 0) { + Warn "Cosmos DB '$($cdb.name)': $($cdbDetail.ipRules.Count) IP rule(s) — may allow public access" + } + + if ($cdbPeCount -eq 0) { + if ($injectedSubnetIds.Count -gt 0) { + Fail "Cosmos DB '$($cdb.name)': no PE — agents in VNet cannot reach Cosmos without a private endpoint" + } else { + Warn "Cosmos DB '$($cdb.name)': no approved PEs — data-plane access may rely on VNet rules or public access" + } + } + } + + # AI Search PEs + network rules + foreach ($ss in $searchServices) { + $ssPeCount = Test-PEConnections -ResourceId $ss.id -Label "AI Search '$($ss.name)'" + $ssDetail = az resource show --ids $ss.id --api-version 2025-05-01 -o json 2>$null | ConvertFrom-Json + + # Check shared private link resources on AI Search (outbound shared PEs from search to other services) + $sharedPLResources = $ssDetail.properties.sharedPrivateLinkResources + if ($sharedPLResources -and $sharedPLResources.Count -gt 0) { + foreach ($spl in $sharedPLResources) { + $splName = $spl.name + $splStatus = $spl.properties.status + $splTarget = $spl.properties.privateLinkResourceId + if ($splStatus -eq 'Approved') { + Pass "AI Search '$($ss.name)' shared PE '$splName': $splStatus" + } elseif ($splStatus -eq 'Pending') { + Fail "AI Search '$($ss.name)' shared PE '$splName': $splStatus — needs approval on target resource" + } else { + Warn "AI Search '$($ss.name)' shared PE '$splName': $splStatus" + } + } + } + + $ssNetRules = $ssDetail.properties.networkRuleSet + if ($ssNetRules) { + $ssBypass = $ssNetRules.bypass + if ($ssNetRules.ipRules -and $ssNetRules.ipRules.Count -gt 0) { + Warn "AI Search '$($ss.name)': $($ssNetRules.ipRules.Count) IP rule(s)" + } + if ($ssBypass -and $ssBypass -ne 'None') { + Info "AI Search '$($ss.name)' bypass: $ssBypass" + } + } + + if ($ssPeCount -eq 0) { + if ($injectedSubnetIds.Count -gt 0) { + Fail "AI Search '$($ss.name)': no PE — agents in VNet cannot reach search without a private endpoint" + Detail "Network injection is active. Create a PE for AI Search in the PE subnet and link privatelink.search.windows.net DNS zone." + } else { + Warn "AI Search '$($ss.name)': no approved PEs — data-plane access may fail if publicNetworkAccess is Disabled" + } + } + } + + # APIM PEs + foreach ($apim in $apimServices) { + $apimPeCount = Test-PEConnections -ResourceId $apim.id -Label "APIM '$($apim.name)'" + if ($apimPeCount -eq 0) { + if ($injectedSubnetIds.Count -gt 0) { + Fail "APIM '$($apim.name)': no PE — agents in VNet cannot reach APIM without a private endpoint" + } else { + Warn "APIM '$($apim.name)': no approved PEs" + } + } + } + + # ACR PEs + foreach ($acr in $containerRegistries) { + $acrPeCount = Test-PEConnections -ResourceId $acr.id -Label "ACR '$($acr.name)'" + if ($acrPeCount -eq 0) { + Warn "ACR '$($acr.name)': no private endpoint — expected for private network setup" + } + } + Write-Host "" + + # ============================================================================= + # 8. PROJECTS AND MANAGED IDENTITY + # ============================================================================= + Write-Host "--- 8. Projects and Managed Identity ---" + + $projectsUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$acctName/projects?api-version=$CogApiVersion" + $projectsResp = Invoke-ArmGet -Url $projectsUrl + + if (-not $projectsResp -or -not $projectsResp.value -or $projectsResp.value.Count -eq 0) { + Warn "No projects found under account '$acctName'" + Write-Host "" + continue + } + + $projIndex = 0 + $projTotal = $projectsResp.value.Count + foreach ($proj in $projectsResp.value) { + # ARM returns name as 'accountName/projectName' — extract just the project part + $projName = ($proj.name -split '/')[-1] + $projIndex++ + Write-Host "---------- Project $projIndex/$projTotal`: $projName ----------" + $projState = $proj.properties.provisioningState + $projPrincipalId = $proj.identity.principalId + + if ($projState -eq 'Succeeded') { + Pass "Project '$projName': $projState" + } else { + Fail "Project '$projName': $projState" + } + + if ($projPrincipalId) { + Pass "Project '$projName' has system-assigned MI: $projPrincipalId" + } else { + Fail "Project '$projName' has no system-assigned managed identity" + } + Write-Host "" + + # ============================================================================= + # 9. CAPABILITY HOST STATUS + # ============================================================================= + Write-Host "--- 9. Capability Host Status [$projName] ---" + + # Project-level capability hosts + $projCaphostsUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$acctName/projects/$projName/capabilityHosts?api-version=$CogApiVersion" + $projCaphosts = Invoke-ArmGet -Url $projCaphostsUrl + + if ($projCaphosts -and $projCaphosts.value -and $projCaphosts.value.Count -gt 0) { + foreach ($ch in $projCaphosts.value) { + $chName = $ch.name + $chState = $ch.properties.provisioningState + $chKind = $ch.properties.capabilityHostKind + + if ($chState -eq 'Succeeded') { + Pass "Project caphost '$chName' ($chKind): $chState" + } elseif ($chState -eq 'Creating' -or $chState -eq 'Updating') { + Warn "Project caphost '$chName' ($chKind): $chState — still in progress" + } else { + Fail "Project caphost '$chName' ($chKind): $chState" + } + + # Check connections + $connections = @() + if ($ch.properties.vectorStoreConnections) { $connections += "vectorStore: $($ch.properties.vectorStoreConnections -join ',')" } + if ($ch.properties.storageConnections) { $connections += "storage: $($ch.properties.storageConnections -join ',')" } + if ($ch.properties.threadStorageConnections) { $connections += "threadStorage: $($ch.properties.threadStorageConnections -join ',')" } + if ($connections.Count -gt 0) { + Info " Connections: $($connections -join ' | ')" + } else { + Warn " No connections configured on caphost '$chName'" + } + } + } else { + Fail "No project-level capability hosts found for project '$projName'" + Detail "The capability host is required for agent functionality" + Detail "Check that RBAC was assigned before caphost creation" + } + + # Account-level capability hosts + $acctCaphostsUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$acctName/capabilityHosts?api-version=$CogApiVersion" + $acctCaphosts = Invoke-ArmGet -Url $acctCaphostsUrl + + if ($acctCaphosts -and $acctCaphosts.value -and $acctCaphosts.value.Count -gt 0) { + foreach ($ch in $acctCaphosts.value) { + $chName = $ch.name + $chState = $ch.properties.provisioningState + if ($chState -eq 'Succeeded') { + Pass "Account caphost '$chName': $chState" + } elseif ($chState -eq 'Creating' -or $chState -eq 'Updating') { + Warn "Account caphost '$chName': $chState — still in progress" + } else { + Fail "Account caphost '$chName': $chState" + } + } + } else { + Info "No account-level capability hosts (deployment may use project-only pattern)" + } + Write-Host "" + + # ============================================================================= + # 10. PROJECT CONNECTIONS + # ============================================================================= + Write-Host "--- 10. Project Connections [$projName] ---" + + $connUrl = "https://management.azure.com/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.CognitiveServices/accounts/$acctName/projects/$projName/connections?api-version=$CogApiVersion" + $connResp = Invoke-ArmGet -Url $connUrl + + $expectedCategories = @( + @{ Display = 'Cosmos DB'; Matches = @('AzureCosmosDBNoSQL', 'CosmosDb') }, + @{ Display = 'Storage'; Matches = @('AzureStorageAccount') }, + @{ Display = 'AI Search'; Matches = @('CognitiveSearch') } + ) + $foundCategories = @() + + if ($connResp -and $connResp.value) { + foreach ($conn in $connResp.value) { + $connName = ($conn.name -split '/')[-1] + $connCat = $conn.properties.category + $connAuth = $conn.properties.authType + $connTarget = $conn.properties.target + $foundCategories += $connCat + + Pass "Connection '$connName': category=$connCat, auth=$connAuth" + # AAD auth is expected for caphost-related connections; other categories (e.g. AppInsights) may use ApiKey + $aadExpectedCategories = @('AzureCosmosDBNoSQL', 'CosmosDb', 'AzureStorageAccount', 'CognitiveSearch') + if ($connAuth -ne 'AAD' -and $aadExpectedCategories -contains $connCat) { + Warn " Connection uses '$connAuth' auth instead of AAD — may not work with managed identity" + } + } + } + + foreach ($expected in $expectedCategories) { + $found = $false + foreach ($m in $expected.Matches) { + if ($foundCategories -contains $m) { $found = $true; break } + } + if (-not $found) { + Fail "Missing project connection: $($expected.Display) (expected category: $($expected.Matches -join ' or '))" + Detail "Capability host needs this connection to function properly" + } + } + Write-Host "" + + # ============================================================================= + # 11. RBAC ROLE ASSIGNMENTS + # ============================================================================= + Write-Host "--- 11. RBAC Role Assignments [$projName] (MI: $projPrincipalId) ---" + + if ($projPrincipalId) { + # Check role assignments for project MI across the RG scope + $raJson = az role assignment list --assignee $projPrincipalId --resource-group $ResourceGroup --include-inherited -o json 2>$null | ConvertFrom-Json + + # Also check cross-RG role assignments (for BYO resources) + $raAllJson = az role assignment list --assignee $projPrincipalId --all -o json 2>$null | ConvertFrom-Json + + $allRolesFound = @() + if ($raJson) { $allRolesFound += $raJson } + if ($raAllJson) { $allRolesFound += $raAllJson } + $allRolesFound = $allRolesFound | Sort-Object -Property id -Unique + + # Expected roles + $expectedRoles = @{ + 'Cosmos DB Operator' = @{ scope = 'Cosmos DB'; required = $true; preCaphost = $true } + 'Storage Blob Data Contributor' = @{ scope = 'Storage'; required = $true; preCaphost = $true } + 'Search Index Data Contributor' = @{ scope = 'AI Search'; required = $true; preCaphost = $true } + 'Search Service Contributor' = @{ scope = 'AI Search'; required = $true; preCaphost = $true } + 'Storage Blob Data Owner' = @{ scope = 'Storage'; required = $true; preCaphost = $false } + } + + foreach ($roleName in $expectedRoles.Keys) { + $found = $allRolesFound | Where-Object { $_.roleDefinitionName -eq $roleName } + $meta = $expectedRoles[$roleName] + if ($found) { + $timing = if ($meta.preCaphost) { "pre-caphost" } else { "post-caphost" } + Pass "Role '$roleName' assigned on $($meta.scope) ($timing)" + + # Check ABAC condition on Storage Blob Data Owner + if ($roleName -eq 'Storage Blob Data Owner') { + $hasCondition = $found | Where-Object { $_.condition } + if ($hasCondition) { + Pass " Storage Blob Data Owner has ABAC condition (scoped to agent containers)" + } else { + Warn " Storage Blob Data Owner has no ABAC condition — broader than needed" + } + } + } else { + Fail "Role '$roleName' NOT found for project MI on $($meta.scope)" + if ($meta.preCaphost) { + Detail "This role must be assigned BEFORE capability host creation" + } else { + Detail "This role must be assigned AFTER capability host creation" + } + } + } + + # Check Cosmos DB SQL data-plane role + foreach ($cdb in $cosmosAccounts) { + $cdbRG = ($cdb.id -split '/')[4] + $cosmosRoles = az cosmosdb sql role assignment list --account-name $cdb.name --resource-group $cdbRG -o json 2>$null | ConvertFrom-Json + if ($cosmosRoles) { + $dataContrib = $cosmosRoles | Where-Object { + $_.principalId -eq $projPrincipalId -and + $_.roleDefinitionId -match '00000000-0000-0000-0000-000000000002' + } + if ($dataContrib) { + Pass "Cosmos DB SQL Built-in Data Contributor role assigned to project MI" + } else { + Fail "Cosmos DB SQL Built-in Data Contributor role NOT assigned to project MI" + Detail "This data-plane role must be assigned AFTER capability host creation" + } + } + } + } else { + Warn "Skipping RBAC checks — no project MI principal ID" + } + Write-Host "" + Write-Host "---------- End Project: $projName ----------" + Write-Host "" + + } # end foreach project + + # ============================================================================= + # 12. RESOURCE PROVISIONING STATE + # ============================================================================= + Write-Host "--- 12. Resource Provisioning State ---" + + $acctState = $acctDetail.properties.provisioningState + if ($acctState -eq 'Succeeded') { + Pass "AI Services account: $acctState" + } else { + Fail "AI Services account: $acctState" + } + + $depTypes = @( + 'Microsoft.Storage/storageAccounts', + 'Microsoft.DocumentDB/databaseAccounts', + 'Microsoft.Search/searchServices', + 'Microsoft.ContainerRegistry/registries', + 'Microsoft.ApiManagement/service' + ) + foreach ($depType in $depTypes) { + $typeName = $depType.Split('/')[-1] + # Use the merged typed arrays (includes BYO resources from other RGs) + $resources = switch ($depType) { + 'Microsoft.Storage/storageAccounts' { $storageAccounts } + 'Microsoft.DocumentDB/databaseAccounts' { $cosmosAccounts } + 'Microsoft.Search/searchServices' { $searchServices } + 'Microsoft.ContainerRegistry/registries' { $containerRegistries } + 'Microsoft.ApiManagement/service' { $apimServices } + } + foreach ($res in $resources) { + $resDetail = az resource show --ids $res.id -o json 2>$null | ConvertFrom-Json + $state = $resDetail.properties.provisioningState + if (-not $state) { $state = "Unknown" } + $resRG = ($res.id -split '/')[4] + $rgLabel = if ($resRG -ne $ResourceGroup) { " (BYO RG: $resRG)" } else { "" } + if ($state -eq 'Succeeded') { + Pass "$typeName '$($res.name)'${rgLabel}: $state" + } else { + Fail "$typeName '$($res.name)'${rgLabel}: $state" + } + } + if ($resources.Count -eq 0) { + Info "No $typeName found in RG or via BYO discovery" + } + } + Write-Host "" + + # ============================================================================= + # 13. PUBLIC NETWORK ACCESS + AI SERVICES ACLS (Lockdown Audit) + # ============================================================================= + Write-Host "--- 13. Public Network Access and ACL Lockdown ---" + + $publicAccess = $acctDetail.properties.publicNetworkAccess + if ($publicAccess -eq 'Disabled') { + Pass "AI Services publicNetworkAccess: Disabled" + } elseif ($publicAccess -eq 'Enabled') { + Warn "AI Services publicNetworkAccess: Enabled (expected Disabled for private network setup)" + } else { + Info "AI Services publicNetworkAccess: $publicAccess" + } + + # AI Services network ACLs + $networkAcls = $acctDetail.properties.networkAcls + if ($networkAcls) { + $defaultAction = $networkAcls.defaultAction + $bypass = $networkAcls.bypass + + if ($defaultAction -eq 'Deny') { + Pass "AI Services ACL defaultAction: Deny" + } else { + Warn "AI Services ACL defaultAction: $defaultAction (expected Deny for private setup)" + } + + if ($bypass -match 'AzureServices') { + Pass "AI Services ACL bypass includes AzureServices" + } else { + Warn "AI Services ACL bypass does not include AzureServices — trusted Azure services may be blocked" + } + + if ($networkAcls.ipRules -and $networkAcls.ipRules.Count -gt 0) { + Warn "AI Services ACL has $($networkAcls.ipRules.Count) IP rule(s) — may allow public access" + } + if ($networkAcls.virtualNetworkRules -and $networkAcls.virtualNetworkRules.Count -gt 0) { + Info "AI Services ACL has $($networkAcls.virtualNetworkRules.Count) VNet rule(s)" + } + } else { + Warn "No network ACLs configured on AI Services account" + } + + # Check Storage public access + foreach ($sa in $storageAccounts) { + $saDetail = az storage account show --ids $sa.id -o json 2>$null | ConvertFrom-Json + $saPNA = $saDetail.publicNetworkAccess + if ($saPNA -eq 'Disabled') { + Pass "Storage '$($sa.name)' publicNetworkAccess: Disabled" + } else { + Warn "Storage '$($sa.name)' publicNetworkAccess: $saPNA (expected Disabled)" + } + } + + # Check Cosmos DB public access + auth + foreach ($cdb in $cosmosAccounts) { + $cdbDetail = az cosmosdb show --ids $cdb.id -o json 2>$null | ConvertFrom-Json + $cdbPNA = $cdbDetail.publicNetworkAccess + if ($cdbPNA -eq 'Disabled') { + Pass "Cosmos DB '$($cdb.name)' publicNetworkAccess: Disabled" + } else { + Warn "Cosmos DB '$($cdb.name)' publicNetworkAccess: $cdbPNA (expected Disabled)" + } + if ($cdbDetail.disableLocalAuth -eq $true) { + Pass "Cosmos DB '$($cdb.name)' disableLocalAuth: true (Entra-only)" + } else { + Warn "Cosmos DB '$($cdb.name)' disableLocalAuth: false — key-based auth enabled (expected true for private setups)" + } + } + + # Check AI Search public access + auth + foreach ($ss in $searchServices) { + $ssDetail = az resource show --ids $ss.id --api-version 2025-05-01 -o json 2>$null | ConvertFrom-Json + $ssPNA = $ssDetail.properties.publicNetworkAccess ?? $ssDetail.properties.publicInternetAccess + if ($ssPNA -eq 'disabled' -or $ssPNA -eq 'Disabled') { + Pass "AI Search '$($ss.name)' publicNetworkAccess: Disabled" + } elseif (-not $ssPNA) { + Warn "AI Search '$($ss.name)' publicNetworkAccess: unknown (API call may have failed)" + } else { + Warn "AI Search '$($ss.name)' publicNetworkAccess: $ssPNA (expected Disabled)" + } + # Auth check + $ssLocalAuth = $ssDetail.properties.disableLocalAuth + $ssAuthOptions = $ssDetail.properties.authOptions + if ($ssLocalAuth -eq $true) { + Pass "AI Search '$($ss.name)' disableLocalAuth: true (Entra-only)" + } elseif ($ssAuthOptions -and $ssAuthOptions.aadOrApiKey) { + Pass "AI Search '$($ss.name)' authOptions: aadOrApiKey (AAD accepted alongside API key)" + } elseif ($ssLocalAuth -eq $false -and -not $ssAuthOptions) { + Warn "AI Search '$($ss.name)' disableLocalAuth: false with no aadOrApiKey — API-key only, AAD tokens rejected" + } else { + Info "AI Search '$($ss.name)' auth: disableLocalAuth=$ssLocalAuth" + } + } + + # Check ACR public access + foreach ($acr in $containerRegistries) { + $acrDetail = az acr show --ids $acr.id -o json 2>$null | ConvertFrom-Json + $acrPNA = $acrDetail.publicNetworkAccess + if ($acrPNA -eq 'Disabled') { + Pass "ACR '$($acr.name)' publicNetworkAccess: Disabled" + } else { + Info "ACR '$($acr.name)' publicNetworkAccess: $acrPNA (developer access mode — verify IP allowlist is configured)" + } + } + + # Check APIM public access + foreach ($apim in $apimServices) { + $apimDetail = az resource show --ids $apim.id -o json 2>$null | ConvertFrom-Json + $apimPNA = $apimDetail.properties.publicNetworkAccess + if ($apimPNA -eq 'Disabled') { + Pass "APIM '$($apim.name)' publicNetworkAccess: Disabled" + } else { + Warn "APIM '$($apim.name)' publicNetworkAccess: $apimPNA (expected Disabled for private setup)" + } + } + + # AI Services local auth + $acctLocalAuth = $acctDetail.properties.disableLocalAuth + if ($acctLocalAuth -eq $true) { + Pass "AI Services '$acctName' disableLocalAuth: true (Entra-only)" + } else { + Info "AI Services '$acctName' disableLocalAuth: false (API keys enabled — expected for agent setups)" + } + + Info "(Storage, Cosmos DB, AI Search, ACR, and APIM network rules are checked in section 7 — Private Endpoints)" + Write-Host "" + + # ============================================================================= + # 14. MODEL DEPLOYMENT + # ============================================================================= + Write-Host "--- 14. Model Deployments ---" + + $deploymentsJson = az cognitiveservices account deployment list --name $acctName --resource-group $ResourceGroup -o json 2>$null | ConvertFrom-Json + if ($deploymentsJson -and $deploymentsJson.Count -gt 0) { + foreach ($dep in $deploymentsJson) { + $depName = $dep.name + $depState = $dep.properties.provisioningState + $modelName = $dep.properties.model.name + $modelVersion = $dep.properties.model.version + $sku = $dep.sku.name + $capacity = $dep.sku.capacity + + if ($depState -eq 'Succeeded') { + Pass "Model '$depName' ($modelName v$modelVersion, $sku, ${capacity} TPM): $depState" + } else { + Fail "Model '$depName' ($modelName v$modelVersion): $depState" + } + } + } else { + Warn "No model deployments found on account '$acctName'" + } + Write-Host "" + +} # end foreach account + +# ============================================================================= +# 15. AZURE POLICY COMPLIANCE (summary only — Deny policies can block redeployments) +# ============================================================================= +Write-Host "--- 15. Azure Policy Compliance ---" + +$policyStates = az policy state list --resource-group $ResourceGroup --filter "complianceState eq 'NonCompliant'" -o json 2>$null | ConvertFrom-Json + +if ($policyStates -and $policyStates.Count -gt 0) { + $denyPolicies = @($policyStates | Where-Object { $_.policyDefinitionAction -in 'deny', 'Deny' }) + $otherPolicies = @($policyStates | Where-Object { $_.policyDefinitionAction -notin 'deny', 'Deny' }) + + if ($denyPolicies.Count -gt 0) { + $denyGrouped = $denyPolicies | Group-Object -Property policyAssignmentName + Warn "$($denyPolicies.Count) Deny policy evaluation(s) across $($denyGrouped.Count) assignment(s) — may block redeployment" + foreach ($g in $denyGrouped) { + $resources = ($g.Group | ForEach-Object { ($_.resourceId -split '/')[-1] } | Select-Object -Unique) -join ', ' + Detail " Assignment '$($g.Name)': $resources" + } + } else { + Pass "No Deny policies — redeployments will not be blocked by policy" + } + + if ($otherPolicies.Count -gt 0) { + Info "$($otherPolicies.Count) non-blocking policy evaluation(s) (Audit/DINE) — informational only" + } +} else { + Pass "No non-compliant Azure Policy evaluations in resource group" +} +Write-Host "" + +# ============================================================================= +# SUMMARY +# ============================================================================= +Write-Host "" +Write-Host "========================================" +Write-Host "Diagnostic Summary" +Write-Host "========================================" +Write-Host " Passed: $($script:PassCount)" -ForegroundColor Green +Write-Host " Failed: $($script:FailCount)" -ForegroundColor $(if ($script:FailCount -gt 0) { 'Red' } else { 'Green' }) +Write-Host " Warnings: $($script:WarnCount)" -ForegroundColor $(if ($script:WarnCount -gt 0) { 'Yellow' } else { 'Green' }) +Write-Host "========================================" + +if ($script:FailCount -gt 0) { + Write-Host "" + Write-Host "Common remediation steps:" -ForegroundColor Yellow + Write-Host " - Net injection missing: Network injection must be configured on the AI Services account (section 2)" + Write-Host " - SAL missing: Agent subnet has no SAL. Caphost may not have provisioned (section 3)" + Write-Host " - NSG blocking: Ensure outbound 443 to AzureCloud and inbound from VNet on PE/MCP subnets (section 4)" + Write-Host " - Missing DNS link: 'az network private-dns link vnet create' to link zone to your VNet (section 5)" + Write-Host " - Custom DNS: Add conditional forwarders to 168.63.129.16 for all privatelink.* zones (section 6)" + Write-Host " - PE 'Pending': Approve via portal or 'az network private-endpoint-connection approve' (section 7)" + Write-Host " - Caphost 'Failed': Delete caphost, use a NEW VNet/subnet, and re-deploy (section 9)" + Write-Host " - Missing connection: Re-deploy or manually create project connections for Cosmos/Storage/Search (section 10)" + Write-Host " - Missing RBAC: Pre-caphost roles must exist before caphost creation. Re-deploy or assign manually (section 11)" + Write-Host " - Policy 'Deny': Review Azure Policy assignments. Exempt or adjust policies blocking deployment (section 15)" + Write-Host "" + Write-Host "Docs: https://learn.microsoft.com/azure/ai-foundry/how-to/configure-private-link" + Write-Host " https://learn.microsoft.com/azure/ai-services/agents/how-to/virtual-networks" + exit 1 +} else { + Write-Host "" + Write-Host "All checks passed. If agents still fail, test from within the VNet (VPN/Bastion)." -ForegroundColor Green + exit 0 +} diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic.config.sample b/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic.config.sample new file mode 100644 index 000000000..de4d9d300 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/diagnostic/diagnostic.config.sample @@ -0,0 +1,9 @@ +# Post-Deployment Diagnostic Configuration +# Uncomment and fill in the values below. + +# Required +SubscriptionId= +ResourceGroup= + +# Optional — auto-discovered if omitted +# AccountName= diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/README.md b/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/README.md new file mode 100644 index 000000000..7fa7edb73 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/README.md @@ -0,0 +1,156 @@ +--- +description: Shared Private Link (SPL) setup from Azure AI Search to AI Services / Foundry for private network scenarios. +page_type: tool +products: +- azure +- azure-resource-manager +languages: +- bicep +--- + +# AI Search → AI Services Shared Private Link + +## Overview + +When Azure AI Services (the Foundry account) has `publicNetworkAccess=Disabled`, Azure AI Search's vectorizer, indexer enrichment skills, and hosted model skills fail because those calls originate from AI Search's **managed backend infrastructure** — outside your VNet. + +Private endpoints in your VNet only cover **inbound** traffic to AI Services. They do nothing for **outbound** calls from AI Search's managed infrastructure. + +A **Shared Private Link (SPL)** provisions a private endpoint **from** AI Search's managed infrastructure **into** AI Services via Azure Private Link — no public access required. + +--- + +## What Gets Created + +This template deploys **three** SPL resources from AI Search into a single AI Services account. All three are required for full Foundry coverage: + +| SPL Name | Group ID | Purpose | +|----------|----------|---------| +| `-openai` | `openai_account` | Vectorizer — query-time embedding via integrated vectorization | +| `-cogsvc` | `cognitiveservices_account` | Built-in AI enrichment skills (OCR, entity extraction, key phrases) and Foundry billing link | +| `-foundry` | `foundry_account` | Azure-hosted model skills — GenAI prompt skill, Azure OpenAI embedding skill, Content Understanding skill | + +> **Important**: The standard private endpoint `groupId` value `account` does **not** work for SPLs. Using it returns: `BadRequest: Cannot create private endpoint for requested type 'account'`. + +> **Not needed for**: Cosmos DB and Storage — they are passive data stores and never initiate outbound calls from AI Search. + +--- + +## Prerequisites + +1. **Azure AI Search** — S1 or higher recommended. +2. **Azure AI Services / Foundry account** already deployed with `publicNetworkAccess=Disabled` (or about to be disabled). +3. Both resources must be in the same Azure subscription for SPL creation. +4. **Permissions**: + - `Microsoft.Search/searchServices/sharedPrivateLinkResources/write` on the AI Search resource. + - `Microsoft.CognitiveServices/accounts/privateEndpointConnections/write` on the AI Services resource (for approval). + +--- + +## Parameters + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `aiSearchName` | Name of the Azure AI Search service | — | Yes | +| `aiServicesResourceId` | Full ARM resource ID of the AI Services / Foundry account | — | Yes | +| `splNamePrefix` | Prefix for SPL resource names. Change if connecting multiple AI Services accounts to the same AI Search instance. | `foundry-spl` | No | + +--- + +## Usage + +### 1. Fill in the parameter file + +Edit `ai-search-shared-private-link.bicepparam`: + +```bicep +using './ai-search-shared-private-link.bicep' + +param aiSearchName = 'my-ai-search' +param aiServicesResourceId = '/subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/' + +// Uncomment to customize the SPL name prefix (e.g., when targeting multiple AI Services accounts): +// param splNamePrefix = 'team2-spl' +``` + +### 2. Deploy + +```bash +az deployment group create \ + --resource-group \ + --template-file ai-search-shared-private-link.bicep \ + --parameters ai-search-shared-private-link.bicepparam +``` + +### 3. Approve the pending private endpoint connections + +After deployment, the three SPLs are in **Pending** state. They must be approved on the AI Services side before traffic can flow. + +**Option A — Azure Portal**: +Navigate to **AI Services resource → Networking → Private endpoint connections** and approve each pending connection. + +**Option B — Azure CLI**: +```bash +# List pending connections +az network private-endpoint-connection list \ + --id /subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/ + +# Approve each connection +az network private-endpoint-connection approve \ + --id \ + --description "Approved for AI Search SPL" +``` + +> **Note**: The SPLs will not route traffic until approved. Approval is a one-time step per SPL. + +--- + +## Redeployment + +Re-running the deployment is safe — ARM PUT operations are idempotent. Existing SPLs are updated in place. However, if a connection was previously **rejected**, re-deployment creates a new pending connection that must be approved again. + +--- + +## Limitations + +### How the templates connect AI Search → AI Services today + +The private network templates ([15](../../infrastructure-setup-bicep/15-private-network-standard-agent-setup/), [16](../../infrastructure-setup-bicep/16-private-network-standard-agent-apim-setup/), [17](../../infrastructure-setup-bicep/17-private-network-standard-user-assigned-identity-agent-setup/), [18](../../infrastructure-setup-bicep/18-managed-virtual-network/), [19](../../infrastructure-setup-bicep/19-private-network-agent-tools/)) deploy: + +| Resource | Setting | Template Value | +|----------|---------|---------------| +| AI Services (Foundry) | `publicNetworkAccess` | `Disabled` | +| AI Services (Foundry) | `networkAcls.bypass` | `AzureServices` (trusted services bypass **on**) | +| AI Services (Foundry) | `networkAcls.defaultAction` | `Deny` | +| AI Search | SKU | `standard` (S1) | +| AI Search | `publicNetworkAccess` | `disabled` | +| AI Search | `networkRuleSet.bypass` | `None` | +| AI Search | SPLs | **None deployed** | + +**Default behavior**: AI Search reaches AI Services via the **trusted services bypass** (`bypass: 'AzureServices'`). `Microsoft.Search` is on the [trusted services list](https://learn.microsoft.com/en-us/azure/ai-services/cognitive-services-virtual-networks#grant-access-to-trusted-azure-services-for-azure-openai), so AI Search's managed infrastructure can connect without SPLs. + +**With this tool**: Deploy SPLs and then set AI Services `bypass` to `'None'` for zero-trust — only your specific AI Search instance has private access. + +### SPL operational constraints + +Per [SPL documentation](https://learn.microsoft.com/en-us/azure/search/search-indexer-howto-access-private) and [service limits](https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity#shared-private-link-resource-limits): + +| Constraint | Details | Source | +|------------|---------|--------| +| **`openai_account` — public cloud + Azure Gov only** | Other sovereign clouds do not support `openai_account` SPLs. | [SPL docs footnote 7](https://learn.microsoft.com/en-us/azure/search/search-indexer-howto-access-private#prerequisites) | +| **One SPL per resource + groupId** | Only one SPL can exist per resource and subresource (`groupId`) combination on a search service. | [SPL docs](https://learn.microsoft.com/en-us/azure/search/search-indexer-howto-access-private#when-to-use-a-shared-private-link) | +| **Billed feature** | Shared private links are billed through [Azure Private Link pricing](https://azure.microsoft.com/pricing/details/private-link/). | [SPL docs](https://learn.microsoft.com/en-us/azure/search/search-indexer-howto-access-private) | + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---------|-------|-----| +| `BadRequest: Cannot create private endpoint for requested type 'account'` | Wrong `groupId` — used `account` instead of the SPL-specific values | Use `openai_account`, `cognitiveservices_account`, or `foundry_account` | +| SPL stuck in **Pending** state | Connection not approved on AI Services side | Approve the connection (see step 3 above) | +| SPL shows **Rejected** | Someone rejected the connection on AI Services side | Delete the SPL, redeploy, and approve the new connection | +| Vectorizer still fails after SPL approval | DNS resolution not updated yet | Wait a few minutes for DNS propagation, or verify the private DNS zone for `*.openai.azure.com` is linked to your VNet | +| `Conflict` error during deployment | AI Search only accepts one SPL write at a time; parallel writes conflict | The template handles this via `dependsOn` — if running manually, serialize the operations | + +--- diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicep b/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicep new file mode 100644 index 000000000..c9a664aff --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicep @@ -0,0 +1,121 @@ +/* + PATCH: AI Search → AI Services Shared Private Link + --------------------------------------------------- + Applies to: BYO Azure AI Search scenarios where publicNetworkAccess is Disabled + on AI Services (the Foundry account). + + PROBLEM: + Azure AI Search's vectorizer and indexer AI enrichment skills call AI Services + outbound from AI Search's managed backend — which lives outside your VNet. + When AI Services has publicNetworkAccess=Disabled, those calls are rejected. + Private endpoints in your VNet only cover INBOUND traffic to AI Services. + They do nothing for OUTBOUND calls from AI Search's managed infrastructure. + + SOLUTION: + A Shared Private Link provisions a private endpoint FROM AI Search's managed + infra INTO AI Services via Azure Private Link — no public access required. + + USAGE: + az deployment group create \ + --resource-group \ + --template-file ai-search-shared-private-link.bicep \ + --parameters ai-search-shared-private-link.bicepparam + + AFTER DEPLOYMENT: + Approve the pending private endpoint + connection on the AI Services side. The link is not active until approved. + + SCOPE: + Three SPLs are required for full Foundry coverage — all target the same AI Services + resource ID but each uses a different groupId required by the AI Search SPL API: + + openai_account — vectorizer (query-time embedding, integrated vectorization) + cognitiveservices_account — built-in AI enrichment skills (OCR, entity extraction, + key phrases) and their Foundry billing link + foundry_account — Azure-hosted model skills: GenAI prompt skill, + Azure OpenAI embedding skill, Content Understanding skill + + NOTE: 'account' (the standard PE groupId for Cognitive Services) is NOT valid here — + it causes: BadRequest: Cannot create private endpoint for requested type 'account'. + + Not needed: CosmosDB and Storage are passive data stores — they never initiate + outbound calls, so no shared private link is required for them. +*/ + +@minLength(1) +@description('Name of the Azure AI Search service (BYO resource).') +param aiSearchName string + +@minLength(1) +@description('Full resource ID of the Azure AI Services / Foundry account to connect to. Format: /subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/') +param aiServicesResourceId string + +@description('Prefix for SPL resource names. Change this if you need multiple SPL sets targeting different AI Services accounts from the same AI Search instance.') +param splNamePrefix string = 'foundry-spl' + +// --------------------------------------------------------------------------- +// Reference the existing AI Search service in the deployment resource group +// --------------------------------------------------------------------------- +resource searchService 'Microsoft.Search/searchServices@2024-03-01-preview' existing = { + name: aiSearchName +} + +// --------------------------------------------------------------------------- +// SPL 1 — openai_account +// Covers: vectorizer (query-time embedding via integrated vectorization) +// --------------------------------------------------------------------------- +resource splOpenAI 'Microsoft.Search/searchServices/sharedPrivateLinkResources@2024-03-01-preview' = { + parent: searchService + name: '${splNamePrefix}-openai' + properties: { + privateLinkResourceId: aiServicesResourceId + groupId: 'openai_account' + requestMessage: 'AI Search vectorizer requires private access to AI Services (openai_account)' + } +} + +// --------------------------------------------------------------------------- +// SPL 2 — cognitiveservices_account +// Covers: built-in AI enrichment skills (OCR, entity extraction, etc.) and +// the Foundry billing link used by skillsets +// dependsOn splOpenAI: AI Search only accepts one SPL write at a time +// --------------------------------------------------------------------------- +resource splCogSvc 'Microsoft.Search/searchServices/sharedPrivateLinkResources@2024-03-01-preview' = { + parent: searchService + name: '${splNamePrefix}-cogsvc' + properties: { + privateLinkResourceId: aiServicesResourceId + groupId: 'cognitiveservices_account' + requestMessage: 'AI Search enrichment skills require private access to AI Services (cognitiveservices_account)' + } + dependsOn: [splOpenAI] +} + +// --------------------------------------------------------------------------- +// SPL 3 — foundry_account +// Covers: Azure-hosted model skills — GenAI prompt skill, Azure OpenAI +// embedding skill, Content Understanding skill +// dependsOn splCogSvc: AI Search only accepts one SPL write at a time +// --------------------------------------------------------------------------- +resource splFoundry 'Microsoft.Search/searchServices/sharedPrivateLinkResources@2024-03-01-preview' = { + parent: searchService + name: '${splNamePrefix}-foundry' + properties: { + privateLinkResourceId: aiServicesResourceId + groupId: 'foundry_account' + requestMessage: 'AI Search model skills require private access to AI Services (foundry_account)' + } + dependsOn: [splCogSvc] +} + +// --------------------------------------------------------------------------- +// Outputs +// --------------------------------------------------------------------------- +@description('Provisioning state of the openai_account SPL (vectorizer).') +output splOpenAIState string = splOpenAI.properties.status + +@description('Provisioning state of the cognitiveservices_account SPL (enrichment skills).') +output splCogSvcState string = splCogSvc.properties.status + +@description('Provisioning state of the foundry_account SPL (hosted model skills).') +output splFoundryState string = splFoundry.properties.status diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicepparam b/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicepparam new file mode 100644 index 000000000..187783c17 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/networking/ai-search-shared-private-link.bicepparam @@ -0,0 +1,14 @@ +using './ai-search-shared-private-link.bicep' + +// Name of the BYO AI Search service +param aiSearchName = '' + +// Full resource ID of the Foundry AI Services account (with unique suffix) +// Three SPLs are deployed automatically targeting this resource: +// -openai (openai_account) — vectorizer +// -cogsvc (cognitiveservices_account) — enrichment skills + billing +// -foundry (foundry_account) — hosted model skills +param aiServicesResourceId = '/subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/' + +// SPL name prefix — change only if you connect multiple AI Services accounts to the same AI Search +// param splNamePrefix = 'foundry-spl' diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/README.md b/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/README.md new file mode 100644 index 000000000..2b7fd9952 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/README.md @@ -0,0 +1,157 @@ +# Preflight Check for Foundry Private Network Deployments + +`preflight-check.ps1` validates your Azure environment **before** running `az deployment group create`. It catches common misconfigurations that would otherwise surface as cryptic ARM errors mid-deploy — saving you from failed deployments, wasted time, and difficult-to-diagnose issues. + +## Why Run Preflight Checks? + +ARM template deployments can fail 10–20 minutes in with opaque error messages. By that point resources may be partially created, leaving your environment in an inconsistent state that requires manual cleanup. This script validates everything upfront so you can fix issues before they become problems. + +## What It Checks + +### 0. Azure CLI Login & Subscription + +| Check | What It Prevents | +|---|---| +| Azure CLI is logged in | Script failures when `az` commands can't authenticate | +| Subscription is accessible | Deployment against a subscription you don't have access to | +| Active subscription matches the requested one | Running checks (and later deploying) against the wrong subscription | +| Location is a valid Azure region | Deployments targeting a misspelled or non-existent region (accepts both display names like "Sweden Central" and API names like "swedencentral") | + +### 1. Resource Provider Registration + +| Check | What It Prevents | +|---|---| +| Required providers are registered: `Microsoft.CognitiveServices`, `Microsoft.Storage`, `Microsoft.Search`, `Microsoft.DocumentDB`, `Microsoft.Network`, `Microsoft.App`, `Microsoft.KeyVault`, `Microsoft.MachineLearningServices`, `Microsoft.ContainerService` | Deployment failures with "resource provider not registered" errors, which require registration and a retry | +| Optional providers are checked: `Microsoft.Bing` (Bing Search tool), `Microsoft.ApiManagement` (APIM setups), `Microsoft.Web` (Azure Functions agent tools), `Microsoft.ManagedIdentity` (user-assigned identity setups), `Microsoft.ContainerRegistry` (most templates enable ACR) | Warns if an optional provider needed for your scenario is not registered | + +### 2. Resource Group State + +| Check | What It Prevents | +|---|---| +| Resource group exists and its location matches the deployment location | Cross-region failures where private endpoints reference `resourceGroup().location` but the RG is in a different region than intended | +| Existing AI accounts in the resource group | Accidental creation of duplicate resources with timestamp-based naming when re-deploying, instead of reusing existing ones | +| Soft-deleted Cognitive Services accounts | Name collision failures when a new deployment tries to create an account with the same name as a soft-deleted one | + +### 3. BYO (Bring-Your-Own) Resource Validation + +When you pass existing resource IDs, the script validates them before the template tries to reference them. + +| Check | What It Prevents | +|---|---| +| ARM resource ID format is valid (correct segments, valid subscription GUID) | Template failures caused by malformed resource IDs | +| AI Search resource exists and is accessible | References to non-existent or inaccessible Search services | +| AI Search SKU is not `free` | Private endpoint creation failures — free tier doesn't support private endpoints | +| AI Search has AAD authentication enabled (`disableLocalAuth` or `aadOrApiKey`) | Bicep deployment failures when AAD auth is not configured — provides fix commands | +| Storage Account exists and is accessible | References to non-existent or inaccessible storage accounts | +| Storage Account kind is `StorageV2` | Feature incompatibilities (e.g., file shares) with non-StorageV2 account kinds | +| Cosmos DB exists and is accessible | References to non-existent or inaccessible Cosmos DB accounts | +| Cosmos DB `disableLocalAuth` is enabled | Silent failures in Foundry role assignments when key-based auth is still active | +| API Management instance exists (when `ApiManagementResourceId` provided) | References to non-existent or inaccessible APIM instances in private network setups | +| Fabric Workspace exists (when `FabricWorkspaceResourceId` provided) | References to non-existent or inaccessible Fabric workspaces in agent tools / MCP setups | + +### 4. VNet and Subnet Validation + +When using an existing VNet (`-ExistingVnetId`), the script performs deep network checks. + +| Check | What It Prevents | +|---|---| +| VNet exists and its location matches the deployment location | Private endpoint failures when VNet and deployment are in different regions | +| Expected subnets exist (`agent-subnet`, `pe-subnet`, `mcp-subnet`) with per-subnet guidance on which scenarios need which subnet | Unclear errors when the template expects subnets that don't exist yet | +| No Service Association Links (SALs) on subnets | Deployment will fail — the platform cannot inject into a subnet already owned by another resource (SAL holder type is reported, e.g. `Microsoft.App/environments` on agent/mcp subnets) | +| Agent and MCP subnets are delegated to `Microsoft.App/environments` | Container App environment provisioning failures due to missing or incorrect delegation | +| PE subnet has enough usable IPs (4 base PEs for AI Services, Search, Storage, Cosmos DB — plus 1 each for APIM and/or Fabric when configured) | Private endpoint creation failures when the subnet is too small (recommends /24, minimum /28) | + +### 5. DNS Zone Conflict Detection + +| Check | What It Prevents | +|---|---| +| Checks for existing private DNS zones in the resource group: `privatelink.services.ai.azure.com`, `privatelink.openai.azure.com`, `privatelink.cognitiveservices.azure.com`, `privatelink.search.windows.net`, `privatelink.blob.core.windows.net`, `privatelink.documents.azure.com`, `privatelink.azurecr.io` (ACR), `privatelink.azure-api.net` (APIM), `privatelink.fabric.microsoft.com` (Fabric) | VNet link creation failures when the template tries to create a DNS zone that already exists and is already linked | + +### 6. Model and Cosmos DB Quota Checks + +Model checks only run when `ModelName`, `ModelFormat`, `ModelSkuName`, and `ModelCapacity` are all provided. Cosmos DB throughput check only runs when `CosmosDBResourceId` is provided. + +| Check | What It Prevents | +|---|---| +| Model is available in the target region (name + format) | Deployment failures when the requested model isn't available in the selected region | +| Sufficient TPM (tokens per minute) quota for the requested model SKU and capacity | Model deployment failures due to insufficient quota — directs you to the quota increase page | +| Cosmos DB account throughput cap is at least 3000 RU/s (3 containers × 1000 RU/s per project), or no cap is set | Agent service failures when a hard throughput cap on the account would block the required per-container provisioning | + +### 7. Resource Quota Checks + +| Check | What It Prevents | +|---|---| +| AI Search Standard tier quota in the target region | Search service creation failures due to exhausted quota | +| Storage account count in the region (limit: 250) | Storage account creation failures when approaching or exceeding the per-region limit | +| VNet and Private Endpoint quotas in the target region | Network resource creation failures when quotas are exhausted or near capacity (warns at 80%) | + +## Usage + +```powershell +cd infrastructure/infrastructure-setup-bicep/deployment-tools/preflight +``` + +### Option 1: Config File + +Copy the sample config, fill in your values, and run: + +```powershell +cp preflight.config.sample preflight.config +# Edit preflight.config with your values +.\preflight-check.ps1 -ConfigFile .\preflight.config +``` + +### Option 2: Command-Line Parameters + +```powershell +# Minimal (new VNet, new resources) +.\preflight-check.ps1 -SubscriptionId "your-sub-id" -ResourceGroup "my-rg" -Location "swedencentral" + +# With BYO VNet and AI Search +.\preflight-check.ps1 -SubscriptionId "your-sub-id" -ResourceGroup "my-rg" -Location "swedencentral" ` + -ExistingVnetId "/subscriptions/.../virtualNetworks/my-vnet" ` + -AiSearchResourceId "/subscriptions/.../searchServices/my-search" +``` + +Command-line parameters override config file values. + +### Parameters + +| Parameter | Required | Description | +|---|---|---| +| `ConfigFile` | No | Path to a key=value config file (see `preflight.config.sample`) | +| `SubscriptionId` | Yes* | Azure subscription ID | +| `ResourceGroup` | Yes* | Target resource group | +| `Location` | Yes* | Deployment region (e.g., `swedencentral`) | +| `ExistingVnetId` | No | Full ARM resource ID of an existing VNet | +| `AiSearchResourceId` | No | Full ARM resource ID of an existing AI Search resource | +| `StorageAccountResourceId` | No | Full ARM resource ID of an existing Storage Account | +| `CosmosDBResourceId` | No | Full ARM resource ID of an existing Cosmos DB account | +| `ApiManagementResourceId` | No | Full ARM resource ID of an existing API Management instance (for APIM private network setup) | +| `FabricWorkspaceResourceId` | No | Full ARM resource ID of an existing Fabric Workspace (for agent tools / MCP setup) | +| `ModelName` | No | Model name for quota checks (e.g., `gpt-4o`) | +| `ModelFormat` | No | Model format — must match `az cognitiveservices model list` output (e.g., `OpenAI`, `Mistral AI`) | +| `ModelSkuName` | No | Model SKU name (e.g., `Standard`, `GlobalStandard`) | +| `ModelCapacity` | No | Requested TPM capacity in thousands (e.g., `10` = 10K TPM) | + +\* Can be provided via config file instead. + +## Output + +The script produces color-coded output: + +- **[PASS]** (green) — Check passed +- **[FAIL]** (red) — Must fix before deploying +- **[WARN]** (yellow) — Potential issue, review recommended +- **[INFO]** (cyan) — Informational + +A summary at the end shows total pass/fail/warn counts. The script sets exit code `1` if any checks fail, `0` if all pass. + +## Prerequisites + +- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed and logged in (`az login`) +- Active subscription set to the target subscription: + ```powershell + az account set --subscription + ``` +- Sufficient permissions to read resources in the target subscription diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight-check.ps1 b/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight-check.ps1 new file mode 100644 index 000000000..5bde78a05 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight-check.ps1 @@ -0,0 +1,705 @@ +<# +.SYNOPSIS + Pre-deployment validation for Foundry private network templates. + +.DESCRIPTION + Validates prerequisites before running az deployment group create. + Catches common misconfigurations that would otherwise surface as + cryptic ARM errors mid-deploy. + + Checks performed: + - Resource provider registration + - Resource group state (existing resources, soft-deleted accounts) + - BYO resource validation (existence, SKU, configuration) + - VNet and subnet validation (SALs, delegations, PE capacity) + - DNS zone conflicts + +.PARAMETER ConfigFile + Path to a config file with key=value pairs. See preflight.config.sample for format. + When provided, SubscriptionId, ResourceGroup, and Location are read from the file. + +.PARAMETER SubscriptionId + Azure subscription ID (or set in config file) + +.PARAMETER ResourceGroup + Target resource group for deployment (or set in config file) + +.PARAMETER Location + Deployment region. Accepts either display name (e.g. "Sweden Central") + or API name (e.g. "swedencentral") — the script normalizes automatically. + +.PARAMETER ExistingVnetId + Full ARM resource ID of an existing VNet (optional, for BYO VNet scenarios) + +.PARAMETER AiSearchResourceId + Full ARM resource ID of an existing AI Search resource (optional, for BYO) + +.PARAMETER StorageAccountResourceId + Full ARM resource ID of an existing Storage Account (optional, for BYO) + +.PARAMETER CosmosDBResourceId + Full ARM resource ID of an existing Cosmos DB account (optional, for BYO) + +.PARAMETER ApiManagementResourceId + Full ARM resource ID of an existing API Management instance + (optional, for APIM private network setup) + +.PARAMETER FabricWorkspaceResourceId + Full ARM resource ID of an existing Fabric Workspace + (optional, for agent tools / MCP setup) + +.PARAMETER ModelName + Model name for quota checks (e.g., gpt-4o). Leave empty to skip model quota checks. + +.PARAMETER ModelFormat + Model format — must match az cognitiveservices model list output (e.g., OpenAI, Mistral AI) + +.PARAMETER ModelSkuName + Model SKU name (e.g., Standard, GlobalStandard) + +.PARAMETER ModelCapacity + Requested TPM capacity in thousands (e.g., 10 = 10K TPM) + +.EXAMPLE + .\preflight-check.ps1 -ConfigFile .\preflight.config + +.EXAMPLE + .\preflight-check.ps1 -SubscriptionId "xxx" -ResourceGroup "my-rg" -Location "swedencentral" + +.EXAMPLE + .\preflight-check.ps1 -SubscriptionId "xxx" -ResourceGroup "my-rg" -Location "swedencentral" ` + -ExistingVnetId "/subscriptions/.../virtualNetworks/my-vnet" ` + -AiSearchResourceId "/subscriptions/.../searchServices/my-search" +#> + +param( + [string]$ConfigFile = '', + [string]$SubscriptionId = '', + [string]$ResourceGroup = '', + [string]$Location = '', + [string]$ExistingVnetId = '', + [string]$AiSearchResourceId = '', + [string]$StorageAccountResourceId = '', + [string]$CosmosDBResourceId = '', + [string]$ApiManagementResourceId = '', + [string]$FabricWorkspaceResourceId = '', + [string]$ModelName = '', + [string]$ModelFormat = '', + [string]$ModelSkuName = '', + [int]$ModelCapacity = 0 +) + +# --- Load config file if provided --- +if ($ConfigFile) { + if (-not (Test-Path $ConfigFile)) { + Write-Host "[FAIL] Config file not found: $ConfigFile" -ForegroundColor Red + exit 1 + } + $configLines = Get-Content $ConfigFile | Where-Object { $_ -match '^\s*[^#]' -and $_ -match '=' } + $config = @{} + foreach ($line in $configLines) { + $parts = $line -split '=', 2 + $key = $parts[0].Trim() + $val = $parts[1].Trim() + $config[$key] = $val + } + # Config values are used as defaults; explicit params override + if (-not $SubscriptionId -and $config['SubscriptionId']) { $SubscriptionId = $config['SubscriptionId'] } + if (-not $ResourceGroup -and $config['ResourceGroup']) { $ResourceGroup = $config['ResourceGroup'] } + if (-not $Location -and $config['Location']) { $Location = $config['Location'] } + if (-not $ExistingVnetId -and $config['ExistingVnetId']) { $ExistingVnetId = $config['ExistingVnetId'] } + if (-not $AiSearchResourceId -and $config['AiSearchResourceId']) { $AiSearchResourceId = $config['AiSearchResourceId'] } + if (-not $StorageAccountResourceId -and $config['StorageAccountResourceId']) { $StorageAccountResourceId = $config['StorageAccountResourceId'] } + if (-not $CosmosDBResourceId -and $config['CosmosDBResourceId']) { $CosmosDBResourceId = $config['CosmosDBResourceId'] } + if (-not $ApiManagementResourceId -and $config['ApiManagementResourceId']) { $ApiManagementResourceId = $config['ApiManagementResourceId'] } + if (-not $FabricWorkspaceResourceId -and $config['FabricWorkspaceResourceId']) { $FabricWorkspaceResourceId = $config['FabricWorkspaceResourceId'] } + if (-not $ModelName -and $config['ModelName']) { $ModelName = $config['ModelName'] } + if (-not $ModelFormat -and $config['ModelFormat']) { $ModelFormat = $config['ModelFormat'] } + if (-not $ModelSkuName -and $config['ModelSkuName']) { $ModelSkuName = $config['ModelSkuName'] } + if ($ModelCapacity -eq 0 -and $config['ModelCapacity']) { $ModelCapacity = [int]$config['ModelCapacity'] } +} + +# --- Validate required params --- +if (-not $SubscriptionId -or -not $ResourceGroup -or -not $Location) { + Write-Host "ERROR: SubscriptionId, ResourceGroup, and Location are required." -ForegroundColor Red + Write-Host "Provide them as parameters or in a config file (-ConfigFile)." -ForegroundColor Red + Write-Host "" + Write-Host "Usage:" + Write-Host " .\preflight-check.ps1 -ConfigFile .\preflight.config" + Write-Host " .\preflight-check.ps1 -SubscriptionId 'xxx' -ResourceGroup 'my-rg' -Location 'swedencentral'" + exit 1 +} + +# Normalize Location to API format (lowercase, no spaces) so both +# display names like "Sweden Central" and API names like "swedencentral" work. +$originalLocation = $Location +$Location = ($Location -replace '\s','').ToLower() + +$ErrorActionPreference = "Continue" +$script:PassCount = 0 +$script:FailCount = 0 +$script:WarnCount = 0 + +function Pass { param([string]$Msg) Write-Host "[PASS] $Msg" -ForegroundColor Green; $script:PassCount++ } +function Fail { param([string]$Msg) Write-Host "[FAIL] $Msg" -ForegroundColor Red; $script:FailCount++ } +function Warn { param([string]$Msg) Write-Host "[WARN] $Msg" -ForegroundColor Yellow; $script:WarnCount++ } +function Info { param([string]$Msg) Write-Host "[INFO] $Msg" -ForegroundColor Cyan } + +# Helper: validate ARM resource ID format +function Test-ArmResourceId { + param([string]$Id, [string]$Label) + if ([string]::IsNullOrWhiteSpace($Id)) { return $false } + + $segments = $Id -split '/' + # ARM IDs have the form /subscriptions/{guid}/resourceGroups/{rg}/providers/{ns}/{type}/{name} + if ($segments.Count -lt 9) { + Fail "$Label resource ID has too few segments ($($segments.Count)). Expected ARM format: /subscriptions/{guid}/resourceGroups/{rg}/providers/{provider}/{type}/{name}" + return $false + } + $subGuid = $segments[2] + if ($subGuid -notmatch '^[0-9a-fA-F]{8}-([0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}$') { + Fail "$Label resource ID has invalid subscription GUID: $subGuid" + return $false + } + Pass "$Label resource ID format is valid" + return $true +} + +Write-Host "========================================" +Write-Host "Pre-Deployment Validation" +Write-Host "========================================" +Write-Host "Subscription: $SubscriptionId" +Write-Host "Resource Group: $ResourceGroup" +if ($originalLocation -ne $Location) { + Write-Host "Location: $Location (normalized from '$originalLocation')" +} else { + Write-Host "Location: $Location" +} +Write-Host "Existing VNet: $(if ($ExistingVnetId) { $ExistingVnetId } else { '' })" +if ($ApiManagementResourceId) { Write-Host "APIM: $ApiManagementResourceId" } +if ($FabricWorkspaceResourceId) { Write-Host "Fabric: $FabricWorkspaceResourceId" } +Write-Host "" + +# Verify Azure CLI is logged in +$azAccount = az account show -o json 2>$null | ConvertFrom-Json +if (-not $azAccount) { + Write-Host "[FAIL] Not logged in to Azure CLI. Run: az login" -ForegroundColor Red + exit 1 +} + +# Verify subscription access without switching the active context +$subCheck = az account show --subscription $SubscriptionId --query "id" -o tsv 2>$null +if (-not $subCheck) { + Write-Host "[FAIL] Cannot access subscription $SubscriptionId. Verify the ID and your access." -ForegroundColor Red + exit 1 +} +# Ensure the CLI is already pointed at the right subscription +$activeSubId = ($azAccount.id).Trim() +if ($activeSubId -ne $SubscriptionId) { + Write-Host "[FAIL] Active subscription ($activeSubId) does not match requested ($SubscriptionId)." -ForegroundColor Red + Write-Host " Run: az account set --subscription $SubscriptionId" -ForegroundColor Red + exit 1 +} + +# Validate Location is a real Azure region +$validLocations = az account list-locations --query "[].name" -o tsv 2>$null +if ($validLocations) { + $locationList = $validLocations -split "`n" | ForEach-Object { $_.Trim() } | Where-Object { $_ } + if ($locationList -contains $Location) { + Pass "Location '$Location' is a valid Azure region" + } else { + Fail "Location '$Location' is not a valid Azure region. Run: az account list-locations --query ""[].name"" -o tsv" + exit 1 + } +} else { + Warn "Could not query Azure locations. Skipping location validation." +} + +# ============================================================================= +# 1. Resource Provider Registration +# ============================================================================= +Write-Host "--- Resource Provider Registration ---" +$requiredProviders = @( + 'Microsoft.CognitiveServices', + 'Microsoft.Storage', + 'Microsoft.Search', + 'Microsoft.DocumentDB', + 'Microsoft.Network', + 'Microsoft.App', + 'Microsoft.KeyVault', + 'Microsoft.MachineLearningServices', + 'Microsoft.ContainerService' +) + +# Optional providers — warn if not registered instead of failing +$optionalProviders = @{ + 'Microsoft.Bing' = 'Required only if using Grounding with Bing Search tool' + 'Microsoft.ApiManagement' = 'Required only for APIM setups' + 'Microsoft.Web' = 'Required only for agent tools setup (Azure Functions)' + 'Microsoft.ManagedIdentity' = 'Required only for user-assigned identity setups' + 'Microsoft.ContainerRegistry' = 'Required when enableContainerRegistry is true (most templates)' +} + +foreach ($rp in $requiredProviders) { + $state = az provider show --namespace $rp --query "registrationState" -o tsv 2>$null + if ($state -eq 'Registered') { + Pass "$rp is registered" + } else { + Fail "$rp is NOT registered (state: $state). Run: az provider register --namespace '$rp'" + } +} + +foreach ($rp in $optionalProviders.Keys) { + $state = az provider show --namespace $rp --query "registrationState" -o tsv 2>$null + if ($state -eq 'Registered') { + Pass "$rp is registered" + } else { + Warn "$rp is NOT registered. $($optionalProviders[$rp]). Run: az provider register --namespace '$rp'" + } +} + +# ============================================================================= +# 2. Resource Group +# ============================================================================= +Write-Host "" +Write-Host "--- Resource Group ---" +$rgExists = az group exists --name $ResourceGroup 2>$null + +if ($rgExists -eq 'true') { + $rgLocation = az group show --name $ResourceGroup --query "location" -o tsv 2>$null + Pass "Resource group '$ResourceGroup' exists in '$rgLocation'" + + if ($rgLocation -ne $Location) { + Warn "RG location ($rgLocation) differs from deployment location ($Location). Private endpoints use resourceGroup().location — this may cause cross-region failures." + } + + # Check for existing AI accounts (orphan risk with timestamp-based naming) + $existingAI = az cognitiveservices account list --resource-group $ResourceGroup --query "[].name" -o tsv 2>$null + if ($existingAI) { + Warn "Existing AI account(s) found: $existingAI. Re-deploying creates NEW resources with a different suffix. Pass existing resource IDs to reuse them." + } + + # Soft-deleted accounts + $deleted = az cognitiveservices account list-deleted ` + --query "[?contains(id, '/resourceGroups/$ResourceGroup/')].name" -o tsv 2>$null + if ($deleted) { + Warn "Soft-deleted AI account(s) found in this RG: $deleted. If the new deployment uses the same name, it will fail. Purge if needed: az cognitiveservices account purge --name --resource-group $ResourceGroup --location $Location" + } +} else { + Pass "Resource group '$ResourceGroup' will be created" +} + +# ============================================================================= +# 3. BYO Resource Validation +# ============================================================================= + +# Helper: extract subscription and RG from a BYO resource ID and +# flag cross-subscription or cross-resource-group differences. +function Test-ByoResourceContext { + param([string]$Id, [string]$Label) + $segments = $Id -split '/' + $byoSub = $segments[2] + $byoRg = $segments[4] + + # Cross-subscription + if ($byoSub -ne $SubscriptionId) { + Info "$Label is in a different subscription ($byoSub). Templates support cross-subscription BYO — ensure the deploying principal has read access." + } + # Cross-resource-group + if ($byoRg -ne $ResourceGroup) { + Info "$Label is in resource group '$byoRg' (deploying to '$ResourceGroup'). Private endpoints will be created in the deployment RG." + } +} + +if ($AiSearchResourceId -or $StorageAccountResourceId -or $CosmosDBResourceId) { + Write-Host "" + Write-Host "--- BYO Resource Validation ---" +} + +# --- AI Search --- +if ($AiSearchResourceId) { + if (Test-ArmResourceId -Id $AiSearchResourceId -Label "AI Search") { + $searchInfo = az resource show --ids $AiSearchResourceId --query "{sku:sku.name, location:location}" -o json 2>$null | ConvertFrom-Json + if ($searchInfo) { + Pass "AI Search resource exists (SKU: $($searchInfo.sku), Location: $($searchInfo.location))" + Test-ByoResourceContext -Id $AiSearchResourceId -Label "AI Search" + if ($searchInfo.sku -eq 'free') { + Fail "AI Search SKU is 'free'. Free tier does not support private endpoints. Use a dedicated tier (basic or higher)." + } + # Check AAD auth is enabled (replicates validate-search-aad-auth.bicep logic) + $searchAuth = az search service show --ids $AiSearchResourceId --query "{disableLocalAuth:disableLocalAuth, authOptions:authOptions}" -o json 2>$null | ConvertFrom-Json + if ($searchAuth) { + if ($searchAuth.disableLocalAuth -eq $true) { + Pass "AI Search AAD auth: local auth disabled (AAD-only)" + } elseif ($searchAuth.authOptions -and $searchAuth.authOptions.aadOrApiKey) { + Warn "AI Search has local auth enabled. Consider disabling it for private network deployments." + Write-Host " Fix: az search service update --ids $AiSearchResourceId --disable-local-auth true" -ForegroundColor Yellow + } else { + Fail "AI Search does not have AAD authentication enabled. The Bicep deployment will fail." + Write-Host " Fix: az search service update --ids $AiSearchResourceId --auth-options aadOrApiKey --aad-auth-failure-mode http401WithBearerChallenge" -ForegroundColor Yellow + } + } else { + Warn "Could not query AI Search auth settings. Skipping AAD auth check." + } + } else { + Fail "AI Search resource not found or no access: $AiSearchResourceId" + } + } +} + +# --- Storage Account --- +if ($StorageAccountResourceId) { + if (Test-ArmResourceId -Id $StorageAccountResourceId -Label "Storage Account") { + $storageInfo = az resource show --ids $StorageAccountResourceId --query "{kind:kind, location:location}" -o json 2>$null | ConvertFrom-Json + if ($storageInfo) { + Pass "Storage Account exists" + Test-ByoResourceContext -Id $StorageAccountResourceId -Label "Storage Account" + if ($storageInfo.kind -eq "StorageV2") { + Pass "Storage Account kind is StorageV2" + } else { + Warn "Storage Account kind is '$($storageInfo.kind)'. Templates create StorageV2. Some features (e.g., file shares) may not work with other kinds." + } + } else { + Fail "Storage Account not found or no access: $StorageAccountResourceId" + } + } +} + +# --- Cosmos DB --- +if ($CosmosDBResourceId) { + if (Test-ArmResourceId -Id $CosmosDBResourceId -Label "Cosmos DB") { + $cosmosInfo = az cosmosdb show --ids $CosmosDBResourceId --query "{disableLocalAuth:disableLocalAuth, location:location}" -o json 2>$null | ConvertFrom-Json + if ($cosmosInfo) { + Pass "Cosmos DB exists (Location: $($cosmosInfo.location))" + Test-ByoResourceContext -Id $CosmosDBResourceId -Label "Cosmos DB" + if ($cosmosInfo.disableLocalAuth -ne $true) { + Fail "Cosmos DB disableLocalAuth is not true. Foundry requires AAD-only auth. Fix: az resource update --ids $CosmosDBResourceId --set properties.disableLocalAuth=true" + } else { + Pass "Cosmos DB disableLocalAuth is enabled" + } + } else { + Fail "Cosmos DB not found or no access: $CosmosDBResourceId" + } + } +} + +# --- API Management (APIM private network setup - optional) --- +if ($ApiManagementResourceId) { + if (Test-ArmResourceId -Id $ApiManagementResourceId -Label "API Management") { + $apimInfo = az resource show --ids $ApiManagementResourceId --query "{id:id, location:location}" -o json 2>$null | ConvertFrom-Json + if ($apimInfo) { + Pass "API Management instance exists" + Test-ByoResourceContext -Id $ApiManagementResourceId -Label "API Management" + } else { + Fail "API Management not found or no access: $ApiManagementResourceId" + } + } +} + +# --- Fabric Workspace (agent tools / MCP setup - optional) --- +if ($FabricWorkspaceResourceId) { + if (Test-ArmResourceId -Id $FabricWorkspaceResourceId -Label "Fabric Workspace") { + $fabricInfo = az resource show --ids $FabricWorkspaceResourceId --query "{id:id, location:location}" -o json 2>$null | ConvertFrom-Json + if ($fabricInfo) { + Pass "Fabric Workspace exists" + Test-ByoResourceContext -Id $FabricWorkspaceResourceId -Label "Fabric Workspace" + } else { + Fail "Fabric Workspace not found or no access: $FabricWorkspaceResourceId" + } + } +} + +# ============================================================================= +# 4. Existing VNet Validation +# ============================================================================= +if ($ExistingVnetId) { + Write-Host "" + Write-Host "--- Existing VNet Validation ---" + + if ($ExistingVnetId -match 'resourceGroups/([^/]+).*virtualNetworks/([^/]+)') { + $vnetRg = $Matches[1] + $vnetName = $Matches[2] + } else { + Fail "Cannot parse VNet resource ID: $ExistingVnetId" + $vnetRg = $null + } + + if ($vnetRg) { + $vnetLocation = az network vnet show --resource-group $vnetRg --name $vnetName --query "location" -o tsv 2>$null + if ($vnetLocation) { + Pass "VNet '$vnetName' exists in '$vnetLocation'" + if ($vnetLocation -ne $Location) { + Fail "VNet location ($vnetLocation) differs from deployment location ($Location). PEs must be in the same region as the VNet." + } + } else { + Fail "VNet '$vnetName' not found in RG '$vnetRg'" + } + + # Check subnets — which subnets matter depends on the template: + # pe-subnet: all private-network templates + # agent-subnet: VNet-injection — NOT managed network + # mcp-subnet: only agent tools + # The script doesn't know the target template, so it checks all three + # and gives per-subnet guidance when one is missing. + $subnetNotes = @{ + 'pe-subnet' = 'Required by all private-network templates. Most templates create it automatically if deploying a new VNet.' + 'agent-subnet' = 'Required by VNet-injection. Not used by managed network. Created automatically when deploying a new VNet.' + 'mcp-subnet' = 'Only required by agent tools with MCP.' + } + foreach ($subnetName in @('agent-subnet', 'pe-subnet', 'mcp-subnet')) { + $subnetJson = az network vnet subnet show --resource-group $vnetRg --vnet-name $vnetName --name $subnetName -o json 2>$null + if ($subnetJson) { + $subnet = $subnetJson | ConvertFrom-Json + Pass "Subnet '$subnetName' exists ($($subnet.addressPrefix))" + + # SAL check + $salCount = ($subnet.serviceAssociationLinks | Measure-Object).Count + if ($salCount -gt 0) { + $salType = $subnet.serviceAssociationLinks[0].linkedResourceType + Fail "Subnet '$subnetName' has a serviceAssociationLink held by '$salType'. Deploying to this subnet will fail — the platform cannot inject into a subnet already owned by another resource." + } + + # Delegation check for agent/mcp subnets + if ($subnetName -eq 'agent-subnet' -or $subnetName -eq 'mcp-subnet') { + $delegation = if ($subnet.delegations) { $subnet.delegations[0].serviceName } else { $null } + if ($delegation -eq 'Microsoft.App/environments') { + Pass "Subnet '$subnetName' delegation: $delegation" + } else { + Fail "Subnet '$subnetName' must be delegated to Microsoft.App/environments (current: $(if ($delegation) { $delegation } else { 'none' }))" + } + } + } else { + Warn "Subnet '$subnetName' does not exist. $($subnetNotes[$subnetName])" + } + } + + # PE subnet capacity check + $peSubnet = az network vnet subnet show --resource-group $vnetRg --vnet-name $vnetName --name 'pe-subnet' -o json 2>$null | ConvertFrom-Json + if ($peSubnet) { + $prefix = $peSubnet.addressPrefix + if ($prefix -match '/(\d+)$') { + $cidrBits = [int]$Matches[1] + $totalIPs = [math]::Pow(2, 32 - $cidrBits) + $usableIPs = $totalIPs - 5 # Azure reserves 5 IPs per subnet + # Base templates create 4 PEs (AI Services, Search, Storage, CosmosDB). + # APIM and agent-tools setups add additional PEs. + $requiredPEs = 4 + if ($ApiManagementResourceId) { $requiredPEs++ } + if ($FabricWorkspaceResourceId) { $requiredPEs++ } + if ($usableIPs -lt $requiredPEs) { + Fail "PE subnet /$cidrBits has ~$usableIPs usable IPs but template needs at least $requiredPEs. Use /28 minimum (/24 recommended)." + } else { + Pass "PE subnet /$cidrBits has ~$usableIPs usable IPs (need $requiredPEs)" + } + } + } + } +} + +# ============================================================================= +# 5. DNS Zone Conflict Check +# ============================================================================= +Write-Host "" +Write-Host "--- DNS Zone Conflicts ---" +$dnsZones = @{ + 'privatelink.services.ai.azure.com' = 'Used by all private-network setups for AI Services PE.' + 'privatelink.openai.azure.com' = 'Used by all private-network setups for OpenAI PE.' + 'privatelink.cognitiveservices.azure.com'= 'Used by all private-network setups for Cognitive Services PE.' + 'privatelink.search.windows.net' = 'Used by all private-network setups for AI Search PE.' + 'privatelink.blob.core.windows.net' = 'Used by all private-network setups for Storage PE.' + 'privatelink.documents.azure.com' = 'Used by all private-network setups for Cosmos DB PE.' + 'privatelink.azurecr.io' = 'Used when enableContainerRegistry is true (most templates).' + 'privatelink.azure-api.net' = 'Only needed for APIM setups.' + 'privatelink.fabric.microsoft.com' = 'Only needed for agent tools with Fabric connection.' +} + +if ($rgExists -eq 'true') { + $existingZones = az network private-dns zone list --resource-group $ResourceGroup --query "[].name" -o tsv 2>$null + $dnsConflicts = 0 + foreach ($zone in $dnsZones.Keys) { + if ($existingZones -and ($existingZones -split "`n" | Where-Object { $_.Trim() -eq $zone })) { + Warn "DNS zone '$zone' already exists in $ResourceGroup. $($dnsZones[$zone]) Template may fail on VNet link creation if zone is already linked." + $dnsConflicts++ + } + } + if ($dnsConflicts -eq 0) { + Pass "No DNS zone conflicts found" + } +} else { + Pass "New resource group — no DNS zone conflicts possible" +} + +# ============================================================================= +# 6. Quota Checks +# ============================================================================= +if ($ModelName) { + Write-Host "" + Write-Host "--- Quota Checks ---" + + # 6a. Model availability + $modelAvailable = az cognitiveservices model list --location $Location ` + --query "[?model.name=='$ModelName' && model.format=='$ModelFormat'].model.name" -o tsv 2>$null | Select-Object -First 1 + + if ($modelAvailable) { + Pass "Model '$ModelName' ($ModelFormat) is available in $Location" + } else { + Fail "Model '$ModelName' ($ModelFormat) is NOT available in $Location. Check region support or choose a different model." + } + + # 6b. Model quota (TPM capacity) + if ($modelAvailable -and $ModelSkuName -and $ModelCapacity -gt 0) { + # Quota name pattern: {QuotaPrefix}.{Sku}.{Model} e.g. OpenAI.Standard.gpt-4o, AIServices.GlobalStandard.Mistral-Large-3 + # Note: Azure sometimes strips hyphens from model names (e.g. gpt-4.1 -> gpt4.1) so we try both. + $quotaPrefix = if ($ModelFormat -eq 'OpenAI') { 'OpenAI' } else { 'AIServices' } + $quotaName = "$quotaPrefix.$ModelSkuName.$ModelName" + $quotaInfo = az cognitiveservices usage list --location $Location ` + --query "[?name.value=='$quotaName']" -o json 2>$null | ConvertFrom-Json + if (-not $quotaInfo -or $quotaInfo.Count -eq 0) { + $altName = "$quotaPrefix.$ModelSkuName.$($ModelName -replace '-','')" + if ($altName -ne $quotaName) { + $quotaInfo = az cognitiveservices usage list --location $Location ` + --query "[?name.value=='$altName']" -o json 2>$null | ConvertFrom-Json + if ($quotaInfo -and $quotaInfo.Count -gt 0) { $quotaName = $altName } + } + } + if ($quotaInfo -and $quotaInfo.Count -gt 0) { + $currentUsage = $quotaInfo[0].currentValue + $limit = $quotaInfo[0].limit + $remaining = $limit - $currentUsage + if ($remaining -ge $ModelCapacity) { + Pass "Model quota: $remaining TPM available out of $limit ($ModelSkuName) — requesting $ModelCapacity" + } else { + Fail "Model quota insufficient: $remaining TPM available out of $limit ($ModelSkuName) — need $ModelCapacity. Request increase at https://aka.ms/oai/quotaincrease" + } + } else { + Warn "Could not query quota for $ModelName/$ModelSkuName in $Location. Verify manually." + } + } +} # end if ($ModelName) + +# 6c. Cosmos DB throughput check (BYO) +if ($CosmosDBResourceId) { + $cosmosRg = ($CosmosDBResourceId -split '/')[4] + $cosmosName = ($CosmosDBResourceId -split '/')[-1] + $offerInfo = az cosmosdb show --name $cosmosName --resource-group $cosmosRg ` + --query "{enableAutomaticFailover:enableAutomaticFailover, totalThroughputLimit:totalThroughputLimit}" -o json 2>$null | ConvertFrom-Json + if ($offerInfo) { + # Each project needs 3 containers × 1000 RU/s = 3000 RU/s minimum + $minRUs = 3000 + if ($offerInfo.totalThroughputLimit -and $offerInfo.totalThroughputLimit -gt 0 -and $offerInfo.totalThroughputLimit -lt $minRUs) { + Fail "Cosmos DB total throughput limit is $($offerInfo.totalThroughputLimit) RU/s. Agent service requires at least $minRUs RU/s per project (3 containers × 1000 RU/s)." + } else { + Pass "Cosmos DB throughput limit: $(if ($offerInfo.totalThroughputLimit -and $offerInfo.totalThroughputLimit -gt 0) { "$($offerInfo.totalThroughputLimit) RU/s" } else { 'unlimited' }) (need $minRUs per project)" + } + } +} + +# ============================================================================= +# 7. Resource Quota Checks (per region) +# ============================================================================= +Write-Host "" +Write-Host "--- Resource Quotas ($Location) ---" + +# 7a. AI Search service quota (only when template will create a new one) +if ($AiSearchResourceId) { + Pass "AI Search quota check skipped — using BYO resource" +} else { + # Templates default to 'standard' SKU when creating AI Search + $searchSkuToCheck = 'standard' + $searchQuota = az rest --method GET ` + --url "https://management.azure.com/subscriptions/$SubscriptionId/providers/Microsoft.Search/locations/$Location/usages?api-version=2024-06-01-preview" ` + --query "value[?name.value=='$searchSkuToCheck']" -o json 2>$null | ConvertFrom-Json + if ($searchQuota -and $searchQuota.Count -gt 0) { + $searchCurrent = [int]$searchQuota[0].currentValue + $searchLimit = [int]$searchQuota[0].limit + if ($searchLimit -eq 0) { + Warn "AI Search $searchSkuToCheck tier has 0 quota in $Location. Search service creation will fail." + } elseif ($searchCurrent -ge $searchLimit) { + Fail "AI Search $searchSkuToCheck tier quota exhausted in ${Location}: $searchCurrent/$searchLimit. Request increase or use a different region." + } else { + Pass "AI Search $searchSkuToCheck quota in ${Location}: $searchCurrent/$searchLimit" + } + } else { + Warn "Could not query AI Search quota for $Location." + } + + # 7a-note. AI Search capacity limitation + # Azure does not expose a public API for physical capacity checks on Search. + # Quota (above) checks service-count limits, but a region can have quota available + # and still fail with 'InsufficientResourcesAvailable' when physical capacity is exhausted. + # ARM deployment validate does NOT catch this either — it only validates template schema. + # If Search creation fails mid-deploy, try a different region. + Info "No API exists to pre-check AI Search physical capacity. If deployment fails with 'InsufficientResourcesAvailable', try a different region." +} + +# 7b. Storage account quota (only when template will create a new one) +if ($StorageAccountResourceId) { + Pass "Storage account quota check skipped — using BYO resource" +} else { + $storageQuotaRaw = az storage account list --query "length([?location=='$Location'])" -o tsv 2>$null + $storageCount = 0 + if ($storageQuotaRaw -match '^\d+$') { + $storageCount = [int]$storageQuotaRaw + # Default limit is 250 per region per subscription + if ($storageCount -ge 250) { + Fail "Storage account limit reached in ${Location}: $storageCount/250." + } elseif ($storageCount -ge 200) { + Warn "Storage accounts in ${Location}: $storageCount/250 — approaching limit." + } else { + Pass "Storage accounts in ${Location}: $storageCount/250" + } + } +} + +# 7c. Network quotas (VNets, PEs) +# Skip VNet quota if BYO VNet provided; always check PE quota (templates create PEs regardless) +if ($ExistingVnetId) { + Pass "VNet quota check skipped — using BYO VNet" + $netQuotaFilter = "[?name.value=='PrivateEndpoints'].{name:name.localizedValue, current:currentValue, limit:limit}" +} else { + $netQuotaFilter = "[?name.value=='VirtualNetworks' || name.value=='PrivateEndpoints'].{name:name.localizedValue, current:currentValue, limit:limit}" +} +$netQuotas = az network list-usages --location $Location ` + --query $netQuotaFilter ` + -o json 2>$null | ConvertFrom-Json +if ($netQuotas) { + foreach ($q in $netQuotas) { + $currentVal = [int]$q.current + $limitVal = [int]$q.limit + $pct = if ($limitVal -gt 0) { [math]::Round(($currentVal / $limitVal) * 100) } else { 0 } + if ($currentVal -ge $limitVal) { + Fail "$($q.name) quota exhausted in ${Location}: $currentVal/$limitVal" + } elseif ($pct -ge 80) { + Warn "$($q.name) in ${Location}: $currentVal/$limitVal ($pct% used)" + } else { + Pass "$($q.name) in ${Location}: $currentVal/$limitVal" + } + } +} + +# ============================================================================= +# Summary +# ============================================================================= +Write-Host "" +Write-Host "========================================" +Write-Host "Results: " -NoNewline +Write-Host "$($script:PassCount) passed" -ForegroundColor Green -NoNewline +Write-Host ", " -NoNewline +Write-Host "$($script:FailCount) failed" -ForegroundColor Red -NoNewline +Write-Host ", " -NoNewline +Write-Host "$($script:WarnCount) warnings" -ForegroundColor Yellow +Write-Host "========================================" + +Write-Host "" +Write-Host "NOTE: Region support and IP range restrictions vary by scenario." -ForegroundColor Cyan +Write-Host "Before deploying, verify your region and subnet ranges against:" -ForegroundColor Cyan +Write-Host " - Managed VNet regions: https://learn.microsoft.com/azure/foundry/how-to/managed-virtual-network#limitations" -ForegroundColor Cyan +Write-Host " - Private networking: https://learn.microsoft.com/azure/foundry/agents/how-to/virtual-networks#limitations" -ForegroundColor Cyan + +if ($script:FailCount -gt 0) { + Write-Host "Fix the failures above before deploying." -ForegroundColor Red + exit 1 +} else { + Write-Host "Pre-checks passed. Safe to deploy." -ForegroundColor Green + exit 0 +} diff --git a/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight.config.sample b/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight.config.sample new file mode 100644 index 000000000..391254eb5 --- /dev/null +++ b/infrastructure/infrastructure-setup-bicep/deployment-tools/preflight/preflight.config.sample @@ -0,0 +1,40 @@ +# ============================================================================= +# Foundry Private Network — Preflight Check Configuration +# ============================================================================= +# Fill in the values below and run: +# .\preflight-check.ps1 -ConfigFile .\preflight.config +# +# Explicit command-line params override config file values. +# Do not quote values — no '' or "" around them. +# ============================================================================= + +# Required +SubscriptionId= +ResourceGroup= +Location=swedencentral + +# Optional — Existing VNet (full ARM resource ID) +# Leave empty if the template will create a new VNet. +# ExistingVnetId=/subscriptions//resourceGroups//providers/Microsoft.Network/virtualNetworks/ + +# Optional — Bring-your-own resources (full ARM resource IDs) +# Leave empty if the template will create new resources. +# AiSearchResourceId=/subscriptions//resourceGroups//providers/Microsoft.Search/searchServices/ +# StorageAccountResourceId=/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts/ +# CosmosDBResourceId=/subscriptions//resourceGroups//providers/Microsoft.DocumentDB/databaseAccounts/ + +# Optional — Template-specific resources (full ARM resource IDs) +# Set ApiManagementResourceId when deploying with APIM private network setup. +# Set FabricWorkspaceResourceId when deploying with agent tools / MCP setup. +# ApiManagementResourceId=/subscriptions//resourceGroups//providers/Microsoft.ApiManagement/service/ +# FabricWorkspaceResourceId=/subscriptions//resourceGroups//providers/Microsoft.Fabric/capacities/ + +# Optional — Model deployment (for quota checks) +# Leave empty to skip model quota checks. +# ModelFormat: OpenAI, Mistral AI, Meta, Cohere, etc. Must match the format from az cognitiveservices model list. +# ModelSkuName: Standard, GlobalStandard, ProvisionedManaged, etc. +# ModelCapacity: Requested TPM (tokens per minute) in thousands. e.g. 10 = 10K TPM. +# ModelName=gpt-4o +# ModelFormat=OpenAI +# ModelSkuName=Standard +# ModelCapacity=10