Skip to content

Commit dff071d

Browse files
m-gheinimishrapratyush
authored andcommitted
Add deployment preflight check, cleanup, and shared private link setup for search service scripts. (#384)
* Added Cleanup and pre-check scripts * Shared private link for Azure AI search service Add shared private link on azure AI search service. Useful when Azure AI Search service needs to reach out Foundry resource for models (for vectorizer and agentic RAG) and Foundry resource is in a vnet * Moved shared private link to common deployment tools * Added README for preflight-check * Updated scripts and added READMEs * Updated error logging for cleanup * Early exit in case of error * Increased caphost deletion time-out * Updated cleanup script after test on hosted agent cleanups * Updated preflight checks per template updates. * Added diagnostic script for post deployment validation * Address Comments * Made SAL wait scoped on account * Updates to diagnostic and preflight scripts --------- Co-authored-by: Pratyush Mishra <8485494+mishrapratyush@users.noreply.github.com>
1 parent d89d98f commit dff071d

15 files changed

Lines changed: 3326 additions & 0 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
*.sln.docstates
1515
*.env
1616
venv/
17+
preflight.config
1718
# User-specific files (MonoDevelop/Xamarin Studio)
1819
*.userprefs
1920

infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ Use the table below to choose the right infrastructure template for your scenari
8787
* If no parameters are passed in, this template creates an Microsoft Foundry resource, Foundry project, Azure Cosmos DB for NoSQL, Azure AI Search, and Azure Storage account
8888
1. Azure CLI installed and configured on your local workstation or deployment pipeline server
8989

90+
> **💡 Recommended**: Run the [preflight check](../deployment-tools/preflight/README.md) before deploying to catch common misconfigurations (provider registration, subnet conflicts, soft-deleted accounts) before they surface as cryptic ARM errors mid-deploy.
91+
9092
---
9193

9294
## Pre-Deployment Steps
@@ -223,6 +225,8 @@ To use an existing Azure AI Search resource, set aiSearchServiceResourceId param
223225
> --aad-auth-failure-mode http401WithBearerChallenge
224226
> ```
225227
228+
> **AI Search → AI Services connectivity**: This template configures AI Services with `networkAcls.bypass: AzureServices`, which allows Azure AI Search to reach AI Services through the trusted-services bypass. This works for most scenarios. If your security policy requires removing the bypass (setting it to `None`), deploy [Shared Private Links](../deployment-tools/networking/README.md) from AI Search to AI Services instead — this creates a private endpoint from AI Search's managed infrastructure directly into AI Services via Private Link.
229+
226230
227231
4. **Use an existing Azure Storage account**
228232
@@ -283,6 +287,8 @@ az group delete --name <your-resource-group> --yes --no-wait
283287
284288
> **Important**: If you need to reuse the same subnet, follow the [Account Deletion Prerequisites and Cleanup Guidance](#account-deletion-prerequisites-and-cleanup-guidance) to properly purge the account and wait for the capability host to fully unlink (~20 minutes).
285289
290+
> **💡 Tip**: For VNet-injection deployments, use the [cleanup tool](../deployment-tools/cleanup/README.md) it handles the required deletion order (project caphost → account caphost → purge → SAL wait) automatically.
291+
286292
---
287293
288294
## Network Secured Agent Project Architecture Deep Dive

infrastructure/infrastructure-setup-bicep/18-managed-virtual-network/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@ Use the table below to choose the right infrastructure template for your scenari
120120

121121
1. Azure CLI installed and configured on your local workstation or deployment pipeline server. Azure CLI support is required to run the 'az rest' commands to update your managed virtual network.
122122

123+
> **💡 Recommended**: Run the [preflight check](../deployment-tools/preflight/README.md) before deploying to catch common misconfigurations (provider registration, soft-deleted accounts, BYO resource issues) before they surface as cryptic ARM errors mid-deploy.
124+
123125
1. **Register Resource Providers**
124126

125127
Make sure you have an active Azure subscription that allows registering resource providers. If it's not already registered, run the commands below:
@@ -294,6 +296,8 @@ az group delete --name <your-resource-group> --yes --no-wait
294296

295297
> **Important**: Follow the [Account Deletion Prerequisites and Cleanup Guidance](#account-deletion-prerequisites-and-cleanup-guidance) to properly purge the account and wait for the capability host to fully unlink (~20 minutes).
296298
299+
> **💡 Tip**: Use the [cleanup tool](../deployment-tools/cleanup/README.md) it handles the required deletion order (project caphost → account caphost → purge → SAL wait) automatically.
300+
297301
---
298302

299303
## Post-Deployment Steps (Critical for Hosted Agents)

infrastructure/infrastructure-setup-bicep/19-private-network-agent-tools/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@ Use the table below to choose the right infrastructure template for your scenari
104104
* If no parameters are passed in, this template creates an Microsoft Foundry resource, Foundry project, Azure Cosmos DB for NoSQL, Azure AI Search, and Azure Storage account
105105
1. Azure CLI installed and configured on your local workstation or deployment pipeline server
106106

107+
> **💡 Recommended**: Run the [preflight check](../deployment-tools/preflight/README.md) before deploying to catch common misconfigurations (provider registration, subnet conflicts, soft-deleted accounts) before they surface as cryptic ARM errors mid-deploy.
108+
107109
---
108110

109111
## Pre-Deployment Steps
@@ -275,6 +277,8 @@ To use an existing Cosmos DB for NoSQL resource, set `existingAzureCosmosDBAccou
275277
To use an existing Azure AI Search resource, set `existingAiSearchResourceId` to the full ARM ID of the target search service.
276278
- `param existingAiSearchResourceId = '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Search/searchServices/{searchServiceName}'`
277279

280+
> **AI Search → AI Services connectivity**: This template configures AI Services with `networkAcls.bypass: AzureServices`, which allows Azure AI Search to reach AI Services through the trusted-services bypass. This works for most scenarios. If your security policy requires removing the bypass (setting it to `None`), deploy [Shared Private Links](../deployment-tools/networking/README.md) from AI Search to AI Services instead — this creates a private endpoint from AI Search's managed infrastructure directly into AI Services via Private Link.
281+
278282

279283
4. **Use an existing Azure Storage account**
280284

@@ -369,6 +373,8 @@ az group delete --name <your-resource-group> --yes --no-wait
369373

370374
> **Important**: If you need to reuse the same subnet, follow the [Account Deletion Prerequisites and Cleanup Guidance](#account-deletion-prerequisites-and-cleanup-guidance) to properly purge the account and wait for the capability host to fully unlink (~20 minutes).
371375
376+
> **💡 Tip**: For VNet-injection deployments, use the [cleanup tool](../deployment-tools/cleanup/README.md) it handles the required deletion order (project caphost → account caphost → purge → SAL wait) automatically.
377+
372378
---
373379

374380
## Network Secured Agent Project Architecture Deep Dive
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Foundry Private Network Cleanup
2+
3+
`cleanup.ps1` safely tears down Foundry deployments with VNet injection. It handles the specific deletion order required to avoid stuck resources, orphaned locks, and subnet conflicts that make manual cleanup painful.
4+
5+
## Why This Script Exists
6+
7+
Deleting a Foundry private network deployment is **not** as simple as `az group delete`. The deployment creates capability hosts with service association links (SALs) on VNet subnets. If you delete the resource group directly:
8+
9+
- **Capability hosts must be deleted in order** — project-level first, then account-level. Deleting in the wrong order can leave the account in a failed state.
10+
- **SALs block subnet reuse** — subnets with active SALs cannot be re-delegated or deleted. SAL cleanup happens asynchronously after caphost deletion and can take up to 24 hours.
11+
- **Soft-deleted accounts block redeployment** — Cognitive Services accounts are soft-deleted for 48 hours. A new deployment with the same name will fail unless the old account is purged.
12+
13+
This script handles all of this automatically: discovers resources, deletes in the correct order, waits for SAL cleanup, and purges soft-deleted accounts.
14+
15+
## Important: Use the VNet Resource Group
16+
17+
The `--ResourceGroup` parameter must point to the resource group containing the **AI Foundry account, project, and VNet** — not the resource group with dependent resources (Search, Cosmos, Storage).
18+
19+
If your deployment uses multiple resource groups, the cleanup script only needs the one with the AI account and VNet. Dependent resources in other resource groups can be deleted with a simple `az group delete`.
20+
21+
## Prerequisites
22+
23+
- [PowerShell 7+](https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell) (cross-platform)
24+
- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) logged in with access to the target subscription
25+
- Active subscription must be set: `az account set --subscription <subscription-id>`
26+
27+
## Usage
28+
29+
```bash
30+
cd infrastructure/infrastructure-setup-bicep/deployment-tools/cleanup
31+
```
32+
33+
**Always start with a dry run** to see what would be deleted before making changes:
34+
35+
```powershell
36+
.\cleanup.ps1 -SubscriptionId "<subscription-id>" -ResourceGroup "<resource-group>" -DryRun
37+
```
38+
39+
> [!IMPORTANT]
40+
> Always run `-DryRun` first and review the discovered accounts/projects/caphosts before running cleanup without `-DryRun`.
41+
42+
When you're satisfied with the discovery output, run without `-DryRun`. The script will prompt for confirmation before deleting anything.
43+
44+
```powershell
45+
.\cleanup.ps1 -SubscriptionId "<subscription-id>" -ResourceGroup "<resource-group>"
46+
```
47+
48+
## Parameters
49+
50+
| Parameter | Required | Description |
51+
|---|---|---|
52+
| `-SubscriptionId` | Yes | Azure subscription ID |
53+
| `-ResourceGroup` | Yes | Resource group containing the AI Foundry account, project, and VNet |
54+
| `-AccountName` | No | Limit cleanup to a specific AI Services account. When omitted, all AIServices accounts in the RG are discovered and cleaned up. |
55+
| `-DryRun` | No | Show what would be cleaned up without taking any action |
56+
| `-SkipSalWait` | No | Skip waiting for SAL removal (faster but risky — subnet may not be reusable immediately) |
57+
| `-DeleteRG` | No | Delete the resource group after cleanup. Not allowed with `-AccountName` (account-scoped cleanup must not delete the whole RG). |
58+
59+
When `-AccountName` is provided, active cleanup remains scoped to that account, while soft-deleted account purge remains RG-wide residue cleanup.
60+
61+
## What It Does
62+
63+
### Step 0: Discovery
64+
65+
Auto-discovers all resources in the resource group — no need to know account or project names:
66+
67+
- AI Foundry accounts (kind: `AIServices`)
68+
- Projects under each account
69+
- Capability hosts (project-level and account-level)
70+
- VNet subnets with active service association links
71+
72+
After discovery, a summary is printed and you are prompted to confirm before proceeding (unless `-DryRun` is set).
73+
74+
### Step 1: Delete Project Capability Hosts
75+
76+
Deletes all project-level capability hosts first. This is required before account-level caphosts can be removed.
77+
78+
### Step 2: Delete Account Capability Hosts
79+
80+
Deletes account-level capability hosts. Handles async deletion with polling (up to 30 min timeout).
81+
82+
### Step 3: Delete Projects and Purge AI Accounts
83+
84+
Deletes all projects under each account first (accounts cannot be deleted while nested projects exist), then deletes and purges each AI Services account to prevent soft-delete name collisions on redeployment. Also checks for and purges any previously soft-deleted accounts in the resource group.
85+
86+
### Step 4: Wait for SAL Cleanup
87+
88+
Waits for service association links to be removed from subnets (up to 20 min). SAL removal happens asynchronously after caphost deletion. If SALs are still present after 20 minutes, the script warns you to check again later — backend cleanup can take up to 24 hours.
89+
90+
SAL waiting runs only for caphost-linked subnets discovered during cleanup. If none are discovered, SAL waiting is skipped with a warning.
91+
92+
### Step 5: Resource Group (optional)
93+
94+
If `-DeleteRG` is specified, initiates an async deletion of the resource group. Otherwise, prints the command for manual deletion.
95+
96+
## Examples
97+
98+
```powershell
99+
# Dry run — see what would be cleaned up
100+
.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -DryRun
101+
102+
# Clean up a specific account only
103+
.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -AccountName "my-ai-account"
104+
105+
# Full cleanup including resource group deletion
106+
.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -DeleteRG
107+
108+
# Fast cleanup (skip SAL wait — subnet may not be immediately reusable)
109+
.\cleanup.ps1 -SubscriptionId "xxxx" -ResourceGroup "my-foundry-rg" -SkipSalWait
110+
```

0 commit comments

Comments
 (0)