Skip to content

Commit 157780b

Browse files
Reduce model capacity to 50k
1 parent 0c35778 commit 157780b

9 files changed

Lines changed: 24 additions & 24 deletions

File tree

.github/workflows/deploy.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ on:
1717
- cron: "0 11,23 * * *" # Runs at 11:00 AM and 11:00 PM GMT
1818
workflow_dispatch: #Allow manual triggering
1919
env:
20-
GPT_MIN_CAPACITY: 150
20+
GPT_MIN_CAPACITY: 50
2121
O4_MINI_MIN_CAPACITY: 50
2222
GPT41_MINI_MIN_CAPACITY: 50
2323
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}

.github/workflows/job-deploy.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ on:
9898
value: ${{ jobs.azure-setup.outputs.QUOTA_FAILED }}
9999

100100
env:
101-
GPT_MIN_CAPACITY: 150
101+
GPT_MIN_CAPACITY: 50
102102
O4_MINI_MIN_CAPACITY: 50
103103
GPT41_MINI_MIN_CAPACITY: 50
104104
BRANCH_NAME: ${{ github.event.workflow_run.head_branch || github.head_ref || github.ref_name }}

docs/CustomizingAzdParameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ By default this template will use the environment name as the prefix to prevent
1818
| `AZURE_ENV_MODEL_4_1_DEPLOYMENT_TYPE` | string | `GlobalStandard` | Defines the deployment type for the AI model (e.g., Standard, GlobalStandard). |
1919
| `AZURE_ENV_MODEL_4_1_NAME` | string | `gpt-4.1` | Specifies the name of the GPT model to be deployed. |
2020
| `AZURE_ENV_MODEL_4_1_VERSION` | string | `2025-04-14` | Version of the GPT model to be used for deployment. |
21-
| `AZURE_ENV_MODEL_4_1_CAPACITY` | int | `150` | Sets the GPT model capacity. |
21+
| `AZURE_ENV_MODEL_4_1_CAPACITY` | int | `50` | Sets the GPT model capacity. |
2222
| `AZURE_ENV_REASONING_MODEL_DEPLOYMENT_TYPE` | string | `GlobalStandard` | Defines the deployment type for the AI model (e.g., Standard, GlobalStandard). |
2323
| `AZURE_ENV_REASONING_MODEL_NAME` | string | `o4-mini` | Specifies the name of the reasoning GPT model to be deployed. |
2424
| `AZURE_ENV_REASONING_MODEL_VERSION` | string | `2025-04-16` | Version of the reasoning GPT model to be used for deployment. |

docs/DeploymentGuide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Ensure you have access to an [Azure subscription](https://azure.microsoft.com/fr
6868
📖 **Follow:** [Quota Check Instructions](./quota_check.md) to ensure sufficient capacity.
6969

7070
**Default Quota Configuration:**
71-
- **GPT-4.1:** 150k tokens
71+
- **GPT-4.1:** 50k tokens
7272
- **o4-mini:** 50k tokens
7373
- **GPT-4.1-mini:** 50k tokens
7474

@@ -246,7 +246,7 @@ You can customize various deployment settings before running `azd up`, including
246246
<details>
247247
<summary><b>[Optional] Quota Recommendations</b></summary>
248248

249-
By default, the **GPT model capacity** in deployment is set to **150k tokens**.
249+
By default, the **GPT model capacity** in deployment is set to **50k tokens**.
250250

251251
To adjust quota settings, follow these [steps](./AzureGPTQuotaSettings.md).
252252

docs/quota_check.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
## Check Quota Availability Before Deployment
22

33
Before deploying the accelerator, **ensure sufficient quota availability** for the required model.
4-
> **For Global Standard | GPT-4o - the capacity to at least 150k tokens for optimal performance.**
4+
> **For Global Standard | GPT-4o - the capacity to at least 50k tokens for optimal performance.**
55
66
### Login if you have not done so already
77
```
@@ -16,7 +16,7 @@ az login --use-device-code
1616

1717
### 📌 Default Models & Capacities:
1818
```
19-
gpt4.1:150,o4-mini:50,gpt4.1-mini:50
19+
gpt4.1:50,o4-mini:50,gpt4.1-mini:50
2020
```
2121
### 📌 Default Regions:
2222
```
@@ -42,23 +42,23 @@ australiaeast, eastus2, francecentral, japaneast, norwayeast, swedencentral, uks
4242
```
4343
✔️ Check specific model(s) in default regions:
4444
```
45-
./quota_check_params.sh --models gpt4.1:150
45+
./quota_check_params.sh --models gpt4.1:50
4646
```
4747
✔️ Check default models in specific region(s):
4848
```
4949
./quota_check_params.sh --regions eastus2,westus
5050
```
5151
✔️ Passing Both models and regions:
5252
```
53-
./quota_check_params.sh --models gpt4.1:150 --regions eastus2,westus
53+
./quota_check_params.sh --models gpt4.1:50 --regions eastus2,westus
5454
```
5555
✔️ All parameters combined:
5656
```
57-
./quota_check_params.sh --models gpt4.1:150 --regions eastus2,westus --verbose
57+
./quota_check_params.sh --models gpt4.1:50 --regions eastus2,westus --verbose
5858
```
5959
✔️ Multiple models with single region:
6060
```
61-
./quota_check_params.sh --models gpt4.1:150,gpt4.1-mini:50 --regions eastus2 --verbose
61+
./quota_check_params.sh --models gpt4.1:50,gpt4.1-mini:50 --regions eastus2 --verbose
6262
```
6363

6464
### **Sample Output**

infra/main.bicep

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ var deployingUserPrincipalId = deployerInfo.objectId
3737
azd: {
3838
type: 'location'
3939
usageName: [
40-
'OpenAI.GlobalStandard.gpt4.1, 150'
40+
'OpenAI.GlobalStandard.gpt4.1, 50'
4141
'OpenAI.GlobalStandard.o4-mini, 50'
4242
'OpenAI.GlobalStandard.gpt4.1-mini, 50'
4343
]
@@ -100,8 +100,8 @@ param gptReasoningModelDeploymentType string = 'GlobalStandard'
100100
@description('Optional. AI model deployment token capacity. Defaults to 50 for optimal performance.')
101101
param gptDeploymentCapacity int = 50
102102

103-
@description('Optional. AI model deployment token capacity. Defaults to 150 for optimal performance.')
104-
param gpt4_1ModelCapacity int = 150
103+
@description('Optional. AI model deployment token capacity. Defaults to 50 for optimal performance.')
104+
param gpt4_1ModelCapacity int = 50
105105

106106
@description('Optional. AI model deployment token capacity. Defaults to 50 for optimal performance.')
107107
param gptReasoningModelCapacity int = 50

infra/main.json

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"_generator": {
77
"name": "bicep",
88
"version": "0.43.8.12551",
9-
"templateHash": "6587818059632090787"
9+
"templateHash": "17441022390921143507"
1010
},
1111
"name": "Multi-Agent Custom Automation Engine",
1212
"description": "This module contains the resources required to deploy the [Multi-Agent Custom Automation Engine solution accelerator](https://github.com/microsoft/Multi-Agent-Custom-Automation-Engine-Solution-Accelerator) for both Sandbox environments and WAF aligned environments.\n\n> **Note:** This module is not intended for broad, generic use, as it was designed by the Commercial Solution Areas CTO team, as a Microsoft Solution Accelerator. Feature requests and bug fix requests are welcome if they support the needs of this organization but may not be incorporated if they aim to make this module more generic than what it needs to be for its primary use case. This module will likely be updated to leverage AVM resource modules in the future. This may result in breaking changes in upcoming versions when these features are implemented.\n"
@@ -64,7 +64,7 @@
6464
"azd": {
6565
"type": "location",
6666
"usageName": [
67-
"OpenAI.GlobalStandard.gpt4.1, 150",
67+
"OpenAI.GlobalStandard.gpt4.1, 50",
6868
"OpenAI.GlobalStandard.o4-mini, 50",
6969
"OpenAI.GlobalStandard.gpt4.1-mini, 50"
7070
]
@@ -176,9 +176,9 @@
176176
},
177177
"gpt4_1ModelCapacity": {
178178
"type": "int",
179-
"defaultValue": 150,
179+
"defaultValue": 50,
180180
"metadata": {
181-
"description": "Optional. AI model deployment token capacity. Defaults to 150 for optimal performance."
181+
"description": "Optional. AI model deployment token capacity. Defaults to 50 for optimal performance."
182182
}
183183
},
184184
"gptReasoningModelCapacity": {
@@ -27975,9 +27975,9 @@
2797527975
},
2797627976
"dependsOn": [
2797727977
"aiFoundryAiServices",
27978-
"[format('avmPrivateDnsZones[{0}]', variables('dnsZoneIndex').openAI)]",
27979-
"[format('avmPrivateDnsZones[{0}]', variables('dnsZoneIndex').cognitiveServices)]",
2798027978
"[format('avmPrivateDnsZones[{0}]', variables('dnsZoneIndex').aiServices)]",
27979+
"[format('avmPrivateDnsZones[{0}]', variables('dnsZoneIndex').cognitiveServices)]",
27980+
"[format('avmPrivateDnsZones[{0}]', variables('dnsZoneIndex').openAI)]",
2798127981
"virtualNetwork"
2798227982
]
2798327983
},

infra/main_custom.bicep

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ var deployingUserPrincipalId = deployerInfo.objectId
3737
azd: {
3838
type: 'location'
3939
usageName: [
40-
'OpenAI.GlobalStandard.gpt4.1, 150'
40+
'OpenAI.GlobalStandard.gpt4.1, 50'
4141
'OpenAI.GlobalStandard.o4-mini, 50'
4242
'OpenAI.GlobalStandard.gpt4.1-mini, 50'
4343
]
@@ -100,8 +100,8 @@ param gptReasoningModelDeploymentType string = 'GlobalStandard'
100100
@description('Optional. AI model deployment token capacity. Defaults to 50 for optimal performance.')
101101
param gptDeploymentCapacity int = 50
102102

103-
@description('Optional. AI model deployment token capacity. Defaults to 150 for optimal performance.')
104-
param gpt4_1ModelCapacity int = 150
103+
@description('Optional. AI model deployment token capacity. Defaults to 50 for optimal performance.')
104+
param gpt4_1ModelCapacity int = 50
105105

106106
@description('Optional. AI model deployment token capacity. Defaults to 50 for optimal performance.')
107107
param gptReasoningModelCapacity int = 50

infra/scripts/quota_check_params.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ log_verbose() {
4747
}
4848

4949
# Default Models and Capacities (Comma-separated in "model:capacity" format)
50-
DEFAULT_MODEL_CAPACITY="gpt4.1:150,o4-mini:50,gpt4.1-mini:50"
50+
DEFAULT_MODEL_CAPACITY="gpt4.1:50,o4-mini:50,gpt4.1-mini:50"
5151
# Convert the comma-separated string into an array
5252
IFS=',' read -r -a MODEL_CAPACITY_PAIRS <<< "$DEFAULT_MODEL_CAPACITY"
5353

0 commit comments

Comments
 (0)