|
| 1 | +# Vally eval config for Azure-authenticated Microsoft Foundry E2E checks. |
| 2 | +# The filename is e2e-eval.yaml so default discovery does not include it in full. |
| 3 | + |
| 4 | +name: microsoft-foundry-e2e-eval |
| 5 | +description: | |
| 6 | + E2E evaluation for the microsoft-foundry skill in a workflow that has already |
| 7 | + logged into Azure CLI and configured azd to use Azure CLI authentication. |
| 8 | +
|
| 9 | +tags: |
| 10 | + type: e2e |
| 11 | + skill: microsoft-foundry |
| 12 | + |
| 13 | +config: |
| 14 | + runs: 1 |
| 15 | + timeout: "30m" |
| 16 | + executor: integration-test-agent-runner |
| 17 | + model: claude-sonnet-4.6 |
| 18 | + |
| 19 | +scoring: |
| 20 | + threshold: 0.8 |
| 21 | + |
| 22 | +stimuli: |
| 23 | + - name: "Create and deploy hosted agent" |
| 24 | + tags: |
| 25 | + id: create-and-deploy-hosted-agent |
| 26 | + type: e2e |
| 27 | + tier: full |
| 28 | + cost: llm |
| 29 | + area: deploy |
| 30 | + prompt: | |
| 31 | + Create a Python hosted agent for B2B customer onboarding and deploy it to my existing Foundry project. Use the Responses protocol. After it is done, run in locally to make sure it can run successfully; then deploy it to foundry and ensure it can respond to users correctly. |
| 32 | + Alaways create a new .venv in the working directory and install the agent's dependencies there, instead of installing them globally. |
| 33 | + Use an agent name with the `foundry-skill-e2e` prefix and add a random suffix. |
| 34 | + Use these environment variables as the Foundry-related configuration values: |
| 35 | + Foundry project endpoint: https://foundry-test-0603-resource.services.ai.azure.com/api/projects/foundry-test-0603 |
| 36 | + Foundry Arm resource: /subscriptions/1756abc0-3554-4341-8d6a-46674962ea19/resourceGroups/anchenyi-ai/providers/Microsoft.CognitiveServices/accounts/foundry-test-0603-resource/projects/foundry-test-0603 |
| 37 | + Foundry model deployment name: gpt-5.4-mini |
| 38 | + graders: |
| 39 | + - type: skill-invocation |
| 40 | + config: |
| 41 | + required: |
| 42 | + - microsoft-foundry |
| 43 | + - type: completed |
| 44 | + - type: prompt |
| 45 | + config: |
| 46 | + scoring: binary |
| 47 | + threshold: 1 |
| 48 | + prompt: | |
| 49 | + Verify that the coding agent generated hosted-agent code, deployed the agent |
| 50 | + successfully to Microsoft Foundry, invoked the deployed agent after deployment, |
| 51 | + and received a successful response from that deployed agent. Fail if |
| 52 | + code was not generated, deployment did not succeed, the deployed agent was not |
| 53 | + actually invoked after deployment, or the deployed agent invocation failed. |
0 commit comments