Skip to content

Re-enable extension policy tests in daily runbook#3607

Open
maddieford wants to merge 3 commits into
Azure:developfrom
maddieford:enable_policy_tests
Open

Re-enable extension policy tests in daily runbook#3607
maddieford wants to merge 3 commits into
Azure:developfrom
maddieford:enable_policy_tests

Conversation

@maddieford
Copy link
Copy Markdown
Contributor

@maddieford maddieford commented May 13, 2026

Description

The extension policy suites (ext_policy and ext_policy_with_dependencies) suites were disabled previously due to quota issues. This PR re-enables those suites in our daily runs.

This PR also adds the 'security_type' option to the image definition schema. Currently the only supported security_types are:

  • "" (default)-> For VMSS deployments, 'Standard' will be used. For LISA deployments, lisa will infer the security type via the image capabilities and the sku
  • "ConfidentialVM" -> This will force any VMSS/VM created from this image definition to be created as a CVM.

Previously, we were depending on LISA to correctly deploy VMs with the 'ConfidentialVM' security type when using our *_cvm image definitions (even though those images can be used for TrustedLaunch or Standard VMs). Now, we explicitly tell lisa to deploy CVMs when using those images instead of relying on their selection logic.

Issue #

PR information

  • Ensure development PR is based on the develop branch.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines


Distro maintenance information, if applicable

  • This is a contribution from a distro maintainer
  • The changes in this PR have been taken as a downstream patch (Note: it is not recommended to patch the agent without upstream review and approval)

Copilot AI review requested due to automatic review settings May 13, 2026 17:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-enables the extension policy suites in the daily E2E runbook and adjusts related suite/image metadata so those suites can run with CVM coverage and cloud-specific image restrictions.

Changes:

  • Re-added ext_policy and ext_policy_with_dependencies to the daily runbook.
  • Added CVM image location/cloud restrictions and retained explicit CVM VM size metadata.
  • Updated extension policy suite image selections and removed some suite-level location restrictions.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests_e2e/orchestrator/runbook.yml Re-enables extension policy suites in default daily runs.
tests_e2e/test_suites/ext_policy.yml Limits endorsed coverage via random selection, adds CVM coverage, and skips Debian 11 in China.
tests_e2e/test_suites/ext_policy_with_dependencies.yml Converts images to a list and adds CVM coverage for the VMSS-based policy dependency suite.
tests_e2e/test_suites/ext_signature_validation.yml Removes the explicit public-cloud location override.
tests_e2e/test_suites/images.yml Adds public-cloud location restrictions and non-public cloud exclusions for CVM images.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests_e2e/test_suites/ext_policy.yml Outdated
- "AzureUSGovernment"
owns_vm: false
skip_on_images:
- "AzureChinaCloud:debian_11" # The ConfigurationforLinux-1.26.109 extension is failing on Debian 11 in China cloud only; skip this image until the issue is in the extension is fixed No newline at end of file
images: "endorsed"
images:
- "endorsed"
- "cvm-endorsed"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this comment, I confirmed that the CVM VMSS are being created with the default sku (Standard_D2s_v3) instead of the CVM sku from the images.yml (Standard_DC2ads_v5):
image
I didn't notice because there weren't any deployment failures, but turns out they were not being created as CVMs:
2026-05-13T17:59:21.601406Z INFO ExtHandler ExtHandler This is not a confidential virtual machine.

Good catch by Copilot :) I'll fix it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

- "ext_policy/ext_policy.py"
images:
- "endorsed"
- "random(endorsed,10)" # TODO: Remove randomization and run on all endorsed images once the test suite is optimized to reduce runtime.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the decision to limit this scenario to 10 random endorsed images per day, in addition to all 6 cvm-endorsed images. That gives us coverage across 16 images per daily run.

I made this change because each run of this suite takes ~30 minutes (primarily due to the delete scenario waiting on 15min CRP timeout). If this test runs on all 30 endorsed images, most of the 32 available lisa runners will be stuck on the environments with this scenario, preventing us from processing other environments, and we get pipeline timeouts.

I've added a TODO to optimize this test, but in the meantime, I think it's appropriate to only run on 10 of the endorsed images and all of the cvm-endorsed images. If we want more coverage, we can extend the pipeline timeout instead as a temporary measure until the test is optimized.

- "random(endorsed,10)" # TODO: Remove randomization and run on all endorsed images once the test suite is optimized to reduce runtime.
- "cvm-endorsed"
# This test is executed in southcentralus as a workaround for recurring fabric "ServiceUnavailableFault" issues observed in westus2.
locations: "AzureCloud:southcentralus"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this ran on southcentralus when we only ran on 'endorsed' images, but the CVM sku we're using is not available in southcentralus.

Instead of updating the location for the entire test suite to westeurope (where the CVM sku is available), I just updated the CVM image definitions in images.yml to list the regions it is available in.

Now, this suite will run on the default location (westus2) for the 'endorsed' images and westeurope for the 'cvm-endorsed' images (since that is the first region listed in the image.yml definitions for those cvm images).

This reduces the # of environments created per test run

- "cvm-endorsed"
# This test is executed in southcentralus as a workaround for recurring fabric "ServiceUnavailableFault" issues observed in westus2.
locations: "AzureCloud:southcentralus"
# TODO: This test is currently failing on usgov cloud due to an issue with the GuestConfig extension. Re-enable once the extension fix has been rolled out.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and I'm not seeing that issue in USGov anymore, the test is passing there.

There are GuestConfig ext failures in China cloud now though, but only on the debian_11 image, so I am skipping that one only

executes_on_scale_set: true
owns_vm: false
# This test is executed in southcentralus as a workaround for recurring fabric "ServiceUnavailableFault" issues observed in westus2.
locations: "AzureCloud:southcentralus"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning for removing the location here as the ext_policy scenario

@@ -18,4 +18,4 @@ skip_on_images:

# TODO: The current deployment of VmAccess 1.5.22 prevents the extension from uninstalling; enable this test when the issue is fixed
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nora confirmed v1.5.24 of VMaccess with the fix for this will hit USGov regions by end of this week, so we should be able to remove this soon

# Extension signature is sent by CRP only for CVMs, so this test suite should run exclusively on CVMs.
images: "cvm-endorsed"
# Extension signatures are currently only available in the public cloud, so we skip this test on other clouds.
locations: "AzureCloud:westeurope"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We specified the location for this suite, since there the CVM sku we're using is only available in certain regions. I updated the CVM image definitions in images.yml with the locations they are able to be used on, so this is no longer necessary.

urn: "microsoftcblmariner azure-linux-3 azure-linux-3-cvm latest"
vm_sizes:
- "Standard_DC2ads_v5" # CVM v5 SKU
- "Standard_DC2ads_v5" # TODO: The sku for this image should be updated to 'Standard_DC2as_v6' once we have capacity for it in our test subs, since it is available in more regions
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to work with the ACC PM to get us capacity for the 'Standard_DC2as_v6' CVM sku in our subscriptions (Public and Gov, CVM is not supported in China).

Once we have the capacity for that SKU, there are significantly more regions we can test in for CVMs:

$ az vm list-skus --resource-type virtualMachines --query "[?name=='Standard_DC2as_v6'].locations[]"
[
  "australiaeast",
  "KoreaCentral",
  "uksouth",
  "westcentralus",
  "westeurope",
  "WestUS3",
  "AustriaEast",
  "BelgiumCentral",
  "ChileCentral",
  "eastus2",
  "FranceCentral",
  "IndiaSouthCentral",
  "KoreaSouth",
  "MexicoCentral",
  "NewZealandNorth",
  "northeurope",
  "southcentralus",
  "SouthCentralUSSTG",
  "southeastasia",
  "SpainCentral",
  "SwedenCentral",
  "TaiwanNorth",
  "TaiwanNorthwest",
  "westus2",
  "CentralUSEUAP",
  "EastUS2EUAP"
]

VS the current SKU we're using:

$ az vm list-skus --resource-type virtualMachines --query "[?name=='Standard_DC2ads_v5'].locations[]"
[
  "EastUS2EUAP",
  "northeurope",
  "southeastasia",
  "westeurope"
]

locations: Dict[str, List[str]]
# Indicates that the image is available only for those VM sizes. If empty, the image should be available for all VM sizes
vm_sizes: List[str]
# Optional security type (e.g. "ConfidentialVM") to use when deploying this image. When set, the deployment
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we were depending on LISA to correctly deploy VMs with the 'ConfidentialVM' security type when using our *_cvm image definitions (even though those images can be used for TrustedLaunch or Standard VMs).
Now, we explicitly tell lisa to deploy CVMs when using those images instead of relying on their selection logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants