Skip to content

Azure: Sample apps: Terraform tests#32

Merged
DrisDary merged 30 commits into
mainfrom
terraform-ci-test
Jan 28, 2026
Merged

Azure: Sample apps: Terraform tests#32
DrisDary merged 30 commits into
mainfrom
terraform-ci-test

Conversation

@DrisDary

@DrisDary DrisDary commented Jan 28, 2026

Copy link
Copy Markdown
Contributor

Changes

  1. All Terraform tests used the same resource group name "local-rg", causing conflicts when tests ran in parallel on CI. Changes include assigning unique prefixes to each test sample.
  2. Sqlcmd with ODBC Driver 18 rejects LocalStack's self-signed SSL certificate by default therefore added -C flag (trust server certificate) to all sqlcmd commands.
  3. Not all terraform commands work with python 3.13 so had to revert to python 3.12.
  4. Due to disk usage limit error, function app deployment timeout errors, I now run tests in 4 shards instead of 2 allowing more resources free resources per test.

To Do:

  1. For one of the test i.e. function-app-managed-identity sometimes we get the error:

Status Code: 500, Details: {"error": true, "exception": "URLError", "message": ""}

This is a results of the function _check_function_status in our function deployer code which has a timeout set to 1 second for the urlopen. This needs to be increased.

  1. The runners we use to run these large sample tests are relatively weak and can result in timeouts and errors such as API rate limit error. In the future we need to either upgrade these runners or move to a repo that allows for powerful runners.

@paolosalvatori

Copy link
Copy Markdown
Contributor

@DrisDary thanks for building this workflow. You don't need a real review, you know. But here are a few comments for you:

  • I see that two of the four jobs still fail. Is this due to the fact that runners do not have enough resources in terms of CPU and memory?
  • I see that you modified main.tf files to add a resource_group_name = "${var.prefix}-rg" local variable, and then you modified terraform.tfvars files to add a unique default name for the prefix variable, one for each sample.
  • Do you delete the current resource group before provisioning the next Terraform sample? That could slow down a bit the execution, but reduce the amount of resource required, as you remove the containers used by previous samples.

@paolosalvatori paolosalvatori left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Just reply my comments. ☝️

@DrisDary

DrisDary commented Jan 28, 2026

Copy link
Copy Markdown
Contributor Author

@DrisDary thanks for building this workflow. You don't need a real review, you know. But here are a few comments for you:

  • I see that two of the four jobs still fail. Is this due to the fact that runners do not have enough resources in terms of CPU and memory?
  • I see that you modified main.tf files to add a resource_group_name = "${var.prefix}-rg" local variable, and then you modified terraform.tfvars files to add a unique default name for the prefix variable, one for each sample.
  • Do you delete the current resource group before provisioning the next Terraform sample? That could slow down a bit the execution, but reduce the amount of resource required, as you remove the containers used by previous samples.
  • So one is due to a localstack timeout which is fixed by deleting the github cache and rerunning. The other error is an API rate limit error which i believe means the sum of the api calls made accross all running tasks is limited and exceeding leads to failure so in our case its across 4, again rerunning only failed runs should fix this since there will be less tests to run after you rerun failures. Our localstack-pro repo is more advanced and has no such limitations in terms of api rate limit errors or weak runners. [I will rerun the failed tests to see what happens]

  • yes I did change names and then also made sure it was consistent through out the terraform files this is because sometimes same names results in conflicts and hence failures.

  • yes deletion occurs inbetween all tests to save disk usage and is insignificant in terms of cpu usage or even time. The next test doesnt start until the last test is completely deleted so it should not cause an resource usage errors.

@DrisDary

Copy link
Copy Markdown
Contributor Author

@paolosalvatori it appears it worked after rerunning just as expected.

@paolosalvatori

Copy link
Copy Markdown
Contributor

@DrisDary if you see 429 errors, these are due to throttling (to many requests). The only solution in this case is retrying.

@DrisDary DrisDary merged commit f88d1b4 into main Jan 28, 2026
10 of 12 checks passed
DrisDary added a commit that referenced this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants