feat: add multi-cloud VM support with AWS backend and VMProvider protocol#66
Merged
Conversation
…ocol - Create VMProvider Protocol (typing.Protocol) for cloud-agnostic VM management - Create AWSVMManager with boto3 for EC2 lifecycle (create, delete, start, stop) - Add resource_scope/ssh_username properties to AzureVMManager - Add list_pool_resources/cleanup_pool_resources to AzureVMManager - Parameterize pool.py SSH calls and scripts with username/home_dir - Add --cloud flag (azure|aws) to all pool CLI commands - Add cloud_provider/aws_region to config.py settings - Add boto3 optional dependency (openadapt-evals[aws]) - Update tests for WAA_START_SCRIPT_TEMPLATE rename Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix DOCKER_SETUP_SCRIPT_WITH_ACR daemon.json double-brace corruption
that produced invalid JSON ({{"data-root"...}}) breaking Docker start
- Use .metal instance types for AWS (KVM/nested virt required for QEMU)
- Fix region mismatch: update self.region and invalidate cached clients
when create_vm uses a different region than the manager default
- Fix hardcoded "azureuser" in pool-wait diagnostic message
- Set AWSVMManager = None on ImportError so `import *` doesn't raise
- Only delete pool registry on successful cleanup (prevents orphaned
cloud resources when deletion fails)
- Remove unused `time` import from aws_vm.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix pool-vnc/pool-logs/pool-exec hardcoded azureuser: read ssh_username from pool registry with backward-compatible default - Store ssh_username in VMPool dataclass and persist to registry on create - Move set_auto_shutdown after SSH is available (was racing with boot) - Fix cleanup_pool_resources: handle raw instance IDs and allocation IDs for resources without Name tags (prevents orphaned resources) - Narrow key pair exception handling: re-raise unless InvalidKeyPair.NotFound - Add TODO for restricting SSH security group to user's IP Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ssh_username to VMPoolRegistry.load() so it persists across process restarts (was silently reverting to "azureuser" default) - Fix disassociate_address for raw allocation IDs: look up AssociationId via describe_addresses first (disassociate_address does not accept AllocationId parameter) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
typing.Protocol): Cloud-agnostic interface for VM lifecycle management. BothAzureVMManagerandAWSVMManagersatisfy it via structural subtyping — no inheritance changes neededboto3): Full EC2 lifecycle management (create/delete/start/stop instances, EIP allocation, VPC/subnet/SG idempotent setup, tag-based pool resource discovery and cleanup)VMProviderinstead of hardcodedAzureVMManager. All pool commands (pool-create,pool-wait,pool-run,pool-cleanup,pool-pause,pool-resume) accept--cloud azure|awsazureuserreferences replaced withvm_manager.ssh_username/{home_dir}template variablesChanges by file
infrastructure/vm_provider.pyVMProviderProtocol (10 methods + 2 properties)infrastructure/aws_vm.pyAWSVMManagerdataclass using boto3infrastructure/azure_vm.pyresource_scope,ssh_usernameproperties;list_pool_resources,cleanup_pool_resourcesmethods; parameterizedssh_run/wait_for_sshwith usernameinfrastructure/pool.pyVMProviderinstead ofAzureVMManager; parameterized scripts with{home_dir}/{ssh_username}; delegated cleanup to providerbenchmarks/vm_cli.py_create_vm_manager()factory;--cloudarg on all pool subparsersconfig.pycloud_provider,aws_regionsettingsinfrastructure/__init__.pyVMProvider,AWSVMManagerpyproject.tomlaws = ["boto3>=1.34.0"]optional deptests/test_evaluate_server_deploy.pyWAA_START_SCRIPT_TEMPLATErenameTest plan
WAA_START_SCRIPT_TEMPLATEtests updated and passingVMProvider,AWSVMManagerwith/without boto3)oa-vm pool-create,pool-pause,pool-resume,pool-cleanupoa-vm pool-create --cloud aws --workers 1— creates EC2 instanceoa-vm pool-cleanup --cloud aws -y— terminates instances, releases resources🤖 Generated with Claude Code