Complete deployment guide for AWS AutoML Lite using Terraform.
Before you begin, ensure you have:
- ✅ AWS Account with administrative access
- ✅ AWS CLI installed and configured (Install Guide)
- ✅ Terraform >= 1.9 installed (Download)
- ✅ Docker installed and running (Get Docker) - Only for training container
- ✅ Git installed
Note: Docker is ONLY needed for building the training container (AWS Batch). The API Lambda function uses direct code deployment (no containers).
aws configureEnter your:
- AWS Access Key ID
- AWS Secret Access Key
- Default region (e.g.,
us-east-1) - Default output format:
json
Verify configuration:
aws sts get-caller-identity# Navigate to Terraform directory
cd infrastructure/terraform
# Initialize Terraform
terraform init
# Review what will be created
terraform plan
# Deploy (type 'yes' when prompted)
terraform apply⏱️ Deployment time: ~5-10 minutes
After infrastructure is deployed:
# Get ECR repository URL
ECR_URL=$(terraform output -raw ecr_repository_url)
REGION=$(terraform output -raw aws_region)
# Authenticate Docker to ECR
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $(echo $ECR_URL | cut -d'/' -f1)
# Build and push training container
cd ../../backend/training
docker build -t automl-training:latest .
docker tag automl-training:latest $ECR_URL:latest
docker push $ECR_URL:latestWindows PowerShell:
# Get ECR repository URL
$EcrUrl = terraform output -raw ecr_repository_url
$Region = terraform output -raw aws_region
# Authenticate Docker to ECR
$Password = aws ecr get-login-password --region $Region
$Password | docker login --username AWS --password-stdin $($EcrUrl.Split('/')[0])
# Build and push training container
cd ../../backend/training
docker build -t automl-training:latest .
docker tag automl-training:latest "$EcrUrl:latest"
docker push "$EcrUrl:latest"cd infrastructure/terraform
terraform output api_gateway_urlCopy this URL - you'll need it for the frontend and testing.
Example output: https://abc123xyz.execute-api.us-east-1.amazonaws.com/dev
API_URL=$(terraform output -raw api_gateway_url)
curl $API_URL/healthExpected response:
{
"status": "healthy",
"service": "automl-api",
"region": "us-east-1"
}curl -X POST $API_URL/upload \
-H "Content-Type: application/json" \
-d '{"filename": "test.csv"}'terraform output| Resource | Purpose | Cost Impact |
|---|---|---|
| 3 S3 Buckets | Store datasets, models, reports | $0.23/month (10GB) |
| 2 DynamoDB Tables | Training history & metadata | $1.00/month (on-demand) |
| Lambda Function | API endpoints | $0.80/month (100K requests) |
| API Gateway | REST API | $1.00/month (100K requests) |
| AWS Batch | Training jobs (Fargate Spot) | $3.00/month (20 jobs) |
| ECR Repository | Container images | Included |
| CloudWatch Logs | Monitoring | $0.50/month |
| IAM Roles | Permissions | Free |
💰 Total Estimated: ~$3-25/month ($0 when idle) vs ~$36-171/month for SageMaker endpoints.
Edit infrastructure/terraform/terraform.tfvars before deploying:
# Environment
environment = "dev" # or "prod"
aws_region = "us-east-1"
# Lambda Configuration
lambda_memory_size = 1024 # MB
lambda_timeout = 60 # seconds
# Batch Configuration
batch_vcpu = "2" # vCPUs
batch_memory = "4096" # MB
batch_max_vcpus = 4
# Lifecycle
s3_lifecycle_days = 90 # Days before auto-delete
cloudwatch_retention_days = 7
# VPC (leave empty to use default VPC)
vpc_id = ""
subnet_ids = []
security_group_ids = []aws logs tail /aws/lambda/automl-lite-dev-api --followaws logs tail /aws/batch/automl-lite-dev-training --followterraform show
terraform state listcd infrastructure/terraform
terraform apply -target=aws_lambda_function.api# Rebuild and push new image
cd backend/training
docker build -t automl-training:latest .
docker tag automl-training:latest $ECR_URL:latest
docker push $ECR_URL:latest
# Batch will use the new image on next training jobcd infrastructure/terraform
terraform applycd infrastructure/terraform
# Empty S3 buckets first (required)
aws s3 rm s3://$(terraform output -raw datasets_bucket_name) --recursive
aws s3 rm s3://$(terraform output -raw models_bucket_name) --recursive
aws s3 rm s3://$(terraform output -raw reports_bucket_name) --recursive
# Destroy all infrastructure
terraform destroyType yes when prompted.
Solution:
# Force unlock (use the Lock ID from error message)
terraform force-unlock <LOCK_ID>Solution:
# Check if package is too large
cd infrastructure/terraform
ls -lh lambda_function.zip
# If > 50MB, optimize dependencies or use Lambda layersSolution:
-
Verify ECR image exists:
aws ecr describe-images --repository-name automl-training
-
Check compute environment status:
aws batch describe-compute-environments
-
Verify VPC/subnet configuration in
terraform.tfvars
Solution:
- Check Lambda logs for errors
- Verify Lambda has correct environment variables
- Test Lambda directly:
aws lambda invoke --function-name automl-lite-dev-api --payload '{}' response.json
✅ Implemented by default:
- S3 buckets are private (block public access)
- IAM roles use least privilege
- CloudWatch logging enabled
- X-Ray tracing enabled
- Encryption at rest (S3, DynamoDB)
🔒 Additional recommendations:
- Enable MFA for AWS account
- Use AWS Secrets Manager for sensitive data
- Enable CloudTrail for audit logs
- Use VPC endpoints for S3/DynamoDB (optional)
- Rotate AWS access keys regularly
-
Run Locally
# Configure backend cp backend/.env.example backend/.env # Edit with values from: terraform output # Start API docker-compose up # Configure and start frontend cd frontend cp .env.local.example .env.local # Edit with API URL from terraform output pnpm install && pnpm dev
-
Test Complete Workflow
- Upload sample CSV
- Train model
- Download results
-
Write Article
- Document architecture
- Share cost analysis
- Publish as AWS Community Builder
- 📖 Full Documentation: See
PROJECT_REFERENCE.md - 🔧 Terraform Docs: See
infrastructure/terraform/README.md - 🐛 Issues: Create a GitHub issue
- 💬 AWS Support: Use AWS Support Center
# Terraform
terraform init # Initialize
terraform plan # Preview changes
terraform apply # Deploy
terraform destroy # Delete everything
terraform output # Show outputs
terraform state list # List resources
terraform show # Show current state
# AWS CLI
aws sts get-caller-identity # Check credentials
aws s3 ls # List S3 buckets
aws lambda list-functions # List Lambda functions
aws batch list-jobs --job-queue <name> --job-status RUNNING # Check Batch jobs
# Docker
docker build -t name:tag . # Build image
docker push repo:tag # Push to registry
docker images # List local images
docker ps # List running containers
# Logs
aws logs tail <log-group> --follow # Stream logs
aws logs describe-log-groups # List log groups🎉 You're all set! Your AutoML platform is now running on AWS.