EC2 provides virtual machines (instances) in the cloud. You choose the OS, instance type (CPU/RAM), and networking. AWS manages the physical hardware.
m5.2xlarge
│ │ └── Size: nano, micro, small, medium, large, xlarge, 2xlarge...
│ └── Generation: 5th gen
└── Family: m = general purpose
| Family | Optimized for | Examples |
|---|---|---|
t |
Burstable, cost-efficient | t3.micro, t3.small |
m |
General purpose (balanced) | m5.large, m6i.xlarge |
c |
Compute optimized | c5.xlarge (CPU-intensive) |
r |
Memory optimized | r5.2xlarge (Redis, Spark) |
g / p |
GPU | g4dn.xlarge (ML inference) |
i |
Storage optimized (NVMe) | i3.large (high IOPS) |
inf |
Machine learning inference | inf1.xlarge |
| Option | When to use | Savings vs On-Demand |
|---|---|---|
| On-Demand | Short-term, unpredictable | 0% (baseline) |
| Reserved (1yr) | Steady-state workload | ~40% |
| Reserved (3yr) | Long-term steady workload | ~60% |
| Savings Plans | Flexible (commit to $/hr) | ~40-60% |
| Spot | Fault-tolerant batch jobs | ~70-90% |
| Dedicated Host | Compliance (BYOL, socket licensing) | N/A |
| Dedicated Instance | Physical isolation requirement | Higher cost |
- Cheapest option — spare AWS capacity
- AWS can reclaim with 2-minute warning
- Never use for: databases, critical stateful apps
- Use for: batch processing, ML training, rendering, CI/CD workers
- Spot Fleet: mix of instance types to maintain target capacity even if one type reclaimed
Runs once at first launch (as root):
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "Hello from $(hostname)" > /var/www/html/index.htmlEC2 instances can query their own metadata:
# IMDSv1 (legacy - insecure)
curl http://169.254.169.254/latest/meta-data/
# IMDSv2 (secure - token-based, recommended)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/iam/security-credentials/
# What you can get:
# - Instance ID, type, region, AZ
# - IAM role credentials (temporary!)
# - Public/private IP
# - User data scriptIMDSv2 is mandatory for exam answers about security — prevents SSRF attacks on metadata.
Control how EC2 instances are placed physically:
| Type | Physical placement | Use for |
|---|---|---|
| Cluster | Same rack (same AZ) | HPC, low latency network (10Gbps+ between instances) |
| Spread | Different racks, different AZ | Max HA, max 7 instances per AZ |
| Partition | Different partitions (rack groups) | Large distributed systems (Kafka, HDFS, Cassandra) |
AMI = snapshot of an instance (OS + installed software + config):
EC2 Instance (fully configured) → Create AMI → Launch new instances from AMI
Golden AMI pattern:
- Launch base instance
- Install all dependencies, configure everything
- Create AMI
- Auto Scaling Group uses this AMI → instances launch faster (no bootstrap needed)
AMI is regional — must copy to other regions for cross-region launch.
Automatically maintain the right number of EC2 instances:
ASG: min=2, desired=4, max=10
Policy: scale out when CPU > 70%, scale in when CPU < 30%
Traffic spike → CPU 85% → ASG launches 2 more instances → CPU drops
Traffic drop → CPU 20% → ASG terminates 2 instances → save cost
| Policy | How | Use for |
|---|---|---|
| Target Tracking | Maintain metric at target | CPU=70%, ALB requests/instance |
| Step Scaling | Add/remove based on metric bands | Aggressive scaling at critical thresholds |
| Scheduled | Scale at specific times | Known traffic patterns (9am-5pm) |
| Predictive | ML-based, forecast and pre-scale | Cyclical patterns |
Pause instance during launch or termination for custom actions:
Launch: Pending → [Lifecycle Hook: install agent, register with Consul] → InService
Terminate: Terminating → [Lifecycle Hook: drain connections, backup data] → Terminated
Rolling update of all instances (e.g., after AMI update):
Set MinHealthyPercentage: 80%
→ Terminate 20% of instances (replace with new AMI)
→ Wait for them healthy → continue until all replaced
| Practice | Reason |
|---|---|
| Use IAM Instance Profile (not access keys) | Temporary credentials, auto-rotated |
| Enable IMDSv2 | Prevent SSRF metadata attacks |
| Use Auto Scaling Groups | Automatic capacity management |
| Use Golden AMI | Fast, consistent launches |
| Use Spot Instances for batch workloads | 70-90% cost reduction |
| Enable detailed monitoring (1-min) | Faster reaction to scaling events |
| Use lifecycle hooks for graceful shutdown | No dropped connections on scale-in |
| Anti-Pattern | Impact | Fix |
|---|---|---|
| Hardcoding AWS credentials on EC2 | Security risk, manual rotation | Use Instance Profile (IAM role) |
| IMDSv1 enabled | SSRF vulnerability → credential theft | Require IMDSv2 |
| Single instance, no ASG | Single point of failure | Use ASG with min=2 across AZs |
| Using On-Demand for everything | 60% higher cost than Reserved | Use Reserved for steady-state |
- EC2 Instance Connect: browser-based SSH. No need to manage SSH keys (temporary key pushed). Requires inbound SSH from AWS IP range.
- IMDSv2 uses token — PUT to get token, then use token in GET. Prevents SSRF.
- Placement Group Cluster: same AZ, low latency. Spread: different racks, max HA.
- Spot interruption: 2-minute notice. Lambda/EventBridge can handle the interruption.
- T instance bursting: CPU credits accumulate when idle, used when CPU > baseline. Unlimited mode: charges extra for sustained burst.
- On-Demand vs Reserved exam trick: If workload runs 24/7 for > 1 year → Reserved is cheaper.
Q: ML training job needs cheapest compute for batch workload? → Spot Instances — can handle interruptions by checkpointing.
Q: Auto Scaling not scaling fast enough during sudden traffic spike? → Use Predictive Scaling or Scheduled Scaling if pattern is known. Or set a more aggressive step scaling policy.
Q: EC2 needs to call DynamoDB securely? → Attach an IAM Instance Profile with DynamoDB permissions.
Q: Need to run 7 instances always on separate physical hardware? → Spread Placement Group (max 7 per AZ per group).