Skip to content

Latest commit

 

History

History
295 lines (220 loc) · 9.52 KB

File metadata and controls

295 lines (220 loc) · 9.52 KB

RDS & Aurora — Managed Relational Databases

What Is It?

RDS is AWS's managed relational database service — AWS handles OS patching, backups, failover, and replication. You focus on your schema and queries.

Real-World: Instead of running MySQL on EC2 (which requires you to manage OS updates, disk management, replication setup, backup scripts), you use RDS — AWS does all that.


Supported Engines

Engine Best for
MySQL Open-source apps, WordPress
PostgreSQL Complex queries, JSON, extensions
MariaDB MySQL-compatible, open-source
Oracle Enterprise apps, legacy systems
SQL Server .NET apps, Windows integration
Aurora MySQL High-performance MySQL compatible
Aurora PostgreSQL High-performance PostgreSQL compatible

RDS Multi-AZ vs Read Replicas

Feature Multi-AZ Read Replica
Purpose High Availability Performance/Scaling
Replication Synchronous Asynchronous
Failover Automatic (< 2 min) Manual promotion
Can read from standby? No (standby is passive) Yes (that's the point)
Cross-region? No Yes
DNS change on failover? Yes (same DNS, new IP) No (separate endpoint)

Multi-AZ Failover

Primary DB (us-east-1a) → Synchronous replication → Standby (us-east-1b)
                                                              ↓
Primary fails → RDS DNS CNAME flips to standby → App reconnects

Same DNS name, ~60-120 second failover. Not instant — design apps for reconnection.

Read Replicas

Primary DB ← Writes only
Read Replica 1 ← Reads (reporting, analytics)
Read Replica 2 ← Reads (app read traffic)
Read Replica 3 in eu-west-1 ← Cross-region reads

Up to 5 read replicas per RDS instance (15 for Aurora).

Promoting a Read Replica: Make it standalone DB (for disaster recovery cross-region).


RDS Backups

Automated Backups

  • Daily snapshot + transaction logs
  • Retention: 0-35 days
  • Point-in-time recovery to any second within retention window
  • Stored in S3 (you don't see/pay for this storage separately)

Manual Snapshots

  • You trigger explicitly
  • Kept until you delete them (even after DB deleted)
  • Restore = creates new DB instance

RDS Snapshot + Share: Share with other AWS accounts or make public.


RDS Encryption

  • Must enable at creation time (can't add later without restore)
  • Uses KMS (CMK or AWS managed)
  • Encrypted DB → encrypted snapshots → encrypted read replicas
  • Encrypting unencrypted DB: snapshot → copy snapshot with encryption → restore

RDS Proxy

Problem: Lambda functions each open their own DB connection. 1,000 concurrent Lambdas = 1,000 DB connections → MySQL max connections exceeded.

Solution: RDS Proxy pools connections.

1,000 Lambda instances → RDS Proxy (50 pooled connections) → RDS
Benefits:
- Reduces database load
- Reduces failover time (handles connection during Multi-AZ failover)
- Enforces IAM authentication
- Secrets Manager integration

RDS Proxy is highly recommended for Lambda + RDS.


Aurora — High Performance MySQL/PostgreSQL

Aurora is NOT just another RDS option — it's a completely re-architected cloud-native database.

Architecture

Aurora Cluster:
├── Primary Writer (1 instance — reads + writes)
├── Replica 1 (reads only)
├── Replica 2 (reads only)
└── Shared Storage (6 copies across 3 AZs — self-healing)
  • Storage: automatically grows in 10GB increments up to 128TB
  • Replicas: up to 15 (vs 5 for standard RDS)
  • Failover: < 30 seconds (vs 60-120s for RDS Multi-AZ)
  • Performance: 5x MySQL performance, 3x PostgreSQL performance

Aurora Endpoints

Endpoint Use for
Cluster endpoint (Writer) All writes + reads if you want
Reader endpoint (load-balanced) Read traffic — auto load balances across replicas
Instance endpoint Direct to specific instance (for diagnostics)
Custom endpoint You define which instances (e.g., larger instances for analytics)

Aurora Serverless

No provisioned instances — scales automatically based on actual usage:

Low traffic → scales to 0 (you pay nothing)
Traffic spike → scales up in seconds

Use for: Dev/test, infrequent or unpredictable workloads, new apps.

Aurora Serverless v2: Instant scaling (unlike v1 which had a scaling delay).

Aurora Global Database

Multi-region, low-latency reads:

Primary Region (us-east-1): Writer + Readers
Secondary Region (eu-west-1): Readers only
Secondary Region (ap-southeast-1): Readers only

Replication lag: typically < 1 second
RPO: < 1 second
RTO: < 1 minute (promote secondary to primary)

Use for: Global apps, disaster recovery, compliance (data in specific region).


ElastiCache — In-Memory Caching

Redis vs Memcached

Feature Redis Memcached
Data structures Strings, Lists, Sets, Sorted Sets, Hashes Strings only
Persistence Optional (RDB/AOF) No
Replication Yes (Multi-AZ) No
Clustering Yes (cluster mode) Yes (horizontal)
Lua scripting Yes No
Pub/Sub Yes No
Use for Caching + sessions + leaderboards + pub-sub Simple caching, high throughput

Caching Patterns

Cache-Aside (Lazy Loading) — most common:

def get_user(user_id):
    # Try cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # Cache miss - get from DB
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # Store in cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

Write-Through — update cache when DB is updated:

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = ?", user_id)
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))  # Update cache too

Cache Invalidation — delete on update:

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = ?", user_id)
    redis.delete(f"user:{user_id}")  # Next read will repopulate

Session Storage with Redis

# Store user session
redis.setex(f"session:{session_id}", 3600, json.dumps({
    'userId': 'user_123',
    'email': 'john@example.com',
    'role': 'admin'
}))

# Retrieve session
session = redis.get(f"session:{session_id}")

Redis Sorted Sets (Leaderboard)

# Add/update player score
redis.zadd('leaderboard', {'player_123': 9500})
redis.zadd('leaderboard', {'player_456': 8200})
redis.zadd('leaderboard', {'player_789': 11000})

# Get top 10 players
top10 = redis.zrevrange('leaderboard', 0, 9, withscores=True)
# [('player_789', 11000), ('player_123', 9500), ('player_456', 8200)]

# Get player rank
rank = redis.zrevrank('leaderboard', 'player_123')  # 1 (0-indexed)

Good Practices

Practice Reason
Enable Multi-AZ for production RDS Automatic failover, no manual intervention
Use RDS Proxy with Lambda Prevents connection pool exhaustion
Use Aurora for new high-traffic apps Better performance, faster failover
Set appropriate connection pool size Avoid overwhelming DB with connections
Use Read Replicas for read-heavy workloads Scale reads independently
Enable encryption at creation Can't add later without data migration
Use parameter groups for DB config Version-controlled DB configuration
Cache frequently-read data in ElastiCache Reduce DB load, improve latency

Bad Practices

Anti-Pattern Impact Fix
Lambda connecting directly to RDS without proxy Connection exhaustion at scale Use RDS Proxy
Read traffic going to primary Unnecessary load on writer Use Read Replica endpoint
Storing sessions in RDS High read load, latency Use ElastiCache Redis for sessions
Not using Multi-AZ for production Single point of failure Enable Multi-AZ
Long-lived Lambda DB connections without proper handling Stale connections, errors Use connection pooling via RDS Proxy

Exam Tips

  1. Multi-AZ = HA (High Availability). Read Replicas = scalability.
  2. Multi-AZ standby is NOT readable — it's a hot standby, not a read replica.
  3. Aurora replicas serve as Multi-AZ standby AND read replicas simultaneously.
  4. RDS automated backup retention: 0 (disabled) to 35 days.
  5. ElastiCache in VPC: cannot access from internet — must be in same VPC or via VPN.
  6. Redis vs Memcached for exam: If question mentions sessions, sorted sets, pub/sub, Multi-AZ → Redis. Simple key-value, multi-threaded, no persistence needed → Memcached.
  7. Aurora Global Database: primary region handles writes; secondary regions handle reads. RPO = 1 second.

Common Exam Scenarios

Q: Lambda causes too many RDS connections? → Use RDS Proxy.

Q: Reduce RDS load from reporting queries? → Create a Read Replica and point reporting to it.

Q: App needs sub-millisecond read latency for product catalog? → Use ElastiCache (Redis or Memcached) as cache layer in front of RDS.

Q: Store user sessions across multiple EC2 instances?ElastiCache Redis for centralized session storage.

Q: Database needs to survive AZ failure automatically? → Enable RDS Multi-AZ.

Q: Global app with low-latency reads in multiple regions?Aurora Global Database with read replicas in each region.