DynamoDB is AWS's fully managed, serverless NoSQL database. No patching, no cluster management. It's a key-value + document store that scales from zero to 10+ million requests per second automatically.
Real-World: Every Amazon.com shopping cart, every Amazon Prime order lookup — DynamoDB handles millions of reads per second with single-digit millisecond latency.
- Table: like a collection in MongoDB, or a table in SQL (without a fixed schema)
- Item: a row (up to 400KB)
- Attribute: a field/column (no schema enforced except keys)
Option 1: Partition Key only (Simple PK)
Table: Users
PK: userId (e.g., "user_123")
All data with same partition key lives in the same physical partition.
Option 2: Partition Key + Sort Key (Composite PK)
Table: Orders
PK: customerId (e.g., "cust_456")
SK: orderDate#orderId (e.g., "2024-01-15#ord_789")
Now you can query: "all orders for customer 456 in January 2024" — one efficient query.
- You specify Read Capacity Units (RCUs) and Write Capacity Units (WCUs)
- 1 RCU = 1 strongly consistent read per second for items up to 4KB
- 1 WCU = 1 write per second for items up to 1KB
- Use when: traffic is predictable. Can use Auto Scaling.
- Pay per request — no capacity planning
- Scales automatically, immediately
- Use when: unpredictable spikes, new apps, dev/test
Real-World: An e-commerce site uses Provisioned with Auto Scaling for steady traffic, but switches to On-Demand the week of Black Friday.
| Type | RCUs used | When to use |
|---|---|---|
| Eventually Consistent | 0.5 RCU | Default — data may be 1-2 seconds stale |
| Strongly Consistent | 1 RCU | Need latest data (e.g., payment confirmation) |
| Transactional | 2 RCUs | Multi-item ACID transactions |
- Same partition key, different sort key
- Must be created at table creation time
- Uses the table's RCU/WCU
- Max 5 per table
Table: Orders (PK: customerId, SK: orderDate)
LSI: customerId (PK) + orderStatus (SK)
→ "All PENDING orders for customer 456"
- Different partition key AND/OR sort key
- Can be created anytime
- Has its own RCU/WCU (common source of throttling!)
- Max 20 per table
Table: Orders (PK: customerId, SK: orderDate)
GSI: orderStatus (PK) + orderDate (SK)
→ "All PENDING orders across ALL customers in January"
Hot Tip: GSIs are how you enable flexible querying on non-key attributes. Design GSIs carefully — they're the most common DynamoDB exam topic.
| Operation | How it works | RCUs used | Performance |
|---|---|---|---|
| Query | Uses PK (required) + SK (optional) | Only items returned | Fast — O(log n) |
| Scan | Reads every item in table | Every item in table | Slow — O(n), avoid! |
Real-World Anti-Pattern: Team queries a 50M item table using Scan with a filter expression to find users by email. Every query reads all 50M items, burns thousands of RCUs.
Fix: Create a GSI with email as the partition key.
# BAD - Scan
response = table.scan(
FilterExpression=Attr('email').eq('john@example.com')
)
# GOOD - Query on GSI
response = table.query(
IndexName='email-index',
KeyConditionExpression=Key('email').eq('john@example.com')
)What: A changelog of every item modification (INSERT, MODIFY, REMOVE). Events appear within 24 hours, retained for 24 hours.
StreamViewType options:
KEYS_ONLY: just PK/SK of changed itemNEW_IMAGE: entire item after changeOLD_IMAGE: entire item before changeNEW_AND_OLD_IMAGES: both (most useful)
Real-World Use Cases:
- Audit trail: Lambda reads stream, writes to S3/CloudWatch
- Real-time analytics: Stream → Lambda → ElasticSearch
- Cross-region replication: DynamoDB Global Tables uses streams internally
- Event sourcing: User profile update → stream → Lambda → send welcome email
DynamoDB Write → DynamoDB Stream → Lambda (EventSourceMapping) → Process event
DynamoDB supports ACID transactions across multiple items (even across tables):
dynamodb.transact_write(
TransactItems=[
{
'Update': {
'TableName': 'accounts',
'Key': {'accountId': 'acc_A'},
'UpdateExpression': 'SET balance = balance - :amount',
'ConditionExpression': 'balance >= :amount',
'ExpressionAttributeValues': {':amount': Decimal('100')}
}
},
{
'Update': {
'TableName': 'accounts',
'Key': {'accountId': 'acc_B'},
'UpdateExpression': 'SET balance = balance + :amount',
'ExpressionAttributeValues': {':amount': Decimal('100')}
}
}
]
)Cost: Transactional reads/writes cost 2x the normal RCUs/WCUs.
Problem: Two users update the same item simultaneously — last write wins, data lost.
Solution: Use version attribute for optimistic locking.
# Only update if version matches what we last read
table.update_item(
Key={'itemId': 'item_123'},
UpdateExpression='SET price = :newPrice, version = version + :inc',
ConditionExpression='version = :expectedVersion',
ExpressionAttributeValues={
':newPrice': Decimal('29.99'),
':inc': 1,
':expectedVersion': 5
}
)
# Throws ConditionalCheckFailedException if someone else updated firstReal-World: E-commerce inventory system — prevents overselling when two customers try to buy the last item.
Automatically delete items after a timestamp. No extra charge.
import time
table.put_item(Item={
'sessionId': 'sess_abc',
'userId': 'user_123',
'data': {...},
'ttl': int(time.time()) + 3600 # expires in 1 hour
})Real-World: Session tokens, rate limiting counters, temporary cache items, cart abandonment tracking.
Key: TTL deletion is not immediate — can take up to 48 hours. Items past TTL might still be readable briefly. Filter them client-side if exactness matters.
In-memory cache specifically for DynamoDB — reduces read latency from milliseconds to microseconds.
App → DAX (in-memory) → DynamoDB (on miss)
Use When:
- Read-heavy workloads (leaderboards, product catalog)
- Same query repeated thousands of times per second
- Not a good fit for write-heavy or strongly consistent reads (DAX serves eventually consistent only)
NOT a fix for: Write bottlenecks, complex analytics (use Athena/Redshift).
DynamoDB Global Tables = multi-region, multi-active replication.
us-east-1 ←→ eu-west-1 ←→ ap-southeast-1
↑ ↑ ↑
Reads & Reads & Reads &
Writes Writes Writes
Real-World: A global gaming leaderboard where players worldwide see near-real-time updates. Each region writes locally, changes replicate globally in ~1 second.
Conflict resolution: "Last writer wins" — based on timestamp.
Table: GameScores
PK: gameId = "fortnite" ← ALL writes go to one partition!
All writes for the most popular game hit one partition → throttled.
gameId = "fortnite_1", "fortnite_2", ..., "fortnite_10"
Scatter writes across 10 partitions.
PK: gameId#date (e.g., "fortnite#2024-01-15")
Distributes by day.
| Practice | Reason |
|---|---|
| Design for your access patterns first | DynamoDB doesn't let you add indexes cheaply later |
| Use composite keys (PK+SK) for complex queries | Enables range queries, pagination, flexible filtering |
| Keep items under 400KB | Hard limit; large items hurt performance |
| Use sparse indexes | Only items with the indexed attribute appear in GSI → lower cost |
| Set TTL on temporary data | Automatic cleanup, no Lambda needed |
| Use Condition Expressions | Prevents data races, implements optimistic locking |
| Use DynamoDB Streams for async workflows | Decouple write → process pattern |
| Batch reads/writes | BatchGetItem / BatchWriteItem — up to 25 items, lower cost |
| Anti-Pattern | Impact | Fix |
|---|---|---|
| Using Scan instead of Query | Reads entire table = expensive + slow | Create GSI for the access pattern |
| Hot partition key (user_id on viral content) | Throttling on popular items | Add shard suffix to partition key |
| Storing large blobs in DynamoDB | 400KB limit, high cost | Store in S3, keep S3 key in DynamoDB |
| Using DynamoDB as a relational DB | Poor query flexibility, high cost | Use RDS if you need joins |
| Creating too many GSIs | Each GSI costs WCUs to maintain | Limit GSIs, consolidate with smart SK design |
| Not using Condition Expressions | Race conditions, double-writes | Always use conditions for critical writes |
- 400KB per item — hard limit. Exam scenario: "item too large" → store in S3.
- GSI throttling is separate from table throttling — GSI needs its own capacity.
- LSI uses table capacity, GSI has its own capacity.
- Query requires Partition Key — you can't query without it.
- Scan + FilterExpression still reads all items (billing/performance hit) — filter happens AFTER reading.
- DynamoDB is schema-less — different items can have different attributes.
- Strongly consistent reads cost 2x more RCUs.
- Transact writes support up to 25 items per transaction.
- BatchWriteItem doesn't support Update — use for Put and Delete only.
- DynamoDB Streams + Lambda = most common exam serverless pattern.
Q: Query DynamoDB by an attribute that is not the PK? → Create a GSI with that attribute as partition key.
Q: Lambda processes DynamoDB changes in real-time? → Enable DynamoDB Streams, use Lambda as stream processor.
Q: Prevent two users from updating the same item simultaneously? → Use Condition Expressions with a version number (optimistic locking).
Q: DynamoDB latency too high for a gaming leaderboard? → Add DAX cluster in front.
Q: Store user sessions that expire after 24 hours? → Add TTL attribute with Unix timestamp.
Q: One partition key getting all the traffic? → Add random suffix (write sharding) or redesign partition key.