S3 is AWS's infinitely scalable object store. Think of it as a giant key-value store where the key is the "path" (e.g., images/user123/avatar.jpg) and the value is the file bytes. It's not a file system — it's object storage.
| Class | Use Case | Retrieval | Min Duration | Cost |
|---|---|---|---|---|
| Standard | Frequently accessed data | Instant | None | Highest |
| Standard-IA | Infrequent access, rapid retrieval | Instant | 30 days | Lower storage |
| One Zone-IA | IA data, tolerate AZ loss | Instant | 30 days | Cheaper |
| Intelligent-Tiering | Unknown access pattern | Instant | None | Per-monitoring fee |
| Glacier Instant | Archives, once a quarter | Instant | 90 days | Very low |
| Glacier Flexible | Archives, 1-5 min – 12 hrs | Minutes-hours | 90 days | Lower |
| Glacier Deep Archive | 7-10 year compliance archives | 12-48 hours | 180 days | Cheapest |
Real-World: A media company stores raw video uploads in Standard, transcoded copies in Standard-IA after 30 days, and original masters in Glacier Deep Archive after 1 year. S3 Lifecycle policies automate this.
<LifecycleConfiguration>
<Rule>
<ID>ArchiveOldLogs</ID>
<Status>Enabled</Status>
<Filter><Prefix>logs/</Prefix></Filter>
<Transition>
<Days>30</Days>
<StorageClass>STANDARD_IA</StorageClass>
</Transition>
<Transition>
<Days>90</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>365</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>Enabling versioning means every PUT creates a new version. DELETE just adds a "delete marker" — data not lost.
Real-World: A document management system uses S3 versioning so users can restore previous versions of contracts.
bucket/contract.pdf → version 1 (original)
bucket/contract.pdf → version 2 (modified) ← current
bucket/contract.pdf → delete marker ← "deleted" but v1, v2 still there
MFA Delete: Requires MFA to permanently delete versions. Use for compliance buckets.
| Method | Key Management | Performance | Use Case |
|---|---|---|---|
| SSE-S3 | AWS manages everything | Fast | Default, no compliance requirements |
| SSE-KMS | KMS manages keys, you control | API call per encrypt/decrypt | Audit trail needed, HIPAA/PCI |
| SSE-C | YOU provide key per request | Client sends key in header | Regulatory: you must hold key |
| Client-Side | You encrypt before upload | Client CPU overhead | Maximum control |
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
}Any upload WITHOUT SSE-KMS is denied. This enforces encryption at the policy level.
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"],
"Condition": {
"Bool": {"aws:SecureTransport": "false"}
}
}Is there an explicit DENY? → YES → DENY (end)
↓ NO
Is it a cross-account request?
→ Bucket policy must allow + IAM must allow
Is it same-account request?
→ Bucket policy allows OR IAM allows (either is enough)
- Bucket Policy: Resource-based, JSON, grants access to accounts/users/public
- ACL: Legacy, grant to canonical user IDs, avoid for new setups
- IAM Policy: Identity-based, controls what your users/roles can do to S3
Best Practice: Use bucket policies for cross-account and public access. Use IAM policies for your own users.
Problem: Your app needs to let users upload a profile picture directly to S3 without going through your server.
Solution: Generate a presigned URL server-side, give it to the client.
import boto3
s3 = boto3.client('s3')
# Generate presigned upload URL (PUT)
url = s3.generate_presigned_url(
'put_object',
Params={
'Bucket': 'user-uploads',
'Key': f'profiles/{user_id}/avatar.jpg',
'ContentType': 'image/jpeg'
},
ExpiresIn=3600 # 1 hour
)
# Send URL to client — they PUT directly to S3Key Facts:
- Inherits the permissions of the IAM entity that generated it
- Max expiry: 7 days with SigV4 (604800 seconds)
- Great for: downloads of private files, direct-to-S3 uploads
Trigger Lambda/SQS/SNS when objects are created/deleted.
Real-World: Image upload triggers Lambda to resize and create thumbnails.
User uploads image → S3 → S3 Event Notification → Lambda →
→ Resize image → store thumbnail back in S3
Limitation: S3 events don't guarantee exactly-once delivery. Use SQS as buffer for reliable processing.
S3 Event Bridge integration: For more advanced filtering and routing, send S3 events to EventBridge → route to multiple targets.
For files > 100MB (required > 5GB):
# Boto3 handles this automatically with transfer_config
from boto3.s3.transfer import TransferConfig
config = TransferConfig(multipart_threshold=1024*1024*100) # 100MB
s3.upload_file('large-file.zip', 'bucket', 'large-file.zip', Config=config)Lifecycle rule to clean up incomplete uploads:
<AbortIncompleteMultipartUpload>
<DaysAfterInitiation>7</DaysAfterInitiation>
</AbortIncompleteMultipartUpload>Speeds up cross-region uploads by routing through CloudFront edge locations.
Real-World: Users in Australia uploading to a US-East bucket — enable Transfer Acceleration, uploads route through Sydney edge → faster.
URL format: bucket.s3-accelerate.amazonaws.com
Problem: Your JavaScript app at app.example.com calls S3 to load images. Browser blocks it with CORS error.
[{
"AllowedHeaders": ["*"],
"AllowedMethods": ["GET", "PUT"],
"AllowedOrigins": ["https://app.example.com"],
"ExposeHeaders": ["ETag"],
"MaxAgeSeconds": 3000
}]| Type | Cross-Region | Same-Region | Use Case |
|---|---|---|---|
| CRR | Yes | No | Disaster recovery, latency |
| SRR | No | Yes | Log aggregation, compliance copy |
Requirements: Source bucket must have versioning enabled.
Key: Replication is asynchronous — not instant. Existing objects NOT replicated automatically (use S3 Batch Operations).
For regulatory compliance (SEC Rule 17a-4, FINRA):
- Governance Mode: Users with special IAM permission can override
- Compliance Mode: No one (not even root) can delete/modify before retention expires
s3.put_object_retention(
Bucket='compliance-bucket',
Key='audit-log-2024.gz',
Retention={
'Mode': 'COMPLIANCE',
'RetainUntilDate': datetime(2031, 1, 1)
}
)- 3,500 PUT/COPY/POST/DELETE per prefix per second
- 5,500 GET/HEAD per prefix per second
Real-World Problem: Your app was uploading everything to uploads/ prefix — hitting rate limits.
Fix: Spread across prefixes: uploads/a/, uploads/b/, uploads/c/ — 3x the throughput.
Instead of downloading a 1GB CSV to find 10 rows, query inside S3:
response = s3.select_object_content(
Bucket='data-lake',
Key='sales-2024.csv',
ExpressionType='SQL',
Expression="SELECT * FROM s3object WHERE region = 'us-east'",
InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
OutputSerialization={'CSV': {}}
)Reduces data transfer and Lambda cost dramatically.
| Practice | Reason |
|---|---|
| Block Public Access at account level | Prevents accidental data exposure |
| Enable S3 Access Logs | Audit trail for compliance |
| Enable versioning on critical buckets | Accidental delete protection |
| Use Lifecycle policies | Automatic cost optimization |
| Use VPC Endpoints for S3 | Traffic stays in AWS network, no NAT gateway costs |
| Enforce SSE-KMS via bucket policy | Compliance + audit trail in CloudTrail |
| Use presigned URLs for user uploads | Never expose AWS credentials to clients |
| Enable MFA Delete on sensitive buckets | Prevent accidental/malicious permanent deletion |
| Anti-Pattern | Impact | Fix |
|---|---|---|
"Principal": "*" with no conditions |
Public bucket = data breach | Add conditions or use presigned URLs |
| Storing credentials in S3 public bucket | Game over — credentials stolen | Use Secrets Manager |
| Not enabling versioning on app assets | Can't recover from accidental delete | Enable versioning + lifecycle |
| Using ACLs for access control | ACLs are legacy, hard to audit | Use bucket policies + IAM |
| One prefix for all uploads | Rate limit bottleneck | Distribute across multiple prefixes |
| Not cleaning up incomplete multipart uploads | Stealth storage cost | Add lifecycle rule to abort after N days |
- S3 is eventually consistent — but for new object PUTs, it's strongly consistent since Dec 2020. Exam may have old questions — the answer is "strong consistency" now.
- Bucket names are globally unique across all AWS accounts.
- S3 does NOT support append operations — you must rewrite the whole object.
- ACLs are legacy — exam will push you toward bucket policies.
- Cross-account S3 access: BOTH the bucket policy AND the IAM policy must allow it.
- Glacier retrieval types: Expedited (1-5 min), Standard (3-5 hours), Bulk (5-12 hours).
- Server Access Logging vs CloudTrail: Server Access Logs = S3 API-level detail. CloudTrail = management events + data events (if enabled).
- Static website hosting: CORS on S3 is required when browser JS calls the bucket directly.
Q: Cheapest way to store backups accessed once a year? → S3 Glacier Deep Archive
Q: S3 objects deleted accidentally, how to protect? → Enable versioning + MFA Delete
Q: Lambda processes S3 uploads but sometimes misses events? → Add SQS between S3 and Lambda as event buffer (Dead Letter Queue on SQS)
Q: How to allow a different AWS account to access your S3 bucket?
→ Add bucket policy with Principal: {"AWS": "arn:aws:iam::OTHER_ACCOUNT:root"}
Q: Developer wants to enforce all uploads use KMS encryption?
→ Bucket policy with Deny if s3:x-amz-server-side-encryption != aws:kms