S3 — Simple Storage Service

What Is It?

S3 is AWS's infinitely scalable object store. Think of it as a giant key-value store where the key is the "path" (e.g., images/user123/avatar.jpg) and the value is the file bytes. It's not a file system — it's object storage.

Core Concepts

Storage Classes — Choose Based on Access Pattern

Class	Use Case	Retrieval	Min Duration	Cost
Standard	Frequently accessed data	Instant	None	Highest
Standard-IA	Infrequent access, rapid retrieval	Instant	30 days	Lower storage
One Zone-IA	IA data, tolerate AZ loss	Instant	30 days	Cheaper
Intelligent-Tiering	Unknown access pattern	Instant	None	Per-monitoring fee
Glacier Instant	Archives, once a quarter	Instant	90 days	Very low
Glacier Flexible	Archives, 1-5 min – 12 hrs	Minutes-hours	90 days	Lower
Glacier Deep Archive	7-10 year compliance archives	12-48 hours	180 days	Cheapest

Real-World: A media company stores raw video uploads in Standard, transcoded copies in Standard-IA after 30 days, and original masters in Glacier Deep Archive after 1 year. S3 Lifecycle policies automate this.

Lifecycle Policies — Automate Tiering

<LifecycleConfiguration>
  <Rule>
    <ID>ArchiveOldLogs</ID>
    <Status>Enabled</Status>
    <Filter><Prefix>logs/</Prefix></Filter>
    <Transition>
      <Days>30</Days>
      <StorageClass>STANDARD_IA</StorageClass>
    </Transition>
    <Transition>
      <Days>90</Days>
      <StorageClass>GLACIER</StorageClass>
    </Transition>
    <Expiration>
      <Days>365</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

Versioning

Enabling versioning means every PUT creates a new version. DELETE just adds a "delete marker" — data not lost.

Real-World: A document management system uses S3 versioning so users can restore previous versions of contracts.

bucket/contract.pdf  → version 1 (original)
bucket/contract.pdf  → version 2 (modified)  ← current
bucket/contract.pdf  → delete marker          ← "deleted" but v1, v2 still there

MFA Delete: Requires MFA to permanently delete versions. Use for compliance buckets.

Encryption

Four Encryption Options

Method	Key Management	Performance	Use Case
SSE-S3	AWS manages everything	Fast	Default, no compliance requirements
SSE-KMS	KMS manages keys, you control	API call per encrypt/decrypt	Audit trail needed, HIPAA/PCI
SSE-C	YOU provide key per request	Client sends key in header	Regulatory: you must hold key
Client-Side	You encrypt before upload	Client CPU overhead	Maximum control

Enforcing Encryption via Bucket Policy

{
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:PutObject",
  "Resource": "arn:aws:s3:::my-bucket/*",
  "Condition": {
    "StringNotEquals": {
      "s3:x-amz-server-side-encryption": "aws:kms"
    }
  }
}

Any upload WITHOUT SSE-KMS is denied. This enforces encryption at the policy level.

Enforce HTTPS Only

{
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:*",
  "Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"],
  "Condition": {
    "Bool": {"aws:SecureTransport": "false"}
  }
}

Access Control — Who Can Access What?

Decision Flow

Is there an explicit DENY? → YES → DENY (end)
↓ NO
Is it a cross-account request?
  → Bucket policy must allow + IAM must allow
Is it same-account request?
  → Bucket policy allows OR IAM allows (either is enough)

Bucket Policy vs ACL vs IAM

Bucket Policy: Resource-based, JSON, grants access to accounts/users/public
ACL: Legacy, grant to canonical user IDs, avoid for new setups
IAM Policy: Identity-based, controls what your users/roles can do to S3

Best Practice: Use bucket policies for cross-account and public access. Use IAM policies for your own users.

Presigned URLs

Problem: Your app needs to let users upload a profile picture directly to S3 without going through your server.

Solution: Generate a presigned URL server-side, give it to the client.

import boto3

s3 = boto3.client('s3')

# Generate presigned upload URL (PUT)
url = s3.generate_presigned_url(
    'put_object',
    Params={
        'Bucket': 'user-uploads',
        'Key': f'profiles/{user_id}/avatar.jpg',
        'ContentType': 'image/jpeg'
    },
    ExpiresIn=3600  # 1 hour
)
# Send URL to client — they PUT directly to S3

Key Facts:

Inherits the permissions of the IAM entity that generated it
Max expiry: 7 days with SigV4 (604800 seconds)
Great for: downloads of private files, direct-to-S3 uploads

S3 Events

Trigger Lambda/SQS/SNS when objects are created/deleted.

Real-World: Image upload triggers Lambda to resize and create thumbnails.

User uploads image → S3 → S3 Event Notification → Lambda → 
  → Resize image → store thumbnail back in S3

Limitation: S3 events don't guarantee exactly-once delivery. Use SQS as buffer for reliable processing.

S3 Event Bridge integration: For more advanced filtering and routing, send S3 events to EventBridge → route to multiple targets.

Multipart Upload

For files > 100MB (required > 5GB):

# Boto3 handles this automatically with transfer_config
from boto3.s3.transfer import TransferConfig

config = TransferConfig(multipart_threshold=1024*1024*100)  # 100MB
s3.upload_file('large-file.zip', 'bucket', 'large-file.zip', Config=config)

Lifecycle rule to clean up incomplete uploads:

<AbortIncompleteMultipartUpload>
  <DaysAfterInitiation>7</DaysAfterInitiation>
</AbortIncompleteMultipartUpload>

S3 Transfer Acceleration

Speeds up cross-region uploads by routing through CloudFront edge locations.

Real-World: Users in Australia uploading to a US-East bucket — enable Transfer Acceleration, uploads route through Sydney edge → faster.

URL format: bucket.s3-accelerate.amazonaws.com

CORS Configuration

Problem: Your JavaScript app at app.example.com calls S3 to load images. Browser blocks it with CORS error.

[{
  "AllowedHeaders": ["*"],
  "AllowedMethods": ["GET", "PUT"],
  "AllowedOrigins": ["https://app.example.com"],
  "ExposeHeaders": ["ETag"],
  "MaxAgeSeconds": 3000
}]

Replication (CRR & SRR)

Type	Cross-Region	Same-Region	Use Case
CRR	Yes	No	Disaster recovery, latency
SRR	No	Yes	Log aggregation, compliance copy

Requirements: Source bucket must have versioning enabled.

Key: Replication is asynchronous — not instant. Existing objects NOT replicated automatically (use S3 Batch Operations).

S3 Object Lock & WORM

For regulatory compliance (SEC Rule 17a-4, FINRA):

Governance Mode: Users with special IAM permission can override
Compliance Mode: No one (not even root) can delete/modify before retention expires

s3.put_object_retention(
    Bucket='compliance-bucket',
    Key='audit-log-2024.gz',
    Retention={
        'Mode': 'COMPLIANCE',
        'RetainUntilDate': datetime(2031, 1, 1)
    }
)

Performance Optimization

S3 Request Rate Limits

3,500 PUT/COPY/POST/DELETE per prefix per second
5,500 GET/HEAD per prefix per second

Real-World Problem: Your app was uploading everything to uploads/ prefix — hitting rate limits. Fix: Spread across prefixes: uploads/a/, uploads/b/, uploads/c/ — 3x the throughput.

S3 Select

Instead of downloading a 1GB CSV to find 10 rows, query inside S3:

response = s3.select_object_content(
    Bucket='data-lake',
    Key='sales-2024.csv',
    ExpressionType='SQL',
    Expression="SELECT * FROM s3object WHERE region = 'us-east'",
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'CSV': {}}
)

Reduces data transfer and Lambda cost dramatically.

Good Practices

Practice	Reason
Block Public Access at account level	Prevents accidental data exposure
Enable S3 Access Logs	Audit trail for compliance
Enable versioning on critical buckets	Accidental delete protection
Use Lifecycle policies	Automatic cost optimization
Use VPC Endpoints for S3	Traffic stays in AWS network, no NAT gateway costs
Enforce SSE-KMS via bucket policy	Compliance + audit trail in CloudTrail
Use presigned URLs for user uploads	Never expose AWS credentials to clients
Enable MFA Delete on sensitive buckets	Prevent accidental/malicious permanent deletion

Bad Practices

Anti-Pattern	Impact	Fix
`"Principal": "*"` with no conditions	Public bucket = data breach	Add conditions or use presigned URLs
Storing credentials in S3 public bucket	Game over — credentials stolen	Use Secrets Manager
Not enabling versioning on app assets	Can't recover from accidental delete	Enable versioning + lifecycle
Using ACLs for access control	ACLs are legacy, hard to audit	Use bucket policies + IAM
One prefix for all uploads	Rate limit bottleneck	Distribute across multiple prefixes
Not cleaning up incomplete multipart uploads	Stealth storage cost	Add lifecycle rule to abort after N days

Exam Tips

S3 is eventually consistent — but for new object PUTs, it's strongly consistent since Dec 2020. Exam may have old questions — the answer is "strong consistency" now.
Bucket names are globally unique across all AWS accounts.
S3 does NOT support append operations — you must rewrite the whole object.
ACLs are legacy — exam will push you toward bucket policies.
Cross-account S3 access: BOTH the bucket policy AND the IAM policy must allow it.
Glacier retrieval types: Expedited (1-5 min), Standard (3-5 hours), Bulk (5-12 hours).
Server Access Logging vs CloudTrail: Server Access Logs = S3 API-level detail. CloudTrail = management events + data events (if enabled).
Static website hosting: CORS on S3 is required when browser JS calls the bucket directly.

Common Exam Scenarios

Q: Cheapest way to store backups accessed once a year? → S3 Glacier Deep Archive

Q: S3 objects deleted accidentally, how to protect? → Enable versioning + MFA Delete

Q: Lambda processes S3 uploads but sometimes misses events? → Add SQS between S3 and Lambda as event buffer (Dead Letter Queue on SQS)

Q: How to allow a different AWS account to access your S3 bucket? → Add bucket policy with Principal: {"AWS": "arn:aws:iam::OTHER_ACCOUNT:root"}

Q: Developer wants to enforce all uploads use KMS encryption? → Bucket policy with Deny if s3:x-amz-server-side-encryption != aws:kms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 — Simple Storage Service

What Is It?

Core Concepts

Storage Classes — Choose Based on Access Pattern

Lifecycle Policies — Automate Tiering

Versioning

Encryption

Four Encryption Options

Enforcing Encryption via Bucket Policy

Enforce HTTPS Only

Access Control — Who Can Access What?

Decision Flow

Bucket Policy vs ACL vs IAM

Presigned URLs

S3 Events

Multipart Upload

S3 Transfer Acceleration

CORS Configuration

Replication (CRR & SRR)

S3 Object Lock & WORM

Performance Optimization

S3 Request Rate Limits

S3 Select

Good Practices

Bad Practices

Exam Tips

Common Exam Scenarios

FilesExpand file tree

s3.md

Latest commit

History

s3.md

File metadata and controls

S3 — Simple Storage Service

What Is It?

Core Concepts

Storage Classes — Choose Based on Access Pattern

Lifecycle Policies — Automate Tiering

Versioning

Encryption

Four Encryption Options

Enforcing Encryption via Bucket Policy

Enforce HTTPS Only

Access Control — Who Can Access What?

Decision Flow

Bucket Policy vs ACL vs IAM

Presigned URLs

S3 Events

Multipart Upload

S3 Transfer Acceleration

CORS Configuration

Replication (CRR & SRR)

S3 Object Lock & WORM

Performance Optimization

S3 Request Rate Limits

S3 Select

Good Practices

Bad Practices

Exam Tips

Common Exam Scenarios