"Donde tus datos viven, respiran y escalan en la nube"
Tiempo de Estudio: ~5 horas
Dificultad: Principiante a Intermedio
Importancia: ⭐⭐⭐⭐⭐ (¡Crítico para todas las aplicaciones!)
El almacenamiento es la base de toda aplicación - entender los servicios de almacenamiento de AWS es esencial porque toda aplicación necesita almacenar y recuperar datos de manera eficiente, segura y rentable.
- 🗄️ Entendiendo el Almacenamiento en la Nube
- 📁 Amazon S3 (Simple Storage Service)
- 💽 Amazon EBS (Elastic Block Store)
- 📂 Amazon EFS (Elastic File System)
- 🌉 AWS Storage Gateway
- ❄️ Soluciones de Archivo y Respaldo
- 🎯 Eligiendo el Almacenamiento Correcto
- 📝 Escenarios del Mundo Real
El almacenamiento en la nube proporciona almacenamiento de datos escalable, duradero y accesible a través de internet, eliminando la necesidad de infraestructura de almacenamiento física.
Piensa en el almacenamiento en la nube como diferentes soluciones de almacenamiento en la vida real:
📁 Almacenamiento de Objetos (S3) = Unidad de Almacenamiento
├── Almacena cualquier cosa de cualquier tamaño
├── Acceso con una clave única
├── Pagas por lo que almacenas
└── Accesible desde cualquier lugar
💽 Almacenamiento de Bloques (EBS) = Disco Duro Personal
├── Conectado a tu computadora (EC2)
├── Almacenamiento crudo que puedes formatear
├── Alto rendimiento para uso activo
└── Un disco, una computadora
📂 Almacenamiento de Archivos (EFS) = Unidad de Red
├── Carpeta compartida accesible por muchos
├── Interfaz estándar de sistema de archivos
├── Múltiples computadoras pueden acceder
└── Crece y se reduce automáticamente
| Tipo de Almacenamiento | Método de Acceso | Caso de Uso | Escalabilidad |
|---|---|---|---|
| Objetos (S3) | REST API/Web | Archivos, respaldos, contenido | Ilimitada |
| Bloques (EBS) | Sistema Operativo | Base de datos, sistemas de archivos | Hasta 64 TiB |
| Archivos (EFS) | Protocolo NFS | Contenido compartido, aplicaciones | Ilimitada |
Amazon S3 is object storage that stores and retrieves any amount of data from anywhere on the web. Think of it as an unlimited digital storage unit where each item has a unique address.
S3 is like a massive digital library:
- 📚 Books (objects) stored on shelves (buckets)
- 📖 Each book has a unique catalog number (key)
- 🏢 Library buildings in different cities (regions)
- 📋 Librarian manages access permissions
- 💰 Pay only for books you store
- 📦 Containers for objects - like folders but at the top level
- 🌍 Globally unique names - no two buckets can have the same name
- 📍 Region-specific - stored in a specific AWS region
- 📋 Naming rules: 3-63 characters, lowercase, no spaces
Bucket Example:
Bucket name: my-company-website-assets
Region: us-east-1
Contents:
├── images/logo.png
├── css/styles.css
├── js/app.js
└── documents/manual.pdf
- 📄 Files stored in buckets - any type, any size
- 🔑 Unique key (name) within the bucket
- 📊 Size limit: 0 bytes to 5 TB per object
- 📋 Metadata: Additional information about the object
Object Example:
Object key: images/products/laptop-2024.jpg
Size: 2.5 MB
Metadata:
├── Content-Type: image/jpeg
├── Last-Modified: 2024-06-17
├── Custom-Tag: product-image
└── Cache-Control: max-age=3600
AWS offers different storage classes optimized for different access patterns and cost requirements:
- 🚀 High performance for frequently accessed data
- 🎯 Use case: Active websites, content distribution, analytics
- 💰 Cost: Higher storage cost, lower access cost
- ⚡ Availability: 99.99%
- 🛡️ Durability: 99.999999999% (11 9's)
- 🤖 Automatically moves objects between access tiers
- 📊 Monitors access patterns and optimizes costs
- 🎯 Use case: Unknown or changing access patterns
- 💰 Cost: Small monitoring fee + optimized storage costs
- 🔄 Automatic: No operational overhead
- 📉 Lower storage cost than Standard
- 💰 Higher retrieval cost than Standard
- 🎯 Use case: Backups, disaster recovery, long-term storage
- ⏰ Minimum storage: 30 days
- 📊 Retrieval time: Immediate
- 💰 Lower cost than Standard-IA
- 📍 Single AZ storage (lower redundancy)
- 🎯 Use case: Recreatable data, secondary backup copies
⚠️ Risk: Data lost if AZ fails- 📊 Cost savings: Up to 20% vs Standard-IA
- 🧊 Archive storage with instant access
- 💰 Lower storage cost than IA classes
- 🎯 Use case: Archive data accessed once per quarter
- ⏰ Minimum storage: 90 days
- 📊 Retrieval time: Milliseconds
- 🧊 Long-term archive storage
- 💰 Very low storage cost
- 🎯 Use case: Data backup, compliance archives
- ⏰ Minimum storage: 90 days
- 📊 Retrieval time: 1-12 hours
- 🏔️ Lowest cost storage class
- 💰 Cheapest long-term storage
- 🎯 Use case: Compliance, digital preservation
- ⏰ Minimum storage: 180 days
- 📊 Retrieval time: 12-48 hours
How often do you access data?
├── Daily → S3 Standard
├── Weekly/Monthly → S3 Standard-IA
├── Quarterly → S3 Glacier Instant Retrieval
├── Yearly → S3 Glacier Flexible Retrieval
├── Almost never → S3 Glacier Deep Archive
└── Don't know → S3 Intelligent-Tiering
Automatically transition objects between storage classes based on rules you define, optimizing costs over time.
Document Lifecycle:
Day 0: Upload to S3 Standard
├── Active editing and sharing
Day 30: Transition to S3 Standard-IA
├── Reference document, occasional access
Day 90: Transition to S3 Glacier Instant Retrieval
├── Archive document, quarterly compliance checks
Day 365: Transition to S3 Glacier Deep Archive
├── Long-term legal retention
Day 2555 (7 years): Delete
├── Legal retention period complete
- 🔐 Bucket policies - JSON documents defining access rules
- 👥 IAM policies - User and role-based permissions
- 🎫 Access Control Lists (ACLs) - Object-level permissions
- 🔒 Pre-signed URLs - Temporary access to objects
- 🔐 Server-side encryption (SSE)
- SSE-S3: AWS managed keys
- SSE-KMS: AWS KMS managed keys
- SSE-C: Customer provided keys
- 🔒 Client-side encryption - Encrypt before uploading
- 📊 CloudTrail - API call logging
- 📈 CloudWatch - Metrics and monitoring
- 🔍 Access logging - Detailed request logs
- 🚨 Event notifications - SNS, SQS, Lambda triggers
Website Architecture:
├── HTML, CSS, JS files stored in S3
├── CloudFront CDN for global delivery
├── Route 53 for custom domain
└── Certificate Manager for HTTPS
Benefits:
- No web servers to manage
- Scales automatically
- Global performance
- Cost-effective for static content
Mobile App Storage:
├── User profile pictures
├── App data backups
├── Content downloads
└── Log files
Features:
- Direct upload from mobile apps
- Pre-signed URLs for secure access
- Automatic scaling
- Integration with mobile SDKs
Big Data Analytics:
├── Raw data ingestion from multiple sources
├── Data processing with EMR/Glue
├── Analytics with Athena/Redshift
└── Machine learning with SageMaker
Benefits:
- Store structured and unstructured data
- Scale to petabytes
- Integrate with analytics services
- Cost-effective data retention
Amazon EBS provides persistent block storage for EC2 instances. Think of it as a virtual hard drive that you can attach to your EC2 instances.
EBS is like an external hard drive:
- 💻 Plugs into your computer (attaches to EC2)
- 💾 Stores your files persistently (data survives instance restarts)
- 🔄 Can be unplugged and moved (detach and reattach)
- 📊 Different sizes and speeds (various volume types)
- 💰 Pay for storage capacity you provision
- ⚡ Balanced price and performance
- 📊 Baseline: 3,000 IOPS, 125 MiB/s throughput
- 🎯 Use case: Boot volumes, small-medium databases, dev/test
- 💰 Cost: $0.08/GB/month
- 📈 Scalable: Up to 16,000 IOPS, 1,000 MiB/s
- 🚀 High performance for critical workloads
- 📊 Up to 64,000 IOPS per volume
- 🎯 Use case: Large databases, I/O-intensive applications
- 💰 Cost: Higher than gp3
- 🛡️ Durability: 99.999% (higher than other types)
- 📈 High throughput for large sequential workloads
- 📊 Up to 500 MiB/s throughput
- 🎯 Use case: Big data, data warehouses, log processing
- 💰 Cost: Lower than SSD options
⚠️ Limitation: Cannot be boot volume
- 💰 Lowest cost option
- 📊 Up to 250 MiB/s throughput
- 🎯 Use case: Infrequently accessed data
- 💾 Storage: Large volumes of data
⚠️ Limitation: Cannot be boot volume
| Type | Performance | Use Case | Cost |
|---|---|---|---|
| gp3 | Balanced | General purpose | Medium |
| io2 | High IOPS | Databases | High |
| st1 | High throughput | Big data | Low |
| sc1 | Low cost | Archives | Lowest |
- 📸 Point-in-time backups of EBS volumes
- 📦 Stored in S3 (you don't see the bucket)
- 🔄 Incremental backups - only changed blocks
- 🌍 Cross-region copying for disaster recovery
Snapshot Example:
Volume Snapshot Timeline:
Day 1: Full snapshot (10 GB)
Day 2: Incremental snapshot (2 GB changed)
Day 3: Incremental snapshot (1 GB changed)
Storage used: 13 GB total
Recovery: Can restore any point in time
- 🔐 Encryption at rest using AWS KMS
- 🔒 Encryption in transit between instance and volume
- 🔑 Encrypted snapshots automatically
- ⚡ No performance impact
- 🔗 Attach single volume to multiple instances
- 🎯 Use case: Shared file systems, cluster applications
⚠️ Requirement: Cluster-aware file system- 📍 Limitation: Same AZ only
Database Server Configuration:
├── Root volume (gp3): Operating system
├── Database volume (io2): High IOPS for database files
├── Log volume (gp3): Transaction logs
└── Backup volume (sc1): Database backups
Benefits:
- Consistent high performance
- Separate volumes for different workloads
- Easy backup with snapshots
- Encryption for sensitive data
Application Server Setup:
├── Boot volume: OS and applications
├── Data volume: Application data
├── Logs volume: Application logs
└── Temp volume: Temporary processing
Features:
- Persistent storage across reboots
- Resize volumes as needed
- Snapshot for backup
- Move between instances
Amazon EFS provides fully managed NFS (Network File System) that can be mounted on multiple EC2 instances simultaneously.
EFS is like a company shared network drive:
- 📁 Everyone can access the same files
- 🔄 Real-time sharing - changes visible immediately
- 📈 Grows automatically as you add more files
- 💻 Works from any computer in the network
- 🔒 Permissions control who can access what
- 🤖 No servers to manage - AWS handles infrastructure
- 📈 Automatic scaling - grows and shrinks automatically
- 🛡️ Built-in redundancy - stored across multiple AZs
- 💰 Pay only for storage used
- 📁 Standard file system interface - works like traditional file systems
- 🔐 File permissions - standard Unix permissions
- 🔗 Symbolic links and hard links supported
- 📊 Metadata preservation
General Purpose:
- ⚡ Low latency for most workloads
- 📊 Up to 7,000 file operations/second
- 🎯 Best for: Most applications
Max I/O:
- 📈 Higher throughput and operations per second
- ⏰ Slightly higher latency
- 🎯 Best for: Applications that can tolerate higher latency
- 🚀 Frequently accessed files
- ⚡ Lowest latency
- 💰 Higher storage cost
- 🎯 Use case: Active application data
- 💰 Lower storage cost
- ⏰ Higher access cost
- 🎯 Use case: Files accessed less than once per month
- 🔄 Automatic transition available
Multi-Server Web Application:
├── Multiple EC2 instances behind load balancer
├── Shared EFS for website content
├── All servers serve same content
└── Content updates visible on all servers immediately
Benefits:
- Consistent content across servers
- Easy content management
- Scale web servers independently
- Shared logs and configuration
Media Company Workflow:
├── Video editors access shared project files
├── Rendering farm processes files in parallel
├── Multiple users collaborate simultaneously
└── Automatic backup of all content
Features:
- Concurrent access by multiple users
- Standard file system operations
- No capacity planning required
- Built-in durability
Scientific Computing Cluster:
├── Compute nodes mount shared file system
├── Input data accessible to all nodes
├── Results written to shared storage
└── Post-processing from any node
Advantages:
- Shared data access across cluster
- Scales with compute requirements
- Standard NFS interface
- High throughput for large files
AWS Storage Gateway is a hybrid cloud storage service that connects on-premises environments to AWS cloud storage services.
Storage Gateway is like a bridge connecting two cities:
- 🏠 On-premises city (your data center)
- ☁️ Cloud city (AWS)
- 🌉 Bridge (Storage Gateway) enables travel between them
- 🚛 Trucks (data) can move in both directions
- 🎫 Toll system (seamless integration) manages traffic
- 📁 NFS/SMB file shares backed by S3
- 🔄 Files stored as objects in S3 buckets
- 💾 Local cache for frequently accessed files
- 🎯 Use case: File shares, content distribution
Stored Volumes:
- 💽 Primary data on-premises, backup to S3
- 📊 1 GB to 16 TB per volume
- 🎯 Use case: Backup existing applications
Cached Volumes:
- ☁️ Primary data in S3, cache on-premises
- 📊 1 GB to 32 TB per volume
- 🎯 Use case: Expand storage to cloud
- 📼 Virtual Tape Library for backup applications
- 🤖 Works with existing backup software
- 📦 Virtual tapes stored in S3 and Glacier
- 🎯 Use case: Replace physical tape infrastructure
Hybrid Backup Strategy:
├── On-premises backup software unchanged
├── Storage Gateway presents virtual tape library
├── Backups written to "virtual tapes"
├── Tapes automatically stored in S3/Glacier
└── Restore from cloud when needed
Benefits:
- Keep existing backup processes
- Unlimited cloud storage capacity
- Cost-effective long-term retention
- No physical tape management
Gradual Cloud Migration:
├── File Gateway provides NFS/SMB shares
├── Applications see normal file system
├── Files automatically synced to S3
├── Gradual migration to cloud-native apps
└── Maintain on-premises access during transition
Features:
- Transparent cloud storage
- Local caching for performance
- Standard file operations
- Easy migration path
Glacier is AWS's long-term archival storage service, designed for data you access infrequently but need to retain for compliance or backup purposes.
Glacier is like storing items in a secure mountain cave:
- 🏔️ Very secure and protected from disasters
- 💰 Very cheap to store things
- ⏰ Takes time to retrieve items when needed
- 📦 Perfect for things you rarely need
- 🔐 Long-term safety guaranteed
AWS Backup is a centralized backup service that makes it easy to back up data across AWS services.
- 🔄 Centralized backup across multiple AWS services
- 📅 Automated backup scheduling
- 🔐 Encryption and access control
- 📊 Backup monitoring and reporting
- 🌍 Cross-region backup for disaster recovery
AWS Backup supports:
├── EC2 instances
├── EBS volumes
├── EFS file systems
├── RDS databases
├── DynamoDB tables
├── S3 buckets
└── Storage Gateway volumes
- 📁 File system operations → EFS
- 🌐 Web/REST API → S3
- 💽 Block device (disk) → EBS
- 🌉 Hybrid (on-premises + cloud) → Storage Gateway
- 🚀 Very frequently (daily) → S3 Standard, EBS gp3
- 📅 Occasionally (monthly) → S3 Standard-IA
- 📆 Rarely (quarterly) → S3 Glacier Instant Retrieval
- 🏔️ Almost never (yearly) → S3 Glacier Deep Archive
- ⚡ High IOPS → EBS io2
- 📈 High throughput → EBS st1, EFS Max I/O
- 🔄 Balanced → EBS gp3, EFS General Purpose
- 💰 Cost-optimized → EBS sc1, S3 IA classes
- 1️⃣ Single EC2 instance → EBS
- 👥 Multiple EC2 instances → EFS or S3
- 🌐 Internet access → S3
- 🏢 On-premises systems → Storage Gateway
| Use Case | Primary Choice | Alternative | Why |
|---|---|---|---|
| Database files | EBS io2 | EBS gp3 | High IOPS, single attachment |
| Website assets | S3 + CloudFront | EFS | Global distribution, web access |
| Shared application data | EFS | S3 | Multi-instance access, POSIX |
| Backup and archive | S3 Glacier | AWS Backup | Cost-effective long-term storage |
| Big data processing | S3 | EBS st1 | Massive scale, analytics integration |
| Development environment | EBS gp3 | S3 | Balanced performance, cost |
Architecture:
├── Web Tier: EC2 with EBS (gp3) for OS
├── App Tier: EC2 with EFS for shared files
├── Database Tier: EC2 with EBS (io2) for database
└── Static Assets: S3 with CloudFront CDN
Data Flow:
User → CloudFront → S3 (static content)
User → Load Balancer → EC2 (dynamic content)
EC2 App → EFS (shared files)
EC2 App → EC2 DB with EBS (database)
Architecture:
├── Raw Data Ingestion: S3 Standard
├── Processed Data: S3 Standard-IA
├── Archive Data: S3 Glacier
├── Analytics: EMR with EBS
└── Backup: S3 Cross-Region Replication
Benefits:
- Scale to petabytes
- Cost optimization with storage classes
- Analytics integration
- Disaster recovery
On-Premises:
├── File Gateway: NFS/SMB shares
├── Volume Gateway: iSCSI volumes
├── Tape Gateway: Virtual tape library
└── Local cache for performance
AWS Cloud:
├── S3: Object storage backend
├── EBS: Volume snapshots
├── Glacier: Long-term archives
└── CloudWatch: Monitoring
Benefits:
- Gradual cloud adoption
- Keep existing workflows
- Cloud economics
- Disaster recovery
Situation:
- Streaming service with global audience
- Thousands of video files
- Variable viewing patterns
- Need cost optimization
Requirements:
- Store large video files
- Global content delivery
- Optimize costs based on popularity
- Handle peak viewing times
Storage Strategy:
Content Lifecycle:
├── New releases: S3 Standard + CloudFront
├── Popular content: S3 Standard
├── Seasonal content: S3 Standard-IA
├── Old content: S3 Glacier Instant Retrieval
└── Archive content: S3 Glacier Deep Archive
Cost Optimization:
- Intelligent-Tiering for unknown patterns
- Lifecycle policies for automatic transitions
- CloudFront caching reduces S3 costs
- Cross-region replication for global access
Benefits:
- 70% storage cost reduction with lifecycle
- Global low-latency access
- Automatic cost optimization
- Scales to unlimited content
Situation:
- Hospital with electronic health records
- Compliance requirements (7-year retention)
- Need secure access from multiple systems
- Disaster recovery requirements
Requirements:
- HIPAA compliance
- Long-term retention
- Multi-system access
- Backup and disaster recovery
Storage Solution:
Healthcare Data Architecture:
├── Active patient records: EFS (encrypted)
├── Medical imaging: S3 Standard (encrypted)
├── Archive records: S3 Glacier (compliance)
├── Database storage: EBS io2 (encrypted)
└── Backup: AWS Backup to S3
Security Features:
- KMS encryption for all storage
- IAM policies for access control
- CloudTrail for audit logging
- VPC endpoints for secure access
Compliance Benefits:
- HIPAA-eligible services
- Encryption at rest and in transit
- Complete audit trail
- Long-term retention capability
Situation:
- Factory with IoT sensors
- Real-time data processing
- Historical data analysis
- Predictive maintenance
Requirements:
- Ingest streaming sensor data
- Real-time processing
- Long-term analytics
- Cost-effective storage
Data Strategy:
Manufacturing Data Pipeline:
├── Real-time ingestion: Kinesis → S3
├── Hot data (30 days): S3 Standard
├── Warm data (1 year): S3 Standard-IA
├── Cold data (7+ years): S3 Glacier
├── Processing: EMR with EBS st1
└── Results: RDS with EBS io2
Analytics Workflow:
Sensors → Kinesis → Lambda → S3
S3 → EMR (batch processing) → Analytics
S3 → Athena (ad-hoc queries)
S3 → QuickSight (dashboards)
Value Delivered:
- Real-time operational insights
- Predictive maintenance capabilities
- Historical trend analysis
- 80% storage cost optimization
- Simple = Easy to use
- Storage = Unlimited capacity
- Service = Fully managed
- Elastic = Resizable
- Block = Raw disk storage
- Store = Persistent storage
- Elastic = Auto-scaling
- File = Standard file system
- System = Shared across instances
Need storage? Start here:
├── Multiple instances need access?
│ ├── Yes → File system interface needed?
│ │ ├── Yes → EFS
│ │ └── No → S3
│ └── No → High performance needed?
│ ├── Yes → EBS (io2 or gp3)
│ └── No → Infrequent access?
│ ├── Yes → S3 IA classes
│ └── No → S3 Standard
└── Hybrid cloud needed? → Storage Gateway
S3 Storage Classes by Cost (Low to High):
Deep Archive < Glacier < One Zone-IA < Standard-IA < Standard
S3 Storage Classes by Access Speed (Fast to Slow):
Standard = IA = One Zone-IA > Glacier Instant > Glacier > Deep Archive
Antes de proceder, asegúrate de poder:
- Explicar las diferencias entre almacenamiento de objetos, bloques y archivos
- Elegir clases de almacenamiento S3 apropiadas basadas en patrones de acceso
- Entender tipos de volúmenes EBS y sus casos de uso
- Saber cuándo usar EFS para almacenamiento de archivos compartido
- Identificar soluciones de almacenamiento híbrido con Storage Gateway
- Diseñar arquitecturas de almacenamiento para diferentes escenarios
- Entender estrategias de optimización de costos para almacenamiento
Una empresa necesita almacenar logs de aplicación que se acceden frecuentemente durante el primer mes, luego raramente se acceden por propósitos de cumplimiento durante 7 años. ¿Cuál es la estrategia de almacenamiento S3 más costo-efectiva?
A) Almacenar todo en S3 Standard B) Usar S3 Intelligent-Tiering C) Usar políticas de ciclo de vida de S3 para hacer transición de datos D) Almacenar todo en S3 Glacier Deep Archive
💡 Haz clic para ver la Respuesta
Respuesta: C) Usar políticas de ciclo de vida de S3 para hacer transición de datos
Explicación: Las políticas de ciclo de vida pueden hacer transición automáticamente de objetos desde S3 Standard (primeros 30 días) a S3 Standard-IA (acceso menos frecuente) y eventualmente a S3 Glacier o Deep Archive para retención a largo plazo. Esto proporciona la estructura de costos óptima para el patrón de acceso descrito.
Una aplicación web ejecutándose en múltiples instancias EC2 necesita almacenamiento compartido para archivos de configuración que todas las instancias deben acceder. ¿Qué servicio de almacenamiento es más apropiado?
A) Amazon S3 B) Amazon EBS C) Amazon EFS D) AWS Storage Gateway
💡 Haz clic para ver la Respuesta
Respuesta: C) Amazon EFS
Explicación: EFS proporciona un sistema de archivos compartido que puede ser montado por múltiples instancias EC2 simultáneamente. EBS solo puede ser adjuntado a una instancia a la vez, mientras que S3 requeriría cambios en la aplicación para usar APIs de almacenamiento de objetos en lugar de operaciones de sistema de archivos.
¿Qué tipo de volumen EBS proporciona el mayor rendimiento IOPS para una base de datos crítica?
A) SSD de Propósito General (gp3) B) SSD de IOPS Aprovisionadas (io2) C) HDD Optimizado para Rendimiento (st1) D) HDD Frío (sc1)
💡 Haz clic para ver la Respuesta
Respuesta: B) SSD de IOPS Aprovisionadas (io2)
Explicación: El SSD de IOPS Aprovisionadas (io2) está diseñado para aplicaciones intensivas en E/S como bases de datos y puede proporcionar hasta 64,000 IOPS por volumen con alta durabilidad, convirtiéndolo en la mejor opción para cargas de trabajo de bases de datos críticas.
Ahora que entiendes dónde almacenar tus datos, exploremos cómo conectar tus servicios y entregar contenido a usuarios alrededor del mundo.
🎯 Siguiente Capítulo: Servicios de Red
¡Aprende sobre VPCs, balanceadores de carga, CDNs y cómo construir arquitecturas de red seguras y escalables!
🎉 ¡Excelente progreso! Ahora entiendes la base del almacenamiento de datos en AWS. A continuación, exploraremos cómo todos estos servicios se conectan y comunican.