Amazon CloudWatch is a comprehensive monitoring and management service designed for AWS and hybrid cloud applications. This guide covers everything from basic concepts to advanced configurations, helping you leverage CloudWatch for performance monitoring, troubleshooting, and operational insights.
- Amazon CloudWatch is a monitoring and observability service for AWS resources and custom applications.
- Provides actionable insights through metrics, logs, alarms, and dashboards.
- Supports both infrastructure and application-level monitoring.
- Metrics: Collect and monitor key performance data.
- Logs: Aggregate, analyze, and search logs.
- Alarms: Set thresholds for metrics to trigger automated actions.
- Dashboards: Visualize data in real time.
- CloudWatch Events: Trigger actions based on changes in AWS resources.
- Data Sources:
- AWS Services: EC2, RDS, Lambda, etc.
- On-premises servers or hybrid setups using CloudWatch Agent.
- Core Components:
- Metrics: Quantifiable data points (e.g., CPU utilization).
- Logs: Application and system logs.
- Alarms: Notifications or automated responses.
- Dashboards: Custom visualizations.
- Insights: Advanced log analytics.
- Go to the AWS Management Console.
- Navigate to CloudWatch under the Management & Governance section.
To monitor custom metrics or on-premises resources:
-
Install the CloudWatch Agent on your instance:
sudo yum install amazon-cloudwatch-agent
-
Configure the agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
-
Start the agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent
Attach the CloudWatchFullAccess policy to the IAM role or user managing CloudWatch.
- In the CloudWatch console, go to Metrics.
- Select a namespace (e.g.,
AWS/EC2,AWS/Lambda). - Choose metrics like
CPUUtilization,DiskWriteOps, etc.
- EC2:
CPUUtilizationDiskReadBytesNetworkIn/Out
- RDS:
DatabaseConnectionsReadIOPSWriteLatency
- Lambda:
InvocationsDurationErrors
To send custom metrics:
-
Install the AWS CLI.
-
Publish a metric:
aws cloudwatch put-metric-data --namespace "CustomNamespace" --metric-name "MetricName" --value 100
- Navigate to Logs in the CloudWatch console.
- Create a Log Group (e.g.,
/aws/lambda/my-function). - Each application/service writes to a Log Stream under the group.
- Go to Logs → Select a log group.
- Click Actions → Export data to Amazon S3.
- Configure the export with the desired time range.
-
Navigate to Logs Insights.
-
Write queries for analysis:
fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20
- Go to Alarms in the CloudWatch console.
- Click Create Alarm.
- Select a metric (e.g.,
CPUUtilization). - Set a threshold (e.g.,
> 80%for 5 minutes). - Choose an action (e.g., send an SNS notification).
- OK: Metric is within the defined threshold.
- ALARM: Metric breaches the threshold.
- INSUFFICIENT DATA: No data available.
- Composite Alarms: Combine multiple alarms.
- Actions:
- Notify via SNS.
- Trigger Lambda functions.
- Stop/start EC2 instances.
- Go to Dashboards in the CloudWatch console.
- Click Create Dashboard.
- Add widgets:
- Line for metrics.
- Number for single values.
- Text for notes.
- Choose metrics from different namespaces.
- Configure time ranges and granularity.
- EC2 Metrics: CPU, Disk, Network.
- RDS Metrics: Connections, IOPS.
- Lambda Metrics: Invocations, Errors.
- Navigate to Rules under Events in the CloudWatch console.
- Create a rule with an event pattern (e.g., EC2 state change).
- Add a target (e.g., SNS, Lambda, Step Functions).
-
Event Pattern:
{ "source": ["aws.ec2"], "detail-type": ["EC2 Instance State-change Notification"], "detail": { "state": ["stopped"] } } -
Target: Send an SNS notification.
- Create a cross-account role with permissions to access CloudWatch in the target account.
- Use the
CloudWatch:ListMetricsandCloudWatch:GetMetricDataAPIs.
Enable anomaly detection for metrics:
- Go to Metrics → Select a metric.
- Click Actions → Enable anomaly detection.
Perform calculations across metrics:
-
Example: Combine CPU utilization across instances.
(m1+m2)/2
- Use
console.log()to write logs to CloudWatch. - Monitor Lambda-specific metrics like
ErrorsandThrottles.
- Enable CloudWatch Container Insights for detailed monitoring.
- Use
awslogsdriver to send container logs to CloudWatch.
- Use DataDog or Grafana for enhanced visualization.
- Integrate CloudWatch metrics into these platforms using APIs.
-
Set retention policies for logs to reduce costs:
aws logs put-retention-policy --log-group-name "/aws/lambda/my-function" --retention-in-days 30
- Use IAM policies to restrict access to specific metrics, logs, or dashboards.
- Metrics: Charged per metric, per month.
- Logs:
- Ingestion: Cost per GB ingested.
- Storage: Cost per GB stored.
- Dashboards: Charged per dashboard, per month.
- Use metric filters to limit data collection.
- Set shorter retention periods for logs.
-
Organize Log Groups:
- Use consistent naming conventions (e.g.,
/application/environment/service).
- Use consistent naming conventions (e.g.,
-
Use Alarms Wisely:
- Avoid too many alarms to prevent alert fatigue.
- Use composite alarms to group related metrics.
-
Automate Monitoring:
- Automate alert creation and dashboards using CloudFormation or Terraform.
-
Optimize Log Storage:
- Export logs to S3 for long-term storage and analysis.
-
Enable Anomaly Detection:
- Automate anomaly detection for critical metrics.
