Skip to content

Commit ef4c743

Browse files
author
Yuriy Bezsonov
committed
feat(perf-platform): add CloudWatch datasource and Latency Metrics dashboard
Replace the perf_profile_cpu_ratio Prometheus alert with an ALB-based path: Grafana now reads CloudWatch metrics through a dedicated read-only Pod Identity role, and a "Latency Metrics" dashboard surfaces ALB p99 TargetResponseTime, request rate, and 5xx error counts for whichever ALB(s) participants deploy. The ServiceLatency alert rule itself is created by the workshop chapter (Ch 4) because it depends on the participant-deployed ALB ARN, which only exists after the unicorn-store-spring workload rolls out. CDK - PerfPlatform construct adds grafana-cloudwatch-pod-role with cloudwatch:GetMetricData, cloudwatch:GetMetricStatistics, cloudwatch:ListMetrics, cloudwatch:DescribeAlarms*, tag and ec2 read-only permissions, trusted by pods.eks.amazonaws.com. perf-platform.sh - Bind the Grafana ServiceAccount to grafana-cloudwatch-pod-role via EKS Pod Identity, then restart Grafana so the credential attaches to a fresh pod. - Provision the CloudWatch datasource via the Grafana sidecar ConfigMap pattern (grafana_datasource=1). - Provision the "Latency Metrics" dashboard via the Grafana HTTP API directly into the existing "Workshop Dashboards" folder. The dashboard JSON is embedded in the script as a heredoc to keep all setup-time provisioning in one file. - The dashboard uses CloudWatch SEARCH() expressions so it auto-discovers any ALB tagged with the AWS/ApplicationELB namespace - no pre-baked LB names, works for both the EKS ingress ALB and the ECS service ALB once participants deploy. - Remove the old perf_profile_cpu_ratio Prometheus alert and the ALERT_RULE_TITLE env var. - Update the summary block to reflect the new state. Verified end-to-end on the live cluster: pod identity attached, CloudWatch datasource shows both ALBs in dimension-values queries, all four panels render real data under load.
1 parent e6daf23 commit ef4c743

2 files changed

Lines changed: 327 additions & 63 deletions

File tree

infra/cdk/src/main/java/sample/com/constructs/PerfPlatform.java

Lines changed: 49 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@
88

99
/**
1010
* PerfPlatform construct for the agentic performance platform (perf-analyzer module).
11-
* Creates three IAM roles used by the platform components on Amazon EKS:
12-
* - perf-analyzer-eks-pod-role (perf-analyzer Spring Boot service)
13-
* - perf-collector-eks-pod-role (perf-collector DaemonSet)
14-
* - pyroscope-eks-pod-role (Pyroscope server, for S3-backed storage)
11+
* Creates four IAM roles used by the platform components on Amazon EKS:
12+
* - perf-analyzer-eks-pod-role (perf-analyzer Spring Boot service)
13+
* - perf-collector-eks-pod-role (perf-collector DaemonSet)
14+
* - pyroscope-eks-pod-role (Pyroscope server, for S3-backed storage)
15+
* - grafana-cloudwatch-pod-role (Grafana, to read ALB metrics from CloudWatch)
1516
*
1617
* On Amazon ECS Fargate the collector sidecar runs inside the target task and
1718
* reuses that task's existing role — we add S3-write for profiling artifacts to
@@ -28,6 +29,7 @@ public class PerfPlatform extends Construct {
2829
private final Role perfAnalyzerEksPodRole;
2930
private final Role perfCollectorEksPodRole;
3031
private final Role pyroscopeEksPodRole;
32+
private final Role grafanaCloudwatchPodRole;
3133

3234
public static class PerfPlatformProps {
3335
private Bucket workshopBucket;
@@ -57,6 +59,7 @@ public PerfPlatform(final Construct scope, final String id, final PerfPlatformPr
5759
this.perfAnalyzerEksPodRole = createAnalyzerEksPodRole(props);
5860
this.perfCollectorEksPodRole = createCollectorEksPodRole(props);
5961
this.pyroscopeEksPodRole = createPyroscopeEksPodRole(props);
62+
this.grafanaCloudwatchPodRole = createGrafanaCloudwatchPodRole();
6063
grantProfilingWriteToUnicornEcsTaskRole(props);
6164
}
6265

@@ -130,6 +133,44 @@ private Role createPyroscopeEksPodRole(PerfPlatformProps props) {
130133
return role;
131134
}
132135

136+
/**
137+
* Grafana CloudWatch pod role.
138+
* Trusts pods.eks.amazonaws.com (Pod Identity).
139+
* Grants the Grafana ServiceAccount in the monitoring namespace read-only
140+
* access to CloudWatch metrics so the perf-platform alert rule and the
141+
* Latency Metrics dashboard can query ALB TargetResponseTime, RequestCount,
142+
* and HTTPCode_Target_5XX_Count for whichever ALB(s) participants deploy
143+
* during the workshop.
144+
*/
145+
private Role createGrafanaCloudwatchPodRole() {
146+
ServicePrincipal podsPrincipal = ServicePrincipal.Builder.create("pods.eks.amazonaws.com").build();
147+
148+
Role role = Role.Builder.create(this, "GrafanaCloudwatchPodRole")
149+
.roleName("grafana-cloudwatch-pod-role")
150+
.assumedBy(podsPrincipal)
151+
.description("Role for Grafana to read CloudWatch metrics for the perf-platform Latency Metrics dashboard and ServiceLatency alert")
152+
.build();
153+
154+
addTagSession(role);
155+
// Standard CloudWatch read-only set used by Grafana's CloudWatch datasource.
156+
role.addToPolicy(PolicyStatement.Builder.create()
157+
.effect(Effect.ALLOW)
158+
.actions(List.of(
159+
"cloudwatch:GetMetricData",
160+
"cloudwatch:GetMetricStatistics",
161+
"cloudwatch:ListMetrics",
162+
"cloudwatch:DescribeAlarmsForMetric",
163+
"cloudwatch:DescribeAlarmHistory",
164+
"cloudwatch:DescribeAlarms",
165+
"tag:GetResources",
166+
"ec2:DescribeRegions",
167+
"ec2:DescribeTags"
168+
))
169+
.resources(List.of("*"))
170+
.build());
171+
return role;
172+
}
173+
133174
/**
134175
* Grant the Unicorn ECS task role permissions the perf-collector sidecar needs.
135176
* Attaches to the existing task role so the sidecar runs under the task's role
@@ -269,4 +310,8 @@ public Role getPerfCollectorEksPodRole() {
269310
public Role getPyroscopeEksPodRole() {
270311
return pyroscopeEksPodRole;
271312
}
313+
314+
public Role getGrafanaCloudwatchPodRole() {
315+
return grafanaCloudwatchPodRole;
316+
}
272317
}

0 commit comments

Comments
 (0)