|
1 | 1 | # AI JVM Analyzer |
2 | 2 |
|
3 | | -AI-powered JVM performance analyzer using Amazon Bedrock. Receives webhook alerts from monitoring systems (Grafana, CloudWatch), collects thread dumps and profiling data, and generates actionable performance analysis reports. |
| 3 | +AI-powered JVM performance analyzer using Amazon Bedrock. Receives webhook alerts from monitoring systems (Grafana), collects JFR recordings and thread dumps, and generates actionable performance analysis reports with source code references. |
4 | 4 |
|
5 | 5 | ## Architecture |
6 | 6 |
|
7 | 7 | ``` |
8 | | -┌─────────────────────────────────────────────────────────────┐ |
9 | | -│ Monitoring System (Grafana/CloudWatch/Prometheus) │ |
10 | | -│ - Detects high CPU, memory, or thread count alerts │ |
11 | | -└─────────────────────────────────────────────────────────────┘ |
12 | | - │ |
13 | | - ▼ POST /webhook |
14 | | -┌─────────────────────────────────────────────────────────────┐ |
15 | | -│ WebhookController │ |
16 | | -│ - Receives alert payloads with pod name and IP │ |
17 | | -│ - Validates alerts, filters invalid entries │ |
18 | | -└─────────────────────────────────────────────────────────────┘ |
19 | | - │ |
20 | | - ▼ |
21 | | -┌─────────────────────────────────────────────────────────────┐ |
22 | | -│ AnalyzerService │ |
23 | | -│ - Parallel processing with Virtual Threads │ |
24 | | -│ - Fetches thread dump from pod's /actuator/threaddump │ |
25 | | -│ - Retrieves profiling data (flamegraph) from S3 │ |
26 | | -└─────────────────────────────────────────────────────────────┘ |
27 | | - │ |
28 | | - ┌───────────────┴───────────────┐ |
29 | | - ▼ ▼ |
30 | | -┌─────────────────────────┐ ┌─────────────────────────┐ |
31 | | -│ AiService │ │ S3Repository │ |
32 | | -│ - Spring AI + Bedrock │ │ - Fetch profiling data │ |
33 | | -│ - Claude Sonnet 4 │ │ - Store analysis │ |
34 | | -│ - Structured prompts │ │ - Thread dumps │ |
35 | | -└─────────────────────────┘ └─────────────────────────┘ |
| 8 | +Grafana Alert |
| 9 | + │ |
| 10 | + ▼ POST /webhook |
| 11 | +┌──────────────────┐ |
| 12 | +│ WebhookController│ |
| 13 | +└────────┬─────────┘ |
| 14 | + ▼ |
| 15 | +┌──────────────────┐ ┌───────────────┐ |
| 16 | +│ AnalyzerService │────▶│ S3Repository │ |
| 17 | +│ (virtual threads)│ │ fetch JFR, │ |
| 18 | +│ │ │ store results │ |
| 19 | +│ 1. Fetch JFR │ └───────────────┘ |
| 20 | +│ 2. Parse metrics │ |
| 21 | +│ 3. Collapsed stks│ ┌───────────────┐ |
| 22 | +│ 4. HTML flamegrph│────▶│ JfrParser │ |
| 23 | +│ 5. Thread dump │ │ CPU, GC, JVM │ |
| 24 | +│ 6. AI analysis │ └───────────────┘ |
| 25 | +│ 7. Store to S3 │ |
| 26 | +└────────┬─────────┘ ┌───────────────────┐ |
| 27 | + ▼ │ FlamegraphGenerator│ |
| 28 | +┌──────────────────┐ │ collapsed stacks + │ |
| 29 | +│ AiService │ │ HTML flamegraph │ |
| 30 | +│ Spring AI + │ └───────────────────┘ |
| 31 | +│ Amazon Bedrock │ |
| 32 | +│ Claude Sonnet 4 │ |
| 33 | +│ + GitHubSource │ |
| 34 | +│ CodeTool │ |
| 35 | +└──────────────────┘ |
36 | 36 | ``` |
37 | 37 |
|
38 | 38 | ## Project Structure |
39 | 39 |
|
40 | 40 | ``` |
41 | 41 | src/main/java/com/example/ai/jvmanalyzer/ |
42 | | -├── Application.java # Spring Boot entry point, beans config |
43 | | -├── WebhookController.java # REST endpoint for monitoring webhooks |
44 | | -├── AnalyzerService.java # Orchestrates analysis workflow |
45 | | -├── AiService.java # Bedrock integration via Spring AI |
46 | | -└── S3Repository.java # S3 storage for profiling data and results |
| 42 | +├── Application.java # Spring Boot entry point |
| 43 | +├── WebhookController.java # REST endpoint for Grafana webhooks |
| 44 | +├── AnalyzerService.java # Orchestrates analysis pipeline (async, virtual threads) |
| 45 | +├── AiService.java # Amazon Bedrock integration via Spring AI |
| 46 | +├── GitHubSourceCodeTool.java # Spring AI @Tool — fetches source code from GitHub |
| 47 | +├── JfrParser.java # Extracts CPU load, GC heap, JVM info from JFR |
| 48 | +├── FlamegraphGenerator.java # Collapsed stacks + HTML flamegraph via jfr-converter |
| 49 | +└── S3Repository.java # S3 storage for JFR, profiling, analysis artifacts |
47 | 50 | ``` |
48 | 51 |
|
49 | 52 | ## How It Works |
50 | 53 |
|
51 | | -1. Monitoring system detects performance issue (high CPU, thread count, etc.) |
52 | | -2. Alert webhook sent to `/webhook` with pod name and IP address |
53 | | -3. Analyzer fetches thread dump from the pod's actuator endpoint |
54 | | -4. Retrieves latest flamegraph/profiling data from S3 |
55 | | -5. Sends both to Claude Sonnet 4 for analysis |
56 | | -6. Stores thread dump, profiling data, and AI analysis report in S3 |
57 | | - |
58 | | -## Webhook Payload Format |
59 | | - |
60 | | -```json |
61 | | -{ |
62 | | - "alerts": [ |
63 | | - { |
64 | | - "labels": { |
65 | | - "pod": "unicorn-store-spring-abc123", |
66 | | - "instance": "10.0.1.50:8080" |
67 | | - } |
68 | | - } |
69 | | - ] |
70 | | -} |
71 | | -``` |
| 54 | +1. Grafana alert fires when POST request rate exceeds threshold |
| 55 | +2. Webhook sent to `/webhook` with pod name and IP address |
| 56 | +3. AnalyzerService runs asynchronously on a virtual thread: |
| 57 | + - Retrieves latest JFR recording from S3 (with retry for in-progress files) |
| 58 | + - `JfrParser` extracts runtime metrics (CPU load, GC heap, JVM config) from JFR binary |
| 59 | + - `FlamegraphGenerator` produces collapsed stacks text and HTML flamegraph using async-profiler's `jfr-converter` library |
| 60 | + - Fetches thread dump from pod's actuator endpoint |
| 61 | + - `AiService` sends profiling summary + thread dump to Amazon Bedrock (Claude Sonnet 4) |
| 62 | + - If `GITHUB_REPO_URL` is configured, the model uses `GitHubSourceCodeTool` to look up source code of methods found in stack traces |
| 63 | +4. Stores 5 artifacts per analysis to S3 |
| 64 | + |
| 65 | +## Source Code Tool |
72 | 66 |
|
73 | | -## Analysis Report Contents |
| 67 | +When `GITHUB_REPO_URL` is set, `AiService` registers a `GitHubSourceCodeTool` with the `ChatClient`. During analysis, the model can call this tool to fetch source files from the GitHub repository, enabling it to reference specific file paths, line numbers, and provide concrete code fixes. |
74 | 68 |
|
75 | | -The AI generates a structured report including: |
76 | | -- Health status (Healthy/Degraded/Critical) |
77 | | -- Thread analysis with state distribution |
78 | | -- Top 3 critical issues with root cause and fix |
79 | | -- Performance hotspots from flamegraph |
80 | | -- Immediate and short-term recommendations |
| 69 | +- Uses GitHub REST API (`/contents/{path}`) with base64 decoding |
| 70 | +- `GITHUB_REPO_PATH` specifies the application root within the repo |
| 71 | +- `GITHUB_TOKEN` enables access to private repositories (optional for public repos) |
81 | 72 |
|
82 | 73 | ## Dependencies |
83 | 74 |
|
84 | 75 | | Dependency | Version | Purpose | |
85 | 76 | |------------|---------|---------| |
86 | | -| Spring Boot | 4.0.1 | Application framework | |
87 | | -| Spring AI | 1.1.1 | Bedrock integration | |
| 77 | +| Spring Boot | 4.0.2 | Application framework | |
| 78 | +| Spring AI | 1.1.2 | Amazon Bedrock integration | |
88 | 79 | | AWS SDK | 2.40.15 | S3 client | |
89 | | -| Testcontainers | 2.0.3 | Integration testing | |
90 | | -| jqwik | 1.9.3 | Property-based testing | |
91 | | - |
92 | | -## Configuration |
93 | | - |
94 | | -| Property | Default | Description | |
95 | | -|----------|---------|-------------| |
96 | | -| `analyzer.thread-dump.url-template` | `http://{podIp}:8080/actuator/threaddump` | Thread dump endpoint | |
97 | | -| `analyzer.s3.bucket` | `ai-jvm-analyzer-bucket` | S3 bucket for storage | |
98 | | -| `analyzer.s3.prefix.analysis` | `analysis/` | Prefix for analysis results | |
99 | | -| `analyzer.s3.prefix.profiling` | `profiling/` | Prefix for profiling data | |
100 | | -| `spring.ai.bedrock.converse.chat.options.model` | `anthropic.claude-sonnet-4-20250514-v1:0` | Bedrock model | |
| 80 | +| jfr-converter | 4.3 | async-profiler collapsed stacks + flamegraph | |
101 | 81 |
|
102 | 82 | ## Environment Variables |
103 | 83 |
|
104 | 84 | | Variable | Required | Description | |
105 | 85 | |----------|----------|-------------| |
106 | | -| `AWS_REGION` | Yes | AWS region for Bedrock and S3 | |
| 86 | +| `AWS_REGION` | Yes | AWS Region for Amazon Bedrock and S3 | |
107 | 87 | | `AWS_S3_BUCKET` | Yes | S3 bucket name | |
| 88 | +| `GITHUB_REPO_URL` | No | GitHub API URL (e.g. `https://api.github.com/repos/aws-samples/java-on-aws`) | |
| 89 | +| `GITHUB_REPO_PATH` | No | Application root within repo (e.g. `apps/unicorn-store-spring`) | |
| 90 | +| `GITHUB_TOKEN` | No | GitHub PAT with `contents:read` scope (for private repos) | |
| 91 | +| `FLAMEGRAPH_INCLUDE` | No | Regex filter for HTML flamegraph frames (e.g. `.*unicorn.*`). Only affects the visual flamegraph — collapsed stacks sent to the AI model remain unfiltered | |
| 92 | + |
| 93 | +## S3 Storage Layout |
| 94 | + |
| 95 | +``` |
| 96 | +s3://{bucket}/ |
| 97 | +├── profiling/{pod-name}/ |
| 98 | +│ └── profile-{yyyyMMdd}-{HHmmss}.jfr # async-profiler JFR recordings |
| 99 | +└── analysis/ |
| 100 | + ├── {timestamp}.jfr # JFR binary (for re-analysis) |
| 101 | + ├── {timestamp}_profiling_{pod-name}.md # Runtime metrics + collapsed stacks |
| 102 | + ├── {timestamp}_threaddump_{pod-name}.json # Thread dump snapshot |
| 103 | + ├── {timestamp}_flamegraph_{pod-name}.html # Interactive HTML flamegraph |
| 104 | + └── {timestamp}_analysis_{pod-name}.md # AI-generated performance report |
| 105 | +``` |
108 | 106 |
|
109 | 107 | ## Building |
110 | 108 |
|
111 | 109 | ```bash |
112 | | -mvn package # Standard JAR |
113 | | -mvn package -Pnative # Native image (GraalVM 25) |
114 | | -mvn jib:dockerBuild # Container with Jib |
| 110 | +mvn compile jib:build -Dimage={ECR_URI}:latest # Container with Jib |
| 111 | +mvn package # Standard JAR |
115 | 112 | ``` |
116 | 113 |
|
117 | 114 | ## API Endpoints |
118 | 115 |
|
119 | 116 | | Method | Endpoint | Description | |
120 | 117 | |--------|----------|-------------| |
121 | | -| POST | `/webhook` | Receive monitoring alerts | |
| 118 | +| POST | `/webhook` | Receive Grafana alert notifications | |
122 | 119 | | GET | `/actuator/health` | Health check | |
123 | | -| GET | `/actuator/prometheus` | Metrics | |
124 | | - |
125 | | -## S3 Storage Layout |
126 | | - |
127 | | -``` |
128 | | -s3://ai-jvm-analyzer-bucket/ |
129 | | -├── profiling/ |
130 | | -│ └── {pod-name}/ |
131 | | -│ └── profile-{yyyyMMdd}-{timestamp}.html # Flamegraph data |
132 | | -└── analysis/ |
133 | | - ├── {timestamp}_threaddump_{pod-name}.json # Raw thread dump |
134 | | - ├── {timestamp}_profiling_{pod-name}.html # Profiling snapshot |
135 | | - └── {timestamp}_analysis_{pod-name}.md # AI analysis report |
136 | | -``` |
137 | | - |
138 | | -## IAM Permissions Required |
139 | | - |
140 | | -```json |
141 | | -{ |
142 | | - "Version": "2012-10-17", |
143 | | - "Statement": [ |
144 | | - { |
145 | | - "Effect": "Allow", |
146 | | - "Action": [ |
147 | | - "s3:GetObject", |
148 | | - "s3:PutObject", |
149 | | - "s3:ListBucket" |
150 | | - ], |
151 | | - "Resource": [ |
152 | | - "arn:aws:s3:::ai-jvm-analyzer-bucket", |
153 | | - "arn:aws:s3:::ai-jvm-analyzer-bucket/*" |
154 | | - ] |
155 | | - }, |
156 | | - { |
157 | | - "Effect": "Allow", |
158 | | - "Action": "bedrock:InvokeModel", |
159 | | - "Resource": "arn:aws:bedrock:*::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0" |
160 | | - } |
161 | | - ] |
162 | | -} |
163 | | -``` |
| 120 | +| GET | `/actuator/prometheus` | Prometheus metrics | |
0 commit comments