Skip to content

Commit 0a5361c

Browse files
author
Yuriy Bezsonov
committed
feat(ai-jvm-analyzer): enhance JVM analyzer with JFR parsing and flamegraph generation
1 parent 72c0562 commit 0a5361c

10 files changed

Lines changed: 583 additions & 323 deletions

File tree

apps/ai-jvm-analyzer/README.md

Lines changed: 79 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -1,163 +1,120 @@
11
# AI JVM Analyzer
22

3-
AI-powered JVM performance analyzer using Amazon Bedrock. Receives webhook alerts from monitoring systems (Grafana, CloudWatch), collects thread dumps and profiling data, and generates actionable performance analysis reports.
3+
AI-powered JVM performance analyzer using Amazon Bedrock. Receives webhook alerts from monitoring systems (Grafana), collects JFR recordings and thread dumps, and generates actionable performance analysis reports with source code references.
44

55
## Architecture
66

77
```
8-
┌─────────────────────────────────────────────────────────────┐
9-
│ Monitoring System (Grafana/CloudWatch/Prometheus)
10-
│ - Detects high CPU, memory, or thread count alerts │
11-
─────────────────────────────────────────────────────────────┘
12-
13-
▼ POST /webhook
14-
┌─────────────────────────────────────────────────────────────┐
15-
│ WebhookController
16-
- Receives alert payloads with pod name and IP
17-
- Validates alerts, filters invalid entries
18-
└─────────────────────────────────────────────────────────────┘
19-
20-
21-
┌─────────────────────────────────────────────────────────────┐
22-
AnalyzerService
23-
- Parallel processing with Virtual Threads
24-
- Fetches thread dump from pod's /actuator/threaddump
25-
- Retrieves profiling data (flamegraph) from S3
26-
└─────────────────────────────────────────────────────────────┘
27-
28-
┌──────────────────────────────┐
29-
30-
┌─────────────────────────┐ ┌─────────────────────────┐
31-
AiService │ │ S3Repository
32-
- Spring AI + Bedrock │ │ - Fetch profiling data
33-
- Claude Sonnet 4 │ │ - Store analysis
34-
- Structured prompts │ │ - Thread dumps
35-
└─────────────────────────┘ └─────────────────────────
8+
Grafana Alert
9+
10+
▼ POST /webhook
11+
──────────────────
12+
│ WebhookController
13+
└────────┬─────────┘
14+
15+
┌──────────────────┐ ┌───────────────┐
16+
AnalyzerService │────▶│ S3Repository
17+
(virtual threads)│ │ fetch JFR,
18+
│ │ │ store results │
19+
│ 1. Fetch JFR └───────────────┘
20+
│ 2. Parse metrics │
21+
│ 3. Collapsed stks│ ┌───────────────┐
22+
4. HTML flamegrph│────▶│ JfrParser
23+
5. Thread dump │ │ CPU, GC, JVM
24+
6. AI analysis │ └───────────────┘
25+
7. Store to S3
26+
└─────────────────┘ ┌───────────────────
27+
│ FlamegraphGenerator
28+
┌──────────────────┐ │ collapsed stacks + │
29+
│ AiService │ HTML flamegraph
30+
│ Spring AI + │ └───────────────────
31+
Amazon Bedrock
32+
Claude Sonnet 4
33+
+ GitHubSource
34+
CodeTool
35+
└──────────────────┘
3636
```
3737

3838
## Project Structure
3939

4040
```
4141
src/main/java/com/example/ai/jvmanalyzer/
42-
├── Application.java # Spring Boot entry point, beans config
43-
├── WebhookController.java # REST endpoint for monitoring webhooks
44-
├── AnalyzerService.java # Orchestrates analysis workflow
45-
├── AiService.java # Bedrock integration via Spring AI
46-
└── S3Repository.java # S3 storage for profiling data and results
42+
├── Application.java # Spring Boot entry point
43+
├── WebhookController.java # REST endpoint for Grafana webhooks
44+
├── AnalyzerService.java # Orchestrates analysis pipeline (async, virtual threads)
45+
├── AiService.java # Amazon Bedrock integration via Spring AI
46+
├── GitHubSourceCodeTool.java # Spring AI @Tool — fetches source code from GitHub
47+
├── JfrParser.java # Extracts CPU load, GC heap, JVM info from JFR
48+
├── FlamegraphGenerator.java # Collapsed stacks + HTML flamegraph via jfr-converter
49+
└── S3Repository.java # S3 storage for JFR, profiling, analysis artifacts
4750
```
4851

4952
## How It Works
5053

51-
1. Monitoring system detects performance issue (high CPU, thread count, etc.)
52-
2. Alert webhook sent to `/webhook` with pod name and IP address
53-
3. Analyzer fetches thread dump from the pod's actuator endpoint
54-
4. Retrieves latest flamegraph/profiling data from S3
55-
5. Sends both to Claude Sonnet 4 for analysis
56-
6. Stores thread dump, profiling data, and AI analysis report in S3
57-
58-
## Webhook Payload Format
59-
60-
```json
61-
{
62-
"alerts": [
63-
{
64-
"labels": {
65-
"pod": "unicorn-store-spring-abc123",
66-
"instance": "10.0.1.50:8080"
67-
}
68-
}
69-
]
70-
}
71-
```
54+
1. Grafana alert fires when POST request rate exceeds threshold
55+
2. Webhook sent to `/webhook` with pod name and IP address
56+
3. AnalyzerService runs asynchronously on a virtual thread:
57+
- Retrieves latest JFR recording from S3 (with retry for in-progress files)
58+
- `JfrParser` extracts runtime metrics (CPU load, GC heap, JVM config) from JFR binary
59+
- `FlamegraphGenerator` produces collapsed stacks text and HTML flamegraph using async-profiler's `jfr-converter` library
60+
- Fetches thread dump from pod's actuator endpoint
61+
- `AiService` sends profiling summary + thread dump to Amazon Bedrock (Claude Sonnet 4)
62+
- If `GITHUB_REPO_URL` is configured, the model uses `GitHubSourceCodeTool` to look up source code of methods found in stack traces
63+
4. Stores 5 artifacts per analysis to S3
64+
65+
## Source Code Tool
7266

73-
## Analysis Report Contents
67+
When `GITHUB_REPO_URL` is set, `AiService` registers a `GitHubSourceCodeTool` with the `ChatClient`. During analysis, the model can call this tool to fetch source files from the GitHub repository, enabling it to reference specific file paths, line numbers, and provide concrete code fixes.
7468

75-
The AI generates a structured report including:
76-
- Health status (Healthy/Degraded/Critical)
77-
- Thread analysis with state distribution
78-
- Top 3 critical issues with root cause and fix
79-
- Performance hotspots from flamegraph
80-
- Immediate and short-term recommendations
69+
- Uses GitHub REST API (`/contents/{path}`) with base64 decoding
70+
- `GITHUB_REPO_PATH` specifies the application root within the repo
71+
- `GITHUB_TOKEN` enables access to private repositories (optional for public repos)
8172

8273
## Dependencies
8374

8475
| Dependency | Version | Purpose |
8576
|------------|---------|---------|
86-
| Spring Boot | 4.0.1 | Application framework |
87-
| Spring AI | 1.1.1 | Bedrock integration |
77+
| Spring Boot | 4.0.2 | Application framework |
78+
| Spring AI | 1.1.2 | Amazon Bedrock integration |
8879
| AWS SDK | 2.40.15 | S3 client |
89-
| Testcontainers | 2.0.3 | Integration testing |
90-
| jqwik | 1.9.3 | Property-based testing |
91-
92-
## Configuration
93-
94-
| Property | Default | Description |
95-
|----------|---------|-------------|
96-
| `analyzer.thread-dump.url-template` | `http://{podIp}:8080/actuator/threaddump` | Thread dump endpoint |
97-
| `analyzer.s3.bucket` | `ai-jvm-analyzer-bucket` | S3 bucket for storage |
98-
| `analyzer.s3.prefix.analysis` | `analysis/` | Prefix for analysis results |
99-
| `analyzer.s3.prefix.profiling` | `profiling/` | Prefix for profiling data |
100-
| `spring.ai.bedrock.converse.chat.options.model` | `anthropic.claude-sonnet-4-20250514-v1:0` | Bedrock model |
80+
| jfr-converter | 4.3 | async-profiler collapsed stacks + flamegraph |
10181

10282
## Environment Variables
10383

10484
| Variable | Required | Description |
10585
|----------|----------|-------------|
106-
| `AWS_REGION` | Yes | AWS region for Bedrock and S3 |
86+
| `AWS_REGION` | Yes | AWS Region for Amazon Bedrock and S3 |
10787
| `AWS_S3_BUCKET` | Yes | S3 bucket name |
88+
| `GITHUB_REPO_URL` | No | GitHub API URL (e.g. `https://api.github.com/repos/aws-samples/java-on-aws`) |
89+
| `GITHUB_REPO_PATH` | No | Application root within repo (e.g. `apps/unicorn-store-spring`) |
90+
| `GITHUB_TOKEN` | No | GitHub PAT with `contents:read` scope (for private repos) |
91+
| `FLAMEGRAPH_INCLUDE` | No | Regex filter for HTML flamegraph frames (e.g. `.*unicorn.*`). Only affects the visual flamegraph — collapsed stacks sent to the AI model remain unfiltered |
92+
93+
## S3 Storage Layout
94+
95+
```
96+
s3://{bucket}/
97+
├── profiling/{pod-name}/
98+
│ └── profile-{yyyyMMdd}-{HHmmss}.jfr # async-profiler JFR recordings
99+
└── analysis/
100+
├── {timestamp}.jfr # JFR binary (for re-analysis)
101+
├── {timestamp}_profiling_{pod-name}.md # Runtime metrics + collapsed stacks
102+
├── {timestamp}_threaddump_{pod-name}.json # Thread dump snapshot
103+
├── {timestamp}_flamegraph_{pod-name}.html # Interactive HTML flamegraph
104+
└── {timestamp}_analysis_{pod-name}.md # AI-generated performance report
105+
```
108106

109107
## Building
110108

111109
```bash
112-
mvn package # Standard JAR
113-
mvn package -Pnative # Native image (GraalVM 25)
114-
mvn jib:dockerBuild # Container with Jib
110+
mvn compile jib:build -Dimage={ECR_URI}:latest # Container with Jib
111+
mvn package # Standard JAR
115112
```
116113

117114
## API Endpoints
118115

119116
| Method | Endpoint | Description |
120117
|--------|----------|-------------|
121-
| POST | `/webhook` | Receive monitoring alerts |
118+
| POST | `/webhook` | Receive Grafana alert notifications |
122119
| GET | `/actuator/health` | Health check |
123-
| GET | `/actuator/prometheus` | Metrics |
124-
125-
## S3 Storage Layout
126-
127-
```
128-
s3://ai-jvm-analyzer-bucket/
129-
├── profiling/
130-
│ └── {pod-name}/
131-
│ └── profile-{yyyyMMdd}-{timestamp}.html # Flamegraph data
132-
└── analysis/
133-
├── {timestamp}_threaddump_{pod-name}.json # Raw thread dump
134-
├── {timestamp}_profiling_{pod-name}.html # Profiling snapshot
135-
└── {timestamp}_analysis_{pod-name}.md # AI analysis report
136-
```
137-
138-
## IAM Permissions Required
139-
140-
```json
141-
{
142-
"Version": "2012-10-17",
143-
"Statement": [
144-
{
145-
"Effect": "Allow",
146-
"Action": [
147-
"s3:GetObject",
148-
"s3:PutObject",
149-
"s3:ListBucket"
150-
],
151-
"Resource": [
152-
"arn:aws:s3:::ai-jvm-analyzer-bucket",
153-
"arn:aws:s3:::ai-jvm-analyzer-bucket/*"
154-
]
155-
},
156-
{
157-
"Effect": "Allow",
158-
"Action": "bedrock:InvokeModel",
159-
"Resource": "arn:aws:bedrock:*::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0"
160-
}
161-
]
162-
}
163-
```
120+
| GET | `/actuator/prometheus` | Prometheus metrics |

apps/ai-jvm-analyzer/pom.xml

Lines changed: 5 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<parent>
66
<groupId>org.springframework.boot</groupId>
77
<artifactId>spring-boot-starter-parent</artifactId>
8-
<version>4.0.1</version>
8+
<version>4.0.2</version>
99
<relativePath/>
1010
</parent>
1111
<groupId>com.example.ai</groupId>
@@ -21,8 +21,7 @@
2121
<maven.compiler.target>25</maven.compiler.target>
2222
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
2323
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
24-
<spring-ai.version>1.1.1</spring-ai.version>
25-
<testcontainers.version>2.0.3</testcontainers.version>
24+
<spring-ai.version>1.1.2</spring-ai.version>
2625
</properties>
2726

2827
<dependencyManagement>
@@ -41,25 +40,13 @@
4140
<type>pom</type>
4241
<scope>import</scope>
4342
</dependency>
44-
<dependency>
45-
<groupId>org.apache.commons</groupId>
46-
<artifactId>commons-compress</artifactId>
47-
<version>1.27.1</version>
48-
<scope>runtime</scope>
49-
</dependency>
5043
</dependencies>
5144
</dependencyManagement>
5245

5346
<dependencies>
5447
<dependency>
5548
<groupId>org.springframework.boot</groupId>
5649
<artifactId>spring-boot-starter-web</artifactId>
57-
<exclusions>
58-
<exclusion>
59-
<groupId>commons-logging</groupId>
60-
<artifactId>commons-logging</artifactId>
61-
</exclusion>
62-
</exclusions>
6350
</dependency>
6451
<dependency>
6552
<groupId>org.springframework.boot</groupId>
@@ -72,48 +59,15 @@
7259
<dependency>
7360
<groupId>software.amazon.awssdk</groupId>
7461
<artifactId>s3</artifactId>
75-
<exclusions>
76-
<exclusion>
77-
<groupId>commons-logging</groupId>
78-
<artifactId>commons-logging</artifactId>
79-
</exclusion>
80-
</exclusions>
8162
</dependency>
8263
<dependency>
8364
<groupId>io.micrometer</groupId>
8465
<artifactId>micrometer-registry-prometheus</artifactId>
8566
</dependency>
8667
<dependency>
87-
<groupId>com.fasterxml.jackson.core</groupId>
88-
<artifactId>jackson-databind</artifactId>
89-
</dependency>
90-
<dependency>
91-
<groupId>org.springframework.boot</groupId>
92-
<artifactId>spring-boot-starter-test</artifactId>
93-
<scope>test</scope>
94-
</dependency>
95-
<dependency>
96-
<groupId>org.springframework.boot</groupId>
97-
<artifactId>spring-boot-testcontainers</artifactId>
98-
<scope>test</scope>
99-
</dependency>
100-
<dependency>
101-
<groupId>net.jqwik</groupId>
102-
<artifactId>jqwik</artifactId>
103-
<version>1.9.3</version>
104-
<scope>test</scope>
105-
</dependency>
106-
<dependency>
107-
<groupId>org.testcontainers</groupId>
108-
<artifactId>testcontainers-junit-jupiter</artifactId>
109-
<version>${testcontainers.version}</version>
110-
<scope>test</scope>
111-
</dependency>
112-
<dependency>
113-
<groupId>org.testcontainers</groupId>
114-
<artifactId>testcontainers-localstack</artifactId>
115-
<version>${testcontainers.version}</version>
116-
<scope>test</scope>
68+
<groupId>tools.profiler</groupId>
69+
<artifactId>jfr-converter</artifactId>
70+
<version>4.3</version>
11771
</dependency>
11872
</dependencies>
11973

0 commit comments

Comments
 (0)