Skip to content

Commit 11ccf10

Browse files
committed
Game of Pools
1 parent d85723c commit 11ccf10

52 files changed

Lines changed: 2433 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
docker-compose.override.yml
2+
result
3+
tmp
Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
# Game of Pools - Aidbox Performance Testing Framework
2+
3+
A comprehensive automated performance testing framework for Aidbox FHIR server that systematically tests different configurations of CPU limits, web threads, and database connection pool sizes.
4+
5+
## Overview
6+
7+
This project helps you find optimal Aidbox configuration by:
8+
- Testing multiple CPU, web thread, and DB pool combinations automatically
9+
- Simulating realistic FHIR workloads with complex resource relationships
10+
- Measuring throughput (RPS) and latency (P99, P95, avg)
11+
- Generating detailed analysis reports and visualizations
12+
13+
## Prerequisites
14+
15+
- Docker and Docker Compose
16+
- Python 3.8+
17+
- K6 load testing tool
18+
- At least 8GB RAM (16GB recommended)
19+
- Fast storage (NVMe SSD recommended)
20+
21+
## Project Structure
22+
23+
```
24+
.
25+
├── docker-compose.yaml # Main Docker Compose configuration
26+
├── docker-compose.override.yml # Generated per-test configuration
27+
├── init-bundle.json # Aidbox initialization bundle
28+
├── performance_test.py # Main test orchestration script
29+
├── analyze_results.py # Results analysis script
30+
├── summarize_results.py # Generate visualizations
31+
├── k6/ # K6 load test scripts
32+
│ ├── crud.js # Main CRUD test scenario
33+
│ ├── prewarm.js # Cache warmup script
34+
│ ├── util.js # Utility functions
35+
│ └── seed/ # FHIR resource templates
36+
│ ├── patient.js
37+
│ ├── encounter.js
38+
│ ├── observation.js
39+
│ └── ...
40+
├── result/ # Raw test results (JSON)
41+
├── analysis/ # Analysis reports (CSV, SVG)
42+
├── prometheus/ # Prometheus configuration
43+
└── grafana/ # Grafana dashboards
44+
```
45+
46+
## Quick Start
47+
48+
### 1. Clone the Repository
49+
50+
```bash
51+
git clone <repository-url>
52+
cd game-of-pools
53+
```
54+
55+
### 2. Configure Test Parameters
56+
57+
Edit `performance_test.py` to set your test parameters:
58+
59+
```python
60+
# Test parameters
61+
CPU_LIMITS = [2, 4, 6, 8] # CPU core limits to test
62+
WEB_THREAD_MULTIPLIERS = [1, 1.5, 2, 2.5, 3] # Web thread multipliers
63+
DB_POOL_MULTIPLIERS = [1.5, 2, 2.5, 3] # DB pool multipliers
64+
```
65+
66+
**For a quick test** (recommended for first run):
67+
```python
68+
CPU_LIMITS = [2]
69+
WEB_THREAD_MULTIPLIERS = [1, 1.5, 2]
70+
DB_POOL_MULTIPLIERS = [1.5, 2]
71+
```
72+
73+
### 3. Run Performance Tests
74+
75+
```bash
76+
python3 performance_test.py
77+
```
78+
79+
This will:
80+
- Start Docker Compose services (Aidbox, PostgreSQL, Prometheus, Grafana)
81+
- Run through all configuration combinations
82+
- Save results to `./result/` directory
83+
- Take approximately 1 hour per CPU limit (5 min per config + overhead)
84+
85+
**Note:** Services remain running after tests complete for analysis.
86+
87+
### 4. Analyze Results
88+
89+
```bash
90+
python3 analyze_results.py
91+
```
92+
93+
This generates:
94+
- `analysis/cpu_X_avg_rps.csv` - RPS matrices per CPU count
95+
- `analysis/cpu_X_p99_latency.csv` - Latency matrices per CPU count
96+
- `analysis/summary_all_tests.csv` - Complete results table
97+
- `analysis/best_configurations.csv` - Optimal configs per CPU
98+
99+
## Monitoring During Tests
100+
101+
### Grafana Dashboard
102+
103+
Access Grafana at http://localhost:3000
104+
- **Username:** admin
105+
- **Password:** password
106+
107+
The pre-configured Aidbox dashboard shows:
108+
- Request rates and latency percentiles
109+
- Database connection pool usage
110+
- CPU and memory utilization
111+
- FHIR operation breakdown
112+
113+
### Prometheus
114+
115+
Access Prometheus at http://localhost:9090
116+
117+
Useful queries:
118+
```promql
119+
# Request rate
120+
rate(http_requests_total[5m])
121+
122+
# P99 latency
123+
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
124+
125+
# DB pool utilization
126+
db_pool_active_connections / db_pool_max_connections
127+
```
128+
129+
130+
## Understanding Test Results
131+
132+
### Best Configurations CSV
133+
134+
```csv
135+
CPU Limit,Metric,Best Web Threads,Best DB Pool Size,Value
136+
2,Best RPS,3,4,1596.15
137+
2,Best P99 Latency,2,5,6.02
138+
```
139+
140+
- **Best RPS**: Configuration with highest throughput
141+
- **Best P99 Latency**: Configuration with lowest 99th percentile latency
142+
143+
### RPS Matrix (example for 4 CPUs)
144+
145+
```
146+
Web Threads / DB Pool | 6 | 8 | 10 | 12
147+
---------------------|---------|---------|---------|--------
148+
4 | 2694.48 | 2705.79 | 2747.58 | 2697.15
149+
6 | - | - | - | 3094.02
150+
8 | - | - | - | 2718.51
151+
```
152+
153+
- Rows: Web thread counts
154+
- Columns: DB pool sizes
155+
- Values: Requests per second (RPS)
156+
- `-`: Configuration not tested
157+
158+
## Customizing Tests
159+
160+
### Modify Test Duration
161+
162+
Edit `k6/crud.js`:
163+
164+
```javascript
165+
export const options = {
166+
scenarios: {
167+
crud: {
168+
executor: 'constant-vus',
169+
vus: __ENV.K6_VUS || 300,
170+
duration: '5m', // Change this (e.g., '10m', '1h')
171+
gracefulStop: '30s',
172+
},
173+
},
174+
}
175+
```
176+
177+
### Adjust Virtual Users
178+
179+
The framework automatically sets VUs to 2x web threads. To override:
180+
181+
```bash
182+
K6_VUS=100 python3 performance_test.py
183+
```
184+
185+
### Test Specific Configuration
186+
187+
To test a single configuration:
188+
189+
```python
190+
# In performance_test.py
191+
CPU_LIMITS = [4]
192+
WEB_THREAD_MULTIPLIERS = [1.5]
193+
DB_POOL_MULTIPLIERS = [2]
194+
```
195+
196+
Or manually create `docker-compose.override.yml`:
197+
198+
```yaml
199+
services:
200+
aidbox:
201+
environment:
202+
BOX_WEB_THREAD: "6"
203+
BOX_DB_POOL_MAXIMUM_POOL_SIZE: "12"
204+
deploy:
205+
resources:
206+
limits:
207+
cpus: "4"
208+
```
209+
210+
Then run K6 directly:
211+
212+
```bash
213+
docker compose up -d --wait
214+
k6 run k6/prewarm.js
215+
k6 run --summary-export=result.json k6/crud.js
216+
```
217+
218+
### Add Custom FHIR Resources
219+
220+
1. Create a new seed file in `k6/seed/`:
221+
222+
```javascript
223+
// k6/seed/procedure.js
224+
export default {
225+
"resourceType": "Procedure",
226+
"status": "completed",
227+
// ... your resource structure
228+
}
229+
```
230+
231+
2. Import and use in `k6/crud.js`:
232+
233+
```javascript
234+
import procedure from './seed/procedure.js'
235+
236+
export function setup() {
237+
return {
238+
seeds: {
239+
procedure: JSON.stringify(procedure),
240+
// ... other seeds
241+
}
242+
}
243+
}
244+
```
245+
246+
## Troubleshooting
247+
248+
### Services Won't Start
249+
250+
```bash
251+
# Check logs
252+
docker compose logs aidbox
253+
docker compose logs postgres
254+
255+
# Restart services
256+
docker compose down
257+
docker compose up -d --wait
258+
```
259+
260+
## Performance Tips
261+
262+
### For Faster Testing
263+
264+
1. **Reduce test duration**: Edit `k6/crud.js` to use `duration: '2m'`
265+
2. **Test fewer configurations**: Reduce multiplier arrays
266+
3. **Use local storage**: Ensure Docker volumes are on fast SSD
267+
4. **Allocate more resources**: Increase Docker Desktop memory limit
268+
269+
### For Production-Like Testing
270+
271+
1. **Use realistic data volumes**: Pre-populate database with test data
272+
2. **Match production hardware**: Test on similar CPU/memory specs
273+
3. **Include background load**: Run maintenance tasks during tests
274+
4. **Test with real network latency**: Use remote database
275+
276+
## Results Interpretation
277+
278+
### Choosing Configuration
279+
280+
**For maximum throughput:**
281+
- Use 1.5x CPU for web threads
282+
- Use 2-2.5x threads for DB pool
283+
- Example: 4 CPUs → 6 threads, 12 pool → 3,094 RPS
284+
285+
**For low latency:**
286+
- Use 1x CPU for web threads
287+
- Use 1.5-2x threads for DB pool
288+
- Example: 4 CPUs → 4 threads, 8 pool → 6.12ms P99
289+
290+
**For balanced:**
291+
- Use 1.5x CPU for web threads
292+
- Use 2x threads for DB pool
293+
- Best for most production deployments
294+

samurai-lab/game-of-pools/analysis/2_cpu_p99_latency.svg

Lines changed: 1 addition & 0 deletions
Loading

samurai-lab/game-of-pools/analysis/2_cpu_throughput.svg

Lines changed: 1 addition & 0 deletions
Loading

samurai-lab/game-of-pools/analysis/4_cpu_p99_latency.svg

Lines changed: 1 addition & 0 deletions
Loading

samurai-lab/game-of-pools/analysis/4_cpu_throughput.svg

Lines changed: 1 addition & 0 deletions
Loading

samurai-lab/game-of-pools/analysis/6_cpu_p99_latency.svg

Lines changed: 1 addition & 0 deletions
Loading

samurai-lab/game-of-pools/analysis/6_cpu_throughput.svg

Lines changed: 1 addition & 0 deletions
Loading

samurai-lab/game-of-pools/analysis/8_cpu_p99_latency.svg

Lines changed: 1 addition & 0 deletions
Loading

samurai-lab/game-of-pools/analysis/8_cpu_throughput.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)