Skip to content

Commit d57e914

Browse files
committed
Commit
1 parent ae49e9c commit d57e914

20 files changed

Lines changed: 2941 additions & 0 deletions

benchmarks/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Benchmark results (regenerated each run)
2+
results/
3+
4+
# Trace directories created during benchmarks
5+
.benchmark-traces*/

benchmarks/PROFILING.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Profiling the Drift Python SDK
2+
3+
This document explains how to profile the SDK to understand where performance overhead comes from.
4+
5+
## Quick Start
6+
7+
```bash
8+
cd /path/to/drift-python-sdk
9+
10+
# Run cProfile (recommended starting point)
11+
./benchmarks/profile/profile.sh cprofile
12+
```
13+
14+
## Available Profilers
15+
16+
### 1. cProfile (Built-in, Deterministic)
17+
18+
**Best for:** Understanding call counts and cumulative time per function.
19+
20+
```bash
21+
./benchmarks/profile/profile.sh cprofile
22+
```
23+
24+
This runs the profiling workload and outputs:
25+
26+
- A `.prof` file in `benchmarks/profile/results/`
27+
- Summary of top functions by cumulative and total time
28+
29+
**View interactively with snakeviz:**
30+
31+
```bash
32+
pip install snakeviz
33+
snakeviz benchmarks/profile/results/cprofile_*.prof
34+
```
35+
36+
Opens an interactive sunburst diagram in your browser showing the call hierarchy.
37+
38+
### 2. py-spy (Sampling, Flame Graphs)
39+
40+
**Best for:** Low-overhead profiling and classic flame graph visualization.
41+
42+
```bash
43+
# Requires sudo on macOS
44+
sudo ./benchmarks/profile/profile.sh pyspy
45+
```
46+
47+
Generates an SVG flame graph that you can open in any browser.
48+
49+
**Install:**
50+
51+
```bash
52+
pip install py-spy
53+
# Or on macOS: brew install py-spy
54+
```
55+
56+
### 3. Scalene (CPU + Memory)
57+
58+
**Best for:** Understanding both CPU time and memory allocation per line.
59+
60+
```bash
61+
pip install scalene
62+
./benchmarks/profile/profile.sh scalene
63+
```
64+
65+
Generates an HTML report with line-by-line CPU and memory breakdown.
66+
67+
### 4. VizTracer (Timeline)
68+
69+
**Best for:** Seeing the sequence and timing of function calls over time.
70+
71+
```bash
72+
pip install viztracer
73+
./benchmarks/profile/profile.sh viztracer
74+
```
75+
76+
**View the trace:**
77+
78+
```bash
79+
vizviewer benchmarks/profile/results/viztracer_*.json
80+
```
81+
82+
Opens a Chrome DevTools-style timeline visualization.
83+
84+
## Profile Analysis Results
85+
86+
Based on profiling 500 HTTP requests through the SDK:
87+
88+
### Top Overhead Sources
89+
90+
| Function | Time/Call | Description |
91+
|----------|-----------|-------------|
92+
| `span_serialization.clean_span_to_proto` | ~1.7ms | Converting spans to protobuf format |
93+
| `td_span_processor.on_end` | ~2.1ms | Processing spans when they complete |
94+
| `handler.finalize_wsgi_span` | ~2.2ms | Finalizing HTTP/WSGI spans |
95+
| `otel_converter.otel_span_to_clean_span_data` | ~0.4ms | Converting OpenTelemetry spans |
96+
97+
### Key Findings
98+
99+
1. **Span serialization is the biggest bottleneck**
100+
- `_dict_to_struct`, `_value_to_proto`, `_json_schema_to_proto` are called recursively
101+
- Converting rich span data to protobuf is inherently expensive
102+
103+
2. **The instrumentation itself is cheap**
104+
- Function patching/wrapping adds minimal overhead
105+
- Most time is spent in span processing, not in the hooks
106+
107+
3. **Sampling reduces overhead proportionally**
108+
- At 10% sampling, most requests skip span serialization entirely
109+
- This explains why lower sampling rates dramatically improve performance
110+
111+
### Optimization Opportunities
112+
113+
Based on the profile data:
114+
115+
1. Lazy serialization - Defer protobuf conversion until export time
116+
2. Batch serialization - Serialize multiple spans together
117+
3. Schema caching - Cache JSON schema conversions
118+
4. Attribute filtering - Skip serializing large/unnecessary attributes
119+
120+
## Custom Profiling
121+
122+
### Profile a Specific Workload
123+
124+
Edit `benchmarks/profile/simple_profile.py` to customize:
125+
126+
```python
127+
# Adjust number of iterations
128+
iterations = 1000
129+
130+
# Change request mix
131+
if i % 3 == 0:
132+
# Your custom endpoint
133+
response = session.get(f"{server_url}/your-endpoint")
134+
```
135+
136+
### Profile with Different SDK Settings
137+
138+
```python
139+
# In simple_profile.py, modify SDK initialization:
140+
sdk = TuskDrift.initialize(
141+
api_key="profile-test-key",
142+
env="profile",
143+
sampling_rate=0.1, # Test with different sampling rates
144+
transforms={...}, # Test with transforms enabled
145+
log_level="warning",
146+
)
147+
```
148+
149+
### Profile Production Code
150+
151+
You can use py-spy to attach to a running process:
152+
153+
```bash
154+
# Find your Python process PID
155+
ps aux | grep python
156+
157+
# Attach and record
158+
sudo py-spy record -o profile.svg --pid <PID> --duration 30
159+
```
160+
161+
## Comparing Before/After Changes
162+
163+
1. Run profile before changes:
164+
165+
```bash
166+
./benchmarks/profile/profile.sh cprofile
167+
mv benchmarks/profile/results/cprofile_*.prof benchmarks/profile/results/before.prof
168+
```
169+
170+
2. Make your changes
171+
172+
3. Run profile after changes:
173+
174+
```bash
175+
./benchmarks/profile/profile.sh cprofile
176+
mv benchmarks/profile/results/cprofile_*.prof benchmarks/profile/results/after.prof
177+
```
178+
179+
4. Compare with pstats:
180+
181+
```python
182+
import pstats
183+
184+
before = pstats.Stats('benchmarks/profile/results/before.prof')
185+
after = pstats.Stats('benchmarks/profile/results/after.prof')
186+
187+
print("=== BEFORE ===")
188+
before.strip_dirs().sort_stats('cumulative').print_stats(20)
189+
190+
print("=== AFTER ===")
191+
after.strip_dirs().sort_stats('cumulative').print_stats(20)
192+
```
193+
194+
## Output Files
195+
196+
Profile results are saved to `benchmarks/profile/results/` (gitignored):
197+
198+
| File | Description |
199+
|------|-------------|
200+
| `cprofile_*.prof` | cProfile binary data |
201+
| `flamegraph_*.svg` | py-spy flame graph |
202+
| `scalene_*.html` | Scalene HTML report |
203+
| `viztracer_*.json` | VizTracer timeline data |
204+
| `traces/` | SDK trace output during profiling |
205+
206+
## Tips
207+
208+
- **Start with cProfile** - It's built-in and gives good overview
209+
- **Use snakeviz for exploration** - Interactive visualization helps find hotspots
210+
- **Profile realistic workloads** - Micro-benchmarks may not reflect production patterns
211+
- **Compare sampling rates** - Profile with 100% vs 10% sampling to see the difference
212+
- **Watch for I/O** - File writes and network calls can dominate profiles

benchmarks/README.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# Benchmarks
2+
3+
These benchmarks measure the performance overhead of the Drift Python SDK.
4+
5+
## Overview
6+
7+
The benchmark suite runs a Flask test server and makes HTTP requests to various endpoints
8+
while measuring latency, throughput, CPU usage, and memory consumption. Three configurations
9+
are tested:
10+
11+
1. SDK Disabled (baseline) - No SDK instrumentation
12+
2. SDK Active - SDK in RECORD mode, capturing traces
13+
3. SDK Active with Transforms - SDK with data transformation rules enabled
14+
15+
## Usage
16+
17+
### Prerequisites
18+
19+
Make sure you have the required dependencies:
20+
21+
```bash
22+
cd /path/to/drift-python-sdk
23+
uv pip install psutil flask requests
24+
```
25+
26+
### Running All Benchmarks
27+
28+
```bash
29+
# Run all benchmarks and compare results
30+
./run_benchmarks.sh
31+
```
32+
33+
### Running Individual Benchmarks
34+
35+
```bash
36+
# Realistic workload benchmark (recommended)
37+
python benchmarks/bench/realistic_workload.py
38+
39+
# Fixed QPS latency test (measures latency at controlled request rates)
40+
python benchmarks/bench/fixed_qps_latency.py
41+
42+
# Synthetic benchmarks (stress tests)
43+
python benchmarks/bench/sdk_disabled.py
44+
python benchmarks/bench/sdk_active.py
45+
python benchmarks/bench/sdk_active_with_transforms.py
46+
47+
# Sampling rate comparison
48+
python benchmarks/bench/sdk_sampling_rates.py
49+
```
50+
51+
### Comparing Results
52+
53+
After running benchmarks, compare the results:
54+
55+
```bash
56+
python benchmarks/compare_benchmarks.py
57+
```
58+
59+
### Configuration
60+
61+
You can configure benchmarks via environment variables:
62+
63+
- `BENCHMARK_ENABLE_MEMORY=false` - Disable memory monitoring (reduces CPU overhead)
64+
65+
Or modify the options in `common.py`:
66+
67+
```python
68+
DEFAULT_OPTIONS = {
69+
"time_per_task_ms": 10_000, # Duration per task (10 seconds default)
70+
"warmup_iterations": 5, # Warmup iterations before measurement
71+
"enable_memory_tracking": True,
72+
}
73+
```
74+
75+
## Benchmark Tasks
76+
77+
### Realistic Workloads (Recommended)
78+
79+
These endpoints simulate production API behavior:
80+
81+
- **GET /api/typical-read** (~5-10ms): Auth check + DB read + response serialization
82+
- **POST /api/typical-write** (~15-25ms): Validation + DB write + response
83+
- **POST /api/realistic** (~10-20ms): Validation + DB query + data processing + response
84+
85+
### Synthetic Workloads
86+
87+
These are stress tests, not representative of production:
88+
89+
- **POST /api/compute-hash**: Pure CPU (iterative SHA-256) - useful for profiling
90+
- **POST /api/io-bound**: Pure I/O (sleep delays) - tests baseline overhead
91+
- **POST /api/auth/login, /api/users**: Sensitive data for transform testing
92+
93+
## Output
94+
95+
Results are saved to `benchmarks/results/` as JSON files:
96+
97+
- `sdk-disabled.json` - Baseline results
98+
- `sdk-active.json` - SDK enabled results
99+
- `sdk-active-with-transforms.json` - SDK with transforms results
100+
101+
The comparison script outputs a markdown table showing:
102+
103+
- Throughput delta (negative = worse)
104+
- Tail latency (p99) delta (positive = worse)
105+
- CPU usage delta
106+
- Memory overhead
107+
108+
## Interpreting Results
109+
110+
- **Throughput $\Delta$**: Percentage change in operations/second. Negative means slower.
111+
- **Tail Latency $\Delta$**: Percentage change in p99 latency. Positive means slower.
112+
- **CPU User $\Delta$**: Change in user-space CPU percentage.
113+
- **Memory $\Delta$**: Additional memory used by the SDK.
114+
115+
Ideally, the SDK should have minimal impact:
116+
117+
- Throughput should be within ±5%
118+
- Tail latency increase should be <10%
119+
- Memory overhead should be reasonable (<50MB)
120+
121+
## Profiling
122+
123+
For detailed profiling to understand where SDK overhead comes from, see **[PROFILING.md](./PROFILING.md)**.
124+
125+
Quick start:
126+
127+
```bash
128+
# Run cProfile analysis
129+
./benchmarks/profile/profile.sh cprofile
130+
131+
# View interactively
132+
pip install snakeviz
133+
snakeviz benchmarks/profile/results/cprofile_*.prof
134+
```
135+
136+
## Architecture
137+
138+
```text
139+
benchmarks/
140+
├── bench/
141+
│ ├── common.py # Shared benchmark logic
142+
│ ├── fixed_qps_latency.py # Fixed QPS latency test
143+
│ ├── realistic_workload.py # Realistic API workload benchmark
144+
│ ├── resource_monitor.py # CPU/memory monitoring
145+
│ ├── result_utils.py # Result serialization
146+
│ ├── sdk_disabled.py # Baseline benchmark (synthetic)
147+
│ ├── sdk_active.py # SDK active benchmark (synthetic)
148+
│ ├── sdk_active_with_transforms.py
149+
│ └── sdk_sampling_rates.py # Sampling rate impact benchmark
150+
├── profile/
151+
│ ├── profile.sh # Profiler runner script
152+
│ ├── simple_profile.py # Profiling workload
153+
│ └── results/ # Profile output (gitignored)
154+
├── server/
155+
│ └── test_server.py # Flask test server
156+
├── results/ # JSON output (gitignored)
157+
├── compare_benchmarks.py # Result comparison script
158+
├── run_benchmarks.sh # Runner script
159+
├── PROFILING.md # Profiling documentation
160+
└── README.md
161+
```

benchmarks/bench/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Benchmark utilities

0 commit comments

Comments
 (0)