Use-Tusk
diff --git a/‎benchmarks/.gitignore‎
Lines changed: 5 additions & 0 deletions b/‎benchmarks/.gitignore‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎benchmarks/PROFILING.md‎
Lines changed: 212 additions & 0 deletions b/‎benchmarks/PROFILING.md‎
Lines changed: 212 additions & 0 deletions
diff --git a/‎benchmarks/README.md‎
Lines changed: 161 additions & 0 deletions b/‎benchmarks/README.md‎
Lines changed: 161 additions & 0 deletions
diff --git a/‎benchmarks/bench/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎benchmarks/bench/__init__.py‎
Lines changed: 1 addition & 0 deletions
@@ -0,0 +1,5 @@
+# Benchmark results (regenerated each run)
+results/
+
+# Trace directories created during benchmarks
+.benchmark-traces*/
@@ -0,0 +1,212 @@
+# Profiling the Drift Python SDK
+
+This document explains how to profile the SDK to understand where performance overhead comes from.
+
+## Quick Start
+
+```bash
+cd /path/to/drift-python-sdk
+
+# Run cProfile (recommended starting point)
+./benchmarks/profile/profile.sh cprofile
+```
+
+## Available Profilers
+
+### 1. cProfile (Built-in, Deterministic)
+
+**Best for:** Understanding call counts and cumulative time per function.
+
+```bash
+./benchmarks/profile/profile.sh cprofile
+```
+
+This runs the profiling workload and outputs:
+
+- A `.prof` file in `benchmarks/profile/results/`
+- Summary of top functions by cumulative and total time
+
+**View interactively with snakeviz:**
+
+```bash
+pip install snakeviz
+snakeviz benchmarks/profile/results/cprofile_*.prof
+```
+
+Opens an interactive sunburst diagram in your browser showing the call hierarchy.
+
+### 2. py-spy (Sampling, Flame Graphs)
+
+**Best for:** Low-overhead profiling and classic flame graph visualization.
+
+```bash
+# Requires sudo on macOS
+sudo ./benchmarks/profile/profile.sh pyspy
+```
+
+Generates an SVG flame graph that you can open in any browser.
+
+**Install:**
+
+```bash
+pip install py-spy
+# Or on macOS: brew install py-spy
+```
+
+### 3. Scalene (CPU + Memory)
+
+**Best for:** Understanding both CPU time and memory allocation per line.
+
+```bash
+pip install scalene
+./benchmarks/profile/profile.sh scalene
+```
+
+Generates an HTML report with line-by-line CPU and memory breakdown.
+
+### 4. VizTracer (Timeline)
+
+**Best for:** Seeing the sequence and timing of function calls over time.
+
+```bash
+pip install viztracer
+./benchmarks/profile/profile.sh viztracer
+```
+
+**View the trace:**
+
+```bash
+vizviewer benchmarks/profile/results/viztracer_*.json
+```
+
+Opens a Chrome DevTools-style timeline visualization.
+
+## Profile Analysis Results
+
+Based on profiling 500 HTTP requests through the SDK:
+
+### Top Overhead Sources
+
+| Function | Time/Call | Description |
+|----------|-----------|-------------|
+| `span_serialization.clean_span_to_proto` | ~1.7ms | Converting spans to protobuf format |
+| `td_span_processor.on_end` | ~2.1ms | Processing spans when they complete |
+| `handler.finalize_wsgi_span` | ~2.2ms | Finalizing HTTP/WSGI spans |
+| `otel_converter.otel_span_to_clean_span_data` | ~0.4ms | Converting OpenTelemetry spans |
+
+### Key Findings
+
+1. **Span serialization is the biggest bottleneck**
+   - `_dict_to_struct`, `_value_to_proto`, `_json_schema_to_proto` are called recursively
+   - Converting rich span data to protobuf is inherently expensive
+
+2. **The instrumentation itself is cheap**
+   - Function patching/wrapping adds minimal overhead
+   - Most time is spent in span processing, not in the hooks
+
+3. **Sampling reduces overhead proportionally**
+   - At 10% sampling, most requests skip span serialization entirely
+   - This explains why lower sampling rates dramatically improve performance
+
+### Optimization Opportunities
+
+Based on the profile data:
+
+1. Lazy serialization - Defer protobuf conversion until export time
+2. Batch serialization - Serialize multiple spans together
+3. Schema caching - Cache JSON schema conversions
+4. Attribute filtering - Skip serializing large/unnecessary attributes
+
+## Custom Profiling
+
+### Profile a Specific Workload
+
+Edit `benchmarks/profile/simple_profile.py` to customize:
+
+```python
+# Adjust number of iterations
+iterations = 1000
+
+# Change request mix
+if i % 3 == 0:
+    # Your custom endpoint
+    response = session.get(f"{server_url}/your-endpoint")
+```
+
+### Profile with Different SDK Settings
+
+```python
+# In simple_profile.py, modify SDK initialization:
+sdk = TuskDrift.initialize(
+    api_key="profile-test-key",
+    env="profile",
+    sampling_rate=0.1,  # Test with different sampling rates
+    transforms={...},    # Test with transforms enabled
+    log_level="warning",
+)
+```
+
+### Profile Production Code
+
+You can use py-spy to attach to a running process:
+
+```bash
+# Find your Python process PID
+ps aux | grep python
+
+# Attach and record
+sudo py-spy record -o profile.svg --pid <PID> --duration 30
+```
+
+## Comparing Before/After Changes
+
+1. Run profile before changes:
+
+   ```bash
+   ./benchmarks/profile/profile.sh cprofile
+   mv benchmarks/profile/results/cprofile_*.prof benchmarks/profile/results/before.prof
+   ```
+
+2. Make your changes
+
+3. Run profile after changes:
+
+   ```bash
+   ./benchmarks/profile/profile.sh cprofile
+   mv benchmarks/profile/results/cprofile_*.prof benchmarks/profile/results/after.prof
+   ```
+
+4. Compare with pstats:
+
+   ```python
+   import pstats
+
+   before = pstats.Stats('benchmarks/profile/results/before.prof')
+   after = pstats.Stats('benchmarks/profile/results/after.prof')
+
+   print("=== BEFORE ===")
+   before.strip_dirs().sort_stats('cumulative').print_stats(20)
+
+   print("=== AFTER ===")
+   after.strip_dirs().sort_stats('cumulative').print_stats(20)
+   ```
+
+## Output Files
+
+Profile results are saved to `benchmarks/profile/results/` (gitignored):
+
+| File | Description |
+|------|-------------|
+| `cprofile_*.prof` | cProfile binary data |
+| `flamegraph_*.svg` | py-spy flame graph |
+| `scalene_*.html` | Scalene HTML report |
+| `viztracer_*.json` | VizTracer timeline data |
+| `traces/` | SDK trace output during profiling |
+
+## Tips
+
+- **Start with cProfile** - It's built-in and gives good overview
+- **Use snakeviz for exploration** - Interactive visualization helps find hotspots
+- **Profile realistic workloads** - Micro-benchmarks may not reflect production patterns
+- **Compare sampling rates** - Profile with 100% vs 10% sampling to see the difference
+- **Watch for I/O** - File writes and network calls can dominate profiles
@@ -0,0 +1,161 @@
+# Benchmarks
+
+These benchmarks measure the performance overhead of the Drift Python SDK.
+
+## Overview
+
+The benchmark suite runs a Flask test server and makes HTTP requests to various endpoints
+while measuring latency, throughput, CPU usage, and memory consumption. Three configurations
+are tested:
+
+1. SDK Disabled (baseline) - No SDK instrumentation
+2. SDK Active - SDK in RECORD mode, capturing traces
+3. SDK Active with Transforms - SDK with data transformation rules enabled
+
+## Usage
+
+### Prerequisites
+
+Make sure you have the required dependencies:
+
+```bash
+cd /path/to/drift-python-sdk
+uv pip install psutil flask requests
+```
+
+### Running All Benchmarks
+
+```bash
+# Run all benchmarks and compare results
+./run_benchmarks.sh
+```
+
+### Running Individual Benchmarks
+
+```bash
+# Realistic workload benchmark (recommended)
+python benchmarks/bench/realistic_workload.py
+
+# Fixed QPS latency test (measures latency at controlled request rates)
+python benchmarks/bench/fixed_qps_latency.py
+
+# Synthetic benchmarks (stress tests)
+python benchmarks/bench/sdk_disabled.py
+python benchmarks/bench/sdk_active.py
+python benchmarks/bench/sdk_active_with_transforms.py
+
+# Sampling rate comparison
+python benchmarks/bench/sdk_sampling_rates.py
+```
+
+### Comparing Results
+
+After running benchmarks, compare the results:
+
+```bash
+python benchmarks/compare_benchmarks.py
+```
+
+### Configuration
+
+You can configure benchmarks via environment variables:
+
+- `BENCHMARK_ENABLE_MEMORY=false` - Disable memory monitoring (reduces CPU overhead)
+
+Or modify the options in `common.py`:
+
+```python
+DEFAULT_OPTIONS = {
+    "time_per_task_ms": 10_000,  # Duration per task (10 seconds default)
+    "warmup_iterations": 5,      # Warmup iterations before measurement
+    "enable_memory_tracking": True,
+}
+```
+
+## Benchmark Tasks
+
+### Realistic Workloads (Recommended)
+
+These endpoints simulate production API behavior:
+
+- **GET /api/typical-read** (~5-10ms): Auth check + DB read + response serialization
+- **POST /api/typical-write** (~15-25ms): Validation + DB write + response
+- **POST /api/realistic** (~10-20ms): Validation + DB query + data processing + response
+
+### Synthetic Workloads
+
+These are stress tests, not representative of production:
+
+- **POST /api/compute-hash**: Pure CPU (iterative SHA-256) - useful for profiling
+- **POST /api/io-bound**: Pure I/O (sleep delays) - tests baseline overhead
+- **POST /api/auth/login, /api/users**: Sensitive data for transform testing
+
+## Output
+
+Results are saved to `benchmarks/results/` as JSON files:
+
+- `sdk-disabled.json` - Baseline results
+- `sdk-active.json` - SDK enabled results
+- `sdk-active-with-transforms.json` - SDK with transforms results
+
+The comparison script outputs a markdown table showing:
+
+- Throughput delta (negative = worse)
+- Tail latency (p99) delta (positive = worse)
+- CPU usage delta
+- Memory overhead
+
+## Interpreting Results
+
+- **Throughput $\Delta$**: Percentage change in operations/second. Negative means slower.
+- **Tail Latency $\Delta$**: Percentage change in p99 latency. Positive means slower.
+- **CPU User $\Delta$**: Change in user-space CPU percentage.
+- **Memory $\Delta$**: Additional memory used by the SDK.
+
+Ideally, the SDK should have minimal impact:
+
+- Throughput should be within ±5%
+- Tail latency increase should be <10%
+- Memory overhead should be reasonable (<50MB)
+
+## Profiling
+
+For detailed profiling to understand where SDK overhead comes from, see **[PROFILING.md](./PROFILING.md)**.
+
+Quick start:
+
+```bash
+# Run cProfile analysis
+./benchmarks/profile/profile.sh cprofile
+
+# View interactively
+pip install snakeviz
+snakeviz benchmarks/profile/results/cprofile_*.prof
+```
+
+## Architecture
+
+```text
+benchmarks/
+├── bench/
+│   ├── common.py              # Shared benchmark logic
+│   ├── fixed_qps_latency.py   # Fixed QPS latency test
+│   ├── realistic_workload.py  # Realistic API workload benchmark
+│   ├── resource_monitor.py    # CPU/memory monitoring
+│   ├── result_utils.py        # Result serialization
+│   ├── sdk_disabled.py        # Baseline benchmark (synthetic)
+│   ├── sdk_active.py          # SDK active benchmark (synthetic)
+│   ├── sdk_active_with_transforms.py
+│   └── sdk_sampling_rates.py  # Sampling rate impact benchmark
+├── profile/
+│   ├── profile.sh             # Profiler runner script
+│   ├── simple_profile.py      # Profiling workload
+│   └── results/               # Profile output (gitignored)
+├── server/
+│   └── test_server.py         # Flask test server
+├── results/                   # JSON output (gitignored)
+├── compare_benchmarks.py      # Result comparison script
+├── run_benchmarks.sh          # Runner script
+├── PROFILING.md               # Profiling documentation
+└── README.md
+```
@@ -0,0 +1 @@
+# Benchmark utilities