Skip to content

Commit 526f289

Browse files
bk86aclaude
andcommitted
docs: performance baseline + reproducible harness (#59)
Measures sustained throughput, latency curve, and stability of the production deployment using a labeled trusted token. Headline: ~30 RPS / ~1,800 req/min sustained, p99 < 200 ms at 27 RPS over a 3-minute run. scripts/perf_test.sh is parameterised via PC2NUTS_TARGET and PC2NUTS_TOKEN and downloads a fresh GISCO TERCET corpus on first run. The doc identifies the single-worker plateau as the current bottleneck and lists the conditions under which the baseline must be re-measured (issues #7, #45, multi-worker switch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 156fdbf commit 526f289

2 files changed

Lines changed: 312 additions & 0 deletions

File tree

docs/performance.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Performance characterisation
2+
3+
**Date:** 2026-04-30
4+
**Commit:** `5e0b6ae`
5+
**Target:** production deployment (single edge region, single uvicorn worker, single container).
6+
**Test client:** Belgian residential connection → DE PoP, single source IP, authenticated via a labeled trusted token (revoked after the run).
7+
**Tools:** `bombardier` v1.2.6, `vegeta` v12.12.0.
8+
**Reproduction:** `scripts/perf_test.sh` (parameterised on `PC2NUTS_TARGET` and `PC2NUTS_TOKEN`).
9+
10+
---
11+
12+
## Headline
13+
14+
> **Sustained throughput ceiling: ~30 requests/second (~1,800 requests/minute).**
15+
>
16+
> **Recommended operating point: 27 RPS (~1,620/min), p99 < 200 ms.**
17+
18+
The current `60/minute` per-IP cap is therefore not the system bottleneck — the deployment can serve roughly **30× that volume in aggregate** before throughput plateaus. A single client could be permitted up to ~1,500/minute (25 RPS) without affecting overall headroom; the per-IP cap should be set well below the aggregate ceiling regardless.
19+
20+
---
21+
22+
## Latency curve (Scenario B — random valid lookups across 5 countries)
23+
24+
This is the realistic-input scenario and the basis for the headline number.
25+
26+
| Offered RPS | Achieved RPS | Success | p50 | p90 | p95 | p99 | Max |
27+
|------------:|-------------:|--------:|----:|----:|----:|----:|----:|
28+
| 10 | 10.0 | 100% | 46 ms | 53 ms | 63 ms | 74 ms | 104 ms |
29+
| 20 | 20.0 | 100% | 45 ms | 54 ms | 60 ms | 96 ms | 136 ms |
30+
| 25 | 25.1 | 100% | 46 ms | 54 ms | 73 ms | 151 ms | 228 ms |
31+
| **30** | **30.0** |**100%** |**48 ms**|**109 ms**|**137 ms**|**193 ms**|**222 ms**|
32+
| 35 | 32.2 | 100% |2.27 s |3.65 s |4.07 s |4.47 s |5.62 s |
33+
34+
The **knee is at 30 RPS**. From 30 → 35 the throughput barely moves (32.2 vs 30.0) but tail latencies jump 12-30×. Beyond the knee, queue depth grows without bound — the curve is sharp, not gradual.
35+
36+
## Saturation discovery (Scenario A — hot single key, BE 3080)
37+
38+
Throughput plateaus regardless of client concurrency, confirming the bottleneck is per-request work on the server (single event loop / single worker), not concurrency exhaustion on the client.
39+
40+
| Connections | Reqs/sec | p50 | p95 | p99 |
41+
|------------:|---------:|----:|----:|----:|
42+
| 5 | 29.6 | 169 ms | 225 ms | 267 ms |
43+
| 10 | 31.0 | 325 ms | 443 ms | 479 ms |
44+
| 20 | 31.8 | 617 ms | 795 ms | 1.00 s |
45+
| 40 | 30.9 | 1.21 s | 1.63 s | 2.31 s |
46+
| 80 | 30.4 | 2.30 s | 3.92 s | 6.92 s |
47+
48+
Throughput is bounded; concurrency just queues.
49+
50+
**At c≥100 the platform pushes back.** An exploratory pre-run at c=100, 200, 400, 800 produced widespread `5xx`, `dial tcp … connection timed out`, and `tls handshake timed out` errors — i.e. the edge platform aggressively refuses connections at very high concurrency from a single source. Stay well below c=100 in any scripted test against this deployment.
51+
52+
## Fallback-path cost (Scenario C — 50/50 hit/miss at 25 RPS)
53+
54+
Compared to Scenario B at the same rate (25/s), the 50/50 mix is statistically indistinguishable: p50 45 ms vs 46 ms; p99 136 ms vs 151 ms. The Tier 3 prefix-approximation path (taken on every "miss") imposes **no measurable latency cost** at this load. The hard work is per-request HTTP/TLS framing and JSON serialisation, not the lookup itself.
55+
56+
## FastAPI/uvicorn floor (Scenario D — `/health` at 25 RPS)
57+
58+
| Endpoint | p50 | p95 | p99 | Max |
59+
|---|---:|---:|---:|---:|
60+
| `/health` | **15 ms** | 19 ms | 27 ms | 62 ms |
61+
| `/lookup` (Scenario B at 25/s) | 46 ms | 73 ms | 151 ms | 228 ms |
62+
63+
`/health` is roughly **3× faster** than `/lookup`. About 15 ms of every request is the platform/network/TLS/uvicorn floor; the additional ~30 ms on `/lookup` is the endpoint logic plus Pydantic response serialisation. **Optimisation candidates** if a higher ceiling is needed: response serialisation (the dict access itself is microseconds), reducing JSON envelope size, or moving to multi-worker.
64+
65+
## Stability (Scenario E — sustained 27 RPS for 3 minutes)
66+
67+
| Metric | Value |
68+
|---|---|
69+
| Total requests | 4,860 |
70+
| Achieved rate | 27.0/s |
71+
| Success | 100.0% (200:4860) |
72+
| p50 / p95 / p99 / max | 46 / 89 / 132 / 324 ms |
73+
| <50 ms | 73.0% |
74+
| <100 ms | 97.4% |
75+
| <200 ms | 99.8% |
76+
| 5xx | 0 |
77+
| 429 | 0 |
78+
79+
No drift over the 3-minute window. p99 stayed well under 200 ms throughout.
80+
81+
---
82+
83+
## Methodology notes
84+
85+
- **Cooldown between runs.** A short pause (10 s) between scenarios is needed; without it, residual queueing from the previous run pollutes the next.
86+
- **Bombardier default 2 s timeout is too aggressive** here — runs at near-saturation see legitimate 1-2 s tail latencies. Use `--timeout 30s` to avoid spurious "timeout" classifications.
87+
- **Single-region edge means single-PoP measurements.** The platform allocates the deployment to one region (DE). Latency from clients elsewhere will differ accordingly, but the throughput ceiling is unaffected — every request still hits the same one container.
88+
- **Single source IP test client.** Distributed traffic from many IPs would not change the aggregate ceiling (the bottleneck is the container) but would change the per-IP rate-limit behaviour, since slowapi keys per source.
89+
- **No CDN cache between client and `/lookup`.** Verified by inspecting response headers — no `Cache-Status`, no `CDN-Cache-Status`, every request reaches the container.
90+
91+
---
92+
93+
## Recommendations
94+
95+
1. **Keep per-IP cap conservative relative to aggregate ceiling.** The current `60/minute` (1 RPS per IP) leaves comfortable headroom: even ~30 saturation-rate clients in parallel could sustain themselves before degrading the aggregate. No change needed unless trusted-token traffic patterns become heavy.
96+
97+
2. **Pick `p99 ≤ 200 ms` as the SLO** at the recommended 27 RPS operating point. The full 3-minute sustained run met this.
98+
99+
3. **Re-baseline after issues #7, #45, or any worker-count change land.** Specifically:
100+
- **#7 (UK NSPL, +1.79M postcodes)** — should not change per-request latency materially (still a dict lookup) but doubles in-memory state. Re-run to confirm.
101+
- **#45 (happyGISCO outbound geocoding)** — would add a network call to the lookup path; the saturation RPS will drop sharply. **Mandatory** re-baseline.
102+
- **Switching from single-worker to multi-worker** — likely the easiest large win. Each additional worker should approximately add another 30 RPS of headroom up to the container's CPU count.
103+
104+
4. **Don't run unattended high-concurrency tests.** Bombardier at c≥100 from a single source triggers platform-level connection refusal (`5xx`, dial timeouts) and risks short-term throttling. Keep scripted load below c=80.
105+
106+
---
107+
108+
## Reproducing
109+
110+
```bash
111+
# 1. Issue a labeled trusted token (operator credentials required):
112+
export PC2NUTS_TOKEN_DB_URL='libsql://...'
113+
export PC2NUTS_TOKEN_DB_AUTH_TOKEN='...'
114+
python -m scripts.tokens add --label "perf-test-$(date -I)"
115+
# Token will become active in the running service after one refresh interval (default 60 s).
116+
117+
# 2. Run the suite:
118+
export PC2NUTS_TARGET='https://example.invalid'
119+
export PC2NUTS_TOKEN='<the value printed above>'
120+
scripts/perf_test.sh
121+
122+
# 3. Revoke when done:
123+
python -m scripts.tokens revoke <id>
124+
```
125+
126+
Raw outputs are written to `/tmp/perf/`. The harness automatically downloads a fresh corpus from public GISCO TERCET ZIPs on first run.

scripts/perf_test.sh

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
#!/usr/bin/env bash
2+
# Performance test harness for the PostalCode2NUTS service.
3+
#
4+
# Discovers max sustainable RPS, characterises the latency curve, and verifies
5+
# stability at the chosen operating point. See docs/performance.md for the
6+
# methodology and the most recent measured results.
7+
#
8+
# Required env vars:
9+
# PC2NUTS_TARGET Base URL of the service (no trailing slash). Example:
10+
# https://example.invalid
11+
# PC2NUTS_TOKEN Trusted-token value granting rate-limit bypass. Issue with:
12+
# python -m scripts.tokens add --label "perf-test-YYYY-MM-DD"
13+
#
14+
# Optional env vars:
15+
# OUTDIR Output directory for raw results (default: /tmp/perf).
16+
# CORPUS_COUNTRIES Space-separated CC list to pull from GISCO TERCET (default:
17+
# "BE AT IE LU EE" — small files, fast download).
18+
# SCENARIOS Subset of "warm A B C D E" (default: all).
19+
#
20+
# Required tools on PATH: bombardier, vegeta, curl, python3.
21+
#
22+
# Stop conditions: any 5xx >1%, p99 >5s, or any 429 → halt and inspect output.
23+
set -euo pipefail
24+
25+
: "${PC2NUTS_TARGET:?PC2NUTS_TARGET (e.g. https://example.invalid) is required}"
26+
: "${PC2NUTS_TOKEN:?PC2NUTS_TOKEN is required (use scripts/tokens.py add)}"
27+
OUTDIR="${OUTDIR:-/tmp/perf}"
28+
CORPUS_COUNTRIES="${CORPUS_COUNTRIES:-BE AT IE LU EE}"
29+
SCENARIOS="${SCENARIOS:-warm A B C D E}"
30+
HEADER="Authorization: Bearer ${PC2NUTS_TOKEN}"
31+
32+
mkdir -p "${OUTDIR}"
33+
CORPUS_DIR="${OUTDIR}/corpus"
34+
mkdir -p "${CORPUS_DIR}"
35+
36+
# --- Build the corpus from public TERCET ZIPs --------------------------------
37+
build_corpus() {
38+
echo "Building corpus from GISCO TERCET (${CORPUS_COUNTRIES})..."
39+
for cc in ${CORPUS_COUNTRIES}; do
40+
for yr in 2025 2024 2023; do
41+
url="https://gisco-services.ec.europa.eu/tercet/NUTS-2024/pc${yr}_${cc}_NUTS-2024_v1.0.zip"
42+
tmp="${CORPUS_DIR}/${cc}.zip"
43+
curl -sf -o "${tmp}" "${url}" || continue
44+
if [ "$(file -b --mime-type "${tmp}")" = "application/zip" ]; then
45+
(cd "${CORPUS_DIR}" && unzip -oq "${cc}.zip")
46+
break
47+
fi
48+
rm -f "${tmp}"
49+
done
50+
done
51+
python3 - <<'PYEOF'
52+
import csv, os, random, re
53+
random.seed(20260430)
54+
corpus_dir = os.environ["CORPUS_DIR"]
55+
target = os.environ["PC2NUTS_TARGET"]
56+
gen_invalid = {
57+
"BE": lambda: f"{random.randint(1000,9999):04d}",
58+
"AT": lambda: f"{random.randint(1000,9999):04d}",
59+
"EE": lambda: f"{random.randint(10000,99999):05d}",
60+
"LU": lambda: f"{random.randint(1000,9999):04d}",
61+
}
62+
by_cc = {}
63+
for fn in sorted(os.listdir(corpus_dir)):
64+
m = re.match(r"pc\d+_([A-Z]{2})_.*\.csv$", fn)
65+
if not m: continue
66+
cc = m.group(1)
67+
codes = set()
68+
with open(os.path.join(corpus_dir, fn), encoding="utf-8-sig") as f:
69+
r = csv.reader(f, delimiter=";")
70+
next(r, None)
71+
for row in r:
72+
if len(row) >= 2 and (code := row[1].strip().strip("'")):
73+
codes.add(code)
74+
by_cc[cc] = codes
75+
print(f"Loaded: {[(cc, len(c)) for cc, c in by_cc.items()]}")
76+
valid = []
77+
per_cc = max(1, 5000 // len(by_cc))
78+
for cc, codes in by_cc.items():
79+
valid.extend((cc, c) for c in random.sample(sorted(codes), min(per_cc, len(codes))))
80+
random.shuffle(valid)
81+
invalid, attempts = [], 0
82+
ccs = [cc for cc in gen_invalid if cc in by_cc]
83+
while len(invalid) < 500 and attempts < 50_000:
84+
attempts += 1
85+
cc = random.choice(ccs)
86+
pc = gen_invalid[cc]()
87+
if pc not in by_cc[cc]:
88+
invalid.append((cc, pc))
89+
with open(os.path.join(corpus_dir, "..", "targets_B.txt"), "w") as f:
90+
for cc, pc in valid:
91+
f.write(f"GET {target}/lookup?country={cc}&postal_code={pc}\n\n")
92+
mix = []
93+
for i in range(max(len(valid), len(invalid))):
94+
if i < len(valid): mix.append(valid[i])
95+
if i < len(invalid): mix.append(invalid[i])
96+
with open(os.path.join(corpus_dir, "..", "targets_C.txt"), "w") as f:
97+
for cc, pc in mix:
98+
f.write(f"GET {target}/lookup?country={cc}&postal_code={pc}\n\n")
99+
with open(os.path.join(corpus_dir, "..", "targets_D.txt"), "w") as f:
100+
f.write(f"GET {target}/health\n")
101+
print(f"valid={len(valid)} invalid={len(invalid)} mix={len(mix)}")
102+
PYEOF
103+
}
104+
105+
run_warm() {
106+
echo "=== warm: 500 sequential mixed lookups ==="
107+
local errors=0
108+
for _ in $(seq 1 500); do
109+
local n=$((RANDOM % 100 + 2))
110+
local line
111+
line=$(sed -n "${n}p" "${OUTDIR}/targets_B.txt")
112+
local code
113+
code=$(curl -s -o /dev/null -w "%{http_code}" -H "${HEADER}" "${line#GET }")
114+
[ "${code}" = "200" ] || errors=$((errors + 1))
115+
done
116+
echo "warm complete; errors=${errors}"
117+
}
118+
119+
run_A() {
120+
echo "=== A: hot-key saturation sweep (BE 3080, c={5,10,20,40,80} × 20s) ==="
121+
local URL="${PC2NUTS_TARGET}/lookup?country=BE&postal_code=3080"
122+
for c in 5 10 20 40 80; do
123+
echo "-- A: -c ${c} --"
124+
bombardier -c "${c}" -d 20s -l --timeout 30s -H "${HEADER}" "${URL}" \
125+
| tee "${OUTDIR}/A_c${c}.txt" \
126+
| grep -E "Reqs/sec|Latency|^ [0-9]+%|HTTP codes|^ [0-9xa-z]" || true
127+
sleep 10
128+
done
129+
}
130+
131+
run_B() {
132+
echo "=== B: random-corpus rate sweep (10/20/25/30/35 RPS × 20s) ==="
133+
for r in 10 20 25 30 35; do
134+
echo "-- B: ${r}/s --"
135+
vegeta attack -duration=20s -rate="${r}/s" -header="${HEADER}" \
136+
-targets="${OUTDIR}/targets_B.txt" > "${OUTDIR}/B_r${r}.bin"
137+
vegeta report -type=text "${OUTDIR}/B_r${r}.bin" | tee "${OUTDIR}/B_r${r}.txt"
138+
sleep 10
139+
done
140+
}
141+
142+
run_C() {
143+
echo "=== C: 50/50 hit-miss mix at 25/s × 20s (Tier 3 fallback cost) ==="
144+
vegeta attack -duration=20s -rate=25/s -header="${HEADER}" \
145+
-targets="${OUTDIR}/targets_C.txt" > "${OUTDIR}/C_r25.bin"
146+
vegeta report -type=text "${OUTDIR}/C_r25.bin" | tee "${OUTDIR}/C_r25.txt"
147+
sleep 10
148+
}
149+
150+
run_D() {
151+
echo "=== D: /health at 25/s × 20s (FastAPI/uvicorn floor) ==="
152+
vegeta attack -duration=20s -rate=25/s -header="${HEADER}" \
153+
-targets="${OUTDIR}/targets_D.txt" > "${OUTDIR}/D_r25.bin"
154+
vegeta report -type=text "${OUTDIR}/D_r25.bin" | tee "${OUTDIR}/D_r25.txt"
155+
sleep 10
156+
}
157+
158+
run_E() {
159+
echo "=== E: sustained at 27/s for 3 min (90% of knee, stability check) ==="
160+
vegeta attack -duration=3m -rate=27/s -header="${HEADER}" \
161+
-targets="${OUTDIR}/targets_B.txt" > "${OUTDIR}/E_r27.bin"
162+
vegeta report -type=text "${OUTDIR}/E_r27.bin" | tee "${OUTDIR}/E_r27.txt"
163+
vegeta report -type='hist[0,50ms,100ms,200ms,500ms,1s,2s,5s]' \
164+
"${OUTDIR}/E_r27.bin" | tee -a "${OUTDIR}/E_r27.txt"
165+
}
166+
167+
# --- main --------------------------------------------------------------------
168+
export CORPUS_DIR PC2NUTS_TARGET
169+
[ -s "${OUTDIR}/targets_B.txt" ] || build_corpus
170+
171+
for s in ${SCENARIOS}; do
172+
case "${s}" in
173+
warm) run_warm ;;
174+
A) run_A ;;
175+
B) run_B ;;
176+
C) run_C ;;
177+
D) run_D ;;
178+
E) run_E ;;
179+
*) echo "unknown scenario: ${s}" >&2; exit 2 ;;
180+
esac
181+
done
182+
183+
echo
184+
echo "Done. Raw outputs in ${OUTDIR}/"
185+
echo "Remember to revoke the trusted token:"
186+
echo " python -m scripts.tokens revoke <id>"

0 commit comments

Comments
 (0)