docs: performance baseline + reproducible harness (#59)

bk86a · claude · bk86a · commit 526f289f7935 · 2026-04-30T10:18:42.000+02:00
Measures sustained throughput, latency curve, and stability of the production deployment using a labeled trusted token. Headline: ~30 RPS / ~1,800 req/min sustained, p99 < 200 ms at 27 RPS over a 3-minute run. scripts/perf_test.sh is parameterised via PC2NUTS_TARGET and PC2NUTS_TOKEN and downloads a fresh GISCO TERCET corpus on first run. The doc identifies the single-worker plateau as the current bottleneck and lists the conditions under which the baseline must be re-measured (issues #7, #45, multi-worker switch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/docs/performance.md b/docs/performance.md
@@ -0,0 +1,126 @@
+# Performance characterisation
+
+**Date:** 2026-04-30
+**Commit:** `5e0b6ae`
+**Target:** production deployment (single edge region, single uvicorn worker, single container).
+**Test client:** Belgian residential connection → DE PoP, single source IP, authenticated via a labeled trusted token (revoked after the run).
+**Tools:** `bombardier` v1.2.6, `vegeta` v12.12.0.
+**Reproduction:** `scripts/perf_test.sh` (parameterised on `PC2NUTS_TARGET` and `PC2NUTS_TOKEN`).
+
+---
+
+## Headline
+
+> **Sustained throughput ceiling: ~30 requests/second (~1,800 requests/minute).**
+>
+> **Recommended operating point: 27 RPS (~1,620/min), p99 < 200 ms.**
+
+The current `60/minute` per-IP cap is therefore not the system bottleneck — the deployment can serve roughly **30× that volume in aggregate** before throughput plateaus. A single client could be permitted up to ~1,500/minute (25 RPS) without affecting overall headroom; the per-IP cap should be set well below the aggregate ceiling regardless.
+
+---
+
+## Latency curve (Scenario B — random valid lookups across 5 countries)
+
+This is the realistic-input scenario and the basis for the headline number.
+
+| Offered RPS | Achieved RPS | Success | p50 | p90 | p95 | p99 | Max |
+|------------:|-------------:|--------:|----:|----:|----:|----:|----:|
+|          10 |         10.0 |    100% |  46 ms |  53 ms |  63 ms |  74 ms | 104 ms |
+|          20 |         20.0 |    100% |  45 ms |  54 ms |  60 ms |  96 ms | 136 ms |
+|          25 |         25.1 |    100% |  46 ms |  54 ms |  73 ms | 151 ms | 228 ms |
+|      **30** |     **30.0** |**100%** |**48 ms**|**109 ms**|**137 ms**|**193 ms**|**222 ms**|
+|          35 |         32.2 |    100% |2.27 s |3.65 s |4.07 s |4.47 s |5.62 s |
+
+The **knee is at 30 RPS**. From 30 → 35 the throughput barely moves (32.2 vs 30.0) but tail latencies jump 12-30×. Beyond the knee, queue depth grows without bound — the curve is sharp, not gradual.
+
+## Saturation discovery (Scenario A — hot single key, BE 3080)
+
+Throughput plateaus regardless of client concurrency, confirming the bottleneck is per-request work on the server (single event loop / single worker), not concurrency exhaustion on the client.
+
+| Connections | Reqs/sec | p50 | p95 | p99 |
+|------------:|---------:|----:|----:|----:|
+|           5 |     29.6 | 169 ms | 225 ms | 267 ms |
+|          10 |     31.0 | 325 ms | 443 ms | 479 ms |
+|          20 |     31.8 | 617 ms | 795 ms | 1.00 s |
+|          40 |     30.9 | 1.21 s | 1.63 s | 2.31 s |
+|          80 |     30.4 | 2.30 s | 3.92 s | 6.92 s |
+
+Throughput is bounded; concurrency just queues.
+
+**At c≥100 the platform pushes back.** An exploratory pre-run at c=100, 200, 400, 800 produced widespread `5xx`, `dial tcp … connection timed out`, and `tls handshake timed out` errors — i.e. the edge platform aggressively refuses connections at very high concurrency from a single source. Stay well below c=100 in any scripted test against this deployment.
+
+## Fallback-path cost (Scenario C — 50/50 hit/miss at 25 RPS)
+
+Compared to Scenario B at the same rate (25/s), the 50/50 mix is statistically indistinguishable: p50 45 ms vs 46 ms; p99 136 ms vs 151 ms. The Tier 3 prefix-approximation path (taken on every "miss") imposes **no measurable latency cost** at this load. The hard work is per-request HTTP/TLS framing and JSON serialisation, not the lookup itself.
+
+## FastAPI/uvicorn floor (Scenario D — `/health` at 25 RPS)
+
+| Endpoint | p50 | p95 | p99 | Max |
+|---|---:|---:|---:|---:|
+| `/health` | **15 ms** | 19 ms | 27 ms | 62 ms |
+| `/lookup` (Scenario B at 25/s) | 46 ms | 73 ms | 151 ms | 228 ms |
+
+`/health` is roughly **3× faster** than `/lookup`. About 15 ms of every request is the platform/network/TLS/uvicorn floor; the additional ~30 ms on `/lookup` is the endpoint logic plus Pydantic response serialisation. **Optimisation candidates** if a higher ceiling is needed: response serialisation (the dict access itself is microseconds), reducing JSON envelope size, or moving to multi-worker.
+
+## Stability (Scenario E — sustained 27 RPS for 3 minutes)
+
+| Metric | Value |
+|---|---|
+| Total requests | 4,860 |
+| Achieved rate | 27.0/s |
+| Success | 100.0% (200:4860) |
+| p50 / p95 / p99 / max | 46 / 89 / 132 / 324 ms |
+| <50 ms | 73.0% |
+| <100 ms | 97.4% |
+| <200 ms | 99.8% |
+| 5xx | 0 |
+| 429 | 0 |
+
+No drift over the 3-minute window. p99 stayed well under 200 ms throughout.
+
+---
+
+## Methodology notes
+
+- **Cooldown between runs.** A short pause (10 s) between scenarios is needed; without it, residual queueing from the previous run pollutes the next.
+- **Bombardier default 2 s timeout is too aggressive** here — runs at near-saturation see legitimate 1-2 s tail latencies. Use `--timeout 30s` to avoid spurious "timeout" classifications.
+- **Single-region edge means single-PoP measurements.** The platform allocates the deployment to one region (DE). Latency from clients elsewhere will differ accordingly, but the throughput ceiling is unaffected — every request still hits the same one container.
+- **Single source IP test client.** Distributed traffic from many IPs would not change the aggregate ceiling (the bottleneck is the container) but would change the per-IP rate-limit behaviour, since slowapi keys per source.
+- **No CDN cache between client and `/lookup`.** Verified by inspecting response headers — no `Cache-Status`, no `CDN-Cache-Status`, every request reaches the container.
+
+---
+
+## Recommendations
+
+1. **Keep per-IP cap conservative relative to aggregate ceiling.** The current `60/minute` (1 RPS per IP) leaves comfortable headroom: even ~30 saturation-rate clients in parallel could sustain themselves before degrading the aggregate. No change needed unless trusted-token traffic patterns become heavy.
+
+2. **Pick `p99 ≤ 200 ms` as the SLO** at the recommended 27 RPS operating point. The full 3-minute sustained run met this.
+
+3. **Re-baseline after issues #7, #45, or any worker-count change land.** Specifically:
+   - **#7 (UK NSPL, +1.79M postcodes)** — should not change per-request latency materially (still a dict lookup) but doubles in-memory state. Re-run to confirm.
+   - **#45 (happyGISCO outbound geocoding)** — would add a network call to the lookup path; the saturation RPS will drop sharply. **Mandatory** re-baseline.
+   - **Switching from single-worker to multi-worker** — likely the easiest large win. Each additional worker should approximately add another 30 RPS of headroom up to the container's CPU count.
+
+4. **Don't run unattended high-concurrency tests.** Bombardier at c≥100 from a single source triggers platform-level connection refusal (`5xx`, dial timeouts) and risks short-term throttling. Keep scripted load below c=80.
+
+---
+
+## Reproducing
+
+```bash
+# 1. Issue a labeled trusted token (operator credentials required):
+export PC2NUTS_TOKEN_DB_URL='libsql://...'
+export PC2NUTS_TOKEN_DB_AUTH_TOKEN='...'
+python -m scripts.tokens add --label "perf-test-$(date -I)"
+# Token will become active in the running service after one refresh interval (default 60 s).
+
+# 2. Run the suite:
+export PC2NUTS_TARGET='https://example.invalid'
+export PC2NUTS_TOKEN='<the value printed above>'
+scripts/perf_test.sh
+
+# 3. Revoke when done:
+python -m scripts.tokens revoke <id>
+```
+
+Raw outputs are written to `/tmp/perf/`. The harness automatically downloads a fresh corpus from public GISCO TERCET ZIPs on first run.
diff --git a/scripts/perf_test.sh b/scripts/perf_test.sh
@@ -0,0 +1,186 @@
+#!/usr/bin/env bash
+# Performance test harness for the PostalCode2NUTS service.
+#
+# Discovers max sustainable RPS, characterises the latency curve, and verifies
+# stability at the chosen operating point. See docs/performance.md for the
+# methodology and the most recent measured results.
+#
+# Required env vars:
+#   PC2NUTS_TARGET   Base URL of the service (no trailing slash). Example:
+#                    https://example.invalid
+#   PC2NUTS_TOKEN    Trusted-token value granting rate-limit bypass. Issue with:
+#                    python -m scripts.tokens add --label "perf-test-YYYY-MM-DD"
+#
+# Optional env vars:
+#   OUTDIR           Output directory for raw results (default: /tmp/perf).
+#   CORPUS_COUNTRIES Space-separated CC list to pull from GISCO TERCET (default:
+#                    "BE AT IE LU EE" — small files, fast download).
+#   SCENARIOS        Subset of "warm A B C D E" (default: all).
+#
+# Required tools on PATH: bombardier, vegeta, curl, python3.
+#
+# Stop conditions: any 5xx >1%, p99 >5s, or any 429 → halt and inspect output.
+set -euo pipefail
+
+: "${PC2NUTS_TARGET:?PC2NUTS_TARGET (e.g. https://example.invalid) is required}"
+: "${PC2NUTS_TOKEN:?PC2NUTS_TOKEN is required (use scripts/tokens.py add)}"
+OUTDIR="${OUTDIR:-/tmp/perf}"
+CORPUS_COUNTRIES="${CORPUS_COUNTRIES:-BE AT IE LU EE}"
+SCENARIOS="${SCENARIOS:-warm A B C D E}"
+HEADER="Authorization: Bearer ${PC2NUTS_TOKEN}"
+
+mkdir -p "${OUTDIR}"
+CORPUS_DIR="${OUTDIR}/corpus"
+mkdir -p "${CORPUS_DIR}"
+
+# --- Build the corpus from public TERCET ZIPs --------------------------------
+build_corpus() {
+    echo "Building corpus from GISCO TERCET (${CORPUS_COUNTRIES})..."
+    for cc in ${CORPUS_COUNTRIES}; do
+        for yr in 2025 2024 2023; do
+            url="https://gisco-services.ec.europa.eu/tercet/NUTS-2024/pc${yr}_${cc}_NUTS-2024_v1.0.zip"
+            tmp="${CORPUS_DIR}/${cc}.zip"
+            curl -sf -o "${tmp}" "${url}" || continue
+            if [ "$(file -b --mime-type "${tmp}")" = "application/zip" ]; then
+                (cd "${CORPUS_DIR}" && unzip -oq "${cc}.zip")
+                break
+            fi
+            rm -f "${tmp}"
+        done
+    done
+    python3 - <<'PYEOF'
+import csv, os, random, re
+random.seed(20260430)
+corpus_dir = os.environ["CORPUS_DIR"]
+target = os.environ["PC2NUTS_TARGET"]
+gen_invalid = {
+    "BE": lambda: f"{random.randint(1000,9999):04d}",
+    "AT": lambda: f"{random.randint(1000,9999):04d}",
+    "EE": lambda: f"{random.randint(10000,99999):05d}",
+    "LU": lambda: f"{random.randint(1000,9999):04d}",
+}
+by_cc = {}
+for fn in sorted(os.listdir(corpus_dir)):
+    m = re.match(r"pc\d+_([A-Z]{2})_.*\.csv$", fn)
+    if not m: continue
+    cc = m.group(1)
+    codes = set()
+    with open(os.path.join(corpus_dir, fn), encoding="utf-8-sig") as f:
+        r = csv.reader(f, delimiter=";")
+        next(r, None)
+        for row in r:
+            if len(row) >= 2 and (code := row[1].strip().strip("'")):
+                codes.add(code)
+    by_cc[cc] = codes
+print(f"Loaded: {[(cc, len(c)) for cc, c in by_cc.items()]}")
+valid = []
+per_cc = max(1, 5000 // len(by_cc))
+for cc, codes in by_cc.items():
+    valid.extend((cc, c) for c in random.sample(sorted(codes), min(per_cc, len(codes))))
+random.shuffle(valid)
+invalid, attempts = [], 0
+ccs = [cc for cc in gen_invalid if cc in by_cc]
+while len(invalid) < 500 and attempts < 50_000:
+    attempts += 1
+    cc = random.choice(ccs)
+    pc = gen_invalid[cc]()
+    if pc not in by_cc[cc]:
+        invalid.append((cc, pc))
+with open(os.path.join(corpus_dir, "..", "targets_B.txt"), "w") as f:
+    for cc, pc in valid:
+        f.write(f"GET {target}/lookup?country={cc}&postal_code={pc}\n\n")
+mix = []
+for i in range(max(len(valid), len(invalid))):
+    if i < len(valid): mix.append(valid[i])
+    if i < len(invalid): mix.append(invalid[i])
+with open(os.path.join(corpus_dir, "..", "targets_C.txt"), "w") as f:
+    for cc, pc in mix:
+        f.write(f"GET {target}/lookup?country={cc}&postal_code={pc}\n\n")
+with open(os.path.join(corpus_dir, "..", "targets_D.txt"), "w") as f:
+    f.write(f"GET {target}/health\n")
+print(f"valid={len(valid)} invalid={len(invalid)} mix={len(mix)}")
+PYEOF
+}
+
+run_warm() {
+    echo "=== warm: 500 sequential mixed lookups ==="
+    local errors=0
+    for _ in $(seq 1 500); do
+        local n=$((RANDOM % 100 + 2))
+        local line
+        line=$(sed -n "${n}p" "${OUTDIR}/targets_B.txt")
+        local code
+        code=$(curl -s -o /dev/null -w "%{http_code}" -H "${HEADER}" "${line#GET }")
+        [ "${code}" = "200" ] || errors=$((errors + 1))
+    done
+    echo "warm complete; errors=${errors}"
+}
+
+run_A() {
+    echo "=== A: hot-key saturation sweep (BE 3080, c={5,10,20,40,80} × 20s) ==="
+    local URL="${PC2NUTS_TARGET}/lookup?country=BE&postal_code=3080"
+    for c in 5 10 20 40 80; do
+        echo "-- A: -c ${c} --"
+        bombardier -c "${c}" -d 20s -l --timeout 30s -H "${HEADER}" "${URL}" \
+            | tee "${OUTDIR}/A_c${c}.txt" \
+            | grep -E "Reqs/sec|Latency|^     [0-9]+%|HTTP codes|^    [0-9xa-z]" || true
+        sleep 10
+    done
+}
+
+run_B() {
+    echo "=== B: random-corpus rate sweep (10/20/25/30/35 RPS × 20s) ==="
+    for r in 10 20 25 30 35; do
+        echo "-- B: ${r}/s --"
+        vegeta attack -duration=20s -rate="${r}/s" -header="${HEADER}" \
+            -targets="${OUTDIR}/targets_B.txt" > "${OUTDIR}/B_r${r}.bin"
+        vegeta report -type=text "${OUTDIR}/B_r${r}.bin" | tee "${OUTDIR}/B_r${r}.txt"
+        sleep 10
+    done
+}
+
+run_C() {
+    echo "=== C: 50/50 hit-miss mix at 25/s × 20s (Tier 3 fallback cost) ==="
+    vegeta attack -duration=20s -rate=25/s -header="${HEADER}" \
+        -targets="${OUTDIR}/targets_C.txt" > "${OUTDIR}/C_r25.bin"
+    vegeta report -type=text "${OUTDIR}/C_r25.bin" | tee "${OUTDIR}/C_r25.txt"
+    sleep 10
+}
+
+run_D() {
+    echo "=== D: /health at 25/s × 20s (FastAPI/uvicorn floor) ==="
+    vegeta attack -duration=20s -rate=25/s -header="${HEADER}" \
+        -targets="${OUTDIR}/targets_D.txt" > "${OUTDIR}/D_r25.bin"
+    vegeta report -type=text "${OUTDIR}/D_r25.bin" | tee "${OUTDIR}/D_r25.txt"
+    sleep 10
+}
+
+run_E() {
+    echo "=== E: sustained at 27/s for 3 min (90% of knee, stability check) ==="
+    vegeta attack -duration=3m -rate=27/s -header="${HEADER}" \
+        -targets="${OUTDIR}/targets_B.txt" > "${OUTDIR}/E_r27.bin"
+    vegeta report -type=text "${OUTDIR}/E_r27.bin" | tee "${OUTDIR}/E_r27.txt"
+    vegeta report -type='hist[0,50ms,100ms,200ms,500ms,1s,2s,5s]' \
+        "${OUTDIR}/E_r27.bin" | tee -a "${OUTDIR}/E_r27.txt"
+}
+
+# --- main --------------------------------------------------------------------
+export CORPUS_DIR PC2NUTS_TARGET
+[ -s "${OUTDIR}/targets_B.txt" ] || build_corpus
+
+for s in ${SCENARIOS}; do
+    case "${s}" in
+        warm) run_warm ;;
+        A)    run_A ;;
+        B)    run_B ;;
+        C)    run_C ;;
+        D)    run_D ;;
+        E)    run_E ;;
+        *)    echo "unknown scenario: ${s}" >&2; exit 2 ;;
+    esac
+done
+
+echo
+echo "Done. Raw outputs in ${OUTDIR}/"
+echo "Remember to revoke the trusted token:"
+echo "  python -m scripts.tokens revoke <id>"