chore: raise default rate limit from 60/minute to 120/minute

bk86a · claude · bk86a · commit a03821a3a5e3 · 2026-04-30T10:23:39.000+02:00
Doubles the friendliness of the per-IP cap for casual users while staying well below the measured aggregate ceiling (~30 RPS = ~1,800 req/min). At 2 RPS per IP, up to 15 simultaneous full-rate anonymous clients can coexist without degradation; batch users still feel friction and are nudged toward requesting a trusted token. Will be revisited when multi-worker (#68) ships and the aggregate ceiling rises. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/README.md b/README.md
@@ -314,7 +314,7 @@ All settings are overridable via environment variables prefixed with `PC2NUTS_`:
 | `PC2NUTS_DB_CACHE_TTL_DAYS` | `30` | Days between automatic TERCET data refreshes. If the refresh fails, the service falls back to the previous data and sets `data_stale: true` in the health endpoint. |
 | `PC2NUTS_ESTIMATES_CSV` | `./tercet_missing_codes.csv` | Path to the estimates CSV. Loaded automatically at startup if the file exists. |
 | `PC2NUTS_EXTRA_SOURCES` | *(empty)* | Comma-separated list of ZIP URLs containing additional postal code data. Loaded after TERCET; entries overwrite TERCET data. |
-| `PC2NUTS_RATE_LIMIT` | `60/minute` | Rate limit for `/lookup` and `/pattern` endpoints. Uses [slowapi](https://github.com/laurentS/slowapi) syntax (e.g. `100/minute`, `5/second`). `/health` is exempt. |
+| `PC2NUTS_RATE_LIMIT` | `120/minute` | Rate limit for `/lookup` and `/pattern` endpoints. Uses [slowapi](https://github.com/laurentS/slowapi) syntax (e.g. `100/minute`, `5/second`). `/health` is exempt. The default leaves comfortable headroom under the measured aggregate ceiling (~30 RPS) — see [`docs/performance.md`](docs/performance.md) for the rationale. |
 | `PC2NUTS_STARTUP_TIMEOUT` | `300` | Maximum seconds allowed for initial data loading. If exceeded, the service starts with whatever data was loaded and sets `data_stale: true`. |
 | `PC2NUTS_TRUSTED_TOKENS` | `""` (empty — bypass disabled) | Comma-separated list of opaque tokens that bypass the per-IP rate limit when sent via `Authorization: Bearer <token>`. Continues to work as a union with the DB-backed registry below; set this only as a disaster-recovery fallback or for env-var-only deployments. See [Authentication & rate-limit bypass](#authentication--rate-limit-bypass) for the operator runbook. |
 | `PC2NUTS_TOKEN_DB_URL` | `""` (unset) | Connection string for the trusted-token database. Accepts both `https://…` and `libsql://…` (the latter is rewritten to `https://` automatically). Empty → DB-backed bypass disabled, falls back to env-var-only behaviour. |
@@ -328,7 +328,7 @@ All settings are overridable via environment variables prefixed with `PC2NUTS_`:
 
 ## Authentication & rate-limit bypass
 
-The service applies a per-IP rate limit (`60/minute` by default) to `/lookup` and `/pattern`. Trusted callers — operator-issued, manually distributed — can bypass this limit by presenting an `Authorization: Bearer <token>` header. `/health` stays anonymous.
+The service applies a per-IP rate limit (`120/minute` by default) to `/lookup` and `/pattern`. Trusted callers — operator-issued, manually distributed — can bypass this limit by presenting an `Authorization: Bearer <token>` header. `/health` stays anonymous.
 
 ### Configuration
 
@@ -413,7 +413,7 @@ Then remove that token from `PC2NUTS_TRUSTED_TOKENS` on the next config edit.
 
 | Request | Result |
 |---|---|
-| No `Authorization` header | Per-IP `60/minute` cap, normal `200` / `429` |
+| No `Authorization` header | Per-IP `120/minute` cap, normal `200` / `429` |
 | `Authorization: Bearer <valid_token>` | Rate limit fully bypassed; `token_id=<8hex>` appended to access log |
 | `Authorization: Bearer <unknown_token>` | `401 Unauthorized` |
 | `Authorization: <not Bearer>` or malformed | `400 Bad Request` |
diff --git a/app/config.py b/app/config.py
@@ -22,7 +22,7 @@ class Settings(BaseSettings):
     token_db_url: str = ""
     token_db_auth_token: str = ""
     token_refresh_seconds: int = Field(default=60, ge=1)
-    rate_limit: str = _defaults.get("rate_limit", "60/minute")
+    rate_limit: str = _defaults.get("rate_limit", "120/minute")
     rate_limit_headers: bool = _defaults.get("rate_limit_headers", True)
     cache_max_age: int = _defaults.get("cache_max_age", 3600)
     startup_timeout: int = 300
diff --git a/app/settings.json b/app/settings.json
@@ -21,7 +21,7 @@
     "nuts1": 0.90
   },
   "approximate_min_confidence": 0.1,
-  "rate_limit": "60/minute",
+  "rate_limit": "120/minute",
   "rate_limit_headers": true,
   "cache_max_age": 3600
 }
diff --git a/docs/performance.md b/docs/performance.md
@@ -15,7 +15,7 @@
 >
 > **Recommended operating point: 27 RPS (~1,620/min), p99 < 200 ms.**
 
-The current `60/minute` per-IP cap is therefore not the system bottleneck — the deployment can serve roughly **30× that volume in aggregate** before throughput plateaus. A single client could be permitted up to ~1,500/minute (25 RPS) without affecting overall headroom; the per-IP cap should be set well below the aggregate ceiling regardless.
+The per-IP cap is therefore not the system bottleneck — the deployment can serve roughly **15× the default `120/minute` cap in aggregate** before throughput plateaus. A single client could in principle be permitted up to ~1,500/minute (25 RPS) without affecting overall headroom; the per-IP cap is set well below the aggregate ceiling so that ~15 simultaneous full-rate clients can coexist without degradation.
 
 ---
 
@@ -92,7 +92,7 @@ No drift over the 3-minute window. p99 stayed well under 200 ms throughout.
 
 ## Recommendations
 
-1. **Keep per-IP cap conservative relative to aggregate ceiling.** The current `60/minute` (1 RPS per IP) leaves comfortable headroom: even ~30 saturation-rate clients in parallel could sustain themselves before degrading the aggregate. No change needed unless trusted-token traffic patterns become heavy.
+1. **Per-IP cap set to `120/minute` (2 RPS per IP).** Chosen as 1/15 of the aggregate ceiling — up to 15 simultaneous full-rate anonymous clients can sustain themselves before the aggregate degrades. Friendlier UX for casual users (a small country's worth of postcodes finishes in roughly half the time it took at `60/minute`) while still tight enough that batch users feel the pressure to request a trusted token. Revisit when multi-worker (#68) ships and the aggregate ceiling rises.
 
 2. **Pick `p99 ≤ 200 ms` as the SLO** at the recommended 27 RPS operating point. The full 3-minute sustained run met this.
 
diff --git a/tests/test_api.py b/tests/test_api.py
@@ -224,11 +224,11 @@ def test_health_ignores_malformed_header(self, trusted_client):
         assert resp.status_code == 200
 
     def test_valid_token_bypasses_rate_limit(self, trusted_client):
-        """Default rate limit is 60/minute. With a valid bypass token, more
-        than 60 requests in tight succession all return 200. Without bypass,
-        request 61+ would 429."""
+        """Default rate limit is 120/minute. With a valid bypass token, more
+        than 120 requests in tight succession all return 200. Without bypass,
+        request 121+ would 429."""
         headers = {"Authorization": "Bearer test-token-aaa"}
-        for i in range(80):
+        for i in range(150):
             resp = trusted_client.get(
                 "/lookup",
                 params={"postal_code": "10115", "country": "DE"},

Original file line number	Diff line number	Diff line change
`@@ -21,7 +21,7 @@`
`21`	`21`	`"nuts1": 0.90`
`22`	`22`	`},`
`23`	`23`	`"approximate_min_confidence": 0.1,`
`24`		`- "rate_limit": "60/minute",`
	`24`	`+ "rate_limit": "120/minute",`
`25`	`25`	`"rate_limit_headers": true,`
`26`	`26`	`"cache_max_age": 3600`
`27`	`27`	`}`