Skip to content

feat: multi-worker uvicorn with shared rate-limit backend (#68)#71

Merged
bk86a merged 12 commits into
mainfrom
feat/multi-worker-uvicorn
May 1, 2026
Merged

feat: multi-worker uvicorn with shared rate-limit backend (#68)#71
bk86a merged 12 commits into
mainfrom
feat/multi-worker-uvicorn

Conversation

@bk86a
Copy link
Copy Markdown
Owner

@bk86a bk86a commented May 1, 2026

Summary

Implements #68: multi-worker uvicorn behind a shared rate-limit backend.

  • New env vars: PC2NUTS_WORKERS (default 1), PC2NUTS_RATE_LIMIT_STORAGE_URI (default unset).
  • Startup hard-fails if WORKERS > 1 without a storage URI configured (Pydantic model validator) — prevents silent per-IP rate-limit cap loosening.
  • slowapi in_memory_fallback_enabled=True handles transient backend outages with per-process MemoryStorage fallback and exponential-backoff re-probing — once-WARNING-per-outage, INFO on recovery.
  • Default behaviour (single worker, in-memory) is byte-for-byte unchanged: when neither env var is set, the Limiter is constructed exactly as before.

Spec: docs/superpowers/specs/2026-05-01-multi-worker-uvicorn-design.md
Plan: docs/superpowers/plans/2026-05-01-multi-worker-uvicorn.md

Acceptance criteria from #68

  • Dockerfile updated to launch N workers via PC2NUTS_WORKERS (shell-form CMD with exec uvicorn ... --workers ${PC2NUTS_WORKERS:-1})
  • Rate-limit behaviour with N workers documented in README.md (new "Multi-worker deployment" subsection under ## Configuration)
  • Memory usage measured against the container's allocated memory — operational follow-up post-deploy
  • Performance baseline re-run; docs/performance.md updated — operational follow-up post-deploy

Test plan

Automated (passing on this branch)

  • Existing rate-limit tests still pass (single-worker default branch is byte-for-byte unchanged: 178 passed)
  • tests/test_config.py::TestWorkersValidator proves the validator fires for WORKERS>1 + no URI (4 tests covering the truth table including empty-string aliasing)
  • tests/test_limiter.py::TestLimiterStorageSelection proves the storage URI is honoured and fallback flag is set
  • Docker smoke-test confirmed locally: build OK, single-worker startup OK, PC2NUTS_WORKERS=2 without storage URI fails immediately with the validator's error message naming both env vars

Pre-merge manual (operational)

  • Deploy this branch image to the target hosting environment with PC2NUTS_WORKERS=2-4 and a real Redis configured via PC2NUTS_RATE_LIMIT_STORAGE_URI
  • Confirm /health returns 200 from all workers
  • Fire >120 requests in a minute from a single anonymous client; confirm the 120/minute cap is observed across workers (i.e. shared Redis storage works)
  • Fire requests with a valid trusted-token bearer; confirm bypass still works (exempt_when=is_trusted_request runs before the storage call)
  • Briefly take Redis offline mid-traffic; confirm one WARNING log line, traffic continues to be served (fallback active), no 5xx burst
  • Bring Redis back; confirm one INFO "Rate limit storage recovered" line within ~30 s

Post-merge operational

  • Re-run perf baseline (scripts/perf_test.sh) and update docs/performance.md with the new RPS at the chosen worker count
  • Document measured per-worker resident-set size and confirm headroom against the container memory cap

Notes

  • Final review caught one supply-chain issue (commit d0baaeb): the initial requirements.txt declaration slowapi[redis]>=0.1.9,<1 resolves to redis<4 (slowapi 0.1.9's stale extra constraint), conflicting with the redis==7.4.0 lock pin. Switched to limits[redis]>=2.3 which exposes the modern redis>3,<8 constraint.
  • Spec was simplified mid-implementation when slowapi 0.1.9 turned out to ship the fail-degraded behaviour we'd designed (in_memory_fallback_enabled=True); a custom _FailDegradedStorage wrapper class was dropped from the design — see commit 7cf2d00.

🤖 Generated with Claude Code

bk86a and others added 12 commits May 1, 2026 11:38
Brainstormed design for issue #68. Two new opt-in env vars
(PC2NUTS_WORKERS, PC2NUTS_RATE_LIMIT_STORAGE_URI) drive multi-worker
uvicorn behind a fail-degraded shared rate-limit backend; defaults
preserve the current single-worker / in-memory deploy byte-for-byte.

Key decisions captured in the spec:
  - Option (a) Redis-backed slowapi (over edge-layer or per-process
    division), preserving the strict 120/min anonymous cap while keeping
    trusted-token bypass working.
  - Fail-degraded behaviour (option III): on Redis unavailability, fall
    back to per-worker in-memory storage for a 30 s window before
    re-probing. Logs once per outage.
  - Hard-fail at startup if PC2NUTS_WORKERS > 1 with no storage URI
    configured, so the cap can never silently loosen.
  - uvicorn --workers (not gunicorn); shell-form CMD in Dockerfile to
    expand the env var.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While prepping the implementation plan, found that slowapi 0.1.9
already ships exactly the fail-degraded behaviour we'd designed:
Limiter(in_memory_fallback_enabled=True) routes to a per-process
MemoryStorage when the primary raises, logs once per outage, and
re-probes with exponential backoff (better than the fixed 30s window
we'd specified). Custom _FailDegradedStorage class is no longer
needed — drops a new module and four unit tests' worth of code we'd
have to maintain.

Spec sections 4.2, 4.3, 5, 6, 7, and 10 updated to reflect the
library-feature approach. Architecture and operator-visible behaviour
are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-by-step TDD plan for issue #68 against the simplified spec at
docs/superpowers/specs/2026-05-01-multi-worker-uvicorn-design.md.

Seven tasks: Settings + validator, redis dep (ordered before the
limiter test that exercises it), limiter module extraction, Dockerfile,
README, CHANGELOG, final verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New Pydantic model validator hard-fails startup if PC2NUTS_WORKERS > 1
without PC2NUTS_RATE_LIMIT_STORAGE_URI configured, so the per-IP rate
limit can never silently loosen under multi-worker.

Defaults preserve current behaviour: workers=1, storage URI unset.
Pulls in redis-py at the version limits 5.8.0 expects, used only when
PC2NUTS_RATE_LIMIT_STORAGE_URI is set. Single-host deployers who never
configure shared storage pay the install-size cost but no runtime cost
(redis is imported eagerly by limits.storage.RedisStorage at Limiter
construction, but only when the storage URI is configured).
Without this, regenerating requirements.lock via the documented
'pip install -r requirements.txt && pip freeze > requirements.lock'
flow would drop the redis pin added in dc56d8c. Using slowapi's
[redis] extra (rather than pinning redis directly) keeps the
declaration aligned with our actual dependency and lets slowapi's
constraint chain choose the right transitive version of redis-py.
When PC2NUTS_RATE_LIMIT_STORAGE_URI is set, construct the Limiter with
that storage URI and in_memory_fallback_enabled=True so transient
backend outages fall back to per-process MemoryStorage. When unset,
construction is byte-for-byte the previous inline call.

slowapi's built-in fallback handles outage detection, once-per-outage
WARNING logging, and exponential-backoff recovery probes.
One-line whitespace fix flagged by ruff format --check on Task 3's
new test file. CI runs format --check, so this would have broken the
lint job.
Switches CMD from exec-form to shell-form with 'exec uvicorn …' so
${PC2NUTS_WORKERS:-1} expands at container start while uvicorn remains
the foreground PID-1 process for proper SIGTERM handling.

Default of 1 preserves current single-worker behaviour. Multi-worker
mode also requires PC2NUTS_RATE_LIMIT_STORAGE_URI; the Settings
validator (added in feat(config)) refuses to start otherwise.
New 'Multi-worker deployment' subsection covers both env vars, the
startup-validation guard for the unsafe combination, and the slowapi
fail-degraded behaviour during a backend outage.
….txt (#68)

Reviewer caught that slowapi 0.1.9's [redis] extra pins
'redis>=3.4.1,<4.0.0' — a stale 6-year-old constraint that contradicts
the redis==7.4.0 lock pin. Any operator running the documented
'pip install -r requirements.txt && pip freeze > requirements.lock'
regeneration flow would silently downgrade to redis 3.5.3.

limits[redis] exposes the modern constraint
'redis!=4.5.2,!=4.5.3,<8.0.0,>3' which matches the lock pin and is what
we actually want. The slowapi -> limits dependency chain still pulls
slowapi in transitively, so we don't lose any functionality by dropping
the slowapi[redis] extra in favour of limits[redis].

Verified: pip install --dry-run -r requirements.txt now resolves redis
to 7.4.0 cleanly, matching requirements.lock.
@bk86a bk86a merged commit 0973020 into main May 1, 2026
11 checks passed
@bk86a bk86a deleted the feat/multi-worker-uvicorn branch May 1, 2026 10:31
bk86a added a commit that referenced this pull request May 1, 2026
PR #71 shipped multi-worker uvicorn behind a shared rate-limit backend.
This commit captures the re-run of scripts/perf_test.sh against the
post-#68 deployment so the open AC items on #68 ("memory headroom" and
"verify approximately N× headroom on /lookup") have measured numbers
rather than estimates.

Headlines:

- Realistic-corpus knee (Scenario B) moved from 30 → 35-38 RPS.
  Single-worker collapsed at 35 (p99 4.47 s); multi-worker absorbs 35
  cleanly (p99 150 ms) and only saturates between 35 and 40.
- Hot-key plateau (Scenario A, persistent connections) doubled-ish:
  ~30 → ~50 RPS, with p99 at saturation 2.5× lower.
- Recommended operating point unchanged at 27 RPS — Scenario E
  (3-min sustained) still meets the p99 ≤ 200 ms SLO. The win is
  headroom (~10% → ~30-40%), not the operating point itself.

The 1.6× rather than 2× scaling is consistent with shared-edge TLS
termination and Pydantic GIL contention being part of the cap, not
just per-worker compute. Documented in the methodology notes.

Also adds a new "Rate-limit shared-storage verification" subsection:
130 anonymous requests against the published 120/minute cap from a
single source IP yielded exactly 120 × 200 + 10 × 429 — conclusive
evidence the Redis sidecar is reachable from both workers and the cap
is enforced globally rather than per-worker (the failure mode the
startup validator at app/config.py:42-50 exists to prevent).

CHANGELOG entry under [Unreleased] summarises both the re-baseline and
the perf_test.sh fix from the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant