|
| 1 | +# Multi-instance deploy plan |
| 2 | + |
| 3 | +Single-VPS deploy currently caps at ~95 rps (measured, keepalive load test PR #162). Fine for normal traffic + light promo. **Not** fine for HN frontpage (200–500 rps spike). |
| 4 | + |
| 5 | +This doc lays out three escalation paths from "fine to ship" to "scales to viral spike," ordered by complexity. Pick when traffic/revenue justifies. |
| 6 | + |
| 7 | +## Current state (single-instance) |
| 8 | + |
| 9 | +``` |
| 10 | + DNS (codecrispi.es A+AAAA) |
| 11 | + ↓ |
| 12 | + netcup VPS (37.120.174.54) |
| 13 | + ↓ |
| 14 | + Caddy (terminates TLS, routes by host) |
| 15 | + ↓ |
| 16 | + cc (one nginx container, static files) |
| 17 | + ↓ |
| 18 | + dist/ (~30MB after PNG optimize) |
| 19 | +``` |
| 20 | + |
| 21 | +Throughput ceiling = single nginx workers + Caddy CPU + VPS network. Tuned, but **vertical** ceiling. |
| 22 | + |
| 23 | +## Path 1: Cloudflare in front of single VPS (cheapest) |
| 24 | + |
| 25 | +**Win:** absorbs ~95% of traffic at the edge. Origin sees ~5% of visitor count for static assets, full traffic only for SPA routes that miss cache. |
| 26 | + |
| 27 | +``` |
| 28 | + Cloudflare CDN (free tier, 270 PoPs) |
| 29 | + ↓ (cache hit ~95%) |
| 30 | + netcup VPS ← origin |
| 31 | +``` |
| 32 | + |
| 33 | +**Setup:** |
| 34 | +1. Add `codecrispi.es` to Cloudflare account |
| 35 | +2. Update nameservers at registrar to Cloudflare's |
| 36 | +3. Cache rule: cache everything except `/auth/*`, `/api/*`, `/health` |
| 37 | +4. Cache rule: HTML respects `Cache-Control: max-age=300` (already set) |
| 38 | +5. Origin: keep current netcup IP, Cloudflare proxies |
| 39 | + |
| 40 | +**Effect:** capacity multiplies ~20x for static (assets, blog OG). HN-frontpage no longer breaks the box. |
| 41 | + |
| 42 | +**Cost:** $0 (free tier). Caveat: Cloudflare sees user IPs + can MITM TLS. Acceptable for an open-source educational site; not for a service handling financial data. |
| 43 | + |
| 44 | +**Risk:** Cloudflare outage = site down. Has happened (2022, ~6 hours). Mitigation: keep DNS TTL short so you can fail back to direct origin in 5 min. |
| 45 | + |
| 46 | +**When to ship:** before any HN/reddit submission, if star count + outreach mode is starting. |
| 47 | + |
| 48 | +## Path 2: Two-instance active-active (medium effort) |
| 49 | + |
| 50 | +**Win:** redundancy + horizontal compute. If one VPS reboots, the other carries traffic. |
| 51 | + |
| 52 | +``` |
| 53 | + DNS round-robin or HAProxy |
| 54 | + ↓ |
| 55 | + ┌────┴────┐ |
| 56 | + ↓ ↓ |
| 57 | + VPS A VPS B |
| 58 | + (primary) (secondary) |
| 59 | + ↓ ↓ |
| 60 | + Caddy Caddy |
| 61 | + ↓ ↓ |
| 62 | + cc cc |
| 63 | +``` |
| 64 | + |
| 65 | +**Tooling pick:** |
| 66 | +- **DNS round-robin** (simplest): apex A records → both IPs. Browsers pick one randomly. Doesn't handle health. Stale IP on cache miss = ~5min slow path. |
| 67 | +- **HAProxy** (better): tiny third VPS in front routing health-aware. Adds ~1ms latency. Costs another EUR 4-7/mo at netcup. |
| 68 | +- **Anycast IPs** (cleanest): only available with bigger providers (Hetzner Cloud, OVH). Costs more. |
| 69 | + |
| 70 | +**Cert sharing:** caddy stack stores LE certs in `./data` per-VPS. Each box independently runs HTTP-01 challenge. Works but means 2x cert issuance traffic. |
| 71 | + |
| 72 | +**Sync:** static-only stack — no database state on the VPS. Each box pulls from gitea on deploy. Independent. Easy. |
| 73 | + |
| 74 | +**Effect:** doubles capacity, removes single-box failure mode. |
| 75 | + |
| 76 | +**Cost:** +1 VPS at netcup ~EUR 4-7/mo. +HAProxy VPS if going that route. |
| 77 | + |
| 78 | +**When to ship:** if Cloudflare path-1 isn't enough, OR if traffic is non-static enough that CDN cache-hit rate is poor (rare for this app). |
| 79 | + |
| 80 | +## Path 3: Container orchestrator (k3s / docker-swarm) |
| 81 | + |
| 82 | +**Win:** declarative scale-out, zero-downtime deploys, auto-restart. |
| 83 | + |
| 84 | +``` |
| 85 | + Cloudflare or HAProxy |
| 86 | + ↓ |
| 87 | + k3s ingress (traefik or nginx-ingress) |
| 88 | + ↓ |
| 89 | + ┌───────┼───────┐ |
| 90 | + ↓ ↓ ↓ |
| 91 | + cc-1 cc-2 cc-3 ← N replicas |
| 92 | +``` |
| 93 | + |
| 94 | +**Tooling pick:** |
| 95 | +- **k3s** — full Kubernetes, lighter than vanilla. Steepest learning curve, most flexibility. ~250MB ram per node beyond app |
| 96 | +- **docker-swarm** — built into Docker, simpler. Handles 80% of k8s use cases. Can stop investing if you outgrow it |
| 97 | +- **Nomad** — middle ground, less popular |
| 98 | + |
| 99 | +**State:** stack already stateless per-replica — `dist/` baked into image. Replica count purely throughput-driven. |
| 100 | + |
| 101 | +**Operational cost:** ops complexity. You go from "ssh, git pull, restart" to managing a cluster. Worth it only if: |
| 102 | +- You're deploying 3+ stacks together (codecrispi.es + others) |
| 103 | +- You want CI/CD without per-stack Caddyfile edits |
| 104 | +- You're already running k8s for $work and can amortize learning |
| 105 | + |
| 106 | +**Effect:** trivial scale to N replicas. Crash recovery + rolling deploys + autoscaling. |
| 107 | + |
| 108 | +**Cost:** 3+ nodes minimum (1 control plane + 2 workers) → ~EUR 12-20/mo at netcup. Plus your time. |
| 109 | + |
| 110 | +**When to ship:** when you have 5+ apps to host together. Single-app on k8s is overkill. |
| 111 | + |
| 112 | +## Decision matrix |
| 113 | + |
| 114 | +| Traffic regime | Recommended path | |
| 115 | +|---|---| |
| 116 | +| < 50 rps (current) | Stay single-instance | |
| 117 | +| 50–200 rps sustained | Path 1 (Cloudflare) | |
| 118 | +| 200–1000 rps spike (HN) | Path 1 + tuned cache TTLs | |
| 119 | +| Sustained > 500 rps | Path 2 (active-active) | |
| 120 | +| Multi-app cluster | Path 3 (k3s/swarm) | |
| 121 | + |
| 122 | +## Concrete next-step (for THIS app) |
| 123 | + |
| 124 | +If/when star count + outreach starts: **Path 1 only**. Free, fast (1 hour to set up), proven. |
| 125 | + |
| 126 | +Don't build path 2 or 3 speculatively. Single VPS + Cloudflare handles HN frontpage cleanly. Defer infra until traffic forces it. |
| 127 | + |
| 128 | +## Estimates (reference) |
| 129 | + |
| 130 | +| Setup | Sustained rps ceiling | HN-spike survives | Setup effort | Monthly cost | |
| 131 | +|---|---|---|---|---| |
| 132 | +| Current (1 VPS, hardened) | ~95 | ❌ | done | EUR 4 | |
| 133 | +| + Cloudflare (Path 1) | ~2000 | ✓ | 1h | EUR 4 | |
| 134 | +| 2× VPS active-active (Path 2) | ~190 | ⚠ | 4h | EUR 8-15 | |
| 135 | +| Path 1 + Path 2 combined | ~4000 | ✓✓ | 5h | EUR 8-15 | |
| 136 | +| 3-node k3s + Cloudflare (Path 3) | ~6000 | ✓✓ | days | EUR 16+ | |
| 137 | + |
| 138 | +## Out of scope here |
| 139 | + |
| 140 | +- Database scaling — Supabase handles its own (managed) |
| 141 | +- Realtime — Supabase Realtime channel scales with Supabase plan |
| 142 | +- Static asset CDN BUT bypassing Cloudflare (e.g. R2 / S3 + CloudFront) — feasible but more moving parts than Cloudflare proxy mode |
0 commit comments