Skip to content

Commit 3fd4197

Browse files
nextlevelshitMichael Czechowski
authored andcommitted
docs: multi-instance scaling plan (3 escalation paths) (#195)
Co-authored-by: Michael Czechowski <mail@dailysh.it> Co-committed-by: Michael Czechowski <mail@dailysh.it>
1 parent 343fa77 commit 3fd4197

1 file changed

Lines changed: 142 additions & 0 deletions

File tree

docs/MULTI-INSTANCE.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Multi-instance deploy plan
2+
3+
Single-VPS deploy currently caps at ~95 rps (measured, keepalive load test PR #162). Fine for normal traffic + light promo. **Not** fine for HN frontpage (200–500 rps spike).
4+
5+
This doc lays out three escalation paths from "fine to ship" to "scales to viral spike," ordered by complexity. Pick when traffic/revenue justifies.
6+
7+
## Current state (single-instance)
8+
9+
```
10+
DNS (codecrispi.es A+AAAA)
11+
12+
netcup VPS (37.120.174.54)
13+
14+
Caddy (terminates TLS, routes by host)
15+
16+
cc (one nginx container, static files)
17+
18+
dist/ (~30MB after PNG optimize)
19+
```
20+
21+
Throughput ceiling = single nginx workers + Caddy CPU + VPS network. Tuned, but **vertical** ceiling.
22+
23+
## Path 1: Cloudflare in front of single VPS (cheapest)
24+
25+
**Win:** absorbs ~95% of traffic at the edge. Origin sees ~5% of visitor count for static assets, full traffic only for SPA routes that miss cache.
26+
27+
```
28+
Cloudflare CDN (free tier, 270 PoPs)
29+
↓ (cache hit ~95%)
30+
netcup VPS ← origin
31+
```
32+
33+
**Setup:**
34+
1. Add `codecrispi.es` to Cloudflare account
35+
2. Update nameservers at registrar to Cloudflare's
36+
3. Cache rule: cache everything except `/auth/*`, `/api/*`, `/health`
37+
4. Cache rule: HTML respects `Cache-Control: max-age=300` (already set)
38+
5. Origin: keep current netcup IP, Cloudflare proxies
39+
40+
**Effect:** capacity multiplies ~20x for static (assets, blog OG). HN-frontpage no longer breaks the box.
41+
42+
**Cost:** $0 (free tier). Caveat: Cloudflare sees user IPs + can MITM TLS. Acceptable for an open-source educational site; not for a service handling financial data.
43+
44+
**Risk:** Cloudflare outage = site down. Has happened (2022, ~6 hours). Mitigation: keep DNS TTL short so you can fail back to direct origin in 5 min.
45+
46+
**When to ship:** before any HN/reddit submission, if star count + outreach mode is starting.
47+
48+
## Path 2: Two-instance active-active (medium effort)
49+
50+
**Win:** redundancy + horizontal compute. If one VPS reboots, the other carries traffic.
51+
52+
```
53+
DNS round-robin or HAProxy
54+
55+
┌────┴────┐
56+
↓ ↓
57+
VPS A VPS B
58+
(primary) (secondary)
59+
↓ ↓
60+
Caddy Caddy
61+
↓ ↓
62+
cc cc
63+
```
64+
65+
**Tooling pick:**
66+
- **DNS round-robin** (simplest): apex A records → both IPs. Browsers pick one randomly. Doesn't handle health. Stale IP on cache miss = ~5min slow path.
67+
- **HAProxy** (better): tiny third VPS in front routing health-aware. Adds ~1ms latency. Costs another EUR 4-7/mo at netcup.
68+
- **Anycast IPs** (cleanest): only available with bigger providers (Hetzner Cloud, OVH). Costs more.
69+
70+
**Cert sharing:** caddy stack stores LE certs in `./data` per-VPS. Each box independently runs HTTP-01 challenge. Works but means 2x cert issuance traffic.
71+
72+
**Sync:** static-only stack — no database state on the VPS. Each box pulls from gitea on deploy. Independent. Easy.
73+
74+
**Effect:** doubles capacity, removes single-box failure mode.
75+
76+
**Cost:** +1 VPS at netcup ~EUR 4-7/mo. +HAProxy VPS if going that route.
77+
78+
**When to ship:** if Cloudflare path-1 isn't enough, OR if traffic is non-static enough that CDN cache-hit rate is poor (rare for this app).
79+
80+
## Path 3: Container orchestrator (k3s / docker-swarm)
81+
82+
**Win:** declarative scale-out, zero-downtime deploys, auto-restart.
83+
84+
```
85+
Cloudflare or HAProxy
86+
87+
k3s ingress (traefik or nginx-ingress)
88+
89+
┌───────┼───────┐
90+
↓ ↓ ↓
91+
cc-1 cc-2 cc-3 ← N replicas
92+
```
93+
94+
**Tooling pick:**
95+
- **k3s** — full Kubernetes, lighter than vanilla. Steepest learning curve, most flexibility. ~250MB ram per node beyond app
96+
- **docker-swarm** — built into Docker, simpler. Handles 80% of k8s use cases. Can stop investing if you outgrow it
97+
- **Nomad** — middle ground, less popular
98+
99+
**State:** stack already stateless per-replica — `dist/` baked into image. Replica count purely throughput-driven.
100+
101+
**Operational cost:** ops complexity. You go from "ssh, git pull, restart" to managing a cluster. Worth it only if:
102+
- You're deploying 3+ stacks together (codecrispi.es + others)
103+
- You want CI/CD without per-stack Caddyfile edits
104+
- You're already running k8s for $work and can amortize learning
105+
106+
**Effect:** trivial scale to N replicas. Crash recovery + rolling deploys + autoscaling.
107+
108+
**Cost:** 3+ nodes minimum (1 control plane + 2 workers) → ~EUR 12-20/mo at netcup. Plus your time.
109+
110+
**When to ship:** when you have 5+ apps to host together. Single-app on k8s is overkill.
111+
112+
## Decision matrix
113+
114+
| Traffic regime | Recommended path |
115+
|---|---|
116+
| < 50 rps (current) | Stay single-instance |
117+
| 50–200 rps sustained | Path 1 (Cloudflare) |
118+
| 200–1000 rps spike (HN) | Path 1 + tuned cache TTLs |
119+
| Sustained > 500 rps | Path 2 (active-active) |
120+
| Multi-app cluster | Path 3 (k3s/swarm) |
121+
122+
## Concrete next-step (for THIS app)
123+
124+
If/when star count + outreach starts: **Path 1 only**. Free, fast (1 hour to set up), proven.
125+
126+
Don't build path 2 or 3 speculatively. Single VPS + Cloudflare handles HN frontpage cleanly. Defer infra until traffic forces it.
127+
128+
## Estimates (reference)
129+
130+
| Setup | Sustained rps ceiling | HN-spike survives | Setup effort | Monthly cost |
131+
|---|---|---|---|---|
132+
| Current (1 VPS, hardened) | ~95 || done | EUR 4 |
133+
| + Cloudflare (Path 1) | ~2000 || 1h | EUR 4 |
134+
| 2× VPS active-active (Path 2) | ~190 || 4h | EUR 8-15 |
135+
| Path 1 + Path 2 combined | ~4000 | ✓✓ | 5h | EUR 8-15 |
136+
| 3-node k3s + Cloudflare (Path 3) | ~6000 | ✓✓ | days | EUR 16+ |
137+
138+
## Out of scope here
139+
140+
- Database scaling — Supabase handles its own (managed)
141+
- Realtime — Supabase Realtime channel scales with Supabase plan
142+
- Static asset CDN BUT bypassing Cloudflare (e.g. R2 / S3 + CloudFront) — feasible but more moving parts than Cloudflare proxy mode

0 commit comments

Comments
 (0)