Skip to content

Commit d4ea59d

Browse files
PttCodingManclaude
andcommitted
fix: bound public rate-limit memory and clarify threat model
The per-IP sliding-window dict had no eviction: an IP that hit the endpoint once and never returned left a stale bucket forever, and an IP-rotating scraper could leak memory linearly. Add a 10k-entry cap with true LRU eviction (pop+reinsert on touch so the dict's insertion order tracks recency, then drop from the front when over the cap). Also rewrite the comment to be honest about what the limiter is and isn't: gentle DoS / scrape protection on a public-by-definition endpoint, not a security boundary. Counters reset on process restart; that's acceptable for a self-hosted small-team wiki. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a08e78e commit d4ea59d

1 file changed

Lines changed: 22 additions & 3 deletions

File tree

backend/app/routers/public.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,20 +18,39 @@
1818
_HTML_COMMENT_RE = re.compile(r"<!--[\s\S]*?-->")
1919

2020
# In-memory rate limit: 60 requests per IP per 60 seconds.
21-
# Single-process only — see to-do Known Limitations.
21+
#
22+
# Threat model: gentle DoS / scrape protection on a public-by-definition
23+
# endpoint, not a security boundary. The counter is per-process and resets
24+
# on restart; an attacker who can rotate IPs (or trigger restarts) gets
25+
# fresh budget. Acceptable for a self-hosted small-team wiki.
26+
#
27+
# Memory: empty buckets are dropped after the prune so a one-off visitor
28+
# doesn't leave a permanent entry; if the dict still grows past _MAX_IPS
29+
# we drop the oldest-touched entries. Without this, an IP-rotating scraper
30+
# would leak memory linearly.
2231
_access_log: dict[str, list[float]] = defaultdict(list)
2332
_RATE_LIMIT_MAX = 60
2433
_RATE_LIMIT_WINDOW = 60 # seconds
34+
_MAX_IPS = 10_000
2535

2636

2737
def _check_rate_limit(ip: str):
2838
now = time.monotonic()
29-
log = _access_log[ip]
39+
# pop+reinsert so the dict's insertion order tracks recency — the
40+
# eviction step below can then drop true LRU entries.
41+
log = _access_log.pop(ip, [])
3042
pruned = [t for t in log if now - t < _RATE_LIMIT_WINDOW]
31-
_access_log[ip] = pruned
3243
if len(pruned) >= _RATE_LIMIT_MAX:
44+
_access_log[ip] = pruned
3345
raise HTTPException(status_code=429, detail="Too many requests")
3446
pruned.append(now)
47+
_access_log[ip] = pruned
48+
49+
if len(_access_log) > _MAX_IPS:
50+
# Cap memory against IP-rotating scrapers. Drop oldest-touched
51+
# entries first; cheap one-shot pop loop only when we hit the cap.
52+
for stale in list(_access_log)[: len(_access_log) - _MAX_IPS]:
53+
_access_log.pop(stale, None)
3554

3655

3756
@router.get("/pages/{slug}")

0 commit comments

Comments
 (0)