|
1 | | -// docs/tradeoffs.md |
| 1 | +# Design Tradeoffs |
| 2 | + |
| 3 | +This document records the engineering decisions made during design and implementation |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Algorithm: fixed window vs token bucket vs sliding window |
| 8 | + |
| 9 | +**Chosen: sliding window counter** |
| 10 | + |
| 11 | +### Fixed window (rejected) |
| 12 | + |
| 13 | +Fixed windows divide time into discrete buckets (e.g. 12:00-12:01, 12:01-12:02). |
| 14 | +The limit applies per bucket. Flaw: a client can fire `limit` requests at 12:00:59 |
| 15 | +and another `limit` at 12:01:00 - the burst of `2xlimit` within two seconds, defeating |
| 16 | +the intent of the quota. Know as the "boundary burst" problem. |
| 17 | + |
| 18 | +### Token bucket (rejected as primary) |
| 19 | + |
| 20 | +Tokens refill at a constant rate. A client can consume tokens as fast as they accumulate |
| 21 | +or save them up for a burst up to `capacity`. Advantages: smooth burst handling, simple |
| 22 | +mental model for "sustained rate". |
| 23 | + |
| 24 | +Disadvantages for this use case: |
| 25 | +- Storing `tokens` and `last_refill` as floats in Redis and performing arithmetic in Lua |
| 26 | +introduces floating-point imprecision that is harder to audit for correctness. |
| 27 | +- A quota of "100 requests/minute" is harder to explain with token bucked than with |
| 28 | +sliding window, which maps directly to "count requests in the last 60 seconds". |
| 29 | +- The token_bucket.lua script is kept in `internal/store/scripts/lua` |
| 30 | + |
| 31 | +### Sliding Window counter (chosen) |
| 32 | + |
| 33 | +Stores each request timestamp as a member in a Redis sorted set, scored by timestamp. |
| 34 | +On each decision: |
| 35 | +1. Remove members with score < `new_window` (expired requests). |
| 36 | +2. Count remaining members (requests within the active window) |
| 37 | +3. Allow if count < limit; reject otherwise |
| 38 | + |
| 39 | +Correctness is easy to reason about: "at most N timestamps exist in the set at any time" |
| 40 | +Memory is bounded to `limit` entries per key (rejected requests are never added). |
| 41 | + |
| 42 | +**Tradeoff:** sorted sets use more memory per entry (~50-80 bytes) than a simple counter. |
| 43 | +For high-limit keys (e.g. 10,000 req/min), this is measurable. If memory is a concern, |
| 44 | +switch to a token_bucket or approximate counter approach. |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +## Atomicity: application-level locking vs WATCH/MULTI/EXEC vs Redis Lua |
| 49 | + |
| 50 | +**Chosen: Redis Lua scripts (EVALSHA)** |
| 51 | + |
| 52 | +### Application-level locking (rejected) |
| 53 | + |
| 54 | +A read-modify-write in Go application code: |
| 55 | +``` |
| 56 | +count = GET key |
| 57 | +if count < limit: |
| 58 | + SET key (count + 1) |
| 59 | + return allow |
| 60 | +``` |
| 61 | +This is a TOCTOU race across replicas. Two replicas can both read `count - 90` |
| 62 | +(limit: 100), both decide to allow and both write `100` - admitting 101 requests. |
| 63 | +Incorrect by design. |
| 64 | + |
| 65 | +### WATCH/MULTI/EXEC - optimistic transactions (rejected) |
| 66 | + |
| 67 | +Redis's optimistic locking aborts and retries the transaction if a watched key |
| 68 | +changes between WATCH and EXEC. Under high concurrency - many replicas targetting |
| 69 | +the same popular API key - the retry rate can grow unboundedly, causing p99 latency |
| 70 | +spikes. Each retry is a full round-trip |
| 71 | + |
| 72 | +### Redis Lua scripts (chosen) |
| 73 | + |
| 74 | +Redis executes Lua scripts atomically in its single-threaded interpreter. |
| 75 | +The script runs to completion before any other command is processed. |
| 76 | +This serialises all decisions at the Redis server, not application layer. |
| 77 | + |
| 78 | +Benfits: |
| 79 | +- Single round-trip (EVALSHA sends only the 40-char SHA digest, not the script text) |
| 80 | +- No retries, no contention-dependent latency growth |
| 81 | +- Correctness arguement is simple: Redis processes one Lua call at a time |
| 82 | + |
| 83 | +The correctness prodd is in `tests/integration/concurrency_test.go`: |
| 84 | +N goroutines firing simultaneously against a single key, asserting exactly |
| 85 | +`limit` requests are admitted. |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +## Failure behaviour: fail-open vs fail-closed |
| 90 | + |
| 91 | +**Chosen: fail-open** |
| 92 | + |
| 93 | +When Redis is unreachable `AllowRequest` returns `(true, error)` - the request |
| 94 | +is allowed through and the error is recorded in `ratelimiter_redis_error_total`. |
| 95 | + |
| 96 | +**Rationale:** This service enforces API quotas, not security boundaries. |
| 97 | +The cost of over-admitting requests during a Redis outage (a few extra requests |
| 98 | +reach the upstream API) is lower than the cost of dropping all traffic (total |
| 99 | +service outage from the perspective of the callers). |
| 100 | + |
| 101 | +**When to choose fail-closed:** If the rate-limiter enforces a billing limit, |
| 102 | +prevents credential stuffing or protects a resource with strict capacity limits, |
| 103 | +fail-closed is correct. Change 'limiter.go' line: |
| 104 | +```go |
| 105 | +return true, fmt.Errorf("store error (failing open): %w", err) |
| 106 | +// → |
| 107 | +return false, fmt.Errorf("store error (failing closed): %w", err) |
| 108 | +``` |
| 109 | + |
| 110 | +**Observability:** Alert on `ratelimiter_redis_errors_total > 0` to detect and |
| 111 | +measure fail-open windows. The Grafana dashboard includes this panel. |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## Stateless Replicas |
| 116 | + |
| 117 | +Multiple service replicas are safe because no rate-limit state lives in the Go process. |
| 118 | +All state is in Redis. A request landing on any replica gets the same correct decision, |
| 119 | +because the decision is made atomically by the Lua script on Redis. |
| 120 | + |
| 121 | +**What this requires:** |
| 122 | +- Redis must be reachable from all replicas (handled by the Redis Service in k8s) |
| 123 | +- Redis must not be sharded such that the same key could land on different shards |
| 124 | + without coordination. For very high scale, Redis Cluster with `{key}` hashtags |
| 125 | + can ensure per-key routing to a single shard. |
| 126 | + |
| 127 | +**HPA implication:** Because replicas are stateless, the HPA can scale them freely |
| 128 | +without worrying about state migration. See `deploy/k8s/hpa.yml`. |
| 129 | + |
| 130 | +--- |
| 131 | +## Memory Growth in Redis |
| 132 | + |
| 133 | +Each admitted request adds one member to the sorted set for the API key. |
| 134 | +Rejected requests are not recorded (the Lua script returns 0 before ZADD). |
| 135 | + |
| 136 | +Memory per key ≈ `limit x 70 bytes` (approximate sorted set entry size). |
| 137 | + |
| 138 | +At 100 req/min limit: ~7 KB per key. |
| 139 | +At 10,000 req/min limit: ~700 KB per key. |
| 140 | +At 10,000 active keys x 100 req/min limit: ~70 MB total - acceptable. |
| 141 | + |
| 142 | +The `ZREMRANGEBYSCORE` call at the top of every Lua invocation keeps each set |
| 143 | +bounded to the active window. The `PEXPIRE` call ensures idle keys (no requests |
| 144 | +for a full window duration) are evicted automatically by Redis. |
| 145 | + |
| 146 | +Without `PEXPIRE`, sorted sets for inactive keys would persist indefinitely. |
0 commit comments