Skip to content

Commit d01cad4

Browse files
committed
Update sizing formula with measured coefficient and clearer structure
- Update k from 35 to 25 mc/MB/s based on measured 10-topic data - Reformulate as CPU(mc) = k × (P + N × C) to make fan-out explicit - Note k=4-8 mc/MB/s for 100+ topic deployments (3× lower, more realistic) - Frame record encryption as among the most CPU-intensive filters possible - Replace single worked example with 1:1 and fan-out pair - Drop theoretical ×1.3 headroom — k is derived from real measurements Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
1 parent b0e5033 commit d01cad4

1 file changed

Lines changed: 8 additions & 6 deletions

File tree

_posts/2026-05-26-benchmarking-the-proxy.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ So we stopped saying "it depends" — we built something you can run **yourselve
1616
**TL;DR**:
1717
- A passthrough proxy adds negligible overhead: publish latency impact is below measurement noise, E2E adds ~2 ms at moderate topic rates, throughput unaffected
1818
- Add record encryption and expect a ~25% throughput reduction and 0.2–3 ms of additional latency at comfortable rates
19-
- The throughput ceiling scales linearly with CPU: budget ~35 mc per MB/s of total proxy traffic (conservative; a companion post, coming soon, has the full sizing formula)
19+
- The throughput ceiling scales linearly with CPU: budget ~25 mc per MB/s of total proxy traffic (conservative; a companion post, coming soon, has the full sizing formula)
2020
- The full benchmark harness is open source — run it on your own cluster for numbers that reflect your workload
2121

2222
## What we measured
@@ -150,15 +150,17 @@ Numbers without guidance aren't very useful, so here's how to translate these re
150150

151151
**With filters (record encryption is the representative example here):**
152152

153-
1. **Throughput budget**: encryption imposes a CPU-driven throughput ceiling. As a planning formula:
153+
1. **Throughput budget**: record encryption — among the most CPU-intensive filters we can imagine — imposes a CPU-driven throughput ceiling. As a planning formula:
154154

155-
> **`proxy CPU (millicores) = 35 × total proxy throughput (MB/s)`**
155+
> **`CPU (mc) = k × (P + N × C)`**
156156
>
157-
> where *total* = produce MB/s + (each consumer group's consume MB/s independently)
157+
> where *k* = sizing coefficient (mc/MB/s), *P* = produce throughput (MB/s), *N* = number of consumer groups, *C* = consume throughput per group (MB/s)
158158
159-
This is a conservative estimate derived from single-partition workloads; the companion post (coming soon) has the full derivation and a lower bound for multi-topic workloads. For a single produce:consume pair this simplifies to `70 × produce MB/s`. Fan-out multiplies: 100 MB/s produce to 3 consumer groups = 100 + 300 = 400 MB/s total → 14,000m. Add ×1.3 headroom for GC pauses and burst. Measured on AMD EPYC-Rome 2 GHz with AES-NI — calibrate on your hardware using the rate sweep.
159+
On our hardware (AMD EPYC-Rome 2 GHz with AES-NI), we measured *k* = 25 mc/MB/s on a 10-topic workload with record encryption — a conservative estimate: more realistic deployments with 100+ topics show *k* = 4–8 mc/MB/s, roughly 3× lower. Simpler filters will be cheaper still. *k* is measured from real workloads, so measure your throughput and validate on your own hardware. The companion post (coming soon) has the full coefficient grid across topic counts and core allocations.
160160

161-
Worked example: 100k msg/s at 1 KB, 1 consumer group = 100 MB/s produce + 100 MB/s consume = 200 MB/s × 35 = 7,000m, plus headroom → ~9,100m (~9 cores).
161+
*1:1 (100k msg/s at 1 KB, 1 consumer group)*: k=25, P=100, N=1, C=100 → 25 × (100 + 1 × 100) = 5,000m (~5 cores)
162+
163+
*Fan-out (same rate, 3 consumer groups)*: k=25, P=100, N=3, C=100 → 25 × (100 + 3 × 100) = 10,000m (~10 cores)
162164

163165
2. **Latency budget**: well below saturation, expect 2–3 ms additional average publish latency and up to ~15 ms additional p99. The overhead scales with how hard you're pushing — give yourself headroom and you'll barely notice it.
164166

0 commit comments

Comments
 (0)