Update sizing formula with measured coefficient and clearer structure

SamBarker · SamBarker · commit d01cad488fe9 · 2026-05-28T12:59:02.000+12:00
- Update k from 35 to 25 mc/MB/s based on measured 10-topic data
- Reformulate as CPU(mc) = k × (P + N × C) to make fan-out explicit
- Note k=4-8 mc/MB/s for 100+ topic deployments (3× lower, more realistic)
- Frame record encryption as among the most CPU-intensive filters possible
- Replace single worked example with 1:1 and fan-out pair
- Drop theoretical ×1.3 headroom — k is derived from real measurements

Assisted-by: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
Signed-off-by: Sam Barker &lt;sam@quadrocket.co.uk&gt;
diff --git a/_posts/2026-05-26-benchmarking-the-proxy.md b/_posts/2026-05-26-benchmarking-the-proxy.md
@@ -16,7 +16,7 @@ So we stopped saying "it depends" — we built something you can run **yourselve
 **TL;DR**:
 - A passthrough proxy adds negligible overhead: publish latency impact is below measurement noise, E2E adds ~2 ms at moderate topic rates, throughput unaffected
 - Add record encryption and expect a ~25% throughput reduction and 0.2–3 ms of additional latency at comfortable rates
-- The throughput ceiling scales linearly with CPU: budget ~35 mc per MB/s of total proxy traffic (conservative; a companion post, coming soon, has the full sizing formula)
+- The throughput ceiling scales linearly with CPU: budget ~25 mc per MB/s of total proxy traffic (conservative; a companion post, coming soon, has the full sizing formula)
 - The full benchmark harness is open source — run it on your own cluster for numbers that reflect your workload
 
 ## What we measured
@@ -150,15 +150,17 @@ Numbers without guidance aren't very useful, so here's how to translate these re
 
 **With filters (record encryption is the representative example here):**
 
-1. **Throughput budget**: encryption imposes a CPU-driven throughput ceiling. As a planning formula:
+1. **Throughput budget**: record encryption — among the most CPU-intensive filters we can imagine — imposes a CPU-driven throughput ceiling. As a planning formula:
 
-   > **`proxy CPU (millicores) = 35 × total proxy throughput (MB/s)`**
+   > **`CPU (mc) = k × (P + N × C)`**
    >
-   > where *total* = produce MB/s + (each consumer group's consume MB/s independently)
+   > where *k* = sizing coefficient (mc/MB/s), *P* = produce throughput (MB/s), *N* = number of consumer groups, *C* = consume throughput per group (MB/s)
 
-   This is a conservative estimate derived from single-partition workloads; the companion post (coming soon) has the full derivation and a lower bound for multi-topic workloads. For a single produce:consume pair this simplifies to `70 × produce MB/s`. Fan-out multiplies: 100 MB/s produce to 3 consumer groups = 100 + 300 = 400 MB/s total → 14,000m. Add ×1.3 headroom for GC pauses and burst. Measured on AMD EPYC-Rome 2 GHz with AES-NI — calibrate on your hardware using the rate sweep.
+   On our hardware (AMD EPYC-Rome 2 GHz with AES-NI), we measured *k* = 25 mc/MB/s on a 10-topic workload with record encryption — a conservative estimate: more realistic deployments with 100+ topics show *k* = 4–8 mc/MB/s, roughly 3× lower. Simpler filters will be cheaper still. *k* is measured from real workloads, so measure your throughput and validate on your own hardware. The companion post (coming soon) has the full coefficient grid across topic counts and core allocations.
 
-   Worked example: 100k msg/s at 1 KB, 1 consumer group = 100 MB/s produce + 100 MB/s consume = 200 MB/s × 35 = 7,000m, plus headroom → ~9,100m (~9 cores).
+   *1:1 (100k msg/s at 1 KB, 1 consumer group)*: k=25, P=100, N=1, C=100 → 25 × (100 + 1 × 100) = 5,000m (~5 cores)
+
+   *Fan-out (same rate, 3 consumer groups)*: k=25, P=100, N=3, C=100 → 25 × (100 + 3 × 100) = 10,000m (~10 cores)
 
 2. **Latency budget**: well below saturation, expect 2–3 ms additional average publish latency and up to ~15 ms additional p99. The overhead scales with how hard you're pushing — give yourself headroom and you'll barely notice it.