You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2026-05-28-benchmarking-the-proxy.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ So we stopped saying "it depends" — we built something you can run **yourselve
16
16
**TL;DR**:
17
17
- A passthrough proxy adds negligible overhead: publish latency impact is below measurement noise, E2E adds ~2 ms at moderate topic rates, throughput unaffected
18
18
- Add record encryption and expect a ~25% throughput reduction; at comfortable rates, E2E latency stays within measurement noise and publish latency adds up to ~10 ms
19
-
- The throughput ceiling scales linearly with CPU: budget ~25 mc per MB/s of total proxy traffic (conservative; a companion post, coming soon, has the full sizing formula)
19
+
- The throughput ceiling scales linearly with CPU: budget ~25 mc per MB/s of total proxy traffic (conservative; the [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) has the full coefficient grid)
20
20
- The full benchmark harness is open source — run it on your own cluster for numbers that reflect your workload
21
21
22
22
## What we measured
@@ -27,7 +27,7 @@ We ran three scenarios against the same Apache Kafka® cluster on the same hardw
27
27
-**Passthrough proxy** — traffic routed through Kroxylicious with no filter chain configured
28
28
-**Record encryption** — traffic through Kroxylicious with AES-256-GCM record encryption enabled, using HashiCorp Vault as the KMS
29
29
30
-
We used [OpenMessaging Benchmark (OMB)](https://github.com/openmessaging/benchmark) rather than Kafka's own `kafka-producer-perf-test`. OMB is an industry-standard tool that coordinates producers and consumers together, measures end-to-end latency (not just publish latency), and produces structured JSON that makes comparison straightforward. More on why we built a whole harness around it in a companion engineering post, coming soon.
30
+
We used [OpenMessaging Benchmark (OMB)](https://github.com/openmessaging/benchmark) rather than Kafka's own `kafka-producer-perf-test`. OMB is an industry-standard tool that coordinates producers and consumers together, measures end-to-end latency (not just publish latency), and produces structured JSON that makes comparison straightforward. More on why we built a whole harness around it in the [companion engineering post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}).
31
31
32
32
## Test environment
33
33
@@ -108,7 +108,7 @@ The overhead staying flat across 1, 10, and 100 topics makes sense for the same
108
108
109
109
## Record encryption: now we're doing real work
110
110
111
-
Ok, so let's make the proxy smarter — make it do something people actually care about! [Record encryption](https://kroxylicious.io/documentation/0.20.0/html/record-encryption-guide) uses AES-256-GCM to encrypt each record passing through the proxy. AES-256-GCM is going to ask the CPU to work relatively hard on its own, but it's also going to push the proxy to parse each record it receives, unpack it, copy it, encrypt it, and re-pack it before sending it on to the broker. With all that work going on we expect some impact to latency and throughput. To answer our original question we need to identify two things: the latency when everything is going smoothly, and the reduction in throughput all this work causes. Monitoring latency once we go past the throughput inflection point isn't very helpful — it's dominated by the throughput limits and their erratic impacts on the latency of individual requests (a big hello to batching and buffering effects).
111
+
Ok, so let's make the proxy smarter — make it do something people actually care about! [Record encryption](https://kroxylicious.io/documentation/0.21.0/html/record-encryption-guide) uses AES-256-GCM to encrypt each record passing through the proxy. AES-256-GCM is going to ask the CPU to work relatively hard on its own, but it's also going to push the proxy to parse each record it receives, unpack it, copy it, encrypt it, and re-pack it before sending it on to the broker. With all that work going on we expect some impact to latency and throughput. To answer our original question we need to identify two things: the latency when everything is going smoothly, and the reduction in throughput all this work causes. Monitoring latency once we go past the throughput inflection point isn't very helpful — it's dominated by the throughput limits and their erratic impacts on the latency of individual requests (a big hello to batching and buffering effects).
112
112
113
113
### Latency at sub-saturation rates
114
114
@@ -147,7 +147,7 @@ The single-producer ceiling at RF=3 is Kafka-limited, not proxy-limited — the
147
147
148
148
To find the proxy's real ceiling, you need a workload that doesn't hit the Kafka partition limit first: RF=1, spread across multiple topics. With that workload, the ceiling is squarely in the proxy — and it scales linearly with CPU. The mechanism: CPU limit controls `availableProcessors()`, which controls how many Netty event loop threads the proxy creates. More threads, more concurrent connections handled in parallel, higher aggregate ceiling.
149
149
150
-
**The practical implication**: the throughput ceiling is not a fixed number — it's a function of the CPU you allocate. Set `requests` equal to `limits` in your pod spec; this makes the CPU budget deterministic and the ceiling predictable. A companion engineering post, coming soon, has the full story of how we found this, including the workload design choices needed to isolate proxy CPU from Kafka's own limits.
150
+
**The practical implication**: the throughput ceiling is not a fixed number — it's a function of the CPU you allocate. Set `requests` equal to `limits` in your pod spec; this makes the CPU budget deterministic and the ceiling predictable. The [companion engineering post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) has the full story of how we found this, including the workload design choices needed to isolate proxy CPU from Kafka's own limits.
151
151
152
152
---
153
153
@@ -165,7 +165,7 @@ Numbers without guidance aren't very useful, so here's how to translate these re
165
165
>
166
166
> where *mc* = millicores (the Kubernetes CPU scheduling unit; 1,000 mc = 1 core per second), *k* = sizing coefficient (mc/MB/s), *P* = produce throughput (MB/s), *N* = number of consumer groups, *C* = consume throughput per group (MB/s)
167
167
168
-
On our hardware (AMD EPYC-Rome 2 GHz with AES-NI), we measured *k* = 25 mc/MB/s on a 10-topic workload with record encryption — a conservative estimate: more realistic deployments with 100+ topics show *k* = 4–8 mc/MB/s, roughly 3× lower. Simpler filters will be cheaper still. *k* is measured from real workloads, so measure your throughput and validate on your own hardware. The companion post (coming soon) has the full coefficient grid across topic counts and core allocations.
168
+
On our hardware (AMD EPYC-Rome 2 GHz with AES-NI), we measured *k* = 25 mc/MB/s on a 10-topic workload with record encryption — a conservative estimate: more realistic deployments with 100+ topics show *k* = 4–8 mc/MB/s, roughly 3× lower. Simpler filters will be cheaper still. *k* is measured from real workloads, so measure your throughput and validate on your own hardware. The [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) has the full coefficient grid across topic counts and core allocations.
@@ -185,11 +185,11 @@ Numbers without guidance aren't very useful, so here's how to translate these re
185
185
186
186
These are real results from real hardware, but they don't tell a story for your workload. A few things worth knowing before you put these numbers in a slide deck:
187
187
188
-
-**Sub-saturation assumed**: all results assume the system is operating below its throughput ceiling — both the proxy's and Kafka's own replication limits. Above either, queueing and batching effects dominate and the numbers in this post no longer apply. A companion post, coming soon, explains how to identify where those ceilings are.
188
+
-**Sub-saturation assumed**: all results assume the system is operating below its throughput ceiling — both the proxy's and Kafka's own replication limits. Above either, queueing and batching effects dominate and the numbers in this post no longer apply. The [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) explains how to identify where those ceilings are.
189
189
-**Message size**: all results use 1 KB messages. The coefficient is message-size-dependent — encryption overhead as a percentage is likely lower for larger messages.
190
190
-**Horizontal scaling**: linear scaling has been validated across CPU allocations on a single pod; multi-pod horizontal scaling hasn't been measured but is expected to follow the same coefficient.
191
191
-**Memory**: the workloads tested here are CPU-bound before they become memory-bound — we kept container memory settings consistent across all runs (2 Gi request / 4 Gi limit at the pod level) and it was never the constraint. If you're running larger messages or larger batches, revisit this assumption.
192
192
193
-
For the engineering story — why we built a custom harness on top of OMB, what the CPU flamegraphs actually show, and the bugs we found in our own tooling along the way — that's in a companion post, coming soon.
193
+
For the engineering story — why we built a custom harness on top of OMB, what the CPU flamegraphs actually show, and the bugs we found in our own tooling along the way — that's in the [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}).
194
194
195
195
The full benchmark suite, quickstart guide, and sizing reference are in `kroxylicious-openmessaging-benchmarks/` in the [main Kroxylicious repository](https://github.com/kroxylicious/kroxylicious).
0 commit comments