Skip to content

Commit 18d2d54

Browse files
authored
Merge pull request #261 from SamBarker/blog/benchmarking-the-proxy-under-the-hood
docs(blog): add benchmarking under-the-hood post with redirect infrastructure
2 parents 723cb92 + 69fc6e8 commit 18d2d54

7 files changed

Lines changed: 20566 additions & 42001 deletions

File tree

_data/redirects/blog.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
delay: 0
2+
mappings:
3+
- name: benchmark-data
4+
subgroup: benchmarking-the-proxy-under-the-hood
5+
absoluteTarget: https://drive.google.com/drive/folders/14jR_eWoTeVQVC-WiTuZAK7NlwSVKN2a5?usp=drive_link

_plugins/redirector.rb

Lines changed: 42 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,21 @@ def generate(site)
99
config.each { |redirect_config|
1010
Jekyll.logger.info "generating redirects for #{redirect_config[0]}"
1111
redirect_config[1]['mappings'].each do |mapping|
12-
to_version = Version.parse(mapping['toVersion'] ||= latest_release)
13-
from_version = Version.parse(mapping['fromVersion'] ||= latest_release)
14-
versions = releases.select { |rel| rel.between?(from_version, to_version) }
15-
versions.each { |version|
16-
mapping['version'] = version
17-
site.pages << RedirectPage.new(site, redirect_config[0], redirect_config[1], mapping)
18-
if version == latest_release
19-
mapping['landing_version'] = "latest"
12+
if mapping['absoluteTarget']
13+
site.pages << AbsoluteRedirectPage.new(site, redirect_config[0], redirect_config[1], mapping)
14+
else
15+
to_version = Version.parse(mapping['toVersion'] ||= latest_release)
16+
from_version = Version.parse(mapping['fromVersion'] ||= latest_release)
17+
versions = releases.select { |rel| rel.between?(from_version, to_version) }
18+
versions.each { |version|
19+
mapping['version'] = version
2020
site.pages << RedirectPage.new(site, redirect_config[0], redirect_config[1], mapping)
21-
end
22-
}
21+
if version == latest_release
22+
mapping['landing_version'] = "latest"
23+
site.pages << RedirectPage.new(site, redirect_config[0], redirect_config[1], mapping)
24+
end
25+
}
26+
end
2327
end
2428
Jekyll.logger.info "Generated redirects #{redirect_config[0]}"
2529
}
@@ -70,6 +74,34 @@ def url_placeholders
7074
end
7175
end
7276

77+
class AbsoluteRedirectPage < Jekyll::Page
78+
def initialize(site, group, redirect_config, mapping)
79+
@site = site
80+
@base = site.source
81+
subgroup = mapping['subgroup']
82+
@dir = subgroup ? "/redirect/#{group}/#{subgroup}/" : "/redirect/#{group}/"
83+
@basename = mapping['name']
84+
@ext = '.html'
85+
@name = basename + ext
86+
delay = redirect_config['delay'] ||= 1
87+
@data = {
88+
'target' => mapping['absoluteTarget'],
89+
'layout' => 'redirect',
90+
'delay' => "#{delay}",
91+
}
92+
Jekyll.logger.info "generated absolute redirect from #{@dir}#{@basename} to #{data['target']}"
93+
end
94+
95+
def url_placeholders
96+
{
97+
:path => @dir,
98+
:category => @dir,
99+
:basename => basename,
100+
:output_ext => output_ext,
101+
}
102+
end
103+
end
104+
73105
class Version
74106
include Comparable
75107

_posts/2026-05-28-benchmarking-the-proxy.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ So we stopped saying "it depends" — we built something you can run **yourselve
1616
**TL;DR**:
1717
- A passthrough proxy adds negligible overhead: publish latency impact is below measurement noise, E2E adds ~2 ms at moderate topic rates, throughput unaffected
1818
- Add record encryption and expect a ~25% throughput reduction; at comfortable rates, E2E latency stays within measurement noise and publish latency adds up to ~10 ms
19-
- The throughput ceiling scales linearly with CPU: budget ~25 mc per MB/s of total proxy traffic (conservative; a companion post, coming soon, has the full sizing formula)
19+
- The throughput ceiling scales linearly with CPU: budget ~25 mc per MB/s of total proxy traffic (conservative; the [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) has the full coefficient grid)
2020
- The full benchmark harness is open source — run it on your own cluster for numbers that reflect your workload
2121

2222
## What we measured
@@ -27,7 +27,7 @@ We ran three scenarios against the same Apache Kafka® cluster on the same hardw
2727
- **Passthrough proxy** — traffic routed through Kroxylicious with no filter chain configured
2828
- **Record encryption** — traffic through Kroxylicious with AES-256-GCM record encryption enabled, using HashiCorp Vault as the KMS
2929

30-
We used [OpenMessaging Benchmark (OMB)](https://github.com/openmessaging/benchmark) rather than Kafka's own `kafka-producer-perf-test`. OMB is an industry-standard tool that coordinates producers and consumers together, measures end-to-end latency (not just publish latency), and produces structured JSON that makes comparison straightforward. More on why we built a whole harness around it in a companion engineering post, coming soon.
30+
We used [OpenMessaging Benchmark (OMB)](https://github.com/openmessaging/benchmark) rather than Kafka's own `kafka-producer-perf-test`. OMB is an industry-standard tool that coordinates producers and consumers together, measures end-to-end latency (not just publish latency), and produces structured JSON that makes comparison straightforward. More on why we built a whole harness around it in the [companion engineering post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}).
3131

3232
## Test environment
3333

@@ -108,7 +108,7 @@ The overhead staying flat across 1, 10, and 100 topics makes sense for the same
108108

109109
## Record encryption: now we're doing real work
110110

111-
Ok, so let's make the proxy smarter — make it do something people actually care about! [Record encryption](https://kroxylicious.io/documentation/0.20.0/html/record-encryption-guide) uses AES-256-GCM to encrypt each record passing through the proxy. AES-256-GCM is going to ask the CPU to work relatively hard on its own, but it's also going to push the proxy to parse each record it receives, unpack it, copy it, encrypt it, and re-pack it before sending it on to the broker. With all that work going on we expect some impact to latency and throughput. To answer our original question we need to identify two things: the latency when everything is going smoothly, and the reduction in throughput all this work causes. Monitoring latency once we go past the throughput inflection point isn't very helpful — it's dominated by the throughput limits and their erratic impacts on the latency of individual requests (a big hello to batching and buffering effects).
111+
Ok, so let's make the proxy smarter — make it do something people actually care about! [Record encryption](https://kroxylicious.io/documentation/0.21.0/html/record-encryption-guide) uses AES-256-GCM to encrypt each record passing through the proxy. AES-256-GCM is going to ask the CPU to work relatively hard on its own, but it's also going to push the proxy to parse each record it receives, unpack it, copy it, encrypt it, and re-pack it before sending it on to the broker. With all that work going on we expect some impact to latency and throughput. To answer our original question we need to identify two things: the latency when everything is going smoothly, and the reduction in throughput all this work causes. Monitoring latency once we go past the throughput inflection point isn't very helpful — it's dominated by the throughput limits and their erratic impacts on the latency of individual requests (a big hello to batching and buffering effects).
112112

113113
### Latency at sub-saturation rates
114114

@@ -147,7 +147,7 @@ The single-producer ceiling at RF=3 is Kafka-limited, not proxy-limited — the
147147

148148
To find the proxy's real ceiling, you need a workload that doesn't hit the Kafka partition limit first: RF=1, spread across multiple topics. With that workload, the ceiling is squarely in the proxy — and it scales linearly with CPU. The mechanism: CPU limit controls `availableProcessors()`, which controls how many Netty event loop threads the proxy creates. More threads, more concurrent connections handled in parallel, higher aggregate ceiling.
149149

150-
**The practical implication**: the throughput ceiling is not a fixed number — it's a function of the CPU you allocate. Set `requests` equal to `limits` in your pod spec; this makes the CPU budget deterministic and the ceiling predictable. A companion engineering post, coming soon, has the full story of how we found this, including the workload design choices needed to isolate proxy CPU from Kafka's own limits.
150+
**The practical implication**: the throughput ceiling is not a fixed number — it's a function of the CPU you allocate. Set `requests` equal to `limits` in your pod spec; this makes the CPU budget deterministic and the ceiling predictable. The [companion engineering post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) has the full story of how we found this, including the workload design choices needed to isolate proxy CPU from Kafka's own limits.
151151

152152
---
153153

@@ -165,7 +165,7 @@ Numbers without guidance aren't very useful, so here's how to translate these re
165165
>
166166
> where *mc* = millicores (the Kubernetes CPU scheduling unit; 1,000 mc = 1 core per second), *k* = sizing coefficient (mc/MB/s), *P* = produce throughput (MB/s), *N* = number of consumer groups, *C* = consume throughput per group (MB/s)
167167
168-
On our hardware (AMD EPYC-Rome 2 GHz with AES-NI), we measured *k* = 25 mc/MB/s on a 10-topic workload with record encryption — a conservative estimate: more realistic deployments with 100+ topics show *k* = 4–8 mc/MB/s, roughly 3× lower. Simpler filters will be cheaper still. *k* is measured from real workloads, so measure your throughput and validate on your own hardware. The companion post (coming soon) has the full coefficient grid across topic counts and core allocations.
168+
On our hardware (AMD EPYC-Rome 2 GHz with AES-NI), we measured *k* = 25 mc/MB/s on a 10-topic workload with record encryption — a conservative estimate: more realistic deployments with 100+ topics show *k* = 4–8 mc/MB/s, roughly 3× lower. Simpler filters will be cheaper still. *k* is measured from real workloads, so measure your throughput and validate on your own hardware. The [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) has the full coefficient grid across topic counts and core allocations.
169169

170170
*1:1 (100k msg/s at 1 KB, 1 consumer group)*: k=25, P=100, N=1, C=100 → 25 × (100 + 1 × 100) = 5,000m (~5 cores)
171171

@@ -185,11 +185,11 @@ Numbers without guidance aren't very useful, so here's how to translate these re
185185

186186
These are real results from real hardware, but they don't tell a story for your workload. A few things worth knowing before you put these numbers in a slide deck:
187187

188-
- **Sub-saturation assumed**: all results assume the system is operating below its throughput ceiling — both the proxy's and Kafka's own replication limits. Above either, queueing and batching effects dominate and the numbers in this post no longer apply. A companion post, coming soon, explains how to identify where those ceilings are.
188+
- **Sub-saturation assumed**: all results assume the system is operating below its throughput ceiling — both the proxy's and Kafka's own replication limits. Above either, queueing and batching effects dominate and the numbers in this post no longer apply. The [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}) explains how to identify where those ceilings are.
189189
- **Message size**: all results use 1 KB messages. The coefficient is message-size-dependent — encryption overhead as a percentage is likely lower for larger messages.
190190
- **Horizontal scaling**: linear scaling has been validated across CPU allocations on a single pod; multi-pod horizontal scaling hasn't been measured but is expected to follow the same coefficient.
191191
- **Memory**: the workloads tested here are CPU-bound before they become memory-bound — we kept container memory settings consistent across all runs (2 Gi request / 4 Gi limit at the pod level) and it was never the constraint. If you're running larger messages or larger batches, revisit this assumption.
192192

193-
For the engineering story — why we built a custom harness on top of OMB, what the CPU flamegraphs actually show, and the bugs we found in our own tooling along the way — that's in a companion post, coming soon.
193+
For the engineering story — why we built a custom harness on top of OMB, what the CPU flamegraphs actually show, and the bugs we found in our own tooling along the way — that's in the [companion post]({% post_url 2026-06-03-benchmarking-the-proxy-under-the-hood %}).
194194

195195
The full benchmark suite, quickstart guide, and sizing reference are in `kroxylicious-openmessaging-benchmarks/` in the [main Kroxylicious repository](https://github.com/kroxylicious/kroxylicious).

0 commit comments

Comments
 (0)