Skip to content

Commit 55b91d1

Browse files
committed
WIP: Redrafting the engineering deep dive
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
1 parent dc24d90 commit 55b91d1

2 files changed

Lines changed: 22 additions & 12 deletions

File tree

_posts/2026-05-28-benchmarking-the-proxy-under-the-hood.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,55 @@
11
---
22
layout: post
3-
title: "Benchmarking a Kafka proxy: the engineering story"
3+
title: "How hard can it be??? Maxing out a Kroxylicious instance"
44
date: 2026-05-28 00:00:00 +0000
55
author: "Sam Barker"
66
author_url: "https://github.com/SamBarker"
77
categories: benchmarking performance engineering
88
---
99

10-
The [first post]({% post_url 2026-05-21-benchmarking-the-proxy %}) covered what we measured and what the numbers mean for operators. This one is for the people who want to know how we measured it, what the flamegraphs actually show, and what we found when we started looking carefully at our own tooling.
10+
How hard can it be? We started with a laptop, a codebase, and a lot of confidence it was fast. We ended up with a benchmark harness, a six-node cluster, and a much more nuanced answer.
11+
12+
Harder than expected. More interesting too.
13+
14+
We gave everyone [the numbers]({% post_url 2026-05-21-benchmarking-the-proxy %}) in a bland, but slide worthy way, already. This one is the engineering story: how we built the harness, what the flamegraphs actually show, the workload design choices that changed the answers, and the bugs we found in our own tooling.
1115

1216
## Why not Kafka's own tools?
1317

1418
Kafka ships with `kafka-producer-perf-test` and `kafka-consumer-perf-test`. We'd used them before. The problems:
1519

1620
- **Too noisy**: individual runs produced widely varying results depending on JVM warm-up, scheduling jitter, and GC behaviour. Results were hard to trust and harder to compare across scenarios.
17-
- **Producer-only view**: `kafka-producer-perf-test` gives you publish latency, but nothing about the consumer side. You can't see end-to-end latency — which is what operators actually care about.
21+
- **Producer-only view**: `kafka-producer-perf-test` gives you publish latency, but nothing about the consumer side. You can't see end-to-end latency — which is something operators actually care about.
1822
- **Awkward to sweep**: running parametric rate sweeps requires scripting around these tools, and comparing results across scenarios requires manual work.
23+
- Coordinated omission: under load, kafka-producer-perf-test only measures requests it actually sends! So when things start loading up and applying back pressure the send rate drops and the latency stays looking nice and healthy. Only it's not healthy in reality, things are queuing up in your producer.
1924

20-
[OpenMessaging Benchmark (OMB)](https://github.com/openmessaging/benchmark) is a better fit. It's an industry-standard tool used by Confluent, the Pulsar team, and others for their published performance comparisons. OMB coordinates producers and consumers across separate worker pods, runs a configurable warmup phase before taking measurements, and outputs structured JSON that's straightforward to process programmatically.
25+
And critically, it's never heard of Kroxylicious... You have though, you're here!
2126

22-
Using OMB also means our numbers are directly comparable to other published Kafka benchmarks — that credibility matters when you're trying to make the case that your proxy doesn't break things.
27+
[OpenMessaging Benchmark (OMB)](https://github.com/openmessaging/benchmark) is a better fit. It's an industry-standard tool used by Confluent, the Pulsar team, and others for their published performance comparisons - so who am I to argue? OMB coordinates producers and consumers across separate worker pods, runs a configurable warmup phase before taking measurements, takes its latency tracking seriously by tracking coordinated omission, and outputs structured JSON that's straightforward to process programmatically. What's not to like?
28+
29+
Using OMB also means our methodology is directly comparable to other published Kafka benchmarks. The numbers aren't comparable of course it's not the same hardware, network conditions or phase of the moon.
2330

2431
## What we built on top of OMB
2532

26-
OMB handles the measurement. We built everything around it: deployment, teardown, diagnostics collection, and result processing. All of it lives in `kroxylicious-openmessaging-benchmarks/` in the main repo.
33+
So we just fire up OMB and get some numbers, right? Errr no. OMB just does the measurement part. I work really hard at being lazy, I hate clicking things with a mouse and I knew these tests needed to be repeatable. So we scripted deployment (of all the things) teardown (for isolation), diagnostic collection (WHAT BROKE NOW??), and last but not least result processing (what does this wall of JSON mean?)
34+
35+
So now all of that lives in [`kroxylicious-openmessaging-benchmarks`](https://github.com/kroxylicious/kroxylicious/tree/main/kroxylicious-openmessaging-benchmarks) in the main tree (mono repo FTW).
2736

2837
### Helm chart
2938

3039
A Helm chart (`helm/kroxylicious-benchmark/`) deploys the full benchmark stack into Kubernetes:
3140

3241
- OMB coordinator and worker pods
33-
- A Strimzi Kafka cluster
34-
- The Kroxylicious proxy (via the Kroxylicious Kubernetes operator)
35-
- HashiCorp Vault (for the KMS in the encryption scenario)
42+
- A Strimzi Kafka cluster - deploying Kafka on K8s what else are you going to use? (answers to /dev/null)
43+
- The Kroxylicious operator
44+
- The Kroxylicious proxy
45+
- HashiCorp Vault (for the KMS in the encryption scenario). Importantly if you have your own KMS (and you will run this yourself for your workload, right?!) you can plug that in instead.
3646

3747
Scenario-specific configuration lives in `helm/kroxylicious-benchmark/scenarios/` as YAML overrides:
3848

3949
| Scenario file | What it deploys |
4050
|---------------|-----------------|
4151
| `baseline-values.yaml` | Direct Kafka, no proxy |
42-
| `proxy-no-filters-values.yaml` | Proxy with empty filter chain |
52+
| `proxy-no-filters-values.yaml` | Proxy with no user filters |
4353
| `encryption-values.yaml` | Proxy with AES-256-GCM encryption and Vault |
4454
| `rate-sweep-values.yaml` | Extended run profiles for sweep experiments |
4555

@@ -236,4 +246,4 @@ The coefficient is validated at 1, 2, and 4 cores for 1 KB messages. Known gaps:
236246
- **Horizontal scaling**: multiple proxy pods haven't been measured; linear scaling is expected but not confirmed.
237247
- **Multi-pass sweeps**: each rate point was measured once. Running each probe three times and taking the median would give tighter bounds in the saturation transition zone.
238248

239-
The operator-facing sizing reference and all the key tables are in `SIZING-GUIDE.md` in the benchmarks directory.
249+
The operator-facing sizing reference and all the key tables are in `SIZING-GUIDE.md` in the benchmarks directory.

performance.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,5 +101,5 @@ The [engineering post](/blog/2026/05/28/benchmarking-the-proxy-under-the-hood/)
101101
## Further reading
102102

103103
- [Operator guide: results, methodology, and sizing recommendations](/blog/2026/05/21/benchmarking-the-proxy/) — the full benchmark story for operators
104-
- [Engineering deep dive: tooling, flamegraphs, and what we discovered](/blog/2026/05/28/benchmarking-the-proxy-under-the-hood/) — how we measured it, where the CPU goes, and what surprised us
104+
- [How hard can it be??? Maxing out a Kroxylicious instance](/blog/2026/05/28/benchmarking-the-proxy-under-the-hood/) — how we measured it, where the CPU goes, and what surprised us
105105
- [Benchmark quickstart](https://github.com/kroxylicious/kroxylicious/tree/main/kroxylicious-openmessaging-benchmarks/QUICKSTART.md) — run the benchmarks yourself

0 commit comments

Comments
 (0)