Add AIMD based dynamic producer batch sizing to `JCQueue` by GGraziadei · Pull Request #8796 · apache/storm

GGraziadei · 2026-06-20T23:15:42Z

What is the purpose of the change

This PR introduces an adaptive batch-sizing strategy for JCQueue's producer-side inserter, controlled by a new feature flag (topology.producer.batch.dynamic). Instead of committing to a fixed producerBatchSz, the new DynamicBatchInserter starts at a batch size of 1 and adjusts it online using AIMD: it additively grows the effective size (+1) after flushing a full batch (heavy load) and multiplicatively shrinks it (halving toward 1) after a timer-driven partial flush (light load), with the configured batch size acting as a ceiling rather than a fixed target. This lets the queue favor low latency under light load while preserving throughput under sustained back-pressure, without manual per-topology tuning.

How was the change tested

Unit tests
Benchmark BatchInserter (baseline) vs DynamicBatchInserter, report in the first comment.

GGraziadei · 2026-06-20T23:22:50Z

Performance analysis

The benchmarks were run against the FileReadWordCountTopo topology, a standard word-count workload from Storm's performance suite, exercising the inter-executor transfer path that JCQueue's producer batching governs. The topology was deployed on a dockerized dev-cluster, providing a reproducible, self-contained Storm environment so that each configuration ran under identical resource and parallelism conditions, with the same fixed topology shape (2 workers, 7 tasks, 7 executors, single spout executor) across all runs so that the producer batch-sizing strategy was the only variable. The results consistently favored dynamic batching: across batch ceilings of 10, 100, and 1000 it matched or beat every static configuration on all three metrics, delivering throughput gains of up to ~9% and average complete-latency reductions of 8–12%, with the largest benefit at small ceilings where static batching faces the sharpest latency-versus-throughput trade-off and no measured downside at any setting. An extended 600-second run at ceiling 1000 further showed a brief learning phase after which the AIMD controller converges on a stable optimum, with latency settling near 376 ms within a ~2 ms standard deviation, confirming that the policy discovers a good batch size online and then locks onto it rather than relying on a manually tuned fixed value.

Attached raw data and report.

jcqueue-dynamic-batch-size.txt

Dynamic Producer Batch Sizing in Apache Storm.pdf

init - add DynamicBatchInserter

0cd9216

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AIMD based dynamic producer batch sizing to `JCQueue`#8796

Add AIMD based dynamic producer batch sizing to `JCQueue`#8796
GGraziadei wants to merge 1 commit into
apache:masterfrom
GGraziadei:jcqueue-dynamic-batch-size

GGraziadei commented Jun 20, 2026

Uh oh!

GGraziadei commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GGraziadei commented Jun 20, 2026

What is the purpose of the change

How was the change tested

Uh oh!

GGraziadei commented Jun 20, 2026

Performance analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant