Skip to content

Commit e09c023

Browse files
paulohtb6Feediver1
andcommitted
transactions: add info about preventing OOM
Apply suggestions from code review Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com>
1 parent 0dc9aed commit e09c023

2 files changed

Lines changed: 30 additions & 3 deletions

File tree

modules/develop/pages/transactions.adoc

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,27 @@ Redpanda’s default configuration supports exactly-once processing. To preserve
270270

271271
To help avoid common pitfalls and optimize performance, consider the following when configuring transactional workloads in Redpanda:
272272

273+
=== Tune producer ID limits
274+
275+
For production environments with heavy producer usage, consider using xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`] to prevent out-of-memory (OOM) crashes. The default unlimited value can lead to unbounded memory growth, especially with transactions or idempotent producers.
276+
277+
Calculate an appropriate value based on your expected concurrent producers:
278+
279+
* **Lower bound**: `kafka_connections_max` ÷ `number_of_shards` (based on the assumption that each producer connects to only one shard)
280+
* **Upper bound**: `topic_partitions_per_shard` × `kafka_connections_max` (based on the assumption that producers connect to all shards)
281+
* **Recommended starting point**: Use a value between these upper and lower bounds, considering your application's produce patterns
282+
283+
Applications with wide fan-out patterns (producers writing to many partitions across multiple shards) require values closer to the upper bound.
284+
285+
Monitor these metrics to determine if the limit is being reached:
286+
287+
* xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_evicted_producers[`vectorized_cluster_producer_state_manager_evicted_producers`]: Number of evicted producers (should be 0 in steady state)
288+
* xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_producer_manager_total_active_producers[`vectorized_cluster_producer_state_manager_producer_manager_total_active_producers`]: Current number of active producers per shard
289+
290+
If `evicted_producers` > 0, the shard is exceeding the configured limit. For applications with long-running transactions, ensure xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] accommodates your typical transaction lifetime to avoid premature producer ID expiration.
291+
292+
=== Configure transaction timeouts and limits
293+
273294
* If a consumer is configured to use the read_committed isolation level, it can only process successfully committed transactions. As a result, an ongoing transaction with a large timeout that becomes stuck could prevent the consumer from processing other committed transactions.
274295
+
275296
To avoid this, don't set the transaction timeout client setting (`transaction.timeout.ms` in the Kafka Java client implementation) to a value that is too high. The longer the timeout, the longer consumers may be blocked.

modules/reference/partials/properties/cluster-properties.adoc

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12375,7 +12375,9 @@ endif::[]
1237512375

1237612376
=== max_concurrent_producer_ids
1237712377

12378-
Maximum number of active producer sessions per shard. Each shard tracks producer IDs using an LRU (Least Recently Used) eviction policy. When the configured limit is exceeded, the least recently used producer IDs are evicted from the cache. IMPORTANT: The default value is unlimited, which can lead to unbounded memory growth and out-of-memory (OOM) crashes in production environments with heavy producer usage, especially when using transactions or idempotent producers. It is strongly recommended to set a reasonable limit in production deployments.
12378+
Maximum number of active producer sessions per shard. Each shard tracks producer IDs using an LRU (Least Recently Used) eviction policy. When the configured limit is exceeded, the least recently used producer IDs are evicted from the cache.
12379+
12380+
IMPORTANT: The default value is unlimited, which can lead to unbounded memory growth and out-of-memory (OOM) crashes in production environments with heavy producer usage, especially when using transactions or idempotent producers. It is strongly recommended to set a reasonable limit in production deployments.
1237912381

1238012382
[cols="1s,2a"]
1238112383
|===
@@ -12420,6 +12422,8 @@ endif::[]
1242012422

1242112423
* xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[transactional_id_expiration_ms]
1242212424

12425+
* xref:manage:monitoring.adoc[Monitor Redpanda]
12426+
1242312427
|===
1242412428

1242512429

@@ -19095,7 +19099,9 @@ endif::[]
1909519099

1909619100
=== transactional_id_expiration_ms
1909719101

19098-
Expiration time of producer IDs. Measured starting from the time of the last write until now for a given ID. Producer IDs are automatically removed from memory when they expire, which helps manage memory usage. However, this natural cleanup may not be sufficient for workloads with high producer churn rates. For applications with long-running transactions, ensure this value accommodates your typical transaction lifetime to avoid premature producer ID expiration.
19102+
Expiration time of producer IDs. Measured starting from the time of the last write until now for a given ID.
19103+
19104+
Producer IDs are automatically removed from memory when they expire, which helps manage memory usage. However, this natural cleanup may not be sufficient for workloads with high producer churn rates. For applications with long-running transactions, ensure this value accommodates your typical transaction lifetime to avoid premature producer ID expiration.
1909919105

1910019106
[cols="1s,2a"]
1910119107
|===
@@ -19141,7 +19147,7 @@ endif::[]
1914119147
|
1914219148
* xref:develop:transactions.adoc#tune-producer-id-limits[Tune producer ID limits]
1914319149

19144-
* xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[max_concurrent_producer_ids]
19150+
* xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`]
1914519151

1914619152
|===
1914719153

0 commit comments

Comments
 (0)