You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/develop/pages/transactions.adoc
+10-7Lines changed: 10 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -272,22 +272,25 @@ To help avoid common pitfalls and optimize performance, consider the following w
272
272
273
273
=== Tune producer ID limits
274
274
275
-
For production environments with heavy producer usage, consider using xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`] to prevent out-of-memory (OOM) crashes. The default unlimited value can lead to unbounded memory growth, especially with transactions or idempotent producers.
275
+
For production environments with heavy producer usage, configure both xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`] and xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] to prevent out-of-memory (OOM) crashes. Setting limits on producer IDs helps manage memory usage in high-throughput environments, particularly when using transactions or idempotent producers.
276
276
277
-
Calculate an appropriate value based on your expected concurrent producers:
277
+
When setting `max_concurrent_producer_ids`, you can determine an appropriate value based on your connection patterns if you have `kafka_connections_max` configured.
278
278
279
-
* **Lower bound**: `kafka_connections_max` ÷ `number_of_shards` (based on the assumption that each producer connects to only one shard)
280
-
* **Upper bound**: `topic_partitions_per_shard` × `kafka_connections_max` (based on the assumption that producers connect to all shards)
281
-
* **Recommended starting point**: Use a value between these upper and lower bounds, considering your application's produce patterns
279
+
Lower bound: `kafka_connections_max` / `number_of_shards`, assuming each producer connects to only one shard.
280
+
Upper bound: `topic_partitions_per_shard` * `kafka_connections_max`, assuming producers connect to all shards.
282
281
283
-
Applications with wide fan-out patterns (producers writing to many partitions across multiple shards) require values closer to the upper bound.
282
+
If `kafka_connections_max` is not configured in your environment, estimate based on your application patterns. A conservative approach is to start with 1000-5000 per shard, then monitor and adjust as needed. Applications with many partitions per producer typically require higher values, such as 10000 or more per shard.
283
+
284
+
The `transactional_id_expiration_ms` setting should be tuned based on your application's transaction patterns. Calculate this value by taking your longest expected transaction time and adding a safety buffer. For example, if transactions typically run for 30 minutes, consider setting this to 2-4 hours. Short-lived transactions can use values between 1-4 hours, while batch processing applications should match their batch interval plus buffer time. Interactive applications may benefit from shorter values to free up memory faster.
285
+
286
+
Client applications should minimize producer ID churn. Reuse producer instances when possible instead of creating new ones for each operation. Avoid using random transactional IDs, as some Flink configurations do, since this creates excessive producer ID churn. Instead, use consistent transactional IDs that can be resumed across application restarts.
284
287
285
288
Monitor these metrics to determine if the limit is being reached:
286
289
287
290
* xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_evicted_producers[`vectorized_cluster_producer_state_manager_evicted_producers`]: Number of evicted producers (should be 0 in steady state)
288
291
* xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_producer_manager_total_active_producers[`vectorized_cluster_producer_state_manager_producer_manager_total_active_producers`]: Current number of active producers per shard
289
292
290
-
If `evicted_producers` > 0, the shard is exceeding the configured limit. For applications with long-running transactions, ensure xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] accommodates your typical transaction lifetime to avoid premature producer ID expiration.
293
+
If `vectorized_cluster_producer_state_manager_evicted_producers` > 0, the shard is exceeding the configured limit. For applications with long-running transactions, ensure xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] accommodates your typical transaction lifetime to avoid premature producer ID expiration.
Copy file name to clipboardExpand all lines: modules/reference/partials/properties/cluster-properties.adoc
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12377,6 +12377,8 @@ endif::[]
12377
12377
12378
12378
Maximum number of active producer sessions per shard. Each shard tracks producer IDs using an LRU (Least Recently Used) eviction policy. When the configured limit is exceeded, the least recently used producer IDs are evicted from the cache.
12379
12379
12380
+
If upgrading from 23.2.x to 23.3.x and experiencing `OUT_OF_SEQUENCE` errors, consider increasing this value. The configuration changed from per-partition to per-shard basis in 23.3.x.
12381
+
12380
12382
IMPORTANT: The default value is unlimited, which can lead to unbounded memory growth and out-of-memory (OOM) crashes in production environments with heavy producer usage, especially when using transactions or idempotent producers. It is strongly recommended to set a reasonable limit in production deployments.
12381
12383
12382
12384
[cols="1s,2a"]
@@ -19099,9 +19101,9 @@ endif::[]
19099
19101
19100
19102
=== transactional_id_expiration_ms
19101
19103
19102
-
Expiration time of producer IDs. Measured starting from the time of the last write until now for a given ID.
19104
+
Expiration time of producer IDs for both transactional and idempotent producers. Despite the name, this setting applies to all producer types. Measured starting from the time of the last write until now for a given ID.
19103
19105
19104
-
Producer IDs are automatically removed from memory when they expire, which helps manage memory usage. However, this natural cleanup may not be sufficient for workloads with high producer churn rates. For applications with long-running transactions, ensure this value accommodates your typical transaction lifetime to avoid premature producer ID expiration.
19106
+
Producer IDs are automatically removed from memory when they expire, which helps manage memory usage. However, this natural cleanup may not be sufficient for workloads with high producer churn rates. Tune this value based on your application's producer session and transaction lifetimes. Consider your longest-running transaction duration plus a buffer to avoid premature expiration.
0 commit comments