Skip to content

Commit 1c8d155

Browse files
micheleRPclaude
andcommitted
DOC-2130 Warn against max_in_flight > 1 with batching processors
Add a NOTE to the max_in_flight reference of every batched output (46 connectors registered via MustRegisterBatchOutput) explaining that setting max_in_flight > 1 alongside a batching block with processors risks shipping raw, unprocessed messages to the output if a batching processor errors at runtime. Per CON-461, the underlying behavior lives in shared benthos framework code; until the next Connect v5 major can fix it, every affected output gets the same advisory. Implementation: introduce a shared definitions.max_in_flight_batched override and \$ref it from the 32 batched outputs that lacked an existing max_in_flight override. Repoint elasticsearch_v8 and elasticsearch_v9 from the now-removed elasticsearch_max_in_flight definition to the new shared one. Append the NOTE to the 12 connectors with bespoke max_in_flight prose (couchbase, cypher, gcp_bigquery, kafka_franz, mongodb, questdb, redpanda, redpanda_common, redpanda_migrator, salesforce_sink, snowflake_streaming, sql_raw) plus ockam_kafka.kafka.max_in_flight where the field is nested. Cloud Connect docs single-source from this repo, so the same change covers both sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5d68a60 commit 1c8d155

45 files changed

Lines changed: 268 additions & 28 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs-data/overrides.json

Lines changed: 170 additions & 18 deletions
Large diffs are not rendered by default.

modules/components/partials/fields/outputs/arc.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,8 @@ Size in bytes of the per-connection write buffer.
394394

395395
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
396396

397+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
398+
397399
*Type*: `int`
398400

399401
*Default*: `64`

modules/components/partials/fields/outputs/aws_dynamodb.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -227,6 +227,8 @@ json_map_columns:
227227

228228
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
229229

230+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
231+
230232
*Type*: `int`
231233

232234
*Default*: `64`

modules/components/partials/fields/outputs/aws_kinesis.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,9 @@ This field supports xref:configuration:interpolation.adoc#bloblang-queries[inter
211211

212212
=== `max_in_flight`
213213

214-
The maximum number of parallel message batches to have in flight at any given time.
214+
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
215+
216+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
215217

216218
*Type*: `int`
217219

modules/components/partials/fields/outputs/aws_kinesis_firehose.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,8 @@ Allows you to specify a custom endpoint for the AWS API.
206206

207207
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
208208

209+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
210+
209211
*Type*: `int`
210212

211213
*Default*: `64`

modules/components/partials/fields/outputs/aws_s3.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,8 @@ An optional server-side encryption key.
251251

252252
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
253253

254+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
255+
254256
*Type*: `int`
255257

256258
*Default*: `64`

modules/components/partials/fields/outputs/aws_sqs.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,9 @@ Allows you to specify a custom endpoint for the AWS API.
211211

212212
=== `max_in_flight`
213213

214-
The maximum number of parallel message batches to have in flight at any given time.
214+
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
215+
216+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
215217

216218
*Type*: `int`
217219

modules/components/partials/fields/outputs/azure_cosmosdb.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,8 @@ item_id: ${! json("id") }
201201

202202
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
203203

204+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
205+
204206
*Type*: `int`
205207

206208
*Default*: `64`

modules/components/partials/fields/outputs/azure_queue_storage.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,9 @@ processors:
114114

115115
=== `max_in_flight`
116116

117-
The maximum number of parallel message batches to have in flight at any given time.
117+
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
118+
119+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
118120

119121
*Type*: `int`
120122

modules/components/partials/fields/outputs/azure_table_storage.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,9 @@ processors:
114114

115115
=== `max_in_flight`
116116

117-
The maximum number of parallel message batches to have in flight at any given time.
117+
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
118+
119+
NOTE: Set `max_in_flight: 1` when a `batching` block with `processors` is configured on this output. If a batching processor (for example `parquet_encode`, `archive`, or `compress`) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a `.parquet` file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.
118120

119121
*Type*: `int`
120122

0 commit comments

Comments
 (0)