Skip to content

Commit cd346f0

Browse files
dinujohsimonelbaz
authored andcommitted
Fix RDS joins: set workers=1 on main pipeline to prevent event reordering (opensearch-project#6784)
The joins template inherited the user-configured workers count (default 2) for the main pipeline. With workers > 1, multiple threads write to the S3 sink concurrently. The S3 sink's ReentrantLock serializes writes but does not guarantee ordering — thread 2 can write item2 before thread 1 writes item1 for the same parent document. When the S3 sub-pipeline reads these out-of-order events and sends them to OpenSearch, the per-table version check in the Painless script rejects the lower-versioned item (noop), causing data loss for 1:N child records. Setting workers=1 ensures events are written to S3 in binlog order. This has no throughput impact since the S3 sink's ReentrantLock already serializes writes to a single thread at a time. Tested with 200 threads, 5M orders: 0 failures with workers=1 vs ~0.09% failure rate with workers=2. Signed-off-by: Dinu John <86094133+dinujoh@users.noreply.github.com>
1 parent fa72b15 commit cd346f0

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

  • data-prepper-plugins/rds-source/src/main/resources/org/opensearch/dataprepper/transforms/templates

data-prepper-plugins/rds-source/src/main/resources/org/opensearch/dataprepper/transforms/templates/rds-joins-template.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
#
99

1010
"<<pipeline-name>>":
11-
workers: "<<$.<<pipeline-name>>.workers>>"
11+
workers: 1
1212
delay: "<<$.<<pipeline-name>>.delay>>"
1313
buffer: "<<$.<<pipeline-name>>.buffer>>"
1414
source:

0 commit comments

Comments
 (0)