Commit cd346f0
Fix RDS joins: set workers=1 on main pipeline to prevent event reordering (opensearch-project#6784)
The joins template inherited the user-configured workers count (default 2)
for the main pipeline. With workers > 1, multiple threads write to the S3
sink concurrently. The S3 sink's ReentrantLock serializes writes but does
not guarantee ordering — thread 2 can write item2 before thread 1 writes
item1 for the same parent document.
When the S3 sub-pipeline reads these out-of-order events and sends them
to OpenSearch, the per-table version check in the Painless script rejects
the lower-versioned item (noop), causing data loss for 1:N child records.
Setting workers=1 ensures events are written to S3 in binlog order.
This has no throughput impact since the S3 sink's ReentrantLock already
serializes writes to a single thread at a time.
Tested with 200 threads, 5M orders: 0 failures with workers=1 vs
~0.09% failure rate with workers=2.
Signed-off-by: Dinu John <86094133+dinujoh@users.noreply.github.com>1 parent fa72b15 commit cd346f0
1 file changed
Lines changed: 1 addition & 1 deletion
File tree
- data-prepper-plugins/rds-source/src/main/resources/org/opensearch/dataprepper/transforms/templates
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
0 commit comments