Initial addition of the PullIngester for the opensearch sink. by dlvenable · Pull Request #6842 · opensearch-project/data-prepper

dlvenable · 2026-05-08T15:08:03Z

Description

This creates a new PullIngester and implements the first one as the KafkaPullEngine. It reads an index to find the pull ingestion topic and then writes data to that topic. It routes shards using the same Murmur 3 approach that OpenSearch uses. The pull-based ingestion is marked as experimental. The configuration requires specifying the document Id currently.

Issues Resolved

Resolves #6835

This is the first phase of #6796.

Check List

New functionality includes testing.
New functionality has a documentation issue. Please link to it in this PR.
- New functionality has javadoc added
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This creates a new PullIngester and implements the first one as the KafkaPullEngine. It reads an index to find the pull ingestion topic and then writes data to that topic. It routes shards using the same Murmur 3 approach that OpenSearch uses. The pull-based ingestion is marked as experimental. The configuration requires specifying the document Id currently. Resolves opensearch-project#6835 Signed-off-by: David Venable <dlv@amazon.com>

kkondaka · 2026-05-25T22:19:58Z

-            openSearchSinkConfig.getIndexConfiguration().getSemanticEnrichmentConfig(),
-            openSearchSinkConfig.getIndexConfiguration().getSemanticEnrichmentResourceName(),
-            configuredIndexAlias);
+    if (openSearchSinkConfiguration.getPullIndexing() == null) {


Is semantic enrichment not applicable for pull indexing?

I'm guessing that it is. We can add this later. This PR is just phase 1 of the implementation.

kkondaka · 2026-05-25T22:22:36Z

+        return calculateShard(routingValue, numberOfShards);
+    }
+
+    static int calculateShard(final String routingValue, final int numberOfShards) {


Is this the correct approach. What if something changes on Opensearch Side? Isn't there an API in opensearch for this?

As I understand you cannot change the primary shard count on an index. Split and shrink will create a new index with different primary shard counts.

kkondaka · 2026-05-25T22:23:30Z

+     * Murmur3 hash matching OpenSearch's Murmur3HashFunction.hash(String).
+     * 32-bit Murmur3 with seed 0, operating on the UTF-8 bytes of the input.
+     */
+    static int murmur3Hash(final String routing) {


Same comment as above. What if Opensearch's hash function changes? I think we should get this from Opensearch API instead of having our own copy?

@kkondaka , I agree here and would like to get this as a library. However, this is currently not exposed as a library in OpenSearch. We can open an issue to make this available to Data Prepper for pull ingestion.

kkondaka · 2026-05-25T23:38:01Z

+            if (currentPartitions < requiredPartitions) {
+                LOG.info("Topic '{}' has {} partition(s) but {} required, increasing partition count",
+                        topicName, currentPartitions, requiredPartitions);
+                adminClient.createPartitions(


Is it possible that this may be invoked by more than one DataPrepper instance (when they are working as one DataPrepper cluster)? If yes, there may be a race condition here. Maybe you need to catch exceptions like "InvalidPartitionsException or ReassignmentInProgressException", retry?

This is a good catch. I'll follow up.

kkondaka · 2026-05-25T23:39:33Z

+
+    @Override
+    public void shutdown() {
+        if (producer != null) {


Probably explicitly flush() before close() to avoid any data loss?

kkondaka · 2026-05-25T23:41:11Z

+                pullEngine.write(partition, docId, envelope);
+                pullIngestionMetrics.recordBytes(envelope.length);
+                pullIngestionMetrics.incrementDocumentsSucceeded();
+                event.getEventHandle().release(true);


If positive ack is done here, is there a possibility of some error occurring when the data is pulled from kafka to index? How do they end up in DLQ?

I actually plan to support DLQ in #6838. This PR is phase 1 of the implementation.

… Kafka topic. Signed-off-by: David Venable <dlv@amazon.com>

dlvenable requested review from KarstenSchnitter, Zhangxunmt, dinujoh, divbok, graytaylor0, kkondaka, oeyh, san81, sb2k16 and srikanthjg as code owners May 8, 2026 15:08

dlvenable force-pushed the 6835-pull-ingestion-phase1 branch 2 times, most recently from af6842c to 1cdaed0 Compare May 11, 2026 14:14

dlvenable force-pushed the 6835-pull-ingestion-phase1 branch from 1cdaed0 to 2002dc8 Compare May 19, 2026 15:08

kkondaka reviewed May 25, 2026

View reviewed changes

Addressing PR feedback: Flush on shutdown and retry to initialize the…

00d6b7f

… Kafka topic. Signed-off-by: David Venable <dlv@amazon.com>

dlvenable requested a review from srikanthpadakanti as a code owner May 26, 2026 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial addition of the PullIngester for the opensearch sink.#6842

Initial addition of the PullIngester for the opensearch sink.#6842
dlvenable wants to merge 2 commits into
opensearch-project:mainfrom
dlvenable:6835-pull-ingestion-phase1

dlvenable commented May 8, 2026

Uh oh!

kkondaka May 25, 2026

Uh oh!

dlvenable May 26, 2026

Uh oh!

kkondaka May 25, 2026

Uh oh!

dlvenable May 26, 2026

Uh oh!

kkondaka May 25, 2026

Uh oh!

dlvenable May 26, 2026

Uh oh!

kkondaka May 25, 2026

Uh oh!

dlvenable May 26, 2026

Uh oh!

kkondaka May 25, 2026

Uh oh!

kkondaka May 25, 2026

Uh oh!

dlvenable May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dlvenable commented May 8, 2026

Description

Issues Resolved

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants