Fixing the crawler framework to handle ddb outage scenario by san81 · Pull Request #6207 · opensearch-project/data-prepper

san81 · 2025-10-29T23:44:51Z

Description

Crawler framework relay on Source Coordinator framework which uses DDB as its store for state management. In an event when DDB store becomes unreachable, the expectation is that, pipeline should fully recover back to normal state as soon as ddb is accessible. In its current state, leader scheduler is holding onto an expired ownership record, which is preventing it from fully recovering after DDB comes back to normal state. The fix is to give up the partition when ddb store is unavailable so that it can freshly require when the store comes back online and reachable.

Check List

New functionality includes testing.
New functionality has a documentation issue. Please link to it in this PR.
- New functionality has javadoc added
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable · 2025-10-30T16:59:14Z

+                        // it should give up leader partition and require whenever ddb store reachable again
+                        // Not giving up will make this to continue hold on to an ownership expired record which will create inconsistent state issues
+                        // if you are not the owner in ddb table, then you are not supposed to hold the ownership
+                        leaderPartition = null;


Please include a unit test for this scenario and the expected outcome.

Added a unit test to validate reacquiring scenario.

…ition state after ddb calls succeed Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable · 2025-10-30T18:56:50Z

        });
    }
+
+    @Test


You can move your comment to the test name to make it easier to read.

@Test("Ensure that if DynamoDB becomes unreachable, the leader gives up the partition and retries acquisition")

Added @DidsplayName annotation with the description

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

…h-project#6207) * Fixing the crawler framework to handle ddb outage scenario Signed-off-by: Vecheka Chhourn <vecheka@amazon.com>

Fixing the crawler framework to handle ddb outage scenario

7b3b5cc

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 marked this pull request as ready for review October 29, 2025 23:44

san81 requested review from KarstenSchnitter, chenqi0805, dinujoh, dlvenable, engechas, graytaylor0, kkondaka, oeyh, sb2k16 and srikanthjg as code owners October 29, 2025 23:44

graytaylor0 previously approved these changes Oct 30, 2025

View reviewed changes

dlvenable requested changes Oct 30, 2025

View reviewed changes

Adding corresponding unit test to validate reacquiring of leader part…

308e3ef

…ition state after ddb calls succeed Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 dismissed graytaylor0’s stale review via 308e3ef October 30, 2025 17:18

san81 requested review from dlvenable and graytaylor0 October 30, 2025 18:11

dlvenable reviewed Oct 30, 2025

View reviewed changes

Better display name

39b6e02

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 requested a review from dlvenable October 30, 2025 19:13

graytaylor0 approved these changes Oct 30, 2025

View reviewed changes

dlvenable approved these changes Oct 30, 2025

View reviewed changes

san81 merged commit 094db94 into opensearch-project:main Oct 30, 2025
45 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing the crawler framework to handle ddb outage scenario#6207

Fixing the crawler framework to handle ddb outage scenario#6207
san81 merged 3 commits into
opensearch-project:mainfrom
san81:crawler-source-coordinator-fix

san81 commented Oct 29, 2025

Uh oh!

dlvenable Oct 30, 2025

Uh oh!

san81 Oct 30, 2025

Uh oh!

dlvenable Oct 30, 2025

Uh oh!

san81 Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

                       });
                   }
+                  @Test

Conversation

san81 commented Oct 29, 2025

Description

Check List

Uh oh!

dlvenable Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

san81 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

dlvenable Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

san81 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants