Skip to content

Implement handling strategy for retryable vs non-retryable exceptons in workerPartition#6269

Closed
vecheka wants to merge 1 commit into
opensearch-project:mainfrom
vecheka:worker-partition-retry-strategy
Closed

Implement handling strategy for retryable vs non-retryable exceptons in workerPartition#6269
vecheka wants to merge 1 commit into
opensearch-project:mainfrom
vecheka:worker-partition-retry-strategy

Conversation

@vecheka

@vecheka vecheka commented Nov 14, 2025

Copy link
Copy Markdown
Contributor

Description

This is second part of the change to handle retryable vs non-retryable excpetions.

First part PR for more context: #6255

How

We are adding a new generic exception class SaaSCrawlerException to be shared by all connectors. This class is similar to previous API specific exception class (e.g Office365Exception).

CrawlerException will have two criterias:

  • Any REST API calls failures will be considered retryable. Additionally, writing to buffer will be considered retryable too.
  • Other exceptions (e.g internal failures) will be considered non-retryable

We will utilize CrawlerException, and throw this up all the way to WorkerSchedule where in the followup PR:

  • If the exception is flagged as "retryable = true", we will continue with current behaviour of backoff retry every 5ms
  • Otherwise, we will delay the retry by 1 day by calling sourceCoordinator.saveProgressStateForPartition(workerPartition, DURATION_TO_DELAY_RETRY). If it continues to fail up to 30 days (using partitionCreationTime field to confirm), we will give up the worker partition.

Is this change backward compatible?

Yes. We ensure to keep the catch block on generic "Exception" so all other connectors will still use the default behaviour of backoff retry every 5ms for all exception types.

Testing

Unit tests, ran the below successfully:

./gradlew :data-prepper-plugins:saas-source-plugins:microsoft-office365-source:test \
--tests "org.opensearch.dataprepper.plugins.source.microsoft_office365.Office365SourceConfigTest"


./gradlew :data-prepper-plugins:saas-source-plugins:source-crawler:test \
--tests "org.opensearch.dataprepper.plugins.source.source_crawler.coordination.scheduler.WorkerSchedulerTest"

./gradlew :data-prepper-plugins:saas-source-plugins:microsoft-office365-source:checkstyleTest  
./gradlew :data-prepper-plugins:saas-source-plugins:source-crawler:checkstyleTest  

Issues Resolved

N/A

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…in workerPartition

Signed-off-by: Vecheka Chhourn <vecheka@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant