Skip to content

allow retry sagemaker batch job creation for longer time window#6082

Merged
graytaylor0 merged 2 commits into
opensearch-project:mainfrom
Zhangxunmt:main
Sep 15, 2025
Merged

allow retry sagemaker batch job creation for longer time window#6082
graytaylor0 merged 2 commits into
opensearch-project:mainfrom
Zhangxunmt:main

Conversation

@Zhangxunmt

Copy link
Copy Markdown
Collaborator

Description

When setting minimum OCUs greater than 5 with S3Scan reading multiple files, ml processor would trigger SageMaker Batch API throttled. This PR adds retry mechanism to the throttled records in a 10 min time window, to retry throttled records until the throttling goes away which enhances the success rates overall.

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Xun Zhang <xunzh@amazon.com>
Signed-off-by: Xun Zhang <xunzh@amazon.com>
@graytaylor0 graytaylor0 merged commit e1195c1 into opensearch-project:main Sep 15, 2025
44 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants