Skip to content

feat(sink/opensearch): Retry DELETE operations on 404 status#5674

Merged
dlvenable merged 2 commits into
opensearch-project:mainfrom
fidelity-contributions:fix-issue-5521
Aug 11, 2025
Merged

feat(sink/opensearch): Retry DELETE operations on 404 status#5674
dlvenable merged 2 commits into
opensearch-project:mainfrom
fidelity-contributions:fix-issue-5521

Conversation

@saketh-pallempati
Copy link
Copy Markdown
Contributor

Description

feat(sink/opensearch): Retry DELETE operations on 404 status

Enhances the BulkRetryStrategy to retry DELETE operations that initially fail with a 404 (Not Found) status. This addresses race conditions where a delete might be processed before its corresponding create in the same bulk request due to eventual consistency.

  • Modified canRetry and createBulkRequestForRetry to identify and allow retries specifically for DELETE + 404.
  • Added unit tests verifying the new retry behavior for DELETE + 404 and ensuring other 404s are retried.

Issues Resolved

Resolves #5521

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

                  Makes the BulkRetryStrategy retry DELETE operations that initially fail with a 404 (Not Found) status. This handles race conditions where a delete might be processed before its corresponding create in the same bulk request due to eventual consistency.

                  - Modified canRetry and createBulkRequestForRetry to identify and allow retries specifically for DELETE + 404.
                  - Added unit tests verifying the new retry behavior for DELETE + 404 and ensuring other 404s are retried.

Signed-off-by: Pallempati Saketh <pallempati.saketh@fmr.com>
Copy link
Copy Markdown
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @saketh-pallempati for this contribution!

return true;
if (isItemInError(bulkItemResponse)) {
boolean isGenerallyRetryable = !NON_RETRY_STATUS.contains(bulkItemResponse.status());
boolean isDeleteNotFound = bulkItemResponse.status() == RestStatus.NOT_FOUND.getStatus() &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name isDeleteNotFound makes it sound like a delete was not found. Perhaps isDeleteResultingInNotFound would be clearer.

Also, this code is mostly replicated below. Let's have a common method to consolidate this logic. Something like canRetryItem(final BulkItemResponse bulkItemResponse).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor suggestoins. Will implement them.

boolean isDeleteNotFound = bulkItemResponse.status() == RestStatus.NOT_FOUND.getStatus() &&
bulkItemResponse.operationType() == OperationType.Delete;

if (isGenerallyRetryable || isDeleteNotFound) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the question of max retries from the original issue (see comment):

What if the delete operation is for a document that never existed? This would keep retrying because there is not a create operation in the bulk request. What would it take to see if a create/index operation exists in the current request?

- Add DELETE_404_MAX_RETRIES constant (3 attempts max) to prevent infinite retry loops
- Refactor canRetryItem into focused methods: isDeleteOperationWithNotFoundError, canRetryDeleteNotFoundOperation, isGenerallyRetryableOperation
- Add comprehensive test coverage for DELETE+404 edge cases and mixed operations

This prevents infinite retries when DELETE operations encounter 404 errors due to race conditions in bulk operations, while maintaining backward compatibility and not affecting other operation types.

Signed-off-by: Pallempati Saketh <pallempati.saketh@fmr.com>
@dlvenable dlvenable merged commit 0a06b2a into opensearch-project:main Aug 11, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Creates and deletes in the same bulk request can result in 404

4 participants