feat(sink/opensearch): Retry DELETE operations on 404 status#5674
Conversation
Makes the BulkRetryStrategy retry DELETE operations that initially fail with a 404 (Not Found) status. This handles race conditions where a delete might be processed before its corresponding create in the same bulk request due to eventual consistency.
- Modified canRetry and createBulkRequestForRetry to identify and allow retries specifically for DELETE + 404.
- Added unit tests verifying the new retry behavior for DELETE + 404 and ensuring other 404s are retried.
Signed-off-by: Pallempati Saketh <pallempati.saketh@fmr.com>
dlvenable
left a comment
There was a problem hiding this comment.
Thank you @saketh-pallempati for this contribution!
| return true; | ||
| if (isItemInError(bulkItemResponse)) { | ||
| boolean isGenerallyRetryable = !NON_RETRY_STATUS.contains(bulkItemResponse.status()); | ||
| boolean isDeleteNotFound = bulkItemResponse.status() == RestStatus.NOT_FOUND.getStatus() && |
There was a problem hiding this comment.
The variable name isDeleteNotFound makes it sound like a delete was not found. Perhaps isDeleteResultingInNotFound would be clearer.
Also, this code is mostly replicated below. Let's have a common method to consolidate this logic. Something like canRetryItem(final BulkItemResponse bulkItemResponse).
There was a problem hiding this comment.
Thanks for the refactor suggestoins. Will implement them.
| boolean isDeleteNotFound = bulkItemResponse.status() == RestStatus.NOT_FOUND.getStatus() && | ||
| bulkItemResponse.operationType() == OperationType.Delete; | ||
|
|
||
| if (isGenerallyRetryable || isDeleteNotFound) { |
There was a problem hiding this comment.
Related to the question of max retries from the original issue (see comment):
What if the delete operation is for a document that never existed? This would keep retrying because there is not a create operation in the bulk request. What would it take to see if a create/index operation exists in the current request?
- Add DELETE_404_MAX_RETRIES constant (3 attempts max) to prevent infinite retry loops - Refactor canRetryItem into focused methods: isDeleteOperationWithNotFoundError, canRetryDeleteNotFoundOperation, isGenerallyRetryableOperation - Add comprehensive test coverage for DELETE+404 edge cases and mixed operations This prevents infinite retries when DELETE operations encounter 404 errors due to race conditions in bulk operations, while maintaining backward compatibility and not affecting other operation types. Signed-off-by: Pallempati Saketh <pallempati.saketh@fmr.com>
Description
feat(sink/opensearch): Retry DELETE operations on 404 status
Enhances the
BulkRetryStrategyto retry DELETE operations that initially fail with a 404 (Not Found) status. This addresses race conditions where a delete might be processed before its corresponding create in the same bulk request due to eventual consistency.canRetryandcreateBulkRequestForRetryto identify and allow retries specifically forDELETE + 404.DELETE + 404and ensuring other 404s are retried.Issues Resolved
Resolves #5521
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.