Commit 4dfec8d
committed
The original problem, reported by one customer, was that they got non retryable
errors during CompleteMPU calls. In this case, their client would follow up
with a call to the AbortMPU API, logically.
We made a mistake during the last bugfix:
- We incorrectly thought that the reason of the error was a failure during
the extra part deletion. And fixed it by cleaning everything during AbortMPU.
But this is wrong: when an error happens in this step of the CompleteMPU API,
it means the S3 object was properly created AND the overview keys and parts
metadata were already cleaned up. Aborting in such case would just return
a 404 error, but won't do anything. Instead, the state that the customer
encountered was different: they failed at the `deletePartsMetadata` step.
This is something we support with the `CompleteMultiPartUpload` retry logic.
The clean up in the AbortMPU API is actually great: when we end up in this
situation, the client has two available (and valid) choices:
- They retry the CompleteMultiPartUpload.
- They call the AbortMPU API.
The latter is now possible, since the cleanup introduction. In this case, we
detect that the object is in an inconsistent state and proceed with the
removal of everything (both in the S3 bucket and the mpuShadowBucket), so
the result is consistent: the MPU is aborted and the customer must re-upload
the whole object.
The first option is preferred, as it keeps the object in the bucket and avoids
the need to re-upload.
But the problem here is that we do not guarantee that a retryable error is
returned to the client: in case of MD update failure, the returned error
might not be retryable: we can have DeleteConflict or NoSuchKey errors.
Other usually are retryable (InternalError, etc).
So we want to make sure of two behaviors:
- In the batchDeleteObjectMetadata error callback case, we should always
return a retryable error. It means that in case of a DeleteConflict, we
should always retry, and in case of a NoSuchKey, we should not retry.
- During the batchDeleteExtraParts, we already cleaned up the mpu bucket,
so returning an error to the client if the operation fails will wrongly
lead to a completeMPU retry or ABort, while the operation succeeded, and
we only created ghosts. In any case, the ghosts are permanent: without
the mpu bucket, we cannot clean them in subsequent calls.
Issue: CLDSRV-6691 parent 34ee809 commit 4dfec8d
1 file changed
Lines changed: 52 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
524 | 524 | | |
525 | 525 | | |
526 | 526 | | |
527 | | - | |
528 | | - | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
529 | 558 | | |
530 | 559 | | |
531 | 560 | | |
532 | 561 | | |
533 | 562 | | |
534 | 563 | | |
535 | | - | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
536 | 585 | | |
537 | 586 | | |
538 | 587 | | |
| |||
0 commit comments