fix(storage): address multipart upload thread safety and file-size crashes#4203
Draft
harsh62 wants to merge 1 commit into
Draft
fix(storage): address multipart upload thread safety and file-size crashes#4203harsh62 wants to merge 1 commit into
harsh62 wants to merge 1 commit into
Conversation
…ashes (#4138) Serialize all access to StorageMultipartUploadSession.multipartUpload on its existing serial queue, so handlers read and mutate the state under the lock and then branch on a captured snapshot. Replaces the fatalError sites that fire when a concurrent cancel/pause swaps the state between the sync block and a subsequent re-read with thrown StorageError via the existing fail path. Make FileSystem.getFileSize throw instead of Fatal.require so a missing source file (e.g. user deleted the video between selection and upload) surfaces as a StorageError through the upload task rather than crashing. Adds StorageMultipartUploadSafetyTests covering: - concurrent cancel + failed-part event (reproduces the Invalid state crash) - concurrent fail + handle data race (TSan-only reliable repro) - getFileSize on missing file - UploadSource.local with missing source propagates error
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4203 +/- ##
==========================================
- Coverage 68.12% 66.92% -1.20%
==========================================
Files 1150 1091 -59
Lines 56035 41599 -14436
==========================================
- Hits 38173 27841 -10332
+ Misses 17862 13758 -4104
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the three crashes reported in #4138 on
AWSS3StoragePluginv2.53.3.StorageMultipartUploadSession.multipartUploadon the existing serial queue.handle(multipartUploadEvent:),handle(uploadPartEvent:),retryPartUpload,restart,createSubTask, anduploadPartsnow read and mutate the state under the lock, then branch on a captured snapshot instead of re-reading the property after the lock releases. Eliminates the torn-read race that surfaced asswift_retaininStorageMultipartUpload.uploadId.getter(Crash 2) and thefatalError(\"Invalid state\")atStorageMultipartUploadSession.swift:350(Crash 1).fatalErrorsites that fire when a concurrent cancel/pause swaps the state between the sync block and a subsequent re-read with a thrownFailure.invalidStateTransitionrouted through the existingfail(error:)path, so the upload surfaces aStorageErrorinstead of crashing the process.FileSystem.getFileSizethrow instead ofFatal.requireand propagate throughUploadSource.getFile(which is alreadythrowsand already wrapped inattempt(..., fail:)at the upload call sites). Fixes Crash 3 where deleting the source file between selection and upload start crashed the app.Test plan
TDD-first: regression tests reproduce each crash against unfixed code, then turn green after the fix.
New
StorageMultipartUploadSafetyTests:testConcurrentCancelAndFailedPartEvent_doesNotCrash— reproduces Crash 1 (fatalError(\"Invalid state\")) against unfixed code; passes after fixtestConcurrentFailAndHandle_doesNotRace— stress test for Crash 2; reliable repro under Thread Sanitizer (documented in the test;swift test --sanitize=threadis broken on Xcode 26 CLI, run with the Xcode scheme diagnostic)testGetFileSize_missingFile_throwsInsteadOfCrashing— reproduces Crash 3 against unfixed code; passes after fixtestUploadSource_localSource_missingFile_propagatesError— end-to-end variant of Crash 3Regression suite:
StorageMultipartUploadSessionTestspass (includingtestProgressStallTimeoutWhenNoProgressFailsWithStallErroradded in feat(storage): add progress stall timeout for S3 uploads #4162)testConcurrentFailAndHandle_doesNotRacefrom Xcode UI on reviewer's machineCloses #4138