Skip to content

fix(storage): address multipart upload thread safety and file-size crashes#4203

Draft
harsh62 wants to merge 1 commit into
mainfrom
fix/multipart-upload-thread-safety
Draft

fix(storage): address multipart upload thread safety and file-size crashes#4203
harsh62 wants to merge 1 commit into
mainfrom
fix/multipart-upload-thread-safety

Conversation

@harsh62
Copy link
Copy Markdown
Member

@harsh62 harsh62 commented May 4, 2026

Summary

Fixes the three crashes reported in #4138 on AWSS3StoragePlugin v2.53.3.

  • Serialize all access to StorageMultipartUploadSession.multipartUpload on the existing serial queue. handle(multipartUploadEvent:), handle(uploadPartEvent:), retryPartUpload, restart, createSubTask, and uploadParts now read and mutate the state under the lock, then branch on a captured snapshot instead of re-reading the property after the lock releases. Eliminates the torn-read race that surfaced as swift_retain in StorageMultipartUpload.uploadId.getter (Crash 2) and the fatalError(\"Invalid state\") at StorageMultipartUploadSession.swift:350 (Crash 1).
  • Replace three fatalError sites that fire when a concurrent cancel/pause swaps the state between the sync block and a subsequent re-read with a thrown Failure.invalidStateTransition routed through the existing fail(error:) path, so the upload surfaces a StorageError instead of crashing the process.
  • Make FileSystem.getFileSize throw instead of Fatal.require and propagate through UploadSource.getFile (which is already throws and already wrapped in attempt(..., fail:) at the upload call sites). Fixes Crash 3 where deleting the source file between selection and upload start crashed the app.
  • Preserves the progress-stall-timer plumbing from feat(storage): add progress stall timeout for S3 uploads #4162 (reset/cancel calls intact across all state transitions).

Test plan

TDD-first: regression tests reproduce each crash against unfixed code, then turn green after the fix.

New StorageMultipartUploadSafetyTests:

  • testConcurrentCancelAndFailedPartEvent_doesNotCrash — reproduces Crash 1 (fatalError(\"Invalid state\")) against unfixed code; passes after fix
  • testConcurrentFailAndHandle_doesNotRace — stress test for Crash 2; reliable repro under Thread Sanitizer (documented in the test; swift test --sanitize=thread is broken on Xcode 26 CLI, run with the Xcode scheme diagnostic)
  • testGetFileSize_missingFile_throwsInsteadOfCrashing — reproduces Crash 3 against unfixed code; passes after fix
  • testUploadSource_localSource_missingFile_propagatesError — end-to-end variant of Crash 3

Regression suite:

  • All existing StorageMultipartUploadSessionTests pass (including testProgressStallTimeoutWhenNoProgressFailsWithStallError added in feat(storage): add progress stall timeout for S3 uploads #4162)
  • Full Storage unit-test suite: 1018 passed, 0 failed
  • TSan run of testConcurrentFailAndHandle_doesNotRace from Xcode UI on reviewer's machine

Closes #4138

…ashes (#4138)

Serialize all access to StorageMultipartUploadSession.multipartUpload on its
existing serial queue, so handlers read and mutate the state under the lock
and then branch on a captured snapshot. Replaces the fatalError sites that
fire when a concurrent cancel/pause swaps the state between the sync block
and a subsequent re-read with thrown StorageError via the existing fail path.

Make FileSystem.getFileSize throw instead of Fatal.require so a missing
source file (e.g. user deleted the video between selection and upload)
surfaces as a StorageError through the upload task rather than crashing.

Adds StorageMultipartUploadSafetyTests covering:
- concurrent cancel + failed-part event (reproduces the Invalid state crash)
- concurrent fail + handle data race (TSan-only reliable repro)
- getFileSize on missing file
- UploadSource.local with missing source propagates error
@harsh62 harsh62 requested a review from a team as a code owner May 4, 2026 13:19
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 had a problem deploying to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Failure
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 had a problem deploying to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Failure
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Inactive
@harsh62 harsh62 had a problem deploying to IntegrationTest May 4, 2026 13:19 — with GitHub Actions Failure
@harsh62 harsh62 marked this pull request as draft May 4, 2026 13:32
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 87.17949% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.92%. Comparing base (1a430e3) to head (cbc9898).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...pport/Internal/StorageMultipartUploadSession.swift 87.50% 9 Missing ⚠️
...SS3StoragePlugin/Support/Internal/FileSystem.swift 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4203      +/-   ##
==========================================
- Coverage   68.12%   66.92%   -1.20%     
==========================================
  Files        1150     1091      -59     
  Lines       56035    41599   -14436     
==========================================
- Hits        38173    27841   -10332     
+ Misses      17862    13758    -4104     
Flag Coverage Δ
API_plugin_unit_test 68.42% <ø> (-1.68%) ⬇️
AWSPluginsCore ?
Amplify 47.50% <ø> (-1.14%) ⬇️
Amplify_Foundation_Bridge_unit_test 62.28% <ø> (+0.18%) ⬆️
Amplify_Foundation_unit_test 67.64% <ø> (-2.46%) ⬇️
Analytics_plugin_unit_test 83.43% <ø> (-0.96%) ⬇️
Auth_plugin_unit_test 72.42% <ø> (-1.21%) ⬇️
DataStore_plugin_unit_test 82.81% <ø> (-1.55%) ⬇️
Firehose_plugin_unit_test 53.15% <ø> (-1.08%) ⬇️
Geo_plugin_unit_test 73.39% <ø> (-2.56%) ⬇️
Kinesis_plugin_unit_test 52.17% <ø> (-1.41%) ⬇️
Logging_plugin_unit_test 64.86% <ø> (-0.70%) ⬇️
Predictions_plugin_unit_test 33.89% <ø> (-0.36%) ⬇️
PushNotifications_plugin_unit_test 85.66% <ø> (-1.46%) ⬇️
RecordCache_unit_test 76.40% <ø> (-3.79%) ⬇️
Storage_plugin_unit_test 79.17% <87.17%> (-0.13%) ⬇️
unit_tests 66.92% <87.17%> (-1.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 14:28 — with GitHub Actions Inactive
@harsh62 harsh62 temporarily deployed to IntegrationTest May 4, 2026 14:28 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multipart upload crashes persist on 2.53.3 (thread-safety in StorageMultipartUploadSession)

1 participant