fix: throttle retriable transport upload failures#770
Conversation
Detect retriable transport errors via a shared Swift detector and apply throttling for message/alias retry paths while preserving existing HTTP retry ordering.
Remove the dedicated transport-throttle helper and handle transport retry throttling inline for message and alias uploads to keep the retry flow straightforward.
PR SummaryMedium Risk Overview Refactors Standardizes connector timeout errors to a shared domain/code exposed by Reviewed by Cursor Bugbot for commit 5b13232. Bugbot is set up for automated code reviews on this repo. Configure here. |
📦 SDK Size Impact ReportMeasures how much the SDK adds to an app's size (with-SDK minus without-SDK).
➡️ SDK size impact change is minimal. Raw measurementsTarget branch (main): {"baseline_app_size_kb":84,"baseline_executable_size_bytes":75464,"with_sdk_app_size_kb":1896,"with_sdk_executable_size_bytes":76312,"sdk_impact_kb":1812,"sdk_executable_impact_bytes":848,"xcframework_size_kb":6596}This PR: {"baseline_app_size_kb":84,"baseline_executable_size_bytes":75464,"with_sdk_app_size_kb":1900,"with_sdk_executable_size_bytes":76312,"sdk_impact_kb":1816,"sdk_executable_impact_bytes":848,"xcframework_size_kb":6604} |
Keep transport throttling focused on host/connectivity failures by excluding not-connected and timed-out NSURL errors, and update tests to lock in the new retry classification behavior.
BrandonStalnaker
left a comment
There was a problem hiding this comment.
The code looks solid to me. I'm somewhat worried the change is a bit broad. I'm worried that for a few of these error types we would want to retry sooner than 2 hours from the failure (I think thats the minimum right). I'd like anothers opinion before I'll approve.
NSURLErrorCannotFindHost,
NSURLErrorCannotConnectToHost,
NSURLErrorNetworkConnectionLost,
NSURLErrorDNSLookupFailed,
NSURLErrorCannotLoadFromNetwork,
NSURLErrorSecureConnectionFailed,
NSURLErrorInternationalRoamingOff,
NSURLErrorDataNotAllowed,
NSURLErrorCallIsActive,
NSURLErrorAppTransportSecurityRequiresSecureConnection:
jamesnrokt
left a comment
There was a problem hiding this comment.
Requesting changes based on the above comments.
My core concern: as it stands, any retriable transport error puts the SDK into a 2-hour upload freeze with no recovery hook as minUploadDate is only checked, never cleared early.
Even a brief connectivity drop costs 2 hours of buffered events. Worse, because every client uses the same hardcoded default with no jitter, a brief endpoint blip triggers a synchronized thundering herd of requests hitting our servers 2 hours later.
Use the SDK logger in throttleWithHTTPResponse to align with the new logging path and ensure custom logger handling is applied consistently.
Add a Swift NSDictionary extension for parsing Retry-After seconds and cover NSNumber, String, invalid, and missing header scenarios with unit tests.
Use dictionary retry-after helpers for date and seconds parsing in throttling logic and add Swift tests for both parsing paths.
Move retry-after interval calculation into a dedicated Swift network helper and cover the helper behavior with unit tests for seconds and date-based headers.
Add transport-error retry backoff state to the detector, route retriable transport failures through a dedicated throttling path, and reset the counter after successful uploads.
Unify throttling into one method that accepts retryAfter and compute retry intervals at each call site for HTTP response and transport error paths.
Use a local shared instance reference in throttleWithRetryAfter to keep logger and state machine access consistent and avoid repeated singleton lookups.
Promote the connector semaphore timeout domain/code to shared constants and reuse them in connector timeout errors. Restore transport detector handling for the SDK timeout signal and align unit expectations with the updated behavior.
Classify NSURLErrorNotConnectedToInternet and NSURLErrorTimedOut as retriable transport failures and update detector tests to match the restored behavior.
Guard against empty retry schedules and clamp the calculated backoff index to the last available entry to avoid out-of-bounds access if constants drift.
Lower transport retry intervals to a 5s/15s/60s/120s/300s progression and cap at 5 minutes to avoid long upload freezes after transient connectivity failures.
Exclude ATS secure-connection failures from retriable transport errors with matching test coverage, and update OneTrust vendor details call to the current vendorId selector.
Switch the OneTrust vendor consent lookup to the exported getVendorDetailsWithVendorID:for: selector so kit builds succeed with the current SDK headers.
Prefix retry-after category methods with mp_ to avoid Objective-C selector collisions in host apps, and update helper and tests to use the namespaced API.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9ef2c1c. Configure here.

Background
Some transport/network failures can produce excessive retry traffic when uploads cannot reach mParticle endpoints. This update introduces explicit transport-error detection and keeps throttling behavior aligned with existing retry flow.
What Has Changed
MPTransportErrorDetectorto centralize retriable transport error classification.Screenshots/Video
N/A
Checklist
Additional Notes