Skip to content

fix: throttle retriable transport upload failures#770

Merged
denischilik merged 21 commits into
mainfrom
fix/retry-transport-detection
May 21, 2026
Merged

fix: throttle retriable transport upload failures#770
denischilik merged 21 commits into
mainfrom
fix/retry-transport-detection

Conversation

@denischilik
Copy link
Copy Markdown
Contributor

@denischilik denischilik commented May 14, 2026

Background

Some transport/network failures can produce excessive retry traffic when uploads cannot reach mParticle endpoints. This update introduces explicit transport-error detection and keeps throttling behavior aligned with existing retry flow.

What Has Changed

  • Added a Swift MPTransportErrorDetector to centralize retriable transport error classification.
  • Applied transport-aware throttling in both message and alias upload retry paths.
  • Kept existing 429/503 handling order while adding transport-failure throttling in the non-success retry branch.
  • Added unit tests for transport error detection and wired them into the iOS test target.

Screenshots/Video

N/A

Checklist

  • I have performed a self-review of my own code.
  • I have made corresponding changes to the documentation.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have tested this locally.

Additional Notes

Detect retriable transport errors via a shared Swift detector and apply throttling for message/alias retry paths while preserving existing HTTP retry ordering.
Remove the dedicated transport-throttle helper and handle transport retry throttling inline for message and alias uploads to keep the retry flow straightforward.
@denischilik denischilik requested a review from a team as a code owner May 14, 2026 17:26
@cursor
Copy link
Copy Markdown

cursor Bot commented May 14, 2026

PR Summary

Medium Risk
Changes retry/throttling behavior for message and alias uploads by introducing transport-error classification and backoff; mistakes could delay uploads or alter retry volume under failure conditions.

Overview
Reduces retry traffic during network/transport failures by adding transport-aware throttling for message and alias uploads, using a new MPTransportErrorDetector backoff schedule and resetting the backoff on successful requests.

Refactors Retry-After header parsing into Swift utilities (NSDictionary+MPRetryAfter and MPNetworkCommunicationHelper) and updates ObjC upload paths to use the shared calculation for 429/503 throttling.

Standardizes connector timeout errors to a shared domain/code exposed by MPTransportErrorDetector, and adds unit tests covering transport error classification, backoff timing, and Retry-After parsing; wires the new tests into the Xcode project.

Reviewed by Cursor Bugbot for commit 5b13232. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

📦 SDK Size Impact Report

Measures how much the SDK adds to an app's size (with-SDK minus without-SDK).

Metric Target Branch This PR Change
App Bundle Impact 1.77 MB 1.77 MB +4 KB
Executable Impact 848 bytes 848 bytes +N/A
XCFramework Size 6.44 MB 6.45 MB +8 KB

➡️ SDK size impact change is minimal.

Raw measurements

Target branch (main):

{"baseline_app_size_kb":84,"baseline_executable_size_bytes":75464,"with_sdk_app_size_kb":1896,"with_sdk_executable_size_bytes":76312,"sdk_impact_kb":1812,"sdk_executable_impact_bytes":848,"xcframework_size_kb":6596}

This PR:

{"baseline_app_size_kb":84,"baseline_executable_size_bytes":75464,"with_sdk_app_size_kb":1900,"with_sdk_executable_size_bytes":76312,"sdk_impact_kb":1816,"sdk_executable_impact_bytes":848,"xcframework_size_kb":6604}

Comment thread mParticle-Apple-SDK-Swift/Sources/Network/MPTransportErrorDetector.swift Outdated
Keep transport throttling focused on host/connectivity failures by excluding not-connected and timed-out NSURL errors, and update tests to lock in the new retry classification behavior.
Copy link
Copy Markdown
Collaborator

@BrandonStalnaker BrandonStalnaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks solid to me. I'm somewhat worried the change is a bit broad. I'm worried that for a few of these error types we would want to retry sooner than 2 hours from the failure (I think thats the minimum right). I'd like anothers opinion before I'll approve.
NSURLErrorCannotFindHost,
NSURLErrorCannotConnectToHost,
NSURLErrorNetworkConnectionLost,
NSURLErrorDNSLookupFailed,
NSURLErrorCannotLoadFromNetwork,
NSURLErrorSecureConnectionFailed,
NSURLErrorInternationalRoamingOff,
NSURLErrorDataNotAllowed,
NSURLErrorCallIsActive,
NSURLErrorAppTransportSecurityRequiresSecureConnection:

thomson-t
thomson-t previously approved these changes May 15, 2026
Comment thread mParticle-Apple-SDK-Swift/Sources/Network/MPTransportErrorDetector.swift Outdated
Comment thread mParticle-Apple-SDK/Network/MPNetworkCommunication.m Outdated
Comment thread mParticle-Apple-SDK-Swift/Sources/Network/MPTransportErrorDetector.swift Outdated
Copy link
Copy Markdown
Collaborator

@jamesnrokt jamesnrokt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on the above comments.

My core concern: as it stands, any retriable transport error puts the SDK into a 2-hour upload freeze with no recovery hook as minUploadDate is only checked, never cleared early.

Even a brief connectivity drop costs 2 hours of buffered events. Worse, because every client uses the same hardcoded default with no jitter, a brief endpoint blip triggers a synchronized thundering herd of requests hitting our servers 2 hours later.

Use the SDK logger in throttleWithHTTPResponse to align with the new logging path and ensure custom logger handling is applied consistently.
Add a Swift NSDictionary extension for parsing Retry-After seconds and cover NSNumber, String, invalid, and missing header scenarios with unit tests.
Use dictionary retry-after helpers for date and seconds parsing in throttling logic and add Swift tests for both parsing paths.
Move retry-after interval calculation into a dedicated Swift network helper and cover the helper behavior with unit tests for seconds and date-based headers.
Add transport-error retry backoff state to the detector, route retriable transport failures through a dedicated throttling path, and reset the counter after successful uploads.
Unify throttling into one method that accepts retryAfter and compute retry intervals at each call site for HTTP response and transport error paths.
Use a local shared instance reference in throttleWithRetryAfter to keep logger and state machine access consistent and avoid repeated singleton lookups.
Comment thread mParticle-Apple-SDK-Swift/Sources/Network/MPTransportErrorDetector.swift Outdated
Comment thread mParticle-Apple-SDK-Swift/Sources/Network/MPTransportErrorDetector.swift Outdated
Comment thread mParticle-Apple-SDK/Network/MPNetworkCommunication.m
Comment thread mParticle-Apple-SDK/Network/MPNetworkCommunication.m
Promote the connector semaphore timeout domain/code to shared constants and reuse them in connector timeout errors. Restore transport detector handling for the SDK timeout signal and align unit expectations with the updated behavior.
Classify NSURLErrorNotConnectedToInternet and NSURLErrorTimedOut as retriable transport failures and update detector tests to match the restored behavior.
Guard against empty retry schedules and clamp the calculated backoff index to the last available entry to avoid out-of-bounds access if constants drift.
Lower transport retry intervals to a 5s/15s/60s/120s/300s progression and cap at 5 minutes to avoid long upload freezes after transient connectivity failures.
Comment thread mParticle-Apple-SDK-Swift/Sources/Network/MPTransportErrorDetector.swift Outdated
Comment thread mParticle-Apple-SDK-Swift/Sources/Network/MPTransportErrorDetector.swift Outdated
Exclude ATS secure-connection failures from retriable transport errors with matching test coverage, and update OneTrust vendor details call to the current vendorId selector.
Switch the OneTrust vendor consent lookup to the exported getVendorDetailsWithVendorID:for: selector so kit builds succeed with the current SDK headers.
Prefix retry-after category methods with mp_ to avoid Objective-C selector collisions in host apps, and update helper and tests to use the namespaced API.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9ef2c1c. Configure here.

Copy link
Copy Markdown
Collaborator

@jamesnrokt jamesnrokt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but will leave up to you depending on urgency

Comment thread mParticle-Apple-SDK/Network/MPConnector.m
@denischilik denischilik merged commit ed6ff9d into main May 21, 2026
72 of 74 checks passed
@denischilik denischilik deleted the fix/retry-transport-detection branch May 21, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants