[fix] Distinguish client cancellation from server errors during slice upload#290
Conversation
… upload Client-side timeouts (per-attempt CancellationTokenSource) and user cancellations were converted to a synthetic HTTP 500 by the upload strategies, which caused the adaptive upload controller to repeatedly reset the chunk size to 500 KB and masked the real network condition. Introduce UploadFailureKind to distinguish ServerError, ClientCancellation and ClientTimeout, and stop downscaling the chunk size on client-side cancellations. Raise the per-slice attempt timeout floor from 30s to 60s so a single intermittent slice no longer kills the whole upload on a degraded network. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR improves upload resilience by explicitly classifying upload failures (server error vs client cancellation/timeout) and preventing client-driven cancellations/timeouts from being interpreted as bandwidth/server issues by the adaptive upload controller.
Changes:
- Introduces
UploadFailureKindand carries it throughUploadFileResponse, upload strategies, worker refinement, and adaptive controller recording. - Adds shared
UploadFailureClassifierand updates Azure/Cloudflare upload strategies to classify cancellations distinctly. - Adjusts per-attempt timeout computation (new named constants; 60s floor / 120s ceiling) and adds/extends unit tests across the touched components.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ByteSync.Client.UnitTests/Services/Communications/Transfers/Uploading/FileUploadWorkerTests.cs | Adds assertions that failureKind is plumbed to the adaptive controller correctly. |
| tests/ByteSync.Client.UnitTests/Services/Communications/Transfers/Uploading/FileUploadWorkerInternalsTests.cs | New tests for attempt-timeout math and cancellation-kind refinement logic. |
| tests/ByteSync.Client.UnitTests/Services/Communications/Transfers/Uploading/AdaptiveUploadControllerTests.cs | Adds regression coverage ensuring client failures don’t reset chunk size. |
| tests/ByteSync.Client.UnitTests/Services/Communications/Transfers/Strategies/UploadFailureClassifierTests.cs | New tests for shared failure classification behavior. |
| tests/ByteSync.Client.UnitTests/Services/Communications/Transfers/Strategies/CloudflareR2UploadStrategyTests.cs | New tests validating failureKind classification and status handling for R2 uploads. |
| tests/ByteSync.Client.UnitTests/Common/UploadFileResponseTests.cs | New tests validating UploadFileResponse constructors and default FailureKind. |
| tests/ByteSync.Client.IntegrationTests/TestHelpers/FixedAdaptiveUploadController.cs | Updates test helper to match new RecordUploadResult signature. |
| src/ByteSync.Common/Business/Communications/Transfers/UploadFileResponse.cs | Adds FailureKind field and constructors for client cancellation/timeout responses. |
| src/ByteSync.Common/Business/Communications/Transfers/UploadFailureKind.cs | New enum defining failure categories. |
| src/ByteSync.Client/Services/Communications/Transfers/Uploading/FileUploadWorker.cs | Refines failureKind based on which CTS canceled; updates attempt timeout computation. |
| src/ByteSync.Client/Services/Communications/Transfers/Uploading/AdaptiveUploadController.cs | Accepts failureKind and short-circuits bandwidth reset for client-side failures. |
| src/ByteSync.Client/Services/Communications/Transfers/Strategies/UploadFailureClassifier.cs | New shared classifier for mapping exceptions + token state to UploadFileResponse. |
| src/ByteSync.Client/Services/Communications/Transfers/Strategies/CloudflareR2UploadStrategy.cs | Uses shared classifier and distinguishes caller-cancellation path. |
| src/ByteSync.Client/Services/Communications/Transfers/Strategies/AzureBlobStorageUploadStrategy.cs | Uses shared classifier and distinguishes caller-cancellation path. |
| src/ByteSync.Client/Interfaces/Controls/Communications/IAdaptiveUploadController.cs | Adds failureKind parameter (defaulted) to RecordUploadResult. |
| src/ByteSync.Client/ByteSync.Client.csproj | Exposes internals to unit tests via InternalsVisibleTo. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Made-with: Cursor
Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Made-with: Cursor
|



Context
A user reported a stress-test failure synchronizing a 150 GB game file with an unreliable network on the receiving side (B-SIDE). After ~6 minutes of E2EE upload of the 909 MB inventory ZIP, slice uploads started failing in cascade with
SocketException (995)/TaskCanceledException, and after 6 retries on a single slice the entireRunFullInventorywas aborted on B-SIDE, which then propagated to A-SIDE asLocal Inventory is cancelled due to a premature end to another Session Member.The B-SIDE log showed a clear pattern: every per-attempt timeout was reported by the upload strategy as a generic HTTP 500, which the adaptive upload controller then interpreted as a bandwidth/server error and used to reset the chunk size to its initial 500 KB on every failure — preventing any real adaptation to the degraded network and making the failure noisier than necessary.
Summary
Tier 1 of the upload-resilience plan:
UploadFailureKindenum (None,ServerError,ClientCancellation,ClientTimeout) carried onUploadFileResponse. New staticUploadFailureClassifiershared by both upload strategies (CloudflareR2UploadStrategy,AzureBlobStorageUploadStrategy): anOperationCanceledExceptionwhoseCancellationTokenis already canceled is now reported asClientCancellationinstead of a synthetic HTTP 500.IAdaptiveUploadController.RecordUploadResultaccepts afailureKind;AdaptiveUploadController.HandleBandwidthResetis short-circuited when the kind isClientCancellationorClientTimeout, so chunk sizing reflects real bandwidth signals only.FileUploadWorkerupgrades a strategy-levelClientCancellationtoClientTimeoutwhen the worker’s per-attempt CTS is the cancellation source (and not the global token).30s → 60s; ceiling:90s → 120s; constants now named (AttemptTimeoutFloorSeconds,AttemptTimeoutCeilingSeconds,SecondsPerMegabyteHeuristic). A single intermittent slice on a degraded network no longer trips the timeout in 30s.Tests
All new and updated test classes target ≥ 85% coverage of the touched logic. Full
ByteSync.Client.UnitTestssuite: 1173 / 1173 passing.UploadFileResponseTests(new) — 6 tests over success, server failure (status + exception),ClientCancellation,ClientTimeout, and defaultFailureKind.UploadFailureClassifierTests(new) — 5 tests covering OCE/TCE with a canceled token, OCE with a live token, generic exceptions andHttpRequestException.CloudflareR2UploadStrategyTests(new, mockedHttpMessageHandler) — 8 tests covering200,429,500,HttpRequestException,IOException, in-flight cancellation via aHangingHttpMessageHandler, and OCE both with and without a canceled caller token.FileUploadWorkerInternalsTests(new) — 12 tests overComputeAttemptTimeoutSeconds(floor 60s, linear 3s/MB region, ceiling 120s),RefineFailureKind, andDetermineCancellationKind.AdaptiveUploadControllerTests(extended) — 4 new tests asserting thatClientCancellationandClientTimeoutdo not reset the chunk size, and that a 500 with nofailureKindstill does (regression guard for existing behavior).FileUploadWorkerTests(extended) — 3 new tests verifying thefailureKindplumbed toRecordUploadResultforSuccess,ClientCancellation, and serverFailure(503)paths.Notes
ByteSync.Clientexposes internals toByteSync.Client.UnitTestsvia anInternalsVisibleToMSBuild item to keepComputeAttemptTimeoutSeconds,RefineFailureKindandDetermineCancellationKindnon-publicwhile remaining testable.Test plan
ByteSync.ClientandByteSync.Client.IntegrationTests.ByteSync.Client.UnitTestssuite (1173 passing).