Adds caller cancellation token propagation in hedging and timeout strategies#3094
Open
DaRosenberg wants to merge 1 commit into
Open
Adds caller cancellation token propagation in hedging and timeout strategies#3094DaRosenberg wants to merge 1 commit into
DaRosenberg wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
The issue or feature being addressed
Fixes #3086 (Timeout strategy does not propagate the caller's
CancellationToken)When a resilience pipeline contains a strategy that substitutes the execution
CancellationTokenwith an internal one (timeout, and also hedging), a caller-initiated cancellation surfaces anOperationCanceledExceptionwhoseCancellationTokenis Polly's internal token rather than the caller's. This breaks the common pattern of letting caller cancellation pass through unchanged while wrapping other failures, because callers cannot reliably compareOperationCanceledException.CancellationTokento their own token.Details on the issue fix or feature implementation
Goal
Polly should throw an
OperationCanceledExceptioncarrying the caller's token if and only if the cancellation was actually caused by a cancellation request on that token — for any pipeline, regardless of which strategies it is composed of or how they are nested.Approach
A repo-wide audit shows that within
Polly.Coreexactly two strategies substitute the execution token: timeout (TimeoutResilienceStrategy) and hedging (TaskExecutionviaResilienceContext.InitializeFrom). Every other strategy and the pipeline plumbing only readcontext.CancellationToken, so they already emit the correct token at their own level.The fix therefore lives in those two strategies, via a small shared helper:
TimeoutRejectedException) is unchanged.Because each substituting strategy normalizes back to its own previous token, the behavior composes correctly through arbitrary nesting: an inner timeout rewrites to the mid-level token, the outer timeout rewrites that to the caller's token, and so on. The simplest case (
AddTimeoutonly) and deeply nested cases both end up with the caller's token.Design decisions and trade-offs
new OperationCanceledException(callerToken).TrySetStackTrace(), matching the existing convention inDelegatingComponent,CompositeComponent, and hedging's pre-execution cancellation check. We deliberately did not chain the original exception as anInnerException; it keeps the behavior consistent with the rest of the codebase, at the cost of not preserving the original deep stack trace in this specific path.Polly.Core) only. The legacy v7PolicyAPI uses a combined linked token and already documents (inAsyncTimeoutEngine/TimeoutEngine) that the token on the exception is not reliable for this determination. We left v7 untouched to avoid changing long-stable behavior.callerToken.IsCancellationRequested, read after the exception has been produced. Because aCancellationTokenis a monotonic latch with no record of when or why it fired, there is an inherent, unavoidable race: if a non-caller cause produces the exception and the caller then cancels within the small window before we inspect the token, the cancellation is attributed to the caller. We chose not to add machinery to fight this, for three reasons:previousToken.IsCancellationRequestedproxy, so this change does not make it worse (it only makes the resulting token more coherent).Tests
Issues(IssuesTests.CancellationTokenPropagation_3086.cs) with end-to-end coverage: the exact issue repro, timeout±retry in both orders, hedging, nested timeouts, a no-substitution baseline, and two "only if" guards (a real timeout still throwsTimeoutRejectedException; an unrelatedOperationCanceledExceptionis preserved when the caller token is not cancelled)All
Polly.Core.Testspass; branch and method coverage remain at 100% and the new code is fully covered.Confirm the following