You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(ext-workflow): retry transient gRPC errors in wait_for_orchestration_*
wait_for_orchestration_start and wait_for_orchestration_completion call
the workflow runtime through the local Dapr sidecar. Immediately after a
sidecar restart (placement re-dissemination not yet applied, actor
registration still propagating, etc.), the sidecar can return
FAILED_PRECONDITION or UNAVAILABLE for an instance whose persistent
state is intact. The previous implementation surfaced these as a hard
error to the caller, so a client polling a long-running workflow would
fail permanently even though the workflow itself was recoverable.
Wrap both wait methods in a single _call_with_transient_retry helper:
- Retry FAILED_PRECONDITION and UNAVAILABLE with exponential backoff
(0.5s, doubling, capped at 5s).
- Respect the caller's timeout. timeout in (0, None) means unbounded.
The first call passes the user's timeout verbatim so behavior on a
healthy runtime is unchanged. On retry, the per-call gRPC deadline
is the remaining budget against a monotonic deadline anchored to the
start of the loop.
- DEADLINE_EXCEEDED and budget exhaustion both surface as the public
TimeoutError (preserved through a private _TransientTimeout
sentinel).
- Non-transient RpcErrors propagate immediately, unchanged.
Behavior on a healthy runtime is unchanged: the first call succeeds and
no retry loop runs.
Signed-off-by: Javier Aliaga <javier@diagrid.io>
0 commit comments