[Internal] Add dynamic timeout escalation spec#4244
Conversation
Add DYNAMIC_TIMEOUTS_SPEC.md to the driver docs describing escalating connection and request timeouts on transport retries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve 3 of 4 open questions based on review feedback: - Commit to Context-based approach (Option A) for delivering per-attempt timeouts, removing Options B/C. No changes needed in azure_core or typespec_client_core. - Decide hedged attempts always start at tier 0 of the timeout ladder since they target independent regional endpoints. - Record effective per-attempt timeout values in DiagnosticsContext for observability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the dynamic timeout spec to reflect the new pipeline architecture from PR Azure#3887 (Driver Transport step 02): - Replace MAX_TRANSPORT_RETRIES/TransportRetry references with the new FailoverRetry/SessionRetry model and failover_retry_count - Move retry loop references from cosmos_driver.rs to the 7-stage loop in operation_pipeline.rs - Change timeout delivery from Context type-map to TransportRequest field, matching the new pipeline data flow - Update goal from 'increase retry count' to 'leverage existing retry budget' since max_failover_retries already defaults to 3 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Propose adding an optional per-request timeout field to typespec_client_core::http::Request with set_timeout()/timeout() methods. The reqwest HttpClient implementation applies it via reqwest::RequestBuilder::timeout(). For connection timeout escalation, the reqwest client is built with the maximum ladder value (5s) and the per-request timeout provides the tighter overall bound on each attempt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove all connection timeout escalation from the spec. Connection timeouts remain static (configured via ConnectionPoolOptions). Only request timeouts are escalated per retry attempt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Connection timeouts use a failure-rate adaptive model instead of per-retry escalation: - Start at 1s (sufficient for any cloud/datacenter network) - ShardedHttpTransport monitors connection failure rate - On sustained failures, create new HttpClient instances with 5s connect_timeout (one-time persistent transition) - No azure_core changes needed — entirely internal to the sharded transport Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…om/Azure/azure-sdk-for-rust into tvaron3/dynamicTimeouts
- Remove direct mode non-goal (won't be implemented) - Resolve connection failure threshold: >3 consecutive failures per endpoint triggers 1s to 5s escalation - Old shards are actively marked unhealthy on escalation for immediate reclamation by health sweep - Explicitly state per-endpoint failure tracking scope - Remove resolved open questions 2 and 3 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ShardedHttpTransport is no longer 'future integration' — it exists in sharded_transport.rs with per-shard health sweeps - Fix HttpClientFactory::create() → build() - Fix HttpClientConfig connect_timeout claim — connect_timeout is sourced from ConnectionPoolOptions::max_connect_timeout() inside the factory build() method, not as a config field - Reference existing per-request timeout mechanism in transport pipeline (azure_core::sleep racing the HTTP future) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix section numbering (missing §9) - Add rationale for 65s tier-3 jump - Add cross-reference for ladder saturation in §5 - Add typespec_client_core cross-crate PR strategy - Clarify adaptive connection timeout: idempotent transition, per-endpoint aggregation, no-fallback rationale Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Major architectural changes to align with codebase patterns: - Move timeout escalation to transport pipeline level, indexed by transport-level timeout retries (not operation failover) - Use deadline-based pattern (Instant) instead of adding a Duration field to TransportRequest — feeds into existing remaining_request_timeout() mechanism - Remove TimeoutLadder struct; use free function + constants matching DOP style - Remove AttemptTimeoutDiagnostics struct; record flat field directly on RequestDiagnostics - Clarify sleep-race is enforcement mechanism; Request::set_timeout is long-term replacement (never both active) - Use AtomicBool/AtomicU32 for adaptive connection timeout state matching lock-free hot-path pattern - Reference PipelineType for ladder selection - Explicit clamping order: pool bounds first, deadline last - Add note that min_* timeout fields are unused placeholders Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…iews' into tvaron3/dynamicTimeouts
- Fix metadata min timeout max bound: 65s → 6s (matches code) - Reconcile adaptive 1s initial vs existing 5s default: 1s is a new internal initial value, transitions to max_connect_timeout - Reference Step 6 (PR Azure#3957, now merged) as the introducing PR for ShardedHttpTransport - Pull latest from release/azure_data_cosmos-previews Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace typespec_client_core/azure_core dependency with driver-internal approach using reqwest::RequestBuilder::timeout() directly per request. No cross-crate changes needed. Key changes: - Remove all typespec_client_core Request/ClientMethodOptions proposals — driver controls reqwest directly via HttpClientFactory - Enforce per-attempt timeout via reqwest::RequestBuilder::timeout() instead of azure_core::sleep() race - Add options hierarchy section showing how users control timeout bounds (ConnectionPoolOptions) and e2e deadline (RuntimeOptions) - Add thin client (Gateway 2.0) future ladder note: 6s, 6s, 10s - Fix markdown lint: table alignment, heading levels, list formatting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses 15 findings from the PR Deep Reviewer pass: - §1: Reframe Goal Azure#4 as a transport-level timeout-retry counter; add non-goal clarifying users have no per-attempt knobs. - §3: Note that per-attempt enforcement needs concrete reqwest::Client access (HttpClient trait exposes no per-request timeout); add 'Why not azure_core::RetryPolicy' rationale. - §4: Mark thin-client (Gateway 2.0) ladder as deferred. - §5: Add counter interactions table with strict evaluation order, MAX_TIMEOUT_RETRIES hard bound, and a single canonical pseudocode that does not mutate request.deadline. - §6: Add 'Per-Attempt Deadline ≠ request.deadline' subsection. - §7: Cite min ≤ max invariant; document that min_*_request_timeout becomes a hard floor (migration note). - §8: Rewrite dispatch strategy — driver-internal DriverHttpClient wrapper exposing the concrete reqwest::Client; require removing the unconditional builder.timeout() in http_client_factory.rs:173; document per-attempt Duration threading through AdaptiveTransport and ShardedHttpTransport. - §9: Use ErrorKind::Connection (not reqwest::Error::is_connect); acknowledge long-lived-process transient-blip degradation as an accepted trade-off; distinguish new per-endpoint consecutive_connect_failures from the existing per-shard counter. - Header date bumped to 2026-04-21. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds an internal design spec documenting how azure_data_cosmos_driver should implement per-attempt request timeout escalation (laddered timeouts on transport retries) plus an adaptive connect-timeout elevation mechanism in ShardedHttpTransport.
Changes:
- Introduces a new
DYNAMIC_TIMEOUTS_SPEC.mddescribing timeout ladder values for data-plane vs metadata requests. - Specifies retry-loop integration details (counter ordering, clamping order vs
ConnectionPoolOptionsand E2E deadline). - Specifies an adaptive connection-timeout model (1s → 5s) based on consecutive connection failures.
Applies Copilot reviewer suggestions and fixes the CI Build Analyze
(cspell) failure on the dynamic timeouts spec:
- §1 Non-goals: use instance-method form
`OperationOptions::new().with_end_to_end_latency_policy(...)` instead
of the inaccurate associated-function form.
- §4 Thin client ladder: rewritten to match the actual types
(`HttpClientConfig::dataplane_gateway20`, `AdaptiveTransport::gateway20`,
`TransportMode::Gateway20`) and clarify that "thin client" refers to
a future Gateway20 mode with distinct retry semantics, not the existing
HTTP/2 prior-knowledge Gateway20 transport.
- §7 Dispatch signatures: drop stale line-number citations for
`AdaptiveTransport::send_with_dispatch` and `ShardedHttpTransport::send`
and switch to symbol-based references. Mark the hedging path as future
(not yet implemented).
- §9 Adaptive connection timeout: clamp the 1s initial value to
`ConnectionPoolOptions::{min,max}_connect_timeout()` so the adaptive
mechanism never starts outside the configured bounds. Update the flow
diagram to reflect the clamp.
- Replace "cutover" (now gone with the §4 rewrite) and "unmutated"
(→ "unchanged") to clear the cspell failures reported by the
`Build Analyze` job.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop the Java SDK cross-reference tables (gateway/direct mode timeouts) and the 'Why not azure_core::RetryPolicy?' subsection from the dynamic timeouts spec. These sections are no longer needed in the design document. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove §2 'Current Behavior' which detailed static timeout defaults and transport retry internals. These are implementation details that don't belong in the design spec. Renumber remaining sections 3-11 to 2-10 and update all cross-references. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
analogrelay
left a comment
There was a problem hiding this comment.
Looks good. Some general clarifications, but I think it's solid.
| Progressively increasing the timeout on each retry attempt gives the operation a better chance of | ||
| succeeding without requiring the first attempt to use an unnecessarily long timeout — which would | ||
| add latency to the common case. |
There was a problem hiding this comment.
It's not so much that increasing the initial timeout causes latency, because we still conclude the request when the response is received no matter what the timeout was. It's that long timeouts mean we can't react as quickly to network issues. Maybe we just need to use a fresh connection to resolve the issues. It's like hedging, but instead of issuing a second request to a different region, we cancel the current one and start a new one to try and "reset" things. A tighter initial timeout means we can react quickly to intermittent network issues. But keeping it small means we risk thrashing the network with a bunch of retries that make things worse. Scaling the timeout interval up gives us a balance of the two.
| When hedging is enabled (see `TRANSPORT_PIPELINE_SPEC.md` §3.2), the hedged attempt runs in a | ||
| secondary region as a speculative execution. **Hedged attempts always start at tier 0 of the | ||
| timeout ladder**, independent of the primary attempt's ladder position. |
There was a problem hiding this comment.
And, to be clear, Hedging is an Operation Pipeline action, right? So from the transport pipeline's perspective, there's no existing timeout ladder state for a hedged request.
| 1. `HttpClientFactory::build()` returns a driver-owned wrapper that exposes the underlying | ||
| concrete `reqwest::Client` (e.g., `Arc<DriverHttpClient>` where `DriverHttpClient` holds a | ||
| `reqwest::Client` directly). The wrapper still implements `azure_core::http::HttpClient` for | ||
| any other consumer that needs the trait, but the driver's transport layer holds and uses the | ||
| concrete type. |
There was a problem hiding this comment.
Call it HttpClient, just move it into the driver :), and there's no need to still implement azure_core::http::HttpClient IMO.
| 2. `ClientShard` (`sharded_transport.rs`) stores the `Arc<DriverHttpClient>` instead of | ||
| `Arc<dyn HttpClient>`. `ShardedHttpTransport::send()` calls | ||
| `client.dispatch_with_timeout(req, attempt_timeout)` which internally builds the | ||
| `reqwest::Request` and applies `RequestBuilder::timeout(attempt_timeout)` before `.send()`. |
There was a problem hiding this comment.
We could just put the timeout on our transport request structure, that would be similar to how reqwest does it. Just a style choice rather than a correctness thing though.
| This is intentionally a **driver-internal** concrete-reqwest dispatch path — it does not change | ||
| `typespec_client_core` or the public `azure_core::http::HttpClient` trait. Consumers outside the | ||
| driver who pass an `Arc<dyn HttpClient>` into the driver are not supported for timeout | ||
| escalation; the driver constructs its dispatch clients itself via `HttpClientFactory` regardless | ||
| of any externally supplied `HttpClient`. |
There was a problem hiding this comment.
We shouldn't allow using Arc<dyn azure_core::http::HttpClient>. We need our own abstraction now anyway.
…iews' into tvaron3/dynamicTimeouts
Sync code references with latest codebase: - Factory returns Arc<dyn TransportClient> not Arc<dyn HttpClient> - TODO is at transport_pipeline.rs:225-227 (not 247-249) - OperationOptions.end_to_end_latency_policy is a public field, not a builder method — update prose and pseudocode examples - Merge upstream/release/azure_data_cosmos-previews (session token helpers, upsert fix, response headers, query/item driver cutover) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document how reqwest per-request timeouts affect the underlying connection for HTTP/1.1 vs HTTP/2: - HTTP/1.1: dropping the response future cancels the hyper H1 callback, the dispatcher calls close() on the connection, and it is not returned to the pool. Each timeout kills a TCP conn. - HTTP/2: dropping the response future sends RST_STREAM(CANCEL) for that stream only; the multiplexed connection stays alive and is reused immediately. Also covers implications for ShardedHttpTransport shard health and h2 reset-stream accounting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| - **Shard health**: The shard health tracking in `ShardedHttpTransport` should count | ||
| timeout-induced connection loss the same as any other connection failure for HTTP/1.1. For | ||
| HTTP/2, a stream-level timeout is **not** a shard-health signal — the connection is still | ||
| healthy. |
There was a problem hiding this comment.
We should probably track stream failures, and refresh connections if they hit a certain threshold. But that can be a future iteration on this.
Dynamic Timeout Spec
Adds
DYNAMIC_TIMEOUTS_SPEC.mdtosdk/cosmos/azure_data_cosmos_driver/docs/describing the design for escalating per-attempt request timeouts on transport retries plus adaptive connection-timeout tuning.This PR replaces the previously-closed #3871. It contains the original spec plus the revisions made in response to that PR's deep review (see "Changes since #3871" below).
Motivation
When a request times out and is retried with the same timeout, it often fails again — particularly on slow connections, regional latency spikes, or first-byte tail latencies. Escalating the timeout on each retry attempt gives the operation a meaningfully better chance of succeeding without making the initial attempt unnecessarily slow.
Key Design Decisions
6s → 10s → 65sper retry attempt.5s → 10s → 20s, aligned with the Java SDKDatabaseAccountladder.max_failover_retries = 3already maps cleanly onto the 3-tier ladder; a hardMAX_TIMEOUT_RETRIESbound prevents unbounded growth if that default ever changes.ConnectionPoolOptionsmin/max bounds still clamp the effective values.1sand escalates to5safter>3consecutive connection failures per endpoint (one-time persistent transition inShardedHttpTransport).reqwest::RequestBuilder::timeout— threaded as aDurationparameter through the dispatch chain rather than being folded intorequest.deadline, so the deadline retains its E2E meaning.DriverHttpClientwrapper exposing the concretereqwest::Clientso the dispatch site can actually call.timeout()(the previousArc<dyn HttpClient>made the per-request setter unreachable).builder.timeout(config.request_timeout)athttp_client_factory.rs:173— that client-wide timeout would override every per-attempt value.Cross-SDK Alignment
The Java SDK already implements timeout escalation for gateway/metadata HTTP calls (QueryPlan:
0.5s→5s→10s, DatabaseAccount:5s→10s→20s). This spec extends the pattern to both data plane and metadata requests in the Rust driver.Changes since #3871
The spec was substantially revised in response to the deep review on the previous PR. Highlights:
HttpClientFactoryand theDriverHttpClientwrapper that exposesreqwest::Clientto the dispatch site.Durationplumbing — explicitly threaded through dispatch instead of mutatingrequest.deadline; counter-ordering table added to make the interaction with E2E deadline unambiguous.MAX_TIMEOUT_RETRIESbound added to defend against future increases tomax_failover_retries.ErrorKind::Connectionused for failure classification instead ofreqwest::Error::is_connect()(the latter doesn't survive the abstraction boundary).min_*_request_timeoutmigration note clarifies how existingConnectionPoolOptionsvalues continue to clamp the new ladder.azure_core::RetryPolicyrationale explains why a custom escalation policy is needed at the transport layer.Deferred
transport_pipeline.rs)