Skip to content

Hedge cold container-metadata reads across regions (#4253)#4608

Open
NaluTripician wants to merge 1 commit into
Azure:mainfrom
NaluTripician:nalutripician/cosmos-metadata-structural
Open

Hedge cold container-metadata reads across regions (#4253)#4608
NaluTripician wants to merge 1 commit into
Azure:mainfrom
NaluTripician:nalutripician/cosmos-metadata-structural

Conversation

@NaluTripician

Copy link
Copy Markdown
Contributor

Summary

Reworks the fix for #4253 to use the driver's structural cross-region hedging instead of a detached background task (the prior approach, in #4607, conflicted with the crate's stated "no detached tasks" model). Closing #4607 in favor of this.

A cold container/collection metadata read (DocumentCollection point read) that warms the container cache could be routed to a slow or unhealthy preferred region and stall the read — and any operation blocked on warming the cache — past the caller's timeout. If the caller's future is dropped (Rust cancellation) during the control-plane timeout escalation, the inline cross-region failover is preempted and the next region is never tried.

Fix

Widen HEDGEABLE_RESOURCE_TYPES from [Document] to [Document, DocumentCollection] in hedging_eligibility.rs. Container point reads (OperationType::Read + ResourceType::DocumentCollection) become hedge-eligible, so once the hedge threshold elapses the read is speculatively dispatched to a second preferred region in parallel and a healthy region returns within the caller's deadline — structurally removing the single-region-stall / cancellation-preemption window, with no detached tasks. This is exactly the metadata-read phase the eligibility comment already anticipated; the rest of the hedging machinery (evaluate_hedge_eligibility -> should_hedge -> execute_hedged) already handles PipelineType::Metadata, so no other change is required.

Scope (matches the origin issue)

  • ✅ Container/collection cache read — covered.
  • ❌ Partition-key-range reads — intentionally excluded; the .NET origin issue (#5805) explicitly notes the pk-range cache is not affected. (ReadFeed is also not hedge-eligible.)
  • ❌ Container writes (create/replace/delete) — excluded; only idempotent point reads hedge.
  • Account-metadata bootstrap is off-pipeline (not the [cosmos] Metadata retry: dropped caller future silently cancels cross-region failover #4253 scenario).

Safety

Hedging a container read issues at most one extra idempotent GET to another region, and only after the threshold (default min(1000ms, request_timeout/2)) — so there's no duplicate-side-effect risk and no cost on the fast path. All existing hedge invariants (result classification, session tokens, PPCB/PPAF feedback, diagnostics) apply unchanged; the data-plane-only hub-region latch correctly stays off for metadata ops.

Tests

Adds: container read is hedged; container write is not; container ReadFeed is not. cargo clippy --all-targets clean, cargo fmt --check clean, full driver lib suite 1811 passed / 0 failed (37 hedging-eligibility tests including the 3 new).

Fixes #4253

A cold container/collection metadata read (DocumentCollection point read) that
warms the container cache could be routed to a slow or unhealthy preferred
region and stall the read -- and any operation blocked on it -- past the
caller's timeout. If the caller's future was dropped (Rust cancellation) during
the control-plane timeout escalation, the inline cross-region failover was
preempted and no attempt was made against the next region (issue Azure#4253; same
class as dotnet #5805).

Rather than keeping the read alive past caller-drop with a detached task (which
conflicts with the crate's structural, no-detached-tasks model), this widens the
cross-region hedging eligibility to cover DocumentCollection point reads. Once
the hedge threshold elapses, the read is speculatively dispatched to a second
preferred region in parallel, so a healthy region returns within the caller's
deadline -- structurally removing the single-region-stall / cancellation-
preemption window. This is the metadata-read phase the eligibility comment
already anticipated; no other change to should_hedge is required.

Scope matches the origin issue: only the container/collection cache read is
covered. Partition-key-range reads are intentionally excluded (the .NET issue
notes the pk-range cache is unaffected), and container writes / feed reads stay
ineligible (idempotent point reads only).

Adds tests: container read is hedged; container write is not; container ReadFeed
is not.

Fixes Azure#4253

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Cosmos driver’s existing structural cross-region hedging to cover cold container/collection metadata point reads so a slow/unhealthy preferred region can’t stall cache warm-up past the caller’s timeout (addressing #4253) without introducing detached background tasks.

Changes:

  • Expanded hedging eligibility to include ResourceType::DocumentCollection (in addition to Document).
  • Added hedging-eligibility regression tests ensuring container point reads hedge, while container writes and feed reads do not.
  • Added a driver changelog entry describing the behavioral fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/hedging_eligibility.rs Widens hedge-eligible resource types to include container metadata reads and adds targeted tests for container read/write/feed behavior.
sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md Documents the bug fix in the unreleased changelog section.


### Bugs Fixed

- Extended cross-region hedging to cover the container/collection metadata read (`DocumentCollection` point reads). A cold container-cache warm-up routed to a slow or unhealthy preferred region could stall the read — and any operation blocked on it — past the caller's timeout, with the cross-region failover never reached if the caller gave up first. Hedging now speculatively dispatches the read to a second preferred region once the hedge threshold elapses, so a healthy region returns within the caller's deadline. This uses the existing structural hedging mechanism (no detached background tasks) and applies only to idempotent reads; container writes and feed reads are unaffected. ([#4253](https://github.com/Azure/azure-sdk-for-rust/issues/4253))

@simorenoh simorenoh Jun 25, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with copilot - also the issue is being referenced, not the PR link

@NaluTripician NaluTripician left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — looks good, two non-blocking follow-ups

Correct and tightly scoped. Verified against the PR-head source:

  • Gating is sound. should_hedge admits an op only when resource_type ∈ {Document, DocumentCollection} and is_read_only() and operation_type ∈ {Read}. So a container point read qualifies, while create_container (Create) and read_all_containers (ReadFeed) are correctly excluded.
  • Constructor→type mappings match the tests (cosmos_operation.rs): read_containerRead+DocumentCollection, create_containerCreate, read_all_containersReadFeed. The assert_eq!(resource_type(), DocumentCollection) lines guard against constructor drift silently invalidating a test — nice.
  • Not inert. should_hedge/evaluate_hedge_eligibility are consulted on the metadata pipeline path, and the data-plane hub-region latch already stays off for metadata ops (region_latch_none_on_multi_master_or_metadata), so the widening genuinely activates hedging for the #4253 scenario.

Follow-ups (non-blocking)

1. Spec drift. HEDGING_SPEC.md still tracks metadata-read hedging as deferred Phase 2 ("design pass required before scheduling"), and the capability matrix shows it as not-yet-shipped. This PR ships the Container subset, so the design doc now contradicts the code. Please update the spec to move container/collection metadata reads from deferred → shipped.

2. Resolve the spec's own open question. The Phase-2 section explicitly asks: "Hedged metadata reads must not produce stale-cache races when one region returns an older view than another; decide whether to prefer the latest _etag/resource id or the fastest response." This PR effectively answers fastest-wins (default hedge behavior). I think that's safe for the container cache — the partition-key definition we warm on is immutable without a recreate, and staleness is bounded by replication lag + cache TTL (no worse than a single-region read) — but since the spec flagged it, let's record that reasoning rather than resolve it silently.

@simorenoh simorenoh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just two nits on comments/ changelog

Comment on lines +39 to +40
/// past the caller's timeout, which is the cross-region-failover-preemption
/// scenario in issue #4253. Both are idempotent reads.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to mention the relevant issue here


### Bugs Fixed

- Extended cross-region hedging to cover the container/collection metadata read (`DocumentCollection` point reads). A cold container-cache warm-up routed to a slow or unhealthy preferred region could stall the read — and any operation blocked on it — past the caller's timeout, with the cross-region failover never reached if the caller gave up first. Hedging now speculatively dispatches the read to a second preferred region once the hedge threshold elapses, so a healthy region returns within the caller's deadline. This uses the existing structural hedging mechanism (no detached background tasks) and applies only to idempotent reads; container writes and feed reads are unaffected. ([#4253](https://github.com/Azure/azure-sdk-for-rust/issues/4253))

@simorenoh simorenoh Jun 25, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with copilot - also the issue is being referenced, not the PR link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

[cosmos] Metadata retry: dropped caller future silently cancels cross-region failover

3 participants