Cosmos: Adds Cross-Region Hedging Design Spec to Driver Crate by kundadebdatta · Pull Request #4330 · Azure/azure-sdk-for-rust

kundadebdatta · 2026-05-03T21:45:34Z

Summary

Adds HEDGING_SPEC.md to the azure_data_cosmos_driver crate's docs/ folder. This is a doc-only PR — no production code, no API changes, no test changes, no Cargo.toml changes. Single file, +1569 lines.

The spec is the design document for the cross-region hedging (AvailabilityStrategy / HedgingStrategy) feature that will be implemented in a follow-up series of PRs. It is being landed on the previews branch ahead of implementation so reviewers can iterate on the design in-tree alongside the companion specs already there.

Why hedging?

When a Cosmos DB region is degraded but not fully down (elevated tail latency, slow GC, partial network blip) the existing failover paths — PPAF (per-partition automatic failover) and PPCB (per-partition circuit breaker) — do not trigger because the region eventually returns successful responses. Applications see p99 / p99.9 latency spikes on requests routed to the slow region.

Cross-region hedging issues a speculative second request to an alternate region after a configurable threshold and returns whichever response arrives first, bounding tail latency at roughly threshold + cross-region-RTT.

Scope of this PR

In scope: Design document only.

Out of scope (follow-up PRs):

HedgingStrategy / AvailabilityStrategy types
should_hedge() / is_final_result() pure functions
execute_with_hedging() orchestrator
HedgeDiagnostics
Integration into cosmos_driver.rs
PPAF default-strategy auto-enable wiring
Unit + fault-injection tests

What the spec covers (17 sections, ~1,569 lines)

§	Topic
1	Goals, non-goals, and the phased operation-type rollout (Phase 1 reads + writes; Phase 2 Query + ReadMany; Phase 3 ChangeFeed + metadata; Future sprocs/triggers/UDFs)
2	Background: full walkthrough of the .NET v3 reference implementation (`CrossRegionHedgingAvailabilityStrategy`), including PPAF/PPCB integration
3	Architectural overview
4	Configuration surface (`HedgingStrategy`, builder, environment variables)
5	Eligibility rules (`should_hedge()` decision matrix) and default hedging enablement driven by PPAF with a full activation truth table
6	Hedging algorithm — primary at t=0, hedge fan-out at `threshold + N · step`, `tokio::select!` race, drain loop
7	Final-vs-transient status code classification (incl. explicit `403` / `403/3` rows)
8	Operation-pipeline integration, including the explicit local-only-retry contract (`ExcludeRegions` invariant for retries inside a hedge)
9	Interaction with PPAF, PPCB, session consistency, throughput control, end-to-end timeout
10	Diagnostics & observability (`HedgeDiagnostics` shape, attachment contract, *reserved `cosmos.hedge.` tracing/metrics surface**)
11	Options API design and layered resolution priority table (operation > client > SDK default > none)
12	Cancellation & resource cleanup (lock-free `tokio_util::sync::CancellationToken` hierarchy)
13	Multi-write region write hedging (409/412 risk, idempotency considerations)
14	Error handling & edge cases (RU accounting, late-hedge budget)
15	Test plan
16	Implementation phases
17	Open questions (most resolved during spec review)

Design highlights

PPAF-only auto-enable, matching .NET v3 exactly. Enabling PPCB alone
does not auto-enable hedging — PPCB is failure-driven and does not by
itself signal a desire for latency hedging.
Default thresholds when auto-enabled by PPAF:
min(1000ms, request_timeout / 2) / 500ms, mirroring .NET
(Java's static 500ms / 100ms is documented in the cross-SDK comparison
for reference but not adopted).
Lock-free design throughout — tokio_util::sync::CancellationToken
for cancellation, Arc<Bytes> for zero-copy hedge body sharing.
Type-safe disabled sentinel via AvailabilityStrategy::Disabled enum
variant rather than .NET's nullable TimeSpan? / sentinel object pattern.
Always-full HedgeDiagnostics attached whenever a strategy was active,
avoiding .NET's two-shape (fast-path vs drain-path) bookkeeping.
Explicit contracts that .NET and Java rely on implicitly: local-only
retry contract (§8.4), diagnostics attachment contract (§10.1),
preferred-regions precondition (§5.2).
Reserved telemetry namespace under cosmos.hedge.* for tracing events
and metrics, ready for a future observability PR without breaking changes.

Cross-SDK alignment

The spec was reviewed against both the .NET v3 (CrossRegionHedgingAvailabilityStrategy) and Java v4 (ThresholdBasedAvailabilityStrategy) implementations. Final alignment summary:

Capability	.NET v3	Java v4	Rust spec
Auto-enabled by PPAF	✅	✅	✅
Auto-enabled by PPCB alone	❌	❌	❌
Default thresholds	`min(1000ms, RT/2)` / `500ms`	`500ms / 100ms`	`min(1000ms, RT/2)` / `500ms` (= .NET)
Phase-1 read hedging	✅	✅	✅
Phase-1 write hedging on multi-master	✅	✅	✅
Query / ReadMany	✅	✅	Phase 2
ChangeFeed + metadata	✅ / ❌	❌ / ❌	Phase 3

Validation

Markdown only — renders correctly in GitHub preview
All cross-SDK source citations link to specific file/line ranges on
Azure/azure-cosmos-dotnet-v3 (main) and Azure/azure-sdk-for-java (main)
All internal §N.N references resolve to existing spec sections
No .rs, .toml, Cargo.lock, or test changes; CI affecting only the
doc-render job

Branch / target

Source branch: users/kundadebdatta/3935_add_hedging_spec
Target branch: release/azure_data_cosmos-previews
Commits: 2 (Code changes to add hedging spec. →
Code changes to update hedging spec.)
Files changed: 1
Diff: +1,569 / -0

Follow-up work tracker

The implementation will land in subsequent PRs roughly tracking §16 phases:

Phase 1 — HedgingStrategy types + should_hedge() + is_final_result()
- orchestrator + reads/writes (multi-master) + PPAF auto-enable
Phase 2 — Query / ReadMany hedging
Phase 3 — Change feed / metadata operation hedging
Future — sprocs/triggers/UDFs, adaptive thresholds, telemetry surface

NaluTripician

Cross-checked the spec against the actual .NET source in azure-cosmos-dotnet-v3 (CrossRegionHedgingAvailabilityStrategy.cs, AvailabilityStrategy.cs, DocumentClient.InitializePartitionLevelFailoverWithDefaultHedging). Overall the spec is well-grounded — author clearly read the .NET code, not just the docs. One material issue around the SDK-default write-hedging behavior on multi-master, plus a handful of minor nits inline.

NaluTripician · 2026-05-05T15:54:43Z

+                        request_number,
+                        region: regions[request_number].clone(),
+                        result: Err(cancelled_error()),
+                    }


Missing app-cancellation re-raise behavior from .NET. The .NET impl deliberately awaits the faulted task when the app token is the source of cancellation (lines 209–212 of CrossRegionHedgingAvailabilityStrategy.cs):

if (applicationProvidedCancellationToken.IsCancellationRequested) { await (Task<HedgingResponse>)completedTask; }

This is what allows RequestSenderAndResultCheckAsync to rethrow as CosmosOperationCanceledException with the trace attached (lines 358–363). The Rust pseudocode collapses both "hedge cancellation" and "app cancellation" into a generic cancelled_error(), losing the trace context that .NET preserves.

Suggest distinguishing app-token cancellation from hedge-token cancellation in the orchestrator and propagating the former with full diagnostics, mirroring .NET's behavior.

NaluTripician · 2026-05-05T15:54:43Z

+|---:|-----------|--------|
+| 1 | No strategy resolved (or `AvailabilityStrategy::Disabled`) | No |
+| 2 | Application preferred-region list empty (no fan-out targets) | No |
+| 3 | `ResourceType != Document` | No |


This rule won't survive Phase 3. §16 Phase 3 adds metadata operations (Database / Container / Offer / Throughput), all of which have non-Document ResourceType. As written, this row is a hard reject for all non-Document ops — when Phase 3 lands, this row needs to become phase-/resource-type-gated rather than a blanket exclusion.

Worth either:

Tagging this row with "Phase 1 only — see §16 Phase 3 for metadata coverage", or

Restating it as ResourceType not in <phase-allowed set> so the eligibility rule's evolution is encoded in one place rather than getting rewritten in Phase 3.

…datta/3935_add_hedging_spec

Copilot

Pull request overview

Adds a new in-repo design specification (HEDGING_SPEC.md) to the Cosmos driver crate docs, describing the planned cross-region hedging (availability strategy) feature and its intended integration with existing routing/retry systems.

Changes:

Adds a comprehensive design spec for cross-region request hedging (configuration, eligibility, algorithm, diagnostics, and phased rollout plan).
Documents intended interactions with PPAF/PPCB, session consistency, throughput control, deadlines, and cancellation.

+### Operation-type scope (phased)
+
+| Operation type | Phase 1 | Phase 2 | Future |
+|---|:---:|:---:|:---:|
+| Document point reads (GetItem) | ✅ | ✅ | ✅ |
+| Document point writes on multi-master (Create/Replace/Upsert/Delete/Patch) | ✅ | ✅ | ✅ |
+| Queries (`QueryItems`) | ✅ | ✅ | ✅ |
+| `ReadMany` | ✅ | ✅ | ✅ |
+| Change feed | ✅ | ✅ | ✅ |
+| Metadata operations (Database / Container / Offer / Throughput) | ❌ | ✅ | ✅ |
+| Stored procedures / triggers / UDFs execution | ❌ | ❌ | 🟡 candidate |


…datta/3935_add_hedging_spec

FabianMeiswinkel · 2026-05-08T10:38:14Z

+### Solution: Speculative Hedging
+
+**Hedging** sends the same request to an alternate region after a latency threshold
+is exceeded, and returns whichever response arrives first. This bounds tail latency


Suggested change

is exceeded, and returns whichever response arrives first. This bounds tail latency

is exceeded, and returns whichever finite response arrives first. This bounds tail latency

FabianMeiswinkel · 2026-05-08T10:39:43Z

+**Hedging** sends the same request to an alternate region after a latency threshold
+is exceeded, and returns whichever response arrives first. This bounds tail latency
+at roughly `threshold + cross-region-RTT` instead of waiting for the slow region to
+respond.


Should frequently choosing the hedged respinse from second region also trigger PPCB? Like when identifying at least for one partition the second region is always faster - why not skip even trying in the frist region after few iterations? This is a bit like Bhaskhars initial proposal for hedging - which was more dynamic what we have now - please follow up with Bhaskar end let him give you the docs he had - we should revaluate whether at elast some aspects of his initial idea would make sense to be now adopted in Rust. We were not able to convice customers of Java V4 at the time - but it is definitely worth looking at it again.

FabianMeiswinkel · 2026-05-08T10:43:14Z

+4. **Complementary to failover** — hedging handles *latency*; PPAF/PPCB handle
+   *failures*. They compose without interference.
+5. **Resource-safe** — hedged requests that lose the race are cancelled promptly to
+   avoid wasted RU/s and transport resources.


Intend makes sense - but it is not perfectly "safe" - so I would set expecttaions a bit more carefully

FabianMeiswinkel · 2026-05-08T10:45:18Z

+| `ReadMany` | ✅ | ✅ | ✅ |
+| Change feed | ✅ | ✅ | ✅ |
+| Metadata operations (Database / Container / Offer / Throughput) | ❌ | ✅ | ✅ |
+| Stored procedures / triggers / UDFs execution | ❌ | ❌ | 🟡 candidate |


tirggers are never executed separately - a header on normal point operation is sent to service indicating that service should execute the trigger - so, this is completely independent of hedging - if you hedge the createItem it is hedged with or without triggers. Just remove triggers and UDFs here - only Stored Procedure execution is its own "operation"

FabianMeiswinkel · 2026-05-08T10:48:55Z

+operation coverage where it is safe and cheap. Sprocs / triggers / UDFs
+are deferred to Future because their server-side execution model
+interacts with hedging in non-obvious ways (server-side state,
+idempotency). See §16 for the full rollout plan.


triggers / UDF ireelevant

StoredProcedures or point writes all have the same issue with idempotency in Multi-Master. I think in Phase 1 (and maybe all we ever do) we should stick to read/query only. Multi_master with hedging could result in a significantly higher number of conflicts - and that is usually not something you would want - so, my 2 cents - it was a mistake to ever enable hedging for write sin multi-master in Java - and very few customers if any have enabled it (also requires opting into enabling retriable writes). I would scope this down to Pahse 1 - reads, Phase 2 metadata operations - and explicitly not enable hedging for writes.

FabianMeiswinkel · 2026-05-08T10:52:15Z

+so a PPAF-enabled deployment ends up running with all three (PPAF + PPCB +
+hedging) active simultaneously.
+
+**The Rust driver matches .NET exactly:** the SDK-default hedging strategy


To me we should simplify this. Always enable PPAF (if server allows), PPCB and Hedging. PPCB and Hedging we could allow to opt-out as an escape hedge. PPAF will always follow server signal. Whether we ship with the escape hedge or force enablement of PPCB and hedging we can decide after some more stress tests.

FabianMeiswinkel · 2026-05-08T10:54:08Z

+
+**Rationale:** Hedging must operate above the retry loop because each hedged
+request needs its own independent retry state, session tokens, and endpoint
+resolution. The operation pipeline already handles per-region retries; hedging


Needs to dynamically add excluded regions - also we shoudl make sure that any cloning/allocations only happen when heding threshold kicks in - any request not needing heding should be free/super cheap. I would call this out here explicity as a design constraint

Also I am wondering whether we really need to hedge ever to more than one region - I mean two Azure regions Out is a scenario that ws extremely unlikely. And when cleints thrigger heding to more regions because the issue is on cleint side it usually makes things worse. My 2 cents - gate hedging to at most one region. Default routing policy in Rust uses proximity - if a customer intnetionally chooses a slow 2nd preferred region we should honor that choice even if 3rd region might be faster

will simplify this and IMO is teh right design form what we have learned so far.

FabianMeiswinkel · 2026-05-08T10:58:20Z

+   cancelled when a winner is found.
+3. **Immutable request cloning** — the `CosmosOperation` (which contains `&[u8]`
+   body, headers, partition key) is cheap to clone (bytes are `Arc`-backed).
+4. **Respect existing systems** — hedging does not interfere with PPAF/PPCB,


I think you might want to recosnider for PPCB - if first region is so slow that hedged second region wins repeatedly this should probably trigger PPCB? Like e2e timeouts would do.

FabianMeiswinkel · 2026-05-08T11:00:52Z

+/// Configuration error returned by fallible `HedgingStrategy` constructors.
+#[derive(Debug, thiserror::Error)]
+pub enum HedgingConfigError {
+    #[error("hedging threshold must be > 0, got {0:?}")]


I think instead of making it fallible I would use newtypes for Duration that enforces the guard rails natively - that way you do not have to worry about validations.

FabianMeiswinkel · 2026-05-08T11:08:33Z

+> read account that has only one *write* region should still hedge writes
+> only when the write list has ≥ 2 entries.
+
+### 5.2 Default Hedging Enablement Driven by PPAF


The only reason why we combined PPAF and enabling hedging in :Net and Java was that PPAF was an opt-in customers have to do explicitly anyway - and it was the only way to enable hedging as automated as possible without being possibly breaking.

In Rust tehse are independnet features - and heding should be on indpednent of PPAF by default - maybe we allow an opt-out (could be by just allowing threshold that is artificially high) - no threshold-step needed anymore if we gate hedging to at most one region. That simplifies the API surface a lot IMO.

FabianMeiswinkel · 2026-05-08T11:12:21Z

+> still pending.
+
+```rust
+async fn execute_with_hedging(


i skipped this for now - will become much simpler when we allo hedging to at most one region.

FabianMeiswinkel · 2026-05-08T11:17:45Z

+/// Diagnostic information about a hedging execution, attached to the winning
+/// response.
+#[derive(Clone, Debug)]
+pub struct HedgeDiagnostics {


Seems too verbose - didn't we align on a much narrower surface area in .Net/for IC3?

FabianMeiswinkel

LGTM overal - but I think there are a few areas that need further iterations

No correlation at alll betwene hedging and PPAF
Limit heding to at most one region?
Don't allow heding for writes at all?
Can config surface area be minimized - do we need more than the threshold? Can that be the way to opt-out?
Should have second-region hedging winner impact PPCB?
Alignment with TRANSPORT_PIPELINE_SPEC.md
Alignment with FeedOperation spec Ashley is working on - hedging at the level proposed might not work well wihth queries (where hedging really shoud happen for each "Page" individually.

FabianMeiswinkel · 2026-05-08T11:29:46Z

+
+## 3. Architectural Overview
+
+### 3.1 Where Hedging Sits in the Driver


This contradicts with TRANSPORT_PIPELINE_SPEC §4.2 - let us discuss what is the better aproach - but might need reconciliation with the pipeline spec.

🔴 Blocking — Direct contradiction with TRANSPORT_PIPELINE_SPEC.md §4.2 — needs reconciliation, not parallel specs

File: HEDGING_SPEC.md §3.1, §6 vs. TRANSPORT_PIPELINE_SPEC.md §4.2

The two specs disagree on essentially every design axis:

Axis TRANSPORT_PIPELINE_SPEC §4.2 HEDGING_SPEC (this PR)

Layer OperationAction::Hedge { secondary_routing } returned by evaluate_transport_result inside the pipeline loop Orchestrator wraps execute_operation_pipeline() from above

Fan-out Single secondary region (max 2 concurrent) Up to N regions, progressive timer

Default Enabled by default for all ops (writes only on MWR) Off by default; auto-enabled only by PPAF

Threshold Dynamic, P99-based, clamped 50–4000 ms Static (min(1000ms, RT/2) / 500ms)

Decision enum New OperationAction::Hedge variant (TPS line 463) No new variant; orchestration is external

ExecutionContext Has dedicated Hedging value (TPS line 300) Not addressed

FabianMeiswinkel · 2026-05-08T11:35:22Z

+|---|:---:|:---:|:---:|
+| Document point reads (GetItem) | ✅ | ✅ | ✅ |
+| Document point writes on multi-master (Create/Replace/Upsert/Delete/Patch) | ✅ | ✅ | ✅ |
+| Queries (`QueryItems`) | ✅ | ✅ | ✅ |


For FeedOperations the model with hedging on top of execute_operation is non-trivial and needs careful integartion with teh query pipeline and Ashley's FeedRange spec. IMO this is a bit of an open question and will need some mroe thoughts and alignment between you and Ashley. The TRANSPORT_PIPELINE_SPEC model makes it a bit simpler - but in either case this needs some more investigation

simorenoh

Just nits - looks good, interested on what Fabian mentioned on hedging and PPCB basically playing together after a region has been picked several times through hedging.

And on the decision on enabling this by default like PPCB/ using a single additional region only.

simorenoh · 2026-05-08T20:45:34Z

+
+### 2.3 Eligibility — `ShouldHedge()`
+
+Hedging applies **only** to document-level point operations:


Since we mentioned queries/ read many above as part of the inclusion

Suggested change

Hedging applies **only** to document-level point operations:

Hedging applies **only** to document-level operations:

simorenoh · 2026-05-08T21:08:20Z

+
+```rust
+/// Sentinel value used to disable hedging for a specific operation when a
+/// client-level strategy is configured.


If we're enabling hedging by default per the other design docs this also applies to the entire client

simorenoh · 2026-05-08T21:20:20Z

+public static AvailabilityStrategy CrossRegionHedgingStrategy(
+    TimeSpan threshold,        // Time before first hedge fires
+    TimeSpan? thresholdStep,   // Time between subsequent hedges
+    bool enableMultiWriteRegionHedge = false);  // Opt-in for writes on MM


seems risky but so long as we have an explicit config for this makes sense

NaluTripician

Hedging ↔ `x-ms-cosmos-hub-region-processing-only` header coordination

Reviewing #4330 alongside two related PRs surfaced a coordination gap that the spec
should record before any orchestrator code lands. Posting the proposed addition here
as a review with four inline suggestions you can apply à la carte; each one is
independently mergeable.

Context — the two related PRs

Rust PR #4389 added
the x-ms-cosmos-hub-region-processing-only: True header. It is emitted on the
retry triggered by the first 404 / 1002 (READ_SESSION_NOT_AVAILABLE) on a
single-master, data-plane operation and on every subsequent attempt within
that operation. The latch is a bool field on OperationRetryState
(components.rs:108-125),
set in build_session_retry_state
(retry_evaluation.rs:355-381),
and consumed in apply_hub_region_header
(operation_pipeline.rs:945-968).
Full normative spec:
HUB_REGION_PROCESSING_HEADER_SPEC.md.
The header tells the gateway "this client has already discovered the hub region —
process the request only in that region", which lets the operation skip the
multi-region discovery round-trip on every retry after the first 1002.
.NET azure-cosmos-dotnet-v3#5815 fixed the
exact failure mode this comment is about for the .NET v3 SDK. Before #5815, each
hedged request in CrossRegionHedgingAvailabilityStrategy carried its own copy of
the latch state, so every hedge independently re-ran the 404/1002 discovery cycle
and the header's "one round-trip per operation" guarantee held only inside an
individual hedge, not across the fan-out. The fix:

"added a new CrossRegionAvailabilityContext (with the property
ShouldAddHubRegionProcessingOnlyHeader) that is propagated through the
RequestMessage.Properties to every cloned hedge request" — quoted in §9.5.1.

Because Properties is a Dictionary<string, object> whose Clone() is shallow,
every clone gets a shared reference to one CrossRegionAvailabilityContext
instance; latching it once is observable from all hedges.

The gap as it lands today in Rust

Spec §8.2 of this PR explicitly lists OperationRetryState as cloned per hedge.
PR #4389 stores the hub-region latch as a bool on OperationRetryState. So the
two PRs in isolation are each correct, but composed they recreate the exact
pre-fix .NET v3 behavior — every hedge in the fan-out independently goes through its
own 404/1002 discovery cycle, and the header buys nothing under hedging except for
the one hedge that happens to observe 1002 first.

The proposed fix

Rust counterpart of .NET v3's CrossRegionAvailabilityContext: extend
OperationRetryState with a shared_hub_region_latch: Option<Arc<AtomicBool>>
that is Some only while running under execute_with_hedging(). Arc<AtomicBool>
is the Rust idiom equivalent to "shared mutable object propagated via
RequestMessage.Properties"; the Option lets the non-hedged path keep PR #4389’s
behavior bit-for-bit when shared_hub_region_latch = None.

The four suggestions on this review record the design in this PR's spec:

Suggestion	Anchor	What it adds
§8.2 carve-out	line 1110	New bullet in the "Items shared (via `Arc` or reference)" list pointing to §9.5.
§9.5 full section	line 1292	The new normative section between §9.4 and §10.
§15.1 unit tests	line 1715	Five rows covering: latch initialization, the two negative cases (non-hedged path, multi-master/metadata), cross-hedge propagation, and the no-1002-no-header invariant.
§15.2 fault-injection test	line 1732	One row asserting end-to-end propagation under a 2-region SM data-plane fault injection.

I've also left small inline suggestions on
#4389 for the three
load-bearing docstrings in the implementation (OperationRetryState::hub_region_processing_only,
build_session_retry_state, apply_hub_region_header) so the forward reference
to §9.5 lives next to the latch sites themselves.

No code in PR #4389 needs to ship a behavior change for this PR to merge — these
suggestions are doc only. The behavior change lives in the orchestrator PR that
introduces execute_with_hedging(), and §9.5 of this spec is what that PR will
point at for its acceptance criteria.

NaluTripician · 2026-05-12T18:39:35Z

+- `CosmosOperation` — immutable; body is `Bytes` (cheaply cloneable)
+- `LocationStateStore` — lock-free; multiple readers are safe
+- `SessionManager` — designed for concurrent access
+- `Credential` — `Arc`-wrapped


Suggested addition — §8.2 carve-out for the hub-region latch.

Adds the shared Arc<AtomicBool> to the “Items shared (via Arc or reference)” bullet list and forward-references the new §9.5 for full rationale. This is the core hand-off — without this carve-out, the §8.2 “items cloned per hedge” rule (which covers OperationRetryState) silently makes the hub-region latch per-hedge and defeats the discovery propagation.

Suggested change

- `Credential` — `Arc`-wrapped

- `Credential` — `Arc`-wrapped

- **Hub-region-processing-only latch** — a single `Arc<AtomicBool>` is

shared across the primary and every hedge for the lifetime of the

outer operation. See §9.5 for the full rationale; the short version

is that the per-`OperationRetryState` `hub_region_processing_only`

field added by [PR #4389](https://github.com/Azure/azure-sdk-for-rust/pull/4389)

is otherwise per-hedge, which would force every hedge to independently

re-discover the hub region via its own 404/1002 cycle. .NET v3 hit and

fixed this in [PR #5815](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5815)

via the `CrossRegionAvailabilityContext` shared object; Rust must

adopt the equivalent shared signal.

NaluTripician · 2026-05-12T18:39:36Z

+Implication: late hedges have less time budget. If the deadline is 5s and the
+threshold is 3s, the hedge has only ~2s to complete.
+
+---


Suggested addition — new §9.5 “Hub-Region-Processing-Only Header”.

This is the substantive content. It records:

§9.5.1 — the correctness gap: per-hedge OperationRetryState clones each carry an independent hub_region_processing_only: bool from PR #4389, so without coordination every hedge re-runs its own 404/1002 discovery cycle. Includes a verbatim quote from azure-cosmos-dotnet-v3#5815 where .NET hit and fixed the same gap via CrossRegionAvailabilityContext.

§9.5.2 — the required Rust design: extend OperationRetryState with shared_hub_region_latch: Option<Arc<AtomicBool>>, construct one Arc<AtomicBool> in execute_with_hedging() before fan-out, CAS-set it in build_session_retry_state alongside the per-state flag (Release), OR it into emission in apply_hub_region_header (Acquire). Includes the two minimal code samples.

§9.5.3 — eligibility gates (data-plane + single-master + regions.len() > 1); when any fails, shared_hub_region_latch = None and PR Emit x-ms-cosmos-hub-region-processing-only header #4389’s behavior is preserved bit-for-bit.

§9.5.4 — non-interference with the §8.4 local-only-retry invariant.

§9.5.5 — concurrency / memory-ordering notes.

Suggested change

---

### 9.5 Hub-Region-Processing-Only Header

The driver emits the `x-ms-cosmos-hub-region-processing-only: True`

request header on retries triggered by a `404 / 1002

(READ_SESSION_NOT_AVAILABLE)` response, scoped to **single-master

data-plane** operations. The header is specified in

[`HUB_REGION_PROCESSING_HEADER_SPEC.md`](../../azure_data_cosmos/docs/HUB_REGION_PROCESSING_HEADER_SPEC.md)

and implemented in [Rust PR

#4389](https://github.com/Azure/azure-sdk-for-rust/pull/4389) /

[.NET PR #5447](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5447)

(parity baseline).

#### 9.5.1 The hedging-specific correctness gap

The Rust latch lives on `OperationRetryState`

([`components.rs::OperationRetryState::hub_region_processing_only`](../src/driver/pipeline/components.rs))

and is set in

[`retry_evaluation.rs::build_session_retry_state`](../src/driver/pipeline/retry_evaluation.rs)

when all four conditions hold (cf. spec §7.1):

1. `is_dataplane`

2. `!can_use_multiple_write_locations` (single-master account)

3. `session_token_retry_count == 0` (first 1002 within the operation)

4. `!hub_region_processing_only` (idempotency)

It is consumed in

[`operation_pipeline.rs::apply_hub_region_header`](../src/driver/pipeline/operation_pipeline.rs)

on every subsequent transport attempt of the same operation.

Per §8.2, **each hedge has its own `OperationRetryState`**. Without

additional coordination, this means each hedge — primary, hedge 1,

hedge 2, … — would independently observe its own first 1002, then

independently re-issue the next attempt with the header set. Every

hedge pays the full hub-discovery latency cost; the header's purpose

(*bound the discovery cycle to a single 1002 round-trip per operation*)

is defeated for everyone except the lucky hedge that observes 1002 first.

This is the same gap .NET v3 had after its first hub-region header PR

([#5447](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5447)) and

**explicitly fixed** in

[PR #5815 — *Read Consistency Strategy: Adds hub region header for

LastCommittedWriteRegion strategy*](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5815),

in the section *"Hedging request with hub region header"*:

> When `CrossRegionHedgingAvailabilityStrategy` is active, the primary

> request may discover the hub region mid-flight … Hedged requests are

> clones of the original and run with their own `ClientRetryPolicy`

> instance, so they would normally repeat the entire hub discovery cycle

> independently. To avoid this redundant retry overhead, we introduce a

> `CrossRegionAvailabilityContext` — a lightweight shared object with a

> volatile `bool ShouldAddHubRegionProcessingOnlyHeader` flag. This

> context is injected into `RequestMessage.Properties` before the clone

> loop in `CrossRegionHedgingAvailabilityStrategy`. Since `Clone()`

> performs a shallow dictionary copy, all clones (primary + hedges)

> share the same `CrossRegionAvailabilityContext` reference. When the

> primary's `ClientRetryPolicy` sets the hub flag after 2× 404/1002, it

> also sets the flag on the shared context. Each hedge's

> `ClientRetryPolicy.OnBeforeSendRequest` reads this shared flag on

> every attempt and attaches the

> `x-ms-cosmos-hub-region-processing-only` header immediately — without

> needing to go through its own 404/1002 discovery.

The Rust orchestrator MUST adopt the equivalent design.

#### 9.5.2 Required design — `Arc<AtomicBool>` shared latch

Construct a single `Arc<AtomicBool>` in `execute_with_hedging()`

**before any hedge is spawned**, and thread it into every pipeline

invocation (primary and hedges). Concretely:

```rust

// In execute_with_hedging(), before the spawn-and-race loop:

let shared_hub_region_latch: Arc<AtomicBool> = Arc::new(AtomicBool::new(false));

// When constructing each hedge's pipeline params:

let retry_state = OperationRetryState::initial(/* … */)

.with_shared_hub_region_latch(shared_hub_region_latch.clone());

This requires a small extension to OperationRetryState:

pub struct OperationRetryState { // … existing fields … /// Per-operation hub-region-processing-only latch. /// Sticky for the lifetime of this `OperationRetryState`. pub hub_region_processing_only: bool, /// Cross-hedge shared latch. `Some(_)` only when this operation is /// running inside `execute_with_hedging()` — `None` on the /// non-hedged code path so today's allocator behavior is preserved. /// /// Mirrors .NET v3's `CrossRegionAvailabilityContext` injected into /// `RequestMessage.Properties` before the clone loop /// (azure-cosmos-dotnet-v3 PR #5815). pub shared_hub_region_latch: Option<Arc<AtomicBool>>, }

The two existing call sites are then extended:

build_session_retry_state (latch-set side). When the four
trigger conditions fire and the new state sets
hub_region_processing_only = true, also CAS-set
shared_hub_region_latch to true if present:

if let Some(shared) = &retry_state.shared_hub_region_latch { shared.store(true, Ordering::Release); }

Release is sufficient — the only thing being published is the bool
itself; no further state hangs off it.

apply_hub_region_header (header-emission side). Emit the header
when either the per-state latch is set or the shared latch is
set:

let emit = retry_state.hub_region_processing_only || retry_state .shared_hub_region_latch .as_ref() .map(|shared| shared.load(Ordering::Acquire)) .unwrap_or(false); if emit { transport_request.headers.insert( HeaderName::from_static(request_header_names::HUB_REGION_PROCESSING_ONLY), HeaderValue::from_static("True"), ); }

This preserves the §5/§7/§8 invariants of
HUB_REGION_PROCESSING_HEADER_SPEC.md (account-level scope, data-plane
scope, idempotency / sticky semantics) on a per-hedge basis while
also propagating the discovery from any hedge to every other hedge as
soon as it happens.

9.5.3 Eligibility — when the shared latch is actually wired

The shared latch is only populated when all of the following are
true at the start of execute_with_hedging():

Condition Why

Operation is data-plane (is_dataplane) Mirrors the §1.5 scope of HUB_REGION_PROCESSING_HEADER_SPEC.md.

Account is single-master (!can_use_multiple_write_locations) Mirrors AC-4 of HUB_REGION_PROCESSING_HEADER_SPEC.md; multi-master accounts have a separate recovery path and the header is never emitted.

Hedging actually fans out (regions.len() > 1) When the orchestrator falls through to the single-region path (§6.4), the per-state latch alone is sufficient — there is no second hedge to propagate to.

When any condition fails, shared_hub_region_latch is None and the
existing per-state behavior from PR #4389 is preserved bit-for-bit.

9.5.4 Interaction with §8.4 (Local-only retries inside a hedge)

The §8.4 local-only-retry contract is unaffected by the shared latch:
the latch governs only which request header is emitted, not the
endpoint resolution. ExcludeRegions continues to pin each hedge to
its own region across retries; the shared latch merely ensures every
hedge's retries — within their pinned region — also carry the
hub-region hint once any hedge has observed 1002. No new retry
trigger paths or region-fallback edges are introduced.

9.5.5 Concurrency notes

AtomicBool with Release/Acquire ordering is sufficient — the
bool is the only thing being shared and there is no dependent state.
Relaxed would also be functionally correct (single-flag race with
monotonic 0 → 1 transition) but Release/Acquire is preferred for
reader/code-author clarity and costs nothing on every architecture
the Rust SDK targets.

The latch is monotonic 0 → 1 and never reset within an operation —
matches the "sticky" semantics of the per-state latch
(components.rs:108-125).

The Arc is scoped to one outer execute_with_hedging() call, so it
is dropped when the orchestrator returns (no global state, no leak
across operations).

A losing hedge whose transport already responded after cancellation
(cf. §14.2) may still observe and CAS-set the shared latch — this is
benign: the orchestrator has already returned a winner, and the next
observer of the dropped Arc is no one.

NaluTripician · 2026-05-12T18:39:36Z

+| `hedging_config_requires_explicit_step` | `threshold_step` must be provided explicitly; constructor does not default it from `threshold` |
+| `region_exclusion_for_hedge_n` | Correct ExcludeRegions per hedge |
+| `exclude_regions_honored_by_every_retry_trigger` | For each retry trigger class — PPAF write retry, PPCB markdown failback, transport-layer 503, throttling 429, session-token 1002 — fault-inject the trigger inside a hedge and assert the retry attempt does **not** route to a region listed in the hedge's `ExcludeRegions`. Encodes the §8.4 cross-cutting invariant; new retry triggers added in later phases must extend this test. |
+| `app_cancel_preserves_hedge_diagnostics` | Cancel the application token mid-fan-out; assert the returned error carries `HedgeDiagnostics` from the most-advanced in-flight hedge (covers §6.5 invariant #6). |


Suggested addition — five §15.1 unit-test rows for §9.5.

One row per acceptance criterion in §9.5.2/§9.5.3, including the cross-hedge propagation test (shared_hub_region_latch_propagates_first_1002_to_other_hedges) which is the Rust counterpart of .NET PR #5815’s CrossRegionAvailabilityContext_PropagatesHubHeaderFlagToHedgedRequests test.

Suggested change

| `app_cancel_preserves_hedge_diagnostics` | Cancel the application token mid-fan-out; assert the returned error carries `HedgeDiagnostics` from the most-advanced in-flight hedge (covers §6.5 invariant #6). |

| `app_cancel_preserves_hedge_diagnostics` | Cancel the application token mid-fan-out; assert the returned error carries `HedgeDiagnostics` from the most-advanced in-flight hedge (covers §6.5 invariant #6). |

| `shared_hub_region_latch_initialized_when_eligible` | `execute_with_hedging()` invoked on a data-plane / single-master operation with `regions.len() > 1`; assert every hedge's `OperationRetryState.shared_hub_region_latch` is `Some(_)` and points to the same `Arc<AtomicBool>` instance (encodes §9.5.2 / §9.5.3). |

| `shared_hub_region_latch_none_on_non_hedged_path` | `execute_with_hedging()` falls through to `execute_operation_pipeline` because `regions.len() <= 1`; assert `shared_hub_region_latch` is `None` (preserves PR #4389 baseline allocator behavior — §9.5.2). |

| `shared_hub_region_latch_none_on_multi_master_or_metadata` | Multi-master *or* metadata pipeline; assert `shared_hub_region_latch` is `None` even when hedging fans out, matching `HUB_REGION_PROCESSING_HEADER_SPEC.md` §5 account-level / §1.5 data-plane gates (§9.5.3). |

| `shared_hub_region_latch_propagates_first_1002_to_other_hedges` | Drive 1002 through `build_session_retry_state` on hedge 0; assert (a) hedge 0's per-state `hub_region_processing_only` is `true`, (b) the shared `Arc<AtomicBool>` is `true`, (c) on the next transport attempt, hedge 1 and hedge 2 — whose per-state latches are still `false` — both have `apply_hub_region_header` emit the header. Encodes the §9.5 cross-hedge invariant and is the Rust counterpart of .NET PR #5815's `CrossRegionAvailabilityContext_PropagatesHubHeaderFlagToHedgedRequests` test. |

| `shared_hub_region_latch_no_1002_emits_no_header` | No hedge observes 1002; assert no hedge calls `apply_hub_region_header` with the header set. |

NaluTripician · 2026-05-12T18:39:36Z

+| `hedging_with_ppcb` | 503 on Region A reads; PPCB enabled | PPCB and hedging both apply; circuit breaker tripped AND hedge succeeds |
+| `hedging_cancels_losers` | Delay on Region A | Region B wins; verify Region A task cancelled (hit_count ≤ expected) |
+| `hedging_failback_to_primary` | Region A initially slow, then fast | First few reads hedged; after threshold tightened, primary wins again |
+| `hedging_exclude_regions_under_503_retry` | Region B inside hedge returns 503 (triggers transport retry) while Region C is healthy and excluded by that hedge's `ExcludeRegions` | Hedge B's retry stays pinned to Region B (does NOT fall back to Region C) — fault-injection counterpart to the §8.4 invariant unit test. |


Suggested addition — one §15.2 fault-injection integration test for §9.5.

Fault-injects 1002 on the primary's first attempt against Region A and asserts the header propagates to the Region-B hedge — even though Region B never observes 1002 itself. The end-to-end counterpart of the §15.1 unit test, and the direct Rust analogue of .NET PR #5815's emulator-level coverage.

Suggested change

| `hedging_exclude_regions_under_503_retry` | Region B inside hedge returns 503 (triggers transport retry) while Region C is healthy and excluded by that hedge's `ExcludeRegions` | Hedge B's retry stays pinned to Region B (does NOT fall back to Region C) — fault-injection counterpart to the §8.4 invariant unit test. |

| `hedging_exclude_regions_under_503_retry` | Region B inside hedge returns 503 (triggers transport retry) while Region C is healthy and excluded by that hedge's `ExcludeRegions` | Hedge B's retry stays pinned to Region B (does NOT fall back to Region C) — fault-injection counterpart to the §8.4 invariant unit test. |

| `hedging_hub_region_header_propagates_across_hedges` | 2-region SM data-plane account; fault-inject `404/1002` on the primary's first attempt against Region A, healthy 200 on Region B after threshold | Primary's retry against Region A emits `x-ms-cosmos-hub-region-processing-only: True` (per-state latch) **and** the hedge spawned against Region B emits the same header on every attempt — without itself ever observing a 1002 (per the shared `Arc<AtomicBool>` from §9.5). Encodes the cross-hedge propagation invariant under fault injection; counterpart of .NET PR #5815's `CrossRegionAvailabilityContext_PropagatesHubHeader…` emulator tests. |

Code changes to add hedging spec.

203a7b5

github-actions Bot added the Cosmos The azure_cosmos crate label May 3, 2026

github-project-automation Bot added this to CosmosDB Go/Rust Crew May 3, 2026

github-project-automation Bot moved this to Todo in CosmosDB Go/Rust Crew May 3, 2026

Code changes to update hedging spec.

ca6d67b

kundadebdatta changed the title ~~Code changes to add hedging spec.~~ Cosmos: Adds Cross-Region Hedging Design Spec to Driver Crate May 4, 2026

kundadebdatta self-assigned this May 4, 2026

kundadebdatta added the high-availability label May 4, 2026

kundadebdatta moved this from Todo to In Progress in CosmosDB Go/Rust Crew May 4, 2026

NaluTripician reviewed May 5, 2026

View reviewed changes

kundadebdatta added 2 commits May 5, 2026 10:56

Merge branch 'release/azure_data_cosmos-previews' into users/kundadeb…

55b1331

…datta/3935_add_hedging_spec

Code changes to update the spec doc.

ba4d0a6

kundadebdatta marked this pull request as ready for review May 5, 2026 22:58

kundadebdatta requested a review from a team as a code owner May 5, 2026 22:58

Copilot AI review requested due to automatic review settings May 5, 2026 22:58

Copilot started reviewing on behalf of kundadebdatta May 5, 2026 22:59 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

kundadebdatta and others added 2 commits May 6, 2026 19:12

Code changes to update spec to match the review comments.

0c21f8b

Merge branch 'release/azure_data_cosmos-previews' into users/kundadeb…

3647b40

…datta/3935_add_hedging_spec

FabianMeiswinkel reviewed May 8, 2026

View reviewed changes

FabianMeiswinkel requested changes May 8, 2026

View reviewed changes

github-project-automation Bot moved this from In Progress to Changes Requested in CosmosDB Go/Rust Crew May 8, 2026

FabianMeiswinkel reviewed May 8, 2026

View reviewed changes

simorenoh reviewed May 8, 2026

View reviewed changes

NaluTripician reviewed May 12, 2026

View reviewed changes

NaluTripician mentioned this pull request May 12, 2026

Emit x-ms-cosmos-hub-region-processing-only header #4389

Merged

	is exceeded, and returns whichever response arrives first. This bounds tail latency
	is exceeded, and returns whichever finite response arrives first. This bounds tail latency


		## 3. Architectural Overview

		### 3.1 Where Hedging Sits in the Driver

Axis	`TRANSPORT_PIPELINE_SPEC` §4.2	`HEDGING_SPEC` (this PR)
Layer	`OperationAction::Hedge { secondary_routing }` returned by `evaluate_transport_result` inside the pipeline loop	Orchestrator wraps `execute_operation_pipeline()` from above
Fan-out	Single secondary region (max 2 concurrent)	Up to N regions, progressive timer
Default	Enabled by default for all ops (writes only on MWR)	Off by default; auto-enabled only by PPAF
Threshold	Dynamic, P99-based, clamped 50–4000 ms	Static (`min(1000ms, RT/2)` / `500ms`)
Decision enum	New `OperationAction::Hedge` variant (TPS line 463)	No new variant; orchestration is external
`ExecutionContext`	Has dedicated `Hedging` value (TPS line 300)	Not addressed


		### 2.3 Eligibility — `ShouldHedge()`

		Hedging applies only to document-level point operations:

-- `Credential` — `Arc`-wrapped
+- `Credential` — `Arc`-wrapped
+- **Hub-region-processing-only latch** — a single `Arc<AtomicBool>` is
+  shared across the primary and every hedge for the lifetime of the
+  outer operation. See §9.5 for the full rationale; the short version
+  is that the per-`OperationRetryState` `hub_region_processing_only`
+  field added by [PR #4389](https://github.com/Azure/azure-sdk-for-rust/pull/4389)
+  is otherwise per-hedge, which would force every hedge to independently
+  re-discover the hub region via its own 404/1002 cycle. .NET v3 hit and
+  fixed this in [PR #5815](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5815)
+  via the `CrossRegionAvailabilityContext` shared object; Rust must
+  adopt the equivalent shared signal.

----
+### 9.5 Hub-Region-Processing-Only Header
+The driver emits the `x-ms-cosmos-hub-region-processing-only: True`
+request header on retries triggered by a `404 / 1002
+(READ_SESSION_NOT_AVAILABLE)` response, scoped to **single-master
+data-plane** operations. The header is specified in
+[`HUB_REGION_PROCESSING_HEADER_SPEC.md`](../../azure_data_cosmos/docs/HUB_REGION_PROCESSING_HEADER_SPEC.md)
+and implemented in [Rust PR
+#4389](https://github.com/Azure/azure-sdk-for-rust/pull/4389) /
+[.NET PR #5447](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5447)
+(parity baseline).
+#### 9.5.1 The hedging-specific correctness gap
+The Rust latch lives on `OperationRetryState`
+([`components.rs::OperationRetryState::hub_region_processing_only`](../src/driver/pipeline/components.rs))
+and is set in
+[`retry_evaluation.rs::build_session_retry_state`](../src/driver/pipeline/retry_evaluation.rs)
+when all four conditions hold (cf. spec §7.1):
+. `is_dataplane`
+. `!can_use_multiple_write_locations` (single-master account)
+. `session_token_retry_count == 0` (first 1002 within the operation)
+. `!hub_region_processing_only` (idempotency)
+It is consumed in
+[`operation_pipeline.rs::apply_hub_region_header`](../src/driver/pipeline/operation_pipeline.rs)
+on every subsequent transport attempt of the same operation.
+Per §8.2, **each hedge has its own `OperationRetryState`**. Without
+additional coordination, this means each hedge — primary, hedge 1,
+hedge 2, … — would independently observe its own first 1002, then
+independently re-issue the next attempt with the header set. Every
+hedge pays the full hub-discovery latency cost; the header's purpose
+(*bound the discovery cycle to a single 1002 round-trip per operation*)
+is defeated for everyone except the lucky hedge that observes 1002 first.
+This is the same gap .NET v3 had after its first hub-region header PR
+([#5447](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5447)) and
+**explicitly fixed** in
+[PR #5815 — *Read Consistency Strategy: Adds hub region header for
+LastCommittedWriteRegion strategy*](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5815),
+in the section *"Hedging request with hub region header"*:
+> When `CrossRegionHedgingAvailabilityStrategy` is active, the primary
+> request may discover the hub region mid-flight … Hedged requests are
+> clones of the original and run with their own `ClientRetryPolicy`
+> instance, so they would normally repeat the entire hub discovery cycle
+> independently. To avoid this redundant retry overhead, we introduce a
+> `CrossRegionAvailabilityContext` — a lightweight shared object with a
+> volatile `bool ShouldAddHubRegionProcessingOnlyHeader` flag. This
+> context is injected into `RequestMessage.Properties` before the clone
+> loop in `CrossRegionHedgingAvailabilityStrategy`. Since `Clone()`
+> performs a shallow dictionary copy, all clones (primary + hedges)
+> share the same `CrossRegionAvailabilityContext` reference. When the
+> primary's `ClientRetryPolicy` sets the hub flag after 2× 404/1002, it
+> also sets the flag on the shared context. Each hedge's
+> `ClientRetryPolicy.OnBeforeSendRequest` reads this shared flag on
+> every attempt and attaches the
+> `x-ms-cosmos-hub-region-processing-only` header immediately — without
+> needing to go through its own 404/1002 discovery.
+The Rust orchestrator MUST adopt the equivalent design.
+#### 9.5.2 Required design — `Arc<AtomicBool>` shared latch
+Construct a single `Arc<AtomicBool>` in `execute_with_hedging()`
+**before any hedge is spawned**, and thread it into every pipeline
+invocation (primary and hedges). Concretely:
+```rust
+// In execute_with_hedging(), before the spawn-and-race loop:
+let shared_hub_region_latch: Arc<AtomicBool> = Arc::new(AtomicBool::new(false));
+// When constructing each hedge's pipeline params:
+let retry_state = OperationRetryState::initial(/* … */)
+    .with_shared_hub_region_latch(shared_hub_region_latch.clone());

Condition	Why
Operation is data-plane (`is_dataplane`)	Mirrors the §1.5 scope of `HUB_REGION_PROCESSING_HEADER_SPEC.md`.
Account is single-master (`!can_use_multiple_write_locations`)	Mirrors AC-4 of `HUB_REGION_PROCESSING_HEADER_SPEC.md`; multi-master accounts have a separate recovery path and the header is never emitted.
Hedging actually fans out (`regions.len() > 1`)	When the orchestrator falls through to the single-region path (§6.4), the per-state latch alone is sufficient — there is no second hedge to propagate to.

-| `app_cancel_preserves_hedge_diagnostics` | Cancel the application token mid-fan-out; assert the returned error carries `HedgeDiagnostics` from the most-advanced in-flight hedge (covers §6.5 invariant #6). |
+| `app_cancel_preserves_hedge_diagnostics` | Cancel the application token mid-fan-out; assert the returned error carries `HedgeDiagnostics` from the most-advanced in-flight hedge (covers §6.5 invariant #6). |
+| `shared_hub_region_latch_initialized_when_eligible` | `execute_with_hedging()` invoked on a data-plane / single-master operation with `regions.len() > 1`; assert every hedge's `OperationRetryState.shared_hub_region_latch` is `Some(_)` and points to the same `Arc<AtomicBool>` instance (encodes §9.5.2 / §9.5.3). |
+| `shared_hub_region_latch_none_on_non_hedged_path` | `execute_with_hedging()` falls through to `execute_operation_pipeline` because `regions.len() <= 1`; assert `shared_hub_region_latch` is `None` (preserves PR #4389 baseline allocator behavior — §9.5.2). |
+| `shared_hub_region_latch_none_on_multi_master_or_metadata` | Multi-master *or* metadata pipeline; assert `shared_hub_region_latch` is `None` even when hedging fans out, matching `HUB_REGION_PROCESSING_HEADER_SPEC.md` §5 account-level / §1.5 data-plane gates (§9.5.3). |
+| `shared_hub_region_latch_propagates_first_1002_to_other_hedges` | Drive 1002 through `build_session_retry_state` on hedge 0; assert (a) hedge 0's per-state `hub_region_processing_only` is `true`, (b) the shared `Arc<AtomicBool>` is `true`, (c) on the next transport attempt, hedge 1 and hedge 2 — whose per-state latches are still `false` — both have `apply_hub_region_header` emit the header. Encodes the §9.5 cross-hedge invariant and is the Rust counterpart of .NET PR #5815's `CrossRegionAvailabilityContext_PropagatesHubHeaderFlagToHedgedRequests` test. |
+| `shared_hub_region_latch_no_1002_emits_no_header` | No hedge observes 1002; assert no hedge calls `apply_hub_region_header` with the header set. |

	\| `hedging_exclude_regions_under_503_retry` \| Region B inside hedge returns 503 (triggers transport retry) while Region C is healthy and excluded by that hedge's `ExcludeRegions` \| Hedge B's retry stays pinned to Region B (does NOT fall back to Region C) — fault-injection counterpart to the §8.4 invariant unit test. \|
	\| `hedging_exclude_regions_under_503_retry` \| Region B inside hedge returns 503 (triggers transport retry) while Region C is healthy and excluded by that hedge's `ExcludeRegions` \| Hedge B's retry stays pinned to Region B (does NOT fall back to Region C) — fault-injection counterpart to the §8.4 invariant unit test. \|
	\| `hedging_hub_region_header_propagates_across_hedges` \| 2-region SM data-plane account; fault-inject `404/1002` on the primary's first attempt against Region A, healthy 200 on Region B after threshold \| Primary's retry against Region A emits `x-ms-cosmos-hub-region-processing-only: True` (per-state latch) and the hedge spawned against Region B emits the same header on every attempt — without itself ever observing a 1002 (per the shared `Arc<AtomicBool>` from §9.5). Encodes the cross-hedge propagation invariant under fault injection; counterpart of .NET PR #5815's `CrossRegionAvailabilityContext_PropagatesHubHeader…` emulator tests. \|

Conversation

kundadebdatta commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why hedging?

Scope of this PR

What the spec covers (17 sections, ~1,569 lines)

Design highlights

Cross-SDK alignment

Validation

Branch / target

Follow-up work tracker

Uh oh!

NaluTripician left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FabianMeiswinkel May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FabianMeiswinkel May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FabianMeiswinkel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FabianMeiswinkel May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🔴 Blocking — Direct contradiction with TRANSPORT_PIPELINE_SPEC.md §4.2 — needs reconciliation, not parallel specs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simorenoh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kundadebdatta commented May 3, 2026 •

edited

Loading

FabianMeiswinkel May 8, 2026 •

edited

Loading

FabianMeiswinkel May 8, 2026 •

edited

Loading

FabianMeiswinkel left a comment •

edited

Loading

FabianMeiswinkel May 8, 2026 •

edited

Loading

🔴 Blocking — Direct contradiction with `TRANSPORT_PIPELINE_SPEC.md` §4.2 — needs reconciliation, not parallel specs

simorenoh May 8, 2026 •

edited

Loading

Hedging ↔ `x-ms-cosmos-hub-region-processing-only` header coordination